What is CAP Theorem ?

CAP is a theorem that’s used for distributed systems. This theorem stands on 3 states. These are :

  • Consistency (-C-) 
  • Availability (-A-) 
  • Partition Tolerance (-P-)

According to CAP theorem, every distributed system can only have 2 states of these 3. So you can’t have 3 of them at the same time. That means, your distributed system could be CA or CP or AP. I think the best way of understanding these states is to explain them in examples. Suppose we have 3 servers that work together as a distributed system. These are Server 1, Server 2 and Server 3. 

Consistency : If I add data into a server (for example Server 1) and get the same value from another server (let’s say Server 2) that means this distributed system supports consistency.

Availability : If you can make write and read operations to each of these servers, it means this distributed system also supports availability.

Partition Tolerance : If the connection between your servers has failed and you are still able to read or write data from any of these servers, that means your distributed system supports partition tolerance.

With that being said let’s look at CA, CP and AP.

CA (Consistency-Availability) :  If your distributed system supports CA then it must sacrifice for partition tolerance. Suppose all your three servers are running. You send data to Server 1. Then however the connection between Server 1 and Server 2 has failed before Server 1 writes that data to Server 2. So if you try to read the data that you just sent to  Server 1 from Server 2, you can’t read it right? If the read and write operation in your servers must be consistent then we should stop the read and write operations to Server 2 whose connection has failed until we fix the problem. When you do this you will sacrifice for partition tolerance. CA is used in systems where data consistency is very important. Relational databases such as Mysql, Oracle, Postgresql use these states. 

CP (Consistency-Partition Tolerance) : If you don’t want the server to stop running after the connection is lost and you want consistency in your servers then you must stop writing operations to provide consistency in each of your servers until you fix the problem. MongoDB, Redis, Memcached databases support CP states.

AP (Availability-Partition Tolerance) : So if you want your servers to always do write and read operations even if the connection between them has failed then you must sacrifice for data consistency. Because if the connection fails, the data consistency will be lost. Do you think is it nonsense to use this ? Think about a social media feed page that millions of posts flow. If the current like count of each post doesn’t need consistency (which is also what facebook and twitter are considering), you can use AP for that kind of part. We can say Cassandra, CouchDB, Dynamo as AP supported databases.

Alright. That’s it for this post. I hope it was useful. See you soon.

You may also like...