How does dynamo handle consistency and availability? What are the issues that must be resolved? Dynamo favors a consistency centered service than availability. Dynamo targets applications that operate with weaker consistency to result in high availability. Consistency is handled by a process called object versioning. A quorum-like technique is used to maintain updates consistency among replicas. Since Dynamo is a decentralized system with very little user administration, storage nodes can be added and removed without requiring any manual partitioning or redistribution. To maintain a consistent data access interface algorithms are forced to trade-off the availability of the data under certain failure scenarios. Why and how does dynamo modify regular consistent hashing? How does this affect replication? Consistent hashing maps each node/object to a point on a ring where the node/object is assigned a value to and then hashed to identify its location on the ring. This consistent hashing algorithm provided challenges for Amazon with non-uniform data and load distribution. Dynamo uses a variation of consistent hashing algorithm - this new algorithm maps each node to multiple points on the ring. This concept uses the term "Virtual Nodes". A Virtual Node acts just like a physical single node on the system, but is assigned multiple position on the ring. The benefits of using a Virtual Node - (1) if a node fails the load is evenly distributed across the available nodes, (2) when a new node is add the load is readjusted so the newly added node is balanced with the neighbor nodes. How does dynamo handle shopping cart inconsistency? Dynamo treats each modification of the shopping cart as a new version. Amazon position when dealing with shopping cart transactions is to preserve the cart items. Dynamo environment for multiple versions of the shopping cart object as the same time. When the shopping cart is updated this subsumes the previous version of the shopping cart. If a failures occurs this could result in conflicting versions of the shopping cart object. Dynamo cannot reconcile the multiple versions of the same object and a manual effort is need to collapse multiple version of the cart to evolve back one. Describe the BigTable data model. How does it handle persistence and availability? The BigTable data model consists of a multi-dimensional stored map indexed by row & column keys and timestamps. The row key is an "arbitrary strings (currently up to 64KB in size, although 10-100 bytes" [1]. The row key also provides concurrency on row updates given the row keys atomic nature. Column Keys form the basic access control of the database. Column keys are then ground into a column-family of the same type in order to group the piticlure data. Also, at the Column key level, this key controls the disk and memeopry at each column-family. Lastly, Timestamps are used to log each change version of a row/column. Given that BigTable can have many version of the same data the timestamps enables a smoother retrieval foreach version. Why did Facebook develop cassandra? What features of older systems did it combine? Cassandra was developed to meet the readability and scalability needs of the Facebook service. Facebook is used by millions of user each day and is facilitated by thousands of servers world wide. The Facebook service must be scalable to support their continued growth and operational requirements. The service must also be reliable and efficient. Facebook combined term search and interactions form the legacy system with the dynamo. Why did Netfix change its architecture? Describe each the change rationale. Nexflix fist generation architecture consisted of a single load balancer, twenty apache/tomcat servers, 3 Oracle Databases & 1 MySQL DB, a home grown cache Server and Cinematch Recommendation System. The Netfix application is developed in Java and suffers from Java's internal garbage collection problems resulting in slower page loads. Also, the fist gen application suffers from deadlocks in the multi-threaded Java application thus causing web page timeouts and transaction locking. Netflix's architecture also did not support horizontal scalable and was not conductive to high-velocity development and deployment. For example, the Netfilx's code base came from trunk to a release branch - if a bug occurred the Netflix needed to roll back all code and features until single that bug was fixed and redeploy. With the introduction of the Cloud architecture, Netflix reorganized how work should be done by braking responsibility's into small manageable clusters which resulted in 100 different applications. These clusters were at different levels in the Netflix's call stack. These stacks included, NES-Netflix Edge Service - any service that browsers and steaming devices connect to over the web, sits behind AWS elastic load balancers, and can call clusters at lower level. NMTS-Netflix Mid-tier Services - can call services ATA the same or lower levels (other NMTS; NBES, IAAS; not NES) and only exposed though Netflix internal discovery service. NBES-Netflx Backend Service - mainly reserved for 3rd party open-source service, can be described as a leaf in the Call tree & cannot call anything else. An example of this NBES cluser would include - Cassandra DB which stores all data to support application. IAAS AWS IAAS Services - AWS S3 support for large data (video encoders, applications logs) that can not be effectively sorted in local Cassandra and AWS SQS - Amazons message queue to send events. How does Netfix handle rapidly changing demand for services? Give a scenario. With the new architecture Neflix is able to horizontally scale at every level - providing maximum availability. Also, the new architecture supports high velocity development and deployment - Netfix can push update as need do to sitting behind a load balancer will little coordination. For example, if a Netflix needs to push a new IAAS service for a Facebook feature they can do it at any point day or night with little notification or coordination. What is the 'thundering herd'? How is it handled? Thundering herd is the issue when one service (A) is overloading another service (B) my sending to many requests. This overload causes timeouts to B service thus this overload triggers B service tier to add additional resources (server) to accommodate the load.