Question Set 4 How is a collection different than a relational table? Relational databases have columns and tables level where Mongodb defines - fields at the document level. A collection can be considered a scaled down version of a relational table. A document can contain more information than a rows. How are embedded documents similar to joins? How different? Embedded documents can be used to simulate the join structure in relational databases by adding the objectId of another document. The difference is that is just an aftriflce join were mongodb doesn't support or require joins Why is Mongodb good for a logging use case? Applications today need to have the ability to provide multiple type of user data (logins, profiles,etc.). Mongodb is able to provide developers a query language to delivery information quickly. Also, with the Map/Reduce capability, Mongodb can provide user data in real-time for quick decisions. Lastly, with the availability of horizontal scaling or auto-sharding, Mongodb is able to scale with an increasing user base quickly and efficiently. These benefits enable logging use case to benefit by providing new features to market quick and providing great user experience and lower cost of ownership. What issues related to adaptabilty were important to the Guardian decision? To the ability to scale and deploy new features rapidly. Describe the evolution of the Guradian architecture in stages with special attention to the model at each stage. Early Period 1995 - During this time period the Guardian was very static and had a very manual process for updating content. The core was primally a Perl/CGI which sat on top of Oracle Database. It was very custom an experiential software. Mid-Period 2000’s - During this period the Guardian used Vignette and AOLserver and was written in TCL, Apache, and used an Oracle database. This new architecture proved to be develop faster application features than previous design. It primally used a templateing engine for content organization. Later on these templates became hard and hard to use given all the different language used to develop them. Modern Period 2005-2009 - Used J2EE Spring / Hibernate coupled database abstracted with ORM relational database. Partial NoSQL 2009-2010 - Moving away form RDBMS and decoupling applications from database schema to Core API model. The read API delivered using Apache Solr hosted on Amazon EC2 service. This model provided a loose and flexible scheme for scaling. Full NoSQL (In Development) - Using the new Core API layer to have applications decoupled from a database. How is database query different for Mongodb vs. Couchdb vs. relational? Include discussion of update queries. Mongodb and Couchdb are document databases, which stores JSON documents, which lacks the traditional relational database schema of tables and columns. Mongodb has as a query language similar to relational SQL which allows you to extract parts of JSON document. Couchdb pull data from what are essentially stored procedures called views. A view is made up of a map function and optionally a reduce function. Relational queries is a collection of tables of data items, all of which is formally described and organized according to the relational model or schema. What are important schema upgrade/change issues? An important issue for schema upgrade is making sure the new version of an application can deal with the different versions of the documents in the database. Guardian’s solution is to add a version key to each document. This version key will be updated each time an application modifies the document allow to find issue with in the document of application. What are important replication issues for Mongodb? Include sharding, consistency, flexibility, others, etc. When replicating Mongodb you might experience some sharing issues when one share might become hot or to large causing traffic issues. If implemented early on an auto-sharing feature of Mongodb will take over and auto-balcnce the shards. When working with older versions of Mongodb there no guarantee of durability if working a single node. Also, Mongo does not have a write error checking conformation. One must manually check the database to ensure a write to master node. In version of Mongodb great than 1.8, Mongodb has developed a single server mode to ensure data durability, but comes at a significant performance impact. Describe the new Guardian identity model with Mongodb. How does it fit into the current complete architecture and API? What is the future plan for the architecture? The new Guardian model reduced the amount of code and reassured needed for user/identity information. The new Mongodb model implements two collections in a list format. This simplistic model allows developer to access user data quickly and effectually. The new identity model will sit on top of the new core API thus connected to a Mongodb and Oracle databases. Future plan are to possibly bring other application on to the API to totally decoupled application from database. Summarize and discuss problems and controversies for Mongodb. From the articles there are roughly three main issues or controversies sounding Mongodb. Theses issues or controversies deal with global write locks, sharing, and replication. The write lock controversy deals with a deficiency locking an mongod instance preventing it from servicing other queries while writing to the database. This becomes an issue with when the server is under a heavy write load and the server can respond to concurrent read requests. Currently 10Gen is evaluating alternatives to alleviate this issue. Another issue revolves about sharding where setting up shards is difficult contrary to Mongodb’s documentation where it employee an auto-sharidng features. Also with sharding, while under heavy loads sharding does not work. Its theorized that this might be contributed by moving chunks between shards so quickly that the network stops accepting additional packets. The last issues touches on replication, where the currently configuration of master/slave has potential gaps causing salves to have incorrect or missing data or the replication process just stops with error. The theory causing this positional missed data is a result of a missing checksum where the checksum is able to confirm the entire chunk of data was replaced is both sum match.