Recently I was given the daunting task of developing an administration interface for a popular website. No problem I thought, and I began copying over a new copy of CakePHP to begin development. I had been notified that the original site used Mongo for it’s database, and after searching Google for approximately 2 seconds I found a Mongo plugin for Cake. A logical person would believe that this would be the end of the story and I developed the site like I would any other, and no problems were had because of the seamless interface.
If only it were that simple.
I had never used any database implementation other than MySQL. I never saw a use for anything else, as the efficiency was not necessary for the sites I was developing. Playing with Mongo a bit and seeing it seek records in microseconds from a database with millions of entries was an eye opener to other alternatives. I quickly ran into problems though.
I discovered very quickly that the setup this particular site used, and it’s many iterations of development over the years, left it in a patchy state such that I could not tell which data was needed for editing on the administration interface I was developing. Then when I finally figured out the necessary fields I found many of the records pulled did not contain them, throwing errors all over the interface. I was thinking to myself the entire time that it would be easier to hack together a C++ program to convert this database to MySQL than it would be to cobble together a solution for Mongo.
I didn’t wind up going that route for the simple administration interface I was building, but I took a gander at the Apache error log for the site while I was there, and saw errors galore. Mostly “Too many records to sort” or other such issues. While looking into solutions to fix these problems I was introduced to the Map/Reduce functions of Mongo. Cutting down data sets to precisely the information needed, and essentially caching them in their own database table would speed this site up immensely and eliminate most, if not all of the errors I was seeing. As well as give us a chance to standardize the data contained across all the records. Unfortunately this site being designed in the way it was, we determined it would be easier to rebuild it in the near future than potentially break it while trying to fix errors on an otherwise working site.
What I learned from all of this is that NoSQL is a beautiful implementation when done correctly. If you meticulously standardize all inserted data, and build in reducing and caching routines and cron jobs from the beginning, you will see powerful performance from your site for a long, long time. That is, if you believe your site will reach such a size, across many collections (tables) to warrant such optimization. Otherwise I would suggest sticking with MySQL. If you have to later you can convert MySQL to NoSQL fairly easily, not so much the other way around.