12 Feb
2010

Is NoSQL Finally Going Mainstream?

Category:UncategorizedTag: , , :

Its been a while since I enjoyed my adventures with CouchDB. I sure wish I could have some extra time to pick this up again, but getting some sleep at night is nice too once in a while. I noticed that OO databases and document/key-value stores are getting more and more traction lately and I must say that its about time.

Rob Conery hits the nail right on the head in his post on Reporting in NoSQL.

Put as gently as I can ? relational systems are an answer to a problem that we faced 30 years ago. What you?re doing now is nothing other than compensating for a lack of imagination from the platform developers. Think about it ? we code using Object Oriented approaches, we store those objects in a relational system.

What we should all learn in this industry is to stop assuming that a relational database is the default option for storing the data of every solution we build. This is what we have been doing for a long time and its pure madness, plain and simple.

RDBMS don?t fit for holding your application?s data, and they don?t fit for reporting. They?re a solution for a problem that doesn?t exist anymore. Time to kick them to the curb.

The most typical setup you see is a single relational database that is used for both storing the data of an application as well as reporting from this data. The relational schema usually sits between normalized and denormalized tables, which means having a compromise for both needs. You can get away with this for small to medium-sized applications, but when you start working on mission-critical solutions with higher volumes, this compromise isn?t going to cut it anymore. This is why Greg Young, Udi Dahan and Mark Nijhof amongst others are advocating command query separation. For these kind of solutions, you want to have the best option for handling commands, which could be an OO database or a document/key-value store (with or without event-sourcing) and for reporting you?d want the best option available as well like an OLAP system. What I?m describing here is just the elevator pitch, so if you want to learn more about this then do checkout the resources that these gentlemen mentioned above have already put on their blogs.

I hope that one day we realize that a relational database was just a means for optimizing file storage, which is hardly a need anymore these days. We shouldn?t be struggling with how to solve the impedance mismatch between relational databases and OO programming in any kind of application. The one thing we should care about is how to provide solid and clean solutions to our businesses without having to worry about tables and those zealots with their holy database schemas. Just store the objects you want and worry about other things like the so-called ?ilities? and being able to respond to business needs in a timely manner. 

23 thoughts on “Is NoSQL Finally Going Mainstream?

  1. Hello Jan, the latest Herding Code touches this topic as well. The shift which has to be made is to start questioning whether sth is good/proper to use in the current scenario. The same can be applied in terms of programming languages as well.

  2. How do you get from:

    “What we should all learn in this industry is to stop assuming that a relational database is the default option for storing the data of every solution we build.”

    which is true, to:

    “I hope that one day we realize that a relational database was just a means for optimizing file storage, which is hardly a need anymore these days”

    which is false?

    More importantly (since that was somewhat of a rhetorical question), *how* do you get the data from, say, a CouchDB instance to an OLAP system? It’s nice to say it is possible, but I don’t see that Greg, Udi or Mark have really talked in detail about that.

    Getting data from a relational database to an OLAP system is hard enough as it is.

    I hope that NoSQL doesn’t get turned into a religious movement. It should be another tool in the toolkit. Okay, a pretty big tool, and a cool one too, but still. Relational Databases will be around for decades at least.

  3. I hope the NoSQL movement won’t take the same route as Agile and forget all what we have learned in the past. Even though there are cases where relational databases cause alot of friction, this is far from the majority of the cases. I am not sure Eric hits the mark when he says that in 99,9% of the cases RDBMS is the best route, but he definitely has a point (http://www.eflorenzano.com/blog/post/my-thoughts-nosql/)

    Also I don’t think it is the size of the application that often mandates CQRS, it’s the complexity of the domain. Remember before you get drunk on the CQRS juice that most applications do not fit with this approach. It is often overkill.

    It is easy to get carried away by new great ideas, but we should not forget what we have learned in the past. In the end we should all do like Aristotle and try to find a middle ground.

  4. @jdn I heard and read plenty of stuff from Greg, Udi and Mark on how to get a data flow between a CouchDB instance and an OLAP system. Its definitely not a batch process that is kicked off every 15 minutes :-).

    @BjartN I didn’t meant to say that CQRS depends on the size of the application. I’d rather like to think it depends on the lifetime of the application, and that its not appropriate for apps that don’t have a long lifespan.

  5. What I’ve heard and read is that you publish events which your reporting store subscribes to.

    What I’m interested in is seeing a detailed implementation of it using CouchDB and an OLAP system. Like code and stuff. I don’t remember seeing anything specific.

  6. And about batching:

    “Greg actually mentioned a really nice way of caching these SQL statements, he would batch them in a single batch and execute the batch if it gets older then x seconds, or (and this is the interesting part) whenever a read request came in. So when a read request comes in this SQL statement is appended to the batch and the whole thing is executed, ensuring that the read request will always have the latest data available to that part of the system.”

    From Mark’s CQRS a la Greg Young post. 😉

  7. @jdn
    When I talked about batch, I actually meant batch apps that run every 15 minutes or so for pumping data from one data store to another. This is not the same as using events. The latency with these kind of batch apps is much higher.

  8. “What we should all learn in this industry is to stop assuming that a relational database is the default option for storing the data of every solution we build.”

    While people with common sense never consider a RDBMS in this way, it seems to me that you NoSQL guys are doing the same error on the opposite side. Given that not every application in the next decades will be like Facebook, Joomla or SharePoint, i think that transactional systems will use RDBMS for quite some time… and please, don’t talk about OLAP and Reporting… this are adult’s playground…

  9. Its the whole reporting / querying issue that gets me.

    Folks like Udi and Greg have posted ideas on CQRS and how to hook up Domain Events, Buses or something else to handle the work of moving data from one system to another but all seem to require an inordinate amount of plumbing to get it to work. I am eagerly awaiting Rob Conery’s part two post in his NoSql post http://blog.wekeroad.com/2010/02/06/nosql-a-practical-approach-part-1 as I am hoping he can show how this can be handled.

  10. @Silvano You don’t have to work at Facebook, Amazon, etc. … in order to just save an object into a persistence without having to deal with the dreadful impedance mismatch. We are paid to provide solutions, not for dealing with SQL, tables and DBA folks. Just persist the object and go deal with the real issues.

    @JoeYoung The whole point of NServiceBus is to have a bus with as little plumbing as possible. Looking forward to the next posts from Rob as well to see how he deals with things. Did you take a look at the code of Mark’s sample application? It uses a simple in-memory bus that you can use for small apps as well.

  11. @Jan, sorry but in the real world data usually survives the apps. You’re not paid to just persist an object somewere… you need to create data models that will be used in the future from different applications, using different programming platforms… maybe will be used in completely different ways with respect to how they’ve been collected… today, and for a lot of time forward, relational data models are the best way to achieve this goal. I know, it’s hard to understand for a “pure” developer, but life it’s a little bit more complicated than “persist an object…” :))

  12. “Mainstream” is a stretch, but when I hear people on the street complaining about how their SQL Database is a nightmare to deal with (and they’re not doing complex things with it), it’s certainly progress.

    Our startup, Drawn To Scale, is building a platform around the points you’ve made: we’re solving different problems today than we did 30 years ago when the RDBMS was designed. It turns out that when you build something from the ground up to handle what we *really* do with data, it becomes scalable, fast, and easy to use 🙂

  13. @Silvano If it’s just about the data, then why don’t we just hand-out Excel? The data is associated with the app, not the other way around. Having multiple apps on the the same database smells like shared database:

    https://elegantcode.com/2009/03/28/about-a-shared-database/

    I never ever saw this work properly. And to quote Jeremy D. Miller on this

    “With very few exceptions, I?d say
    that at this point that if you?re writing ADO.Net code or SQL by hand, you?re stealing money from your employer.”

  14. *sigh*

    Yet another developer who doesn’t understand the difference between the Relational Model and almost-Relational RDBMSes.

    Coupled with the common trait in developers that they always get “props” for working out new tech, and never for fully understanding old tech, it’s no wonder that this newest of database strawmen will die slowly, like XML and OO databases before it.

    Let’s keep it simple: the Relational Model is nothing more than a logical way to understand and therefore access/validate a data model. It has nothing to do with files, disks, tables, key-value stores, etc. That’s the physical implementation thereof.

    There is no such thing as semi- or unstructured data. There is always a structure. The question is how detailed it needs to be. I could create a “relational” database with one “table” and two attributes: Key and Value. I shove everything in it and call it ArmchairDB. I think the NoSQL people, like almost all developers, conflate the physical layer with the logical one all too often. I also find it funny when some NoSQL databases offer some sort of “logical querying language” that eventually grows up into some knock-off of SQL.

    Lastly, am I the only guy who’s never struggled with the “impedance mismatch”? Or, said another way, doesn’t see it as such, but rather as the semantic mismatch that inevitably exists between any two systems built with a different vision? In fact, the Relational Model, if well implemented, does have an easy solution for this so-called mismatch. If one offered proper domain support, RDBMSes would be even easier to use.

  15. I don’t suppose you could man up enough to make your point without being personally insulting to a man who has done so much for the craft of software development our community in general?

    Oh, we won’t remove your comment, because it should stand in posterity to your rudeness. You should also note at this point that whatever technical message you were trying to make is now lost in this little drama.

    Good job, Mr. Professional.

  16. OODB – You order a car and it arrives outside your house.

    RDB – You order a car, and its posted piece by piece (field) through your letter box where it has to be reassembled.

  17. @David:

    Insulting? My apologies. It was not intended to be personally insulting.

    But, if you want to take the aggressive approach that “relational database was just a means for optimizing file storage”, then you best be prepared for like responses *on that topic*, because I will stand by the assertion that that is completely and utterly wrong.

    RM, and its derivative language SQL, is very powerful. Good for all scenarios? Absolutely not, but still one of the most powerful tools in any dev’s arsenal. I guess if that is perceived as insulting to Jan to find some*thing* they said or believe wrong, then I am at a loss.

    I could just as easily say “NoSQL databases answer the difficulty developers have with designing a suitable structure to data by letting anarchy reign and allowing them to store anything in any unstructured way”, but that wouldn’t be true, would it?

    I meant no disrespect to Jan as a person. I’m sure that if we met for a beer, we’d all have a good time, although you probably now would refuse. 🙂 I shouldn’t have let my frustrations with the rampant NoSQL hype color my post.

    @Nick:

    Follow up to your analogy:

    RDB – After you assemble your car, which the dealer could have assembled for you before shipping, you then order a missing piston. You get the missing piston.

    OODB – You want to store just a piston in the warehouse, you can’t. You have to create a fake car around it. Once you store the piston, you order a piston, and you get a car, and instructions on how to trace down the part chart to get to the piston.

    I’m sure we can go on and on…

  18. @PT
    Did you ever take a serious look into a specific NoSQL data store (e.g. CouchDB)? There is no ‘data model’ that constrains the documents one stores. Map-reduce isn’t ‘knock-off of SQL’ either.

    I agree that you can use an RDBMS and use a key/value table and it has been done, but that somewhat mitigates the point of buying Oracle and SQL isn’t going to be much of help either.

    When I develop an app, I start with the domain model because that’s where the interesting functionality of the business domain lives. This is the place where I provide a model that corresponds with the feedback of the domain experts. I don’t care about a data model or database because it doesn’t interest the business (nor it shouldn’t), so it doesn’t interest me either. When I’m done with a fully fletched model, the shape of the domain is almost always different from the tables layed out by the DB folks. It just need to be able to store these things and move on. NHibernate reduces some of the pain, but the mismatch is still there and a NoSQL data store reduces it even further.

    I want to conclude by saying that the largest apps/websites in the world all run from a NoSQL DB so why wouldn’t we learn, pick this up and move on?

    PS: I’m sure you meant no disrespect. Discussions like these can get passionate and intense. Been there 🙂

  19. Hmm. So many interesting topics flying at once in the conversation that it’s hard to keep it brief and focused! 🙂

    The root of the problem I have is getting devs to differentiate the logical from the physical in the mixed-up world of SQL, RDBMSes, NoSQL, CAP, BASE, and other such complex beasts. There’s so much conflating going on. And, the movement’s extremely poor choice of a name only compounds the issue.

    First point: You will not get an argument from me that NoSQL (ugh!) databases currently address the physical, “CAP Theorem” problem better than ol’skool RDBMSes. Many RDBMSes (esp. startup-friendly, popular DBs like MySQL) weren’t built to address distributed petabytes of simple structure data with limited consistency needs but overwhelming partitioning ones.

    That doesn’t mean that:
    a) all systems have those kinds of FBish needs. In fact, FB and its ilk lie at the far end of the bell curve, and no, the intertubes won’t change that. 🙂

    b) that RDBMSes can’t be made adaptable to these scenarios. (ex: Drizzle)

    c) that we need to push down those CAP needs on to simpler web applications (which I am not sure if that is what you mean by “the largest apps/websites in the world all run from a NoSQL DB so why wouldn’t we learn, pick this up and move on”).

    Second point: SQL, and the Relational Model that fathers it, is a different topic. I am not sure why one would think that the Relational Model is not layerable[sic] over distributed key-value stores, but I am open to being beat on the head with a good reason! 🙂 I am far from a guru in the world of developing NoSQL implementations.

    PS: Your comment on your practices of the domain as a starting point is interesting. I naturally gravitate to the exact opposite. I start from both the absolute “bottom” and absolute “top” of a typical web app, namely I use ORM (Object-Role Modeling) to capture the conceptual structure of the data (and many of its constraints), and use cases to capture the behaviors in the system, and from there I use the domain model as the Play-Doh layer to merge those two worlds.

  20. Many interesting topics indeed 🙂 The NoSQL data store I mentioned (CouchDB) is a document DB (not a distributed key/value store) that stores JSON objects. Not sure whether it is viable to layer an relational model on top of that. There are other NoSQL DB’s like MongoDB (which is also a document DB) that I think provide some sort of schema, but I’m not entirely sure because I haven’t looked into that product (yet) and don’t know the specifics of that.

    Sure, it is possible to use something like MySQL and use that for high-traffic sites. FriendFeed uses it this way, but not with traditional table schemas. You can read this article to learn more about the specifics (if you haven’t done already) -> http://bret.appspot.com/entry/how-friendfeed-uses-mysql

    The approach I mentioned about starting with the domain is called Domain-Driven Design (DDD for short). If you want to learn more about this approach, I can highly recommend the book DDD – Tackling Complexity in the Heart of Software (http://www.amazon.com/Domain-Driven-Design-Tackling-Complexity-Software/dp/0321125215/ref=sr_1_1?ie=UTF8&s=books&qid=1266439284&sr=8-1)

Comments are closed.