Suppose you design a system that is chock full of interfaces, specifically things like some version of IRepository, where you have the ability to change out your backing store/database more easily.
A common criticism of this sort of design is that it is unrealistic to think you actually will change your main backing store/database in a production system. My own experience is that while it does happen (a current client project I am working on involves changing the backing database for a set of applications from SQL Server to Oracle, for instance), it doesn’t happen often, and you often times end up changing your interfaces anyway.
However, the times are a-changing, and the number of situations where you might want to design with this in mind is increasing. With a central project that I am working on, two obvious ‘innovations’ (they aren’t exactly new of course) involve NoSQL and what I lovingly like to call, “all that Cloud Computing shit.” My default implementation uses traditional RDBMS (SQL Server mainly, I’m not really qualified to say much about Oracle…a little, but not much), but I can very easily see the need to have this project remain largely the same, but need to use something like Azure.
Because of this, I’ve been very interested in RavenDB, Ayende Rahien’s long in the making and recently released to RTM document database. It is a .NET based solution written (with help) by someone who knows a bit about .NET and writing software (to say the least), and appears to fit a need quite nicely. So, I was very interested in looking at using it ‘in anger.’
And then I saw this…
Has Ayende lost his mind?
I happened to come across a post by Rob Ashton and found this gem:
“Let’s say there are 100,000 books in the document store and we invoke the following code:
1: Book books = documentSession.Query<Book>()
How many books do you expect for there to be in that collection?…..
Thankfully RavenDB safeguards against this kind of sloppy code and automatically limits the number of results returned back. Both the .NET client and server have this behaviour built into them and this means you’ll only get (at the moment), 128 objects coming back for the above query….
Currently the server itself will only let you page 1024 objects at one time, so you can’t be lazy and make a call to Take(100000) because it won’t let you.”
I was immediately appalled. Why?
Why did I think Ayende had lost his mind?
My immediate reaction was that this was akin to breaking “Select *”. It’s a query engine. If I issue a query, I expect the query to do exactly what I expect it to do. Alt.NET is dead (long live Alt.NET) but there was this notion that when doing Alt.NET type stuff, it is akin to running with scissors, and it seemed to me that Ayende was abandoning that, and not only abandoning it, but tying a user’s shoelaces together. If I want to query to return 100,000 results, then that is what I want (since a lot of work that I do is around ETL type stuff, I often query large result sets…and yes, an RDBMS is different from a document database). Don’t magically limit what I want to do with some magical number.
Think of a trade management system for stock trades. Why would I want to limit some processing to this magical number? Yes, it is not a generally good idea in such a situation to pull back 100,000 results, but let *me* decide that. Why cripple the query engine in its core? Isn’t it up to me to decide?
Rob suggested that I bring it to the google group, which I did.
In my defense, I intended my question about this to be half-serious/half-humorous, but I don’t use emoticons and I wasn’t really paying a lot of attention, so it didn’t quite come across that way. You can check the discussion here.
Paraphrasing roughly, Ayende’s response was basically, “Okay, dude, I did it the way that I did it, if you want to change how it might work, submit a patch already, whiny ‘please stop and smell the flowers’ complaining guy.”
But a patch, in my mind at the time, was treating the symptom, not the disease, the disease being that Ayende was crippling basic functionality of a query engine. RavenDB (in my mind) was supposed to be an Enterprise-level product (whatever that means) and it seemed to be designed to prevent bad developers from doing bad things, causing friction for the rest of us (who perhaps mistakenly think of themselves as not being bad developers).
And a patch seemed sub-optimal for other reasons. The patch would require explicitly implementing a setting. What if a new version came out and someone forgot that they needed to explicitly re-set the setting? What if, at 3AM when that joyful production issue came in as they often do, no one knew or remembered that this hard-coded ‘cripple’ value was in the core code base? Sure, you can create an integration test for this, but do you really know the test will be run?
digression….I hate git. Why? For many reasons. The list of things (and yes, I actually maintain a list) of things I want to learn is in the dozens. I’m old and slow and command line tools are dangerous. And TortoiseGit doesn’t work (randomly, try to clone using it, it hangs….why? Who knows). And I spent a bit of time getting up to speed on Subversion, only to find all of the kool kidz were dumping subversion to go to git. And then I read some kool kid who posted about why git sucked and we should all use mercurial. So what do I do? Spend my very limited resources learning git, only to find out that that one kook kid was right, and I’m going to have to change again? But I digress.
So, I took the conversation off-line to ask my “Git Clueless” questions. I ended up having to download the source from GitHub, and then email Ayende the patch. Along the way, we talked about the design philosophy, and he kept coming up with similar cases that seemed totally irrelevant to me (TFS will limit query results….so what, TFS is an application on top of the query engine…Azure will throw an error if you try to return too many results and it uses too much memory…so what, SQL Server has a setting for that as well, that happens if a developer does bad things, it still isn’t crippling the query engine…etc. etc. etc.).
We agreed to talk about it on Skype. Figuring out the different time zones took a little bit of time (I actually spent a minute trying to figure out if GMT was zero-based…LOL).
Finally, I took a minute to think about a basic question….Ayende is a smart dude, what led him to do this stuff?
I’ll paraphrase all of this, but since it was a nice, friendly conversation, I don’t think I’m mis-characterizing anything. As always, Ayende is one of the most approachable persons to talk with.
Keep in mind that RavenDB is a product. Also keep in mind that Ayende has a ton of experience through his work with NHibernate and his profiler tools, and dealing with client experiences with those things.
His position is that “Sure, bad developers do bad things. But, so do good developers.” His experience has been that multiple ‘problems’ have turned out to be simple problems with developers not limiting queries, and that by putting in these hard-coded limits, it prevents those things from occurring. And since he accepts patches to allow you to explicitly override these hard-coded limits, an end-user has the ability to take control.
So, has Ayende really lost his mind?
From a purist, idealistic point of view, I still cringe at this hard-coded ‘crippling’ of a query engine. From a practical standpoint, you have to consider (among other things) some fundamental truths about software development. No matter how true and good your development practices are, you will suffer production issues (and these are really all that matter, in the end…in production, does your software do what it is supposed to do?). Given that fact, in this situation, you could have ‘mirror’ issues:
- RavenDB doesn’t restrict queries and developers don’t properly analyze their queries, and so everything works fine in testing (where the doc db equivalent of “select *” returns a small enough set of results to be workable), but then 3 months into production, that same select chokes because of memory issues.
- RavenDB does restrict queries and production issues occur because this restriction is magically, forgotten, or whatever.
From his experience, Ayende chose to limit query results. As an idealistic end user, I don’t like this, but I do understand better why he did it the way that he did.
I did point out to him that there is nothing in the official documentation that specifies where he puts these limits in, which he recognized, so I hope that gets updated at some point.
And in the end, he did readily accept a patch that allows me to use RavenDB in the manner I am most familiar with.
Would I recommend RavenDB?
Whether RavenDB actually fits any particular situation is up to any particular person to decide, but after having talked with Ayende, I still plan on giving RavenDB a serious run for its money as the NoSQL variant of a major project I’m working on.
The licensing is still….a work in progress. At one point, an Enterprise license was priced at $8000. It is now something like an OEM license for $999 a year, plus a goat (okay, I made that part up).
If you are looking at a NoSQL option in the .NET space, take a look and decide for yourself. It looks pretty good to me.