Back to the fun stuff. Series link here.
As a reminder of what I’m talking about, here’s the picture from Mark:
What I’m going to be talking about is section #1 from Mark’s picture, and in particular, I’m going to going over a number of concepts including:
- The Reporting Store
- Eventual Consistency
- You don’t need your Domain
- Once you have one, why not more?
“Reporting” doesn’t just mean Reporting
One of the first things that I found difficult when learning about CQRS was the use of the term “Reporting.” Because I come from a SQL background, when I hear the term “Reporting” in IT contexts, I think about reports, e.g. last month’s sales report. Because of how the term is normally used, I’m not sure if there is a different word that should be used here. However, especially with all of the stuff I just wrote about traditional reporting, there are a couple of concepts that are starting to make more sense to me, since it turns out that some of the concepts of CQRS are simply things that we’ve been doing already, but refined.
The first thing to keep in mind is this:
- The Reporting Store (as separated from the Event Store, which is up in section #3 of the picture) is a logical concept, and as such does not have to be physically separated. Having said that, however, it probably will be.
Most Microsoft shops are already familiar with the idea of a Reporting Store, as they probably already have one in one form or another, be it a replicated version of their main database, or an OLAP store using SSAS (or perhaps some other tool). In a traditional shop, this is how traditional reporting tends to be done. You generate/run your reports off of the Reporting Store, which means you don’t tax your main database to do so.
There is nothing particularly cqrs-y about this, but once you accept that you have a separate Reporting Store, with enough ingenuity sparked by genuine need, you think about different ways of using this. Back in the dot.com heyday, a common problem involved how to generate/cache the storefront, so that you didn’t have to hit the database on every page. This is, obviously, still a common task, but there are a lot more ways that you can implement this now than there was then.
We chose to generate our site, and generate it off of the replica database. Basically, we would create the HTML once on special generation servers for every page, and then use MSMQ to push them out to the web farm (there were products that implemented this and caching, but some of them ran six figures IIRC). There’s nothing magic or special about this, of course, but what I’m hoping to convey is that the notion of a Reporting Store within CQRS isn’t magical or special either. In fact, if I had been able to tie my previous experience with replication with the idea of what a Reporting Store was, I think it would have been easier to learn.
Does this mean that using SQL Server Replication is the same thing as implementing CQRS? Of course not. For one thing, it really doesn’t matter to set up a litmus test of what counts as ‘really’ implementing CQRS, but if there was one, there are differences. As I’ll try to explain in a bit, the theory behind CQRS provides a general benefit, a theoretical construct, that is of value in itself, and goes behind a particular technical implementation like Replication.
Let me emphasize:
- I am *not* saying that SQL Server Replication is the same thing as CQRS. It is a technology that *could* be used as part of an implementation of CQRS, but it is a separate thing.
- Some of the *concepts* of CQRS are similar to concepts we have been using for quite a while, such as pulling data from a Reporting Store that is separate from the main database/Event Store.
When is it okay to use something like Replication? Finding an answer to that can provide use with the beneficial theoretical construct I just mentioned, and answering that question depends on understanding and applying the notion of Eventual Consistency.
It will get there eventually
If you go to a business user who works with, e.g., the rolling last 30 day sales report, and ask them if it is okay to use stale data, they probably won’t give you an affirmative answer. But if you ask them essentially the same question in a different way, they probably will.
Suppose they have a morning status meeting with the head of the Marketing Department to go over the rolling last 30 day sales report, and, as always happens, they print this out in multiple copies (the myth of the paperless office is why I don’t believe in any ‘revolutionary’ movements like NoSQL, but I digress) and take it to the meeting. Suppose replication failed at 10 PM EST the previous night, and so whatever sales might have occurred in that small timeframe are therefore missed. Does this invalidate the report?
The answer you will get is something along the lines of “Not really.” In most cases, you don’t have enough sales in that small timeframe overnight so it doesn’t matter that much, but “let me know when replication has caught up so that I can regenerate the report, just to be sure.”
That is the gist of Eventual Consistency. It is acceptable for there to be a gap between the ‘real time’ data, and the data that is viewed in some other context. Once you find out that it is acceptable for there to be a gap, then the next step is to find out how big of a gap is acceptable.
Suppose the business user is looking at the current day sales report. If he is looking at it at 2 PM EST, and replication has been down for 4 hours, that might then be unacceptable. ‘Eventual’ doesn’t mean that next week is acceptable. But suppose the business user prints out his report for his daily afternoon status meeting. Between the time he prints out his report and when it is viewed by the head of Marketing, there may have been additional sales. That is acceptable. It is accepted that from the time the report is printed and when it is looked at 15 minutes later, it might be slightly outdated.
Once you have established that it is okay for there to be a gap between the actual “this is the value at this exact moment” data and what it viewed by the end ‘user’ (the ‘user’ could actually be another system), then you can start to think of ways of using your Reporting store for other things.
Whether these ways are valid will depend on the context. What does your business do, who are your end users, what data will they commonly be seeing, and how will they be acting on it?
In my mind, CQRS offers a theoretical construct to help us here:
- Anything that doesn’t involve a command is a prime candidate for acceptable Eventual Consistency. Anything that involves a command may be a candidate, if the result of the command doesn’t need to be immediately evident.
Now, this should be considered only as a starting point, and it certainly doesn’t answer the contextual questions that it needs to answer, but it can help.
For instance, contrast a desired difference between an order review page and a product list page. When a customer presses submit on the order review page, it is probably the case that you don’t want to immediately show an order confirmation page without knowing the order went through (though you *might* want to). On the other hand, when a customer goes to a product list page, it probably doesn’t matter if the page is a few minutes old (though it *might* matter).
The points I want to emphasize here include:
- The particular examples I’ve given don’t really matter, instead examine the context.
- You don’t absolutely *have* to implement Eventual Consistency to consider having a separate Reporting Store to be beneficial.
This second point will become more evident when talking about section #4) of Mark’s picture (though I will touch on some of them below), but a brief note is important here. In a thread on the DDD mailing list, Greg has emphasized that you should start off without Eventual Consistency, and then work your way towards it as the need arises. This is common-sense, and could be considered a simple application of YAGNI here (though YAGNI is unfortunately too often just used as an excuse). Once you appreciate the concept of Eventual Consistency, it’s an easy temptation to think of all the places where you could possibly implement it without a clear understanding of the drawbacks (and there are always drawbacks).
When you query your Reporting Store, you can ignore your domain
More specifically, if you need to query your reporting store, you don’t need to go through your domain model, and as a matter of fact, it would probably be a bad idea to do so. To paraphrase a comment from Udi, why should data come across 5 layers through 3 model transformations when all you need to do is populate a screen?
Typically, our end user screens will contain information from multiple entities. A typical pattern is to find the parent entity you need, load all of the relevant child entities, and then pass that entity into a mapper, which then produces a DTO with a flattened representation, which is then passed back to your screen and bound to it somehow.
Skip it. Query your reporting store and get a DTO with that flattened representation immediately.
Does this mean you should start rooting around in your code and eliminate any reference you have to AutoMapper? Of course not. But once you start to think of how you can skip going through your Domain Model for queries, some other options open up:
- Put a stored procedure on top of your Reporting Store to return a ViewModel per query.
- Transform the data from your main database/Event Store to your Reporting Store so that you have a table per ViewModel that you can do a simple select from.
- Query off of an OLAP store to do the same.
And so on and so forth. The possibilities aren’t endless, and shouldn’t be done without thought, but it does open up a different avenue for you.
Can you have more than one Reporting Store?
Once you start to think about how to use a Reporting Store with an eye towards Eventual Consistency, even more possibilities open up.
To go back to the dot.com example I gave previously, we used MSMQ to push individual page updates across to our entire web farm. It was, given the day and our abilities, a bit crude. At times, a particular server might process individual pages more slowly than others. From an operational perspective, it worked well enough that we lived with it. A monitoring server could notice that a particular web server was slow, and pull it out of active duty. But for the most part, on most days, almost any updated page would hit each server at about the same time.
To think of a possible CQRS implementation of the same idea, why not have a Reporting Store on each web server that subscribed to events being published out of your domain? Going back to the simplistic product list page example I mentioned previously, imagine having a SQL Server Express instance on each web server which could process those events. If it is acceptable in the context of your environment to have Eventual Consistency here, and if you have a robust enough environment to be able to process these events evenly (and if it was robust enough for 1998, then surely, it might be today with more advanced service bus technology), then this opens up an avenue for immediate horizontal scalability of your ‘query-facing’ infrastructure. As your traffic increases, add another web server with its own Reporting Store. If you have a limited number of processes that utilize Eventual Consistency (think back to Greg’s emphasis of starting slow here), then you have a limited number of events that are subscribed to by a larger and larger numbers of machines.
If you think about it, this is a different way of achieving caching that you might already be doing today, but from a central architectural perspective. Once again, I don’t think anyone who has either used or considered CQRS is suggesting you start to rip and replace memcached or Velocity here. But, you might think about ways to fit memcached or Velocity into CQRS because it offers a general scalable architectural set of patterns.
When and why might you not want to do any of this?
I personally find the notion of Eventual Consistency, and that of the a separate Query layer, that skips your Domain model to be compelling. It ties together concepts that I have already been familiar with into a general architectural model. Having said that, those concepts that I was already familiar with had drawbacks, and CQRS doesn’t magically solve them.
From previous posts that I’ve made, a familiar reader will know that I have pretty extensive experience with Operations, and all that might entail. In particular, I believe pretty strongly in what I’m going to vaguely call here ‘planning for expected catastrophe.’
Starting with SQL Server Replication as a base technology, it sometimes fails. Sometimes it fails easily (an agent stops processing transactions, which merely requires a ‘right-click restart’, and might only take a few minutes to fix if your monitoring is good), and sometimes it fails hard (the entire Replica has become invalid, and must be recreated from scratch, which can take hours to accomplish).
Even though the technology of today is light-years ahead of what we had even 10 years ago, planning for ‘failing hard’ is still something that I think has to be central to planning software. If your Reporting Store suddenly is unavailable, what can you do? What we did with our relatively crude system was build in a switch (more or less) that let us immediately go back to processing off of our main database. We would still generate the site if we could, but even there, we had an emergency ‘oh, good Lord” switch that would allow our site to skip generation altogether, and hope we had enough hardware to weather the load until we could fix the Reporting Store. Obviously, if both the Reporting Store and the main database went down and our off-site log shipping failed….well, at that point we might be polishing off resumes anyway. Some catastrophes can’t be recovered from.
Another more basic reason why you might not want to do any of this is because it does require a certain amount of sophistication and a probably larger amount of faith that it will work. I don’t think a lot of advanced developers will be turned off here, and a good case study of how this works out in actual practice is in the story of MySpace. The architecture there was built under certain assumptions, and then once the limit was hit, the architecture was rebuilt. Something like CQRS, in my opinion, gives you a built-in scalability potential, but it isn’t a panacea.
Even if you choose to embrace Eventual Consistency and building a Query layer, there is another thing to keep in mind. Look at the picture again:
Pay attention to that little line from the thin data layer in section #1 that points back to the services box in section #4. When push comes to shove, sometimes it is okay to default back to calling into your Domain. If you took Greg’s cautionary message to heart, you could start off by building a Query layer that does almost the opposite of what I’ve been describing. All queries ignore the Reporting Store unless and until the Reporting Store ‘proves itself’ within your context, and then you start pointing them there accordingly. Given my experience, you should probably never need to go to this extreme, but it is there for you if you need it.
Why you should consider doing at least some of this anyway
Suppose you have an application that you hope needs to scale, but you don’t know that you need it today. What we did in the past, and what e.g. MySpace did, was build to the scalability you knew you needed today, and then when you hit that limit, you punted. Though I don’t think I’ve done as good a job as I could have, by any stretch of the imagination, I think that CQRS offers an architecture that lets you build it to match the scalability you have today, and then easily expand it. Your query layer can hit a Reporting Store that, as implemented, simply is your main database/Event Store. You can code your code and architect your architecture as if all of these differences were physically separate, since you only need to worry about it at the logical level.
At a fundamental level, building a CQRS-style query layer allows you to logically segment your code between queries and commands. Which leads to the next topic, the command layer, the topic of the next post in this series.