Series link here.
As a reminder of what I’m talking about, here’s the picture from Mark:
What I’m going to be talking about is section #4 from Mark’s picture, and in particular, I’m going to be going over a number of concepts including:
- Eventual Consistency
- Event Handler per view model
- Persisting internal events and publishing external events inside a transaction
- Publishing architecture
In a previous post, I talked in detail about what Eventual Consistency is all about. I will briefly recap here.
At the heart of cqrs in this architecture is that you issue commands with a void return. The commands are sent through command handlers which call into the relevant aggregate root to fulfill the command, and which emit internal events as a result. These events are persisted in the event store as well as published for external consumption, which involves the read-only store used by queries being updated as a result. Subsequent queries can then see these updates.
What’s important to note here is that there is no single transaction that wraps this process end to end. Though there is a transaction involved, it does not extend from the issuance of the command to the update of the read-only store. There’s a temporal gap here: a command is issued and even when it is successful, there is no guarantee that the result is immediately available for querying.
From a certain traditional perspective, this seems to be a flaw. How do you know from the querying side when the update is available to be read? The answer is that, strictly speaking you don’t. In a well-built architecture, it will probably only be milliseconds later, but the key word here is “probably.”
However, what appears to be a flaw is in fact, when looked at from another perspective, par for the course for most applications, and is already acceptable to the business. As long as the read-only store is eventually consistent with the results of successful commands, this temporal gap is perfectly fine.
Why eventual consistency is acceptable
There are many examples that can explain why eventual consistency is acceptable, let me describe two of them, both centering around a standard e-Commerce store. And, to highlight the important points, let us assume that it isn’t using cqrs, but instead uses a common standard: the main database handles all transactions, and then replicates them to a reporting database using stock replication features found in major OLTP systems, such as Replication within Microsoft SQL Server.
Suppose the marketing department has created a new email campaign and wants to get a ‘real-time’ report of how well that campaign is working. The head of the marketing department generates the ‘real-time’ report and prints it out to carry into a meeting.
The first thing to note is that replication has a built-in temporal lag. Depending on how robust the replication infrastructure is, this lag might only be seconds behind the main database. In some instances, under heavy traffic, the lag could be longer. Regardless, there is already a lag built into the system.
Suppose for a moment that the report is generated off of the main database instead of a reporting database, and so there is no replication lag. The moment after the report is generated, it is, theoretically, out of date. It doesn’t capture the orders that are created and persisted after the report is generated. From the time after the report is generated to the time it is discussed and analyzed in the meeting, it is out of date. But this is okay, as the business can operate successfully regardless. It doesn’t actually need an up to the millisecond accurate report. It just needs one that is more or less up to date within a reasonable time frame.
Let’s look at it from a user that is browsing the e-Commerce store. A typical scenario is that only product that has available inventory is viewable on site (to prevent sales on products that can’t be back-ordered, for instance). The user browses to the category of interest, and picks a product that they are interested in. The web site generates a product detail page to give the user full information on that product, so that they can examine it and decide whether or not they wish to purchase it.
The moment after the product detail page is generated and displayed to the user, it is, theoretically, out of date. For popular items, there is absolutely no guarantee that the product won’t have all of its inventory gone by the time the user gets around to attempting to add the item to their shopping basket. Even if they can successfully add it to their shopping basket, there is absolutely no guarantee that it will still be available by the time they initiate the checkout process to purchase it.
What this highlights is the fact that eventual consistency is a fact of life/business that exists regardless of whether you build a system that is architecturally designed around the fact.
Why you might want to architecturally design with eventual consistency in mind
In a word, “scalability.”
When I was working with high volume e-Commerce stores involving properties like NASCAR and the NBA, it was a common theme that we wanted to limit the usage of our main database to taking people’s credit card numbers. This is why we used replication. We didn’t want to limit the scalability of our main database because it needed to run reports or because we needed to generate product detail pages off of it. We did those off of the replica (actually, it was a little more complicated than that, but you should get the idea). We obviously wanted our reports to be accurate, and we obviously wanted to only display product detail pages that had product we could sell, but at the end of the day, given the choice between limiting the number of orders we could process and limiting the number of times we displayed an ‘out of date’ product detail page, we chose the latter.
What’s important to note here is that, no matter what, you always have to make this choice. It isn’t as if adopting cqrs changes this. What cqrs can do is allow you to break down the process flow of your application in such a way as to optimize that flow (more on this below).
cqrs still insists on an important transactional component: the creation of internal events and publishing them externally
If your aggregate roots act on commands that they receive and produce internal events which are then persisted in the event store, you want to ensure that they are available externally for consumption. If a tree falls in a forest, no one cares if it makes a sound or not, but if an aggregate root produces an internal event, you want to make sure it makes a sound, and so failing to publish them externally should fail and throw an exception.
event handler per view model
Replication typically involves a (more or less) straight one for one update from your main database to your replica. The schemas are typically (more or less) identical. This can cause performance problems when querying the replica as you have to join across multiple tables to get the data that is required to be displayed in your views.
A well designed cqrs system will allow you to have one published event update one or more read-only store tables, so that your reporting queries or your UI queries are optimized to deliver exactly the information needed exactly when it is needed.
cqrs publishing architecture
I don’t think you can describe an ‘ultimate’ architecture without understanding the needs of the application in question. There is no one size fits all solution. However, there are some guidelines that I think are useful to keep in mind.
As mentioned above, your read-only store should be optimized to provide the exact information needed when it is needed, and so it should probably be significantly denormalized.
Just as SQL Server Replication can fail, but be restarted either from scratch or from the moment of failure, a cqrs architecture should allow for the same. Since you are storing all of the events that are produced internally, you should have a set of mechanisms that allows you to replay all of the events from scratch or from the moment of failure if it happens to fail. Because you don’t want to reprocess, for example, orders previously placed, so as to not trigger, e.g. credit card processing, you should have some mechanisms in place that allow you to identify an event that is republished externally as being replayed.
cqrs may be new in name, but it is definitely not new in terms of the concepts that underlie it. The flow of such a system, generically can be described thusly:
command triggered –> command handled –> aggregate root acts on the command it receives –> aggregate root publishes internally the event that results –> internal event is persisted in the event store –> internal event is then published externally –> external event is consumed by the read-only store –> read-only store is queried in an eventually consistent manner
Each of these pieces of the flow can be scaled separately. If you are using something like MSMQ, you can have totally separate queues along the way for different command handlers or event handlers, so that the highest traffic ones can be on different sets of hardware, for instance.
The flow that I just described makes the most sense (in my mind) when you have an application that applies DDD principles. The ways in which this can be explored are readily available through Google. Where I find it most interesting is when you think of cqrs in its purest/strictest form, where commands and queries are separate objects, but in applications that don’t necessarily involve aggregate roots. ETL scenarios are an obvious candidate, but there are many others.
I hope that I have been able to lay out the very basic details of cqrs for dummies, from a dummy. I will continue to update this series based on what I learn in the future.