Series link here.
As a reminder of what I’m talking about, here’s the picture from Mark:
What I’m going to be talking about is section #3 from Mark’s picture, and in particular, I’m going to be going over a number of concepts including:
- Domain events
- Write-only event store, or “Accountants don’t use erasers”
- Compensating actions
- Automatic audit log
- Replaying Events
- Data mining
- Event Sourcing
The general nature of Domain Events
As we saw from the previous discussion about the command layer, task specific commands are created and validated (in the sense of passing input validation, e.g., email address must actually be an email address, last name cannot be longer than 25 characters and so forth) which are then passed through the command handler and enter the domain. To use the previous example, the relevant domain entity will receive a ReduceCustomerCreditRatingDueToLatePayment command through the appropriate method (note that the relevant domain entity will depend on the domain, but it is easy enough to imagine a domain model where it is the Customer entity. Note also that I am not talking about where the entity comes from, which I will touch on when I talk about Event Sourcing).
One thing that is important to keep in mind is that the domain entity can either accept or reject a command (depending on the context), but that a Domain Event is raised either way, and stored in the Event Store. Commands are generally accepted or rejected depending on whether they pass what you might call ‘business’ validation. What are the rules of the domain? Let us suppose that customers can have a status level of the stereotypical Platinum/Gold/Silver, etc. type, and that our domain has a rule that a customer’s credit rating will not be reduced due to one late payment if they are of gold status or above. In this instance, the command would be rejected if it was received by a Customer that was of at least Gold status and had no previous late payments, otherwise, it would be accepted.
In either case, a Domain Event would be produced (CustomerCreditRatingReducedEvent and CustomerCreditRatingReductionRejectedEvent could be names for these events; in general, you want your Domain Events to be named such that they are understandable by your business users, so they should be involved in naming them) and would, among other things, be stored in the Event Store.
Why would you want to store events produced by a rejected command? Your domain isn’t changed by a rejected command, the data isn’t affected, so why go through the trouble? There are a number of reasons why it is important to do this, but think of even rejected commands as events that have significance to the business. It’s a tricky concept, but try to picture it this way: when moving to cqrs and moving to the use of commands, one of the things you are trying to focus on is the importance or the significance of things that are happening and the specific tasks that people are trying to accomplish. Thought of in this light, it can be very important to know that these things are being rejected.
To keep with the example, it might very well be important to the business to know that a certain percentage of late payments are not affecting credit ratings. If you are in an expensive niche market, it might be very valuable business knowledge to know that the rate of late payments amongst your elite customers is increasing. The business might want to take action if it knows this. You might want to know if your standards for Gold status are too lenient. There are all sorts of scenarios that you might want to know about, and knowing that commands are being rejected can be the key to this knowledge.
I will touch on this again when I talk about Auditing and Data Mining, but for now, let’s talk about the Event Store.
The Event Store, or why accountants don’t use erasers
The Event Store, as you might have guessed, records the history of the events that occur in your domain. There is no technical restriction on how these events are persisted or with what technology. You could use an RDBMS such as SQL Server, you could use a Document Database like RavenDB, you could use an ODBMS like db40, you could use just about anything, as long as you can store the events in a clear fashion (when talking about Event Sourcing, it may turn out that some technologies seem better than others, but we’ll get to that). You want to be able to trace the history of events as they occur to any entity in your domain, so somehow recording the order is important (timestamps are generally good enough), but how this is implemented is, well, an implementation detail.
There’s one other very important aspect of the Event Store and that is that it is write only. Events come in and are recorded, but once they are recorded, they aren’t changed, and they aren’t deleted either. The idea, as very well explained by Pat Helland in his post “Accountants Don’t User Erasers”, is that you continually record events as they happen, and you want to keep this record intact. You might need to make a correction to a previous event, but you don’t do that by changing the record of the previous event, but by producing a new event altogether.
Again, there are many reasons why you would want to do this. A very basic one is that by doing so, you have a historical picture of the state of your domain as it appeared at any given time. You know what happened in your domain and when. Another good reason involves the notion of Compensating Actions.
I find it easier to understand the notion of Compensating Actions when using a standard example involving inventory allocation and order placement.
Suppose I’m a customer on an e-Commerce site and I order the limited edition Penguins Winter Classic Jersey, of which there are only 300. A command is issued that enters the domain, where it is processed and produces an event that is stored in the event store.
Due to a tragic fire at the vendor warehouse, 200 of the jerseys are lost. Because of this, up to 200 orders will no longer be able to be fulfilled (for the sake of argument, let’s pretend you can’t resource them from another vendor) and so will need to be handled.
A bad company will just cancel the orders, while a better company will try to mollify the miffed customers, but in either case, you don’t want to delete the orders. You don’t want to delete the original event, or change it. Instead, you will need to perform some other event, a Compensating Action, that might have the end result of your domain being in the same state as if you just deleted or changed the original event.
You want to do this with everything in your domain.
It might be helpful to think of the contrast between traditional ACID semantics in a relational database with the idea of long-running transactions. When performing a typical unit of work in a good old RDBMS, there’s the idea that all of the actions succeed or fail as a unit. If one of them fails, then they all should fail, and the end result is that the data within the database looks as if the attempt never took place (it will actually be recorded in the transaction log normally). While this is going on, tables are locked (among other things).
Obviously, when you have a unit of work that might take a number of days (placing and fulfilling an order, approving a mortgage, etc.), you don’t want to lock tables during that time, so in such long running transactions you manage failures differently, and what you do in cqrs with the Event Store is to treat every unit of work as if it were a long running transaction, no matter the time span involved. This is how businesses typically work.
An advantage of this way of processing and storing events is that you get an automatic audit log.
Anyone who has worked as a DBA has probably had to deal with trying to determine the cause of transaction failures. An ACID style unit of work was attempted, and something failed, so the transaction rolled back. With certain tools, you can read the transaction log of your RDBMS, but transaction logs are transient to a certain extent, as they eventually get truncated. What happens when you need to know why a transaction failed a month ago?
Your application can or will typically log a lot of information as it is happening (LogHelper.LogInformation anyone?), but this is additional code that you need to write, and doesn’t necessarily cover every method. Just last week, I needed to add additional logging code to a certain area of an application because I knew something was failing, but I didn’t know why (I still don’t know why as a matter of fact, it’s under investigation).
If *every* important action in your application is logged as an event, then you automatically have an audit log that can tell you what, when and why things happened (obviously, if you have any sort of system failure that prevents events from being produced or logged, having an Event Store can’t help you here, but you can’t solve everything).
Even better, if you design your applications correctly, you can replay them.
Suppose that you start from scratch with an Event Store. The Event Store will log the creation of your domain entities, and then every event that either directly affected them and caused them to change, or logged the rejection of a command that wanted to change them. With this, you have (conceptually at least) the ability to relive the life of your entities at every step.
Outside of this series, I talked about a bug that I was trying to troubleshoot, where I made a fundamental mistake of misreading what the logging system was telling me. Having an Event Store doesn’t automatically prevent an end user (me, in this case) from misreading information, PEBKAC still rules. But, having the entire history of the life of your domain entities can help tremendously in this area. Having task specific commands that produce explicit events of their acceptance or rejection gives you a trail that you can examine step by step when you need to.
Over and above the diagnostic benefit of having this trail when trying to troubleshoot an issue, you also have the positive benefit of having a source for data mining.
As I mentioned previously, if a command is issued against your domain to reduce a customer’s credit rating due to a late payment, that command could be accepted or rejected, and the appropriate event is recorded.
Because you have this permanent record of the events that were produced from commands being processed by your domain, you have a rich source of information that exists for present and future business analysis. You can never know for sure in advance what information will be of value to the business at any given time. It might not occur to you that knowing credit rating reducing commands are being rejected is that important. In hindsight, once something is important, you often wish you had more information to analyze. The Event Store gives you this. You know what was processed and when and whether it was accepted or not, and why it was rejected.
No system or architecture is perfect, and cqrs isn’t. You still have to do a lot of work, and you still need to design it wisely. But, by tracking the history of your domain, step by step, as an architectural feature, you automatically have a wealth of information that you might not have otherwise.
Event Sourcing is a big topic, and as always, a good source of information is available from reviewing Martin Fowler’s discussion of the topic. But, if you take a look at Mark’s picture at the top of the post, you should notice that not only are there arrows that go from the domain to the Event Store, but also some that come from the Event Store to the domain. I’m going to try and explain why this is, but I am far from an expert here (the reason why this series is called ‘cqrs for dummies’ is that I’m one of the dummies), so you should examine what actual experts like Greg Young have to say as well.
Suppose you want to know what the current state of a customer is in your system. In a typical RDBMS, you query the Customer table based on their ID, and it gives you that information straight away.
With Event Sourcing and an Event Store, it is a little different. You query the Event Store for any events for the customer with their ID, from the end (most recent) of it, and work your way backwards until you get to the creation event for the customer, and then replay those events forward from that event, applying all of the events that happened after, and you end up with the current state.
On the face of it, this seems like it would be pretty inefficient in comparison, and, to a certain extent, it is. One way of improving the efficiency of the process is to build in the notion of a snapshot. Given a certain number of events for a domain entity, you store a snapshot of the current state of that entity in the Event Store, and then when you need to get the current state of it, you only need to work your way backwards in the Event Store to the most recent snapshot (or the creation event) before replaying forward.
However, one thing to keep in mind is that for a lot of the systems that use cqrs, the current state of an entity is kept in memory once it is needed. Trading systems that use cqrs might create the entity on demand using the replay model and then kept in memory during trading hours. So, while a first call might be technically inefficient, it doesn’t really affect the system negatively during active operation.
One could, of course, keep a separate set of tables of the current state of an entity which is continually updated by each event that enters the Event Store, but there are inefficiencies involved there as well.
There is some indication that using non-RDBMS for an event store when going with the replay model may have significant performance benefits, but I can’t say for sure.
Building a cqrs-based system that uses an Event Store offers a number of benefits, such as having a perpetual record of the history of the entities in your domain, the ability to replay these events for diagnostic and data mining purposes, and the ability to know when, how, and why your domain has changed.
As always, this isn’t to suggest it is the best way to design a system for every situation. The tradeoffs involved with designing a system that is different in many ways from traditional RDBMS ACID applications have to be weighed in every case. But I think there are clear benefits to considering this path.
In the next post, I will talk about the External Event layer, where we go full circle and figure out how the Reporting Store handles the events that allow the query layer to do its job successfully.