The Good of Event Sourcing - Projections
It was in 2009 in Utrecht, The Netherlands, when I first learned about Event Sourcing and the Command Query Responsibility Segregation (CQRS) patterns at a training Greg Young gave there. I remembered to be awed by the scalability and architectural simplicity those styles provided. However, I also remembered the technical complexity that comes with it. In 2012, I lead the transitioning steps to migrate a CQRS-based system to Event Sourcing. I knew it would be non-trivial, but I still underestimated the number of new challenges I would run into over the course of four years. During that time, I've experienced first-hand how a large group of developers had to deal with the transition. Since then I've talked about this many times, both in the Netherlands and outside.
Event Sourcing is a brilliant solution for high-performance or complex business systems, but you need to be aware that this also introduces challenges most people don't tell you about. In June, I already blogged about the things I would do differently next time. But after attending another introduction to Event Sourcing recently, I realized it is time to talk about some real experiences. In this multi-part series, I will share the good, the bad and the ugly of Event Sourcing to prepare you for the road ahead. Let's start with the the good.
The power of projections
Event Sourcing requires you to store the domain changes as a series of historical intentioned-revealing events. Because of this simple structure, you can't run those queries you may have been used to when working with relational databases. Instead, you'll need to build and maintain queryable representations of those events. However, these projections can be optimized for the purpose they serve. So if your user interface requires the data to be grouped in a certain way, you can store the data pre-grouped in your persistent storage. By the time the query is executed, the data no longer needs any grouping, thereby off-loading your database. You can do the same with aggregated calculations, like a count per grouping. What's important to realize is that you'll end up with multiple autonomous projections that are build from the same events and have a single purpose. Consequently, duplication of data is a very normal thing in ES.
An added benefit of using projections is that there's no technical dependency between the events in the event store and the projection code that uses it. Storing the projections in a completely different database (or cluster) from the event store is perfectly fine. In fact, since each projection is completely independent from the others in terms of the data that it uses, you can easily partition the underlying tables without introducing any side-effects. But that is not all, considering that same autonomy, why would you use the same storage mechanism for all projections? Maybe you have a projection that doesn't contain that much data and can be rebuild in-memory at start-up. Or what about a projection that is written to a local embedded version of RavenDB instead of a relatively slow relational database? Especially in load-balanced scenarios a shared database can be a bottleneck. Having the option to keep the projection on the (load-balanced) front-end machines increases scalability and avoids network overhead.
Independence of place and time
Having discussed that, you might wonder whether these projections need to be in sync with the domain at all. In other words, do you need the update the projection in the same call or transaction that triggered the event in the first place? The answer to that (and many other design challenges) is: it depends. I generally prefer to run the projection code asynchronously from the command handling. It gives you the most amount of flexibility and allows you to reason about a projection without the need to consider anything else. It also enables you to decide how and when that projection is rebuild. You can even have projections that represent the domain at a certain point of time, simply by projecting the events up to that point. How cool is that? However, if the accuracy of a particular projection is important for handling a command, you may decide to treat it differently. Be aware though, if you decide to update your domain and projection in the same database transaction, it will hurt performance and scalability.
Now, one more thing. Given how autonomous each projection is and they way it is optimized to give you the aggregated data in a format that suits your needs, you can imagine how it resolves the friction between the object-oriented world and the relational database world. In fact, you don't any Object-Relation Mappers like NHibernate or Entity Framework at all. A solution that uses raw SQL, something like Dapper or a NoSQL solution like RavenDB will work perfectly fine.
Building a reporting node
Quite often, the first question you get when you tell somebody about event sourcing is how you build reports from that. Since all your events are persisted in a single database table (or whatever storage mechanism you use), it will be non-trivial to connect your ETL product of choice to it. But you do have two options here. First, you could build an ATOM feed on top of your event store so that more sophisticated ETL products can subscribe to what happens in the event store. Greg Young's Event Store does just that. But if you really need a traditional relation model to dig through, you're not out of luck yet. Just build another set of projections that look like a relational database model, completely asynchronous from the other projections or your domain. You could even build some kind of replication process that allows you to run those projections on a completely different machine.
So what are your experiences with Event Sourcing? Any interesting usages of projections that you would not have been able to do without Event Sourcing? I've love to know what you think by commenting below. Oh, and follow me at @ddoomen to get regular updates on my everlasting quest for better projections.
Leave a Comment