Just read an interesting article over at Intelligent Enterprise, titled: “Report Warns Of Data Warehouse ‘Bottleneck’ In Real-Time Analytics“. It certainly reflects what we have been seeing, where people are looking to Event Processing technology (whether you call it Complex Event Processing or Event Stream Processing) to overcome the limitations of more traditional data collection, analysis and reporting technologies - such as operational data stores and data warehouses - when it comes to real-time analysis and monitoring.
A couple of observations:
“Real-time” means different things to different people. While the CS purists are quick to point out that real-time really just refers to predictably response time, in general usage people use it to mean “right now” or “as soon as something happens”. But as the article points out, for some users, a few seconds or even a few minutes can be “right now”. For someone who goes from an overnight report to data that’s a few minutes old, it certainly feels light “right now”. But for other users, and other applications, “right now” can mean milliseconds. Placing a trade in the markets before the opportunity is lost, requires sub-second response. Shutting down that nuclear reactor that just sprung a leak…I hope they don’t consider a few minutes to be “right now”.
This is where event processing can help. Data can be streamed into an event processor - as soon and as fast as it appears, and the event processor can apply complex rules and operations to combine it with other data, filter, summarize, compute new values, look for patterns, etc - producing a stream of new “events” as a result. The lag time between the input and the output - for a high performance complex event processing (CEP) engine - will be sub-second (or even sub-millisecond in some cases).
One thing I think the article didn’t mention is the case that we see most often: using an event processor to complement a data warehouse. The article talks about how event stream processing is an alternative to a data warehouse. But frankly I have yet to see a customer use Aleri’s CEP engine as an alternative to a data warehouse. What I do see, however, is CEP being deployed to complement a data warehouse. The data warehouse still serves as the ultimate data repository, and the event processor takes on one (or both) of two functions:
1) as a real-time monitoring application, the event processor sits along side the data warehouse, monitoring and analyzing the same data that is flowing into the data warehouse. But while the data warehouse holds it to make it available for querying, the event processor analyzes it as soon as it arrives in order to generate alerts, initiate an immediate response, or to make low-latency information available to a user to support real-time decision making.
2) as a front-end pre-processor for data flowing into the data warehouse. This is the case that I think gets overlooked a lot. You can almost think of this as “STL” in the sense that it’s an alternative to ETL that is event-driven. Here, data streams into the event processor in real-time. The event processor performs real-time data correlation, transformation, etc and collects the data to be loaded into the data warehouse in micro-batches. This micro-batch loading overcomes the speed limitations with which the data warehouse can absorb data, and at the same time avoids large batches that introduce unnecessary latency. Because of the event processor’s ability to combine data from different sources and transform the data in real-time, the data gets loaded into the data warehouse in a format that is optimal for reporting. This can significantly improve the speed of applications that use the data in the warehouse, since the data is already in the form they need, rather than being stored in the warehouse in a raw form, and then being combined and transformed at query time.
Bottom line, at Aleri we see event processing (or CEP) as a powerful ally of the data warehouse rather than as an alternative to the data warehouse.