Sergey Babkin, Sr. Engineer
It’s been nagging me for a while, how the model we have, with its consistency rules, is very much like the logical circuits in hardware. This seems to be true for pretty much any Complex Event Processing (CEP) engine but especially so for the Aleri approach that treats the streams as a sort of materialized views, with changes in the underlying tables being propagated throughout the dependent views. The data processing in the models is a pure waterwall. Each stream takes data from upstream, does some processing, and outputs the result downstream. Exactly like the logical circuits! The data comes in here, goes through a few intermediate steps, and the results come out there. No loops. And for a good reason: as every hardware engineer or data flow engineer knows, when used carelessly, the loops can easily bring the device into a generation mode. Sometimes it’s desired but usually “generation” means that the device produces random junk in ever-increasing quantities.
The hardware people have learned to deal with it by using the memory. A device can be seen as an automaton, with the state of the automaton being the “memory”:
input -> [ device (logic, memory) ] -> output
At each step the device takes the input signal, processes it according to the current memory state, and produces the output and the new memory state. How does it know when to make the next step? The synchronization signal does that. So it can be expressed more precisely as:
(input, synchronization) -> [ device (logic, memory) ] -> output
The synchronization signal may have a fixed period (what we’re used to in reading about the CPUs) or be determined from the input. An important point is that once a synchronization cycle starts, the input can’t change any more. We collect the input and then say “that’s it, it’s the processing time”, and we stop accepting the input; any input coming in after that stays beyond the synchronization barrier. Then we process the collected input, send out the output and update the memory state. The cycle has completed, we’re ready for the next cycle: take in the input collected beyond the synchronization barrier, and repeat.
(Note: this also has direct analogies in the RDBMS transactions, but more on this later).
What is missing in the CEP model from the complete hardware analogy, is the “memory”, the state of the whole model. Sure, each stream has its own state, and this state may have non-trivial dependencies on the order of data received. Aleri FlexStream is a good example of this. But this state is localized to a stream. It’s not visible to the model as a whole and does not directly affect other streams. And there is no easy way to make it so, since each stream operates independently, and data may be queued between them in an unpredictable way.
But there are components that can be strung together to break out of the waterfall paradigm. To build the loops, global model state and proper synchronization. More on this in the next installment…