Event Processing
On enrichment - and the difference between BRMS and EP
A recent article in the IBM developerWorks discusses two ways to enrich data used for rules from external databases, one of them is doing the enrichment in the request level, before calling the "decision server" (which is the current name for using BRMS system using the request-response protocol), the other one is doing enrichment during the rule processing itself. The article describes how each of these options is done and also discusses pros and cons, the benefits of enrichment by the request level are - less complexity, and better performance of the rule component; the benefits off enrichment at the rule level are - handling dynamic data and more specialization for the exact data that is being used by the rule.
Thinking about event processing -- there is similarity to the BRMS case, event can be enriched both by the event producer and as part of the event processing itself, the arguments are not far from those in the BRMS case, there is one fundamental difference in event processing -- the work is not done using the "request-response" protocol, moreover, the different part of the system are decoupled, thus the event producer does not necessarily know what purposes the event is going to be used, thus there may be different types of enrichment needed for different uses. The dynamic aspect is applicable here, and there may be some race conditions in highly dynamic systems between updates in the database that was enriched and its use in enrichment, unless the event processing enrichment system locks the data in the database until it is being used in the event processing system, which requires the event processing system to exhibit a transactional behavior for part of it, but I'll not get now into this issue,
Bottom line: The considerations in event driven architecture are somewhat different than the request-response systems that are the most common one in computing.
Mobile Apps Take Data Without Permission
Timo Elliot's presentation on "Business in the moment - from reactive to proactive"
Timo Elliot from SAP gave a recent talk in the Gartner BI meeting in London entitled "Business in the moment -from reactive to proactive". You can download the presentation from a link in Timo's Blog posting. In a following post on his Blog, Timo refers to an FreshDirect explaining the proactive behavior:
“FreshDirect has an operations center that manages its fleet of delivery trucks. In a large metropolitan area like New York, traffic doesn’t always flow predictably. A traditional approach to BI would be to print a report showing the level of on-time deliveries (OTDs) the day before and then ask the transportation department what went wrong for the orders that were delivered late. FreshDirect uses analytics in a more impactful way.”“The company monitors the delivery rate of every truck and enters that data into the BI system on an ongoing basis. Every hour, it uses the previous hour’s data to predict how many deliveries will be on-time in the next hour. If the predicted OTD rate is below FreshDirect’s target, the company sends out an auxiliary truck or trucks to help make deliveries. The company holds 10 trucks in reserve for just this purpose.”I'll bring more proactive stories when I'll find out about them...
ACM DEBS 2012- Second Call for Papers
2012: The Year Analytics Means Business
Killing elephants - is MapReduce dying?
My English teacher in the last grade of high school had an interesting taste in literature, and taught us the story on "Shooting and Elephant" by Orwell. I was not a very good student in English and forgot about it until reading Colin Clark's Blog posting entitled : "It's time to kill the elephant". From time to time there are various people claiming that various things are dead or dying. Some of the readers may still remember the discussion about whether SOA is dead. Recently the Forbes Blog has announced the death of ERP. Colin's contribution to the hunt is the observation that MapReduce is dying (or should be dying) and the batch processing should be replace by more real-time processing. His evidence is that Google is dumping MapReduce and using Colossus for its search technology. While this fact is certainly true, I think that there are still many types of analytic procedures that are done off-line using batch processes, so while the use of real-time analytics will substantially increase given supporting infrastructure, I am not sure that batch will die soon (the same goes for SOA and ERP)... Old soldiers never die - they just fade away (s-l-o-w-l-y).
Crash course to build simple EP application using Esper
A crash course claimed to take less than an hour entitled "A simple introduction to complex event processing" has been posted. This is done by example, which seems to be indeed very simple, finding "decreasing" or "increasing" pattern over two consecutive events and setting the color as green or red. The main emphasis is on the setting - how to obtain, define and use events, and configure the engine - threadpools, listeners etc... However not much about what event processing actually can do -- this is probably the next lesson.
Esper is contrasted with commercial products since its open source model allows developers to play with it, use it for toy examples, and for daily usage that is not necessarily a commercial application of big enterprise, in our days of enterprise computing this approach has certainly a role to play, it should be noted that Esper is not the only open source in this area, and that some of the commercial products allow free development version (not access to the source code, but enabling developers to use the product for these purposes for free).
Anyway -- if you wish to learn Esper, it is a good start.
On revision and compensation in event processing
There are couple of motivations of why I have returned to be interested in it now.
One of them is the investigation of uncertain events, since more information may be acquired with time about events that are uncertain, a revision might be needed.
The second is the work on future events, since future events are obtained using a forecasting process, this forecasting process is not a one-time process, but can be sensitive to additional events that happen between the forecast and the occurrence of the future event, thus the forecast itself may be revised.
The issue of revision may entail the need for compensation for decisions and actions already taken.
- In some cases it is easy, example when no action was taken and it is still possible to take action
- In some cases it is impossible, if an action has been taken, and this action cannot be retracted,
- In the remaining of the cases, it might be possible, however not always cost-effective, since it might have cascading effect of compensating for large amount of actions.
After getting the students' work, I'll write again about this issue.
Uncertainty in event processing
And indeed, there has been a lot of work about uncertainty in data over the years in the research community, but very little got into the products, the conception has been that while data may be noisy, there is a cleansing process that is applied before using the data. Now with the "big data" trend, this assumption seems not to hold at all times, the nature of data (streaming data that need to be processed online), the volume of the data, and the velocity of having also imply that the data, in many cases, cannot be cleansed before processing, and that decisions may be based on noisy, sometimes incomplete or uncertain data. Veracity (data in doubt) was thus added as one of the four Vs of big data.
Uncertainty in event is not really different from uncertainty in data (that may represent either fact or event).
Some of the uncertainty types are:
- Uncertainty whether the event occurred (or forecast to occur)
- Uncertainty about when event occurred (or forecast to occur)
- Uncertainty about where the event occurred (or forecast to occur)
- Uncertainty about the content of an event (attributes' value)
There are more uncertainties relate to the processing of events
- Aggregation of uncertain events (where some of them might be missing)
- Uncertainty whether a derived even matches the situation it needs to detect -- this is a crucial point, since the pattern indicates some situation that we wish to detect, but sometimes the situation is not well-defined by a single pattern. Example: a threshold oriented pattern such as: "event E occurs at least 4 times during one hour". There are false positives and false negatives. Also if event E occurs 3 times during an hour, it does not necessarily indicate that the situation did not happen.
We are planning to submit a tutorial proposal for DEBS'12 to discuss uncertainty in events, and now working on it. I'll write more on that during the next few months
Why Alerts Suck and Monitoring Solutions Need to Become Smarter
Risk-Based Passenger Screening Could Make Air Travel Safer
Modelling Choreography (with events, states and business rules)
This week the BCS SPA group held a fascinating session titled “Modelling Choreography” by requirements analyst Ashley McNeile.
Ashley described some of the past efforts to model and implement choreographies, using types of process algebra such as Robert Millner’s Calculus of Communicating Systems (CCS) and its derivative Pi-Calculus. However, Ashley used sequences of events and states (i.e. a state diagram) which he also compared to Michael Jackson’s formalised object lifecycles (e.g. JSD / Jackson Diagrams). Various W3C efforts have described choreographies too - e.g. WS-CDL. Of course the latest modelling construct for choreography is BPMN2!
As an example of his practice, Ashley described an example - modelling bank account transactions via Protocol Modelling (using simple state diagrams):
- state model 1: defined the close and withdraw events on an active account
- state model 2: defined the freeze and release account events
- state model 3: this had no state transitions, but defined the state by the associated constraints (or business rules)
- if balance < 0 then account state is overdrawn
- one cannot close an account if it is overdrawn
- all 3 state models operate in parallel.
To analyse these state models they can be combined into a single state models (with all combinations of states, and all events), and then the unreachable states can be filtered out. The interesting thing here is (1) the analysis of state models for completeness and (2) the use of incomplete state diagrams as a business notation for textual (policy or constraint) business rules.
Other observations:
- These types of business rule apply to states and data; they can be extracted and modified (by a developer, or state modeller) into event rules or guards in a state transition diagram. Is it interesting to specify these business rules up front before mapping to events and processes? Yes from a business perspective, as new events or states might affect or be affected by existing business rules.
- Using a state to specify a business rule (in terms of the state and output) is an interesting notation that lends itself well to mapping to appropriate events (or indeed processes). Could it catch on in the business rule community?
- The use of an explicit choreography language has not had much success it seems. Google WS-CDL and most entries are dated 2009 or earlier. BPMN2’s choreography may yet prove useful but possibly the concepts are too difficult for business modellers yet imply a co-operative design process for developers that rarely occurs in practice (beyond “this is the interface”!).
- At the end of the day, the sequence of events in a business system is just a complex event - which maybe can tell you if the choreography is valid or not.
I’ll add a link to the slides to help explain all the above when they become available…
Annex: a Distributed System Choreography Development Process:
This process describes a development process of state diagrams for choreography purposes:
- Define participants and messages (/events) that interact between them
- Define states with events as messages from and to, with only 1 sender per state
- Project the states out to individual participants - i.e the parts of the state model for each participant - allowing ambiguous states but ensuring these have no sends
- Merge the states for each participant
- Enact - check each event at a time to prove feasibility of the interacting state models
No related posts.
Event Processing Platforms vs Engines
Opher Etzion just made an interesting classification of the CEP tools market in his observations on the Bloor Research comments on CEP and Big Data, part of an increasing amount of coverage on CEP. To wit:
- Event Processing Platform is a software that enables the creation of event processing network, handle the routing of events among agents, management, and other common infrastructure issues.
- Event Processing Engine is a software that enables the creation of the actual function - in the EPN term implementing agents.
In the CEP Market analysis we don’t try to distinguish between these - probably because it would be contentious. For example, to some folks an “event processing network” is managed as a single process - possibly multi-threaded, but bounded on a single machine instance. To others (like TIBCO) the network is a message or event distribution mechanism for breaking the constraints of a single process or system (e.g. performance, scalability, and fault tolerance constraints). Furthermore “event processing agents” might be viewed as “event processing operations” - like a single pattern detection query, or a pattern matching rule, arranged in some kind of activity or business process diagram - or as more autonomous processing agents that can handle a number of operations and cooperate declaratively towards some solution.
If one views an Event Processing Platform as one that handles routing across multiple processes and distributed systems, then the potential candidates is reduced somewhat [*1]. Of course, any CEP engine can be used acoss multiple systems with a shared middleware infrastructure, but individually they are “blind” to the other agents and the design tools do not handle the cooperative nature of the agents. Of course, one can set up a message type to include management information to allow for some semblance of distributed control, but this is more likely to be a developer task than a platform capability.
Looking at something like TIBCO BusinessEvents, we can see this satisfies the requirements of a (physically distributed) Event Processing Platform:
- Enables a (computer) network of event processing agents - typically as a minimum of rule agents and cache /datagrid agents, in pretty much any configuration.
- Enables a (single process) network of event processing operations - typically the network is implemented as declarative rules, but can be visualised as a network in a report.
- Enables different types of Event Processing Engines - apart from the rule agents, you can also have (continuous) query agents. Rule agents can also be customised as “decision agents” (executing decision rules, or decision tables), “analytics agents” (executing predictive analytics models in Spotfire S+ or R), or “optimization agents” (executing NuOpt optimization routines in Spotfire Statistical Services) [*2]
Notes:
[*1] Other candidates for an Event Processing Platform across distributed systems include IBM Infosphere Streams (although IBM is very quiet these days about that), and EventZero. If there are any others please comment them, and if enough we’ll update the Market Analysis with this classification…
[*2] Note that invoking Spotfire services involves invoking the Spotfire platform under the control of a rules agent; from an architecture point of view these are just SOA services, like calling BusinessWorks services during event processing.
No related posts.
On "CEP and Big Data 2" - comments on Philip Howard's observations.
Philip deals with three issues:
- whether the name CEP is appropriate or should be changed?
- who should be credited as the pioneer of this area?
- whether CEP implies real-time processing?
- who are the CEP big data platforms?
Here are summary of my views on each of this topics.
The name "Complex Event Processing"
Exactly four years ago I posted on this Blog an explanation about - "why I prefer to use the name event processing without any prefix, infix or suffix". My particular dislike of the term "complex event processing" stems from the ambiguity in the name - some people (including David Luckham who coined this term) view it as processing of complex events, some interpret it as complex processing of events, and then debate of when something is complex enough, and what type of complexity is needed to qualify as CEP. Moreover some of the vendors use this term for products that are neither of the two options. I think that two words is enough for the name of a discipline, examples: information retrieval, machine learning, image processing and much more.... Thus, from my point of view the term "event processing" subsumes all other terms like complex event processing, business event processing, event stream processing and more.
Who gets the pioneering credit
Philip as a good UK patriot wonders why the Wikipedia value about Wikipedia and other sources gives credit to David Luckham and forget the Apama work that came from Cambridge UK. Looking at Wikipedia, it has one mention of David, as well as other references (like our EPIA book). It indeed does not mention Apama or any paper by John Bates, but being a Wikipedia, anybody can suggest additions.
David Luckham had major influence on this area, since he was the first one who published a full book and exposed the young area to the general public. An article in IEEE Computer, published in 2009, made some investigation of the history of that area and determined that in the 1990-ies there were four parallel projects that can be classified as starting points in this area: David Luckham's project in Stanford, John Bates' project in Cambridge (UK, not Boston), Mani Chandy in Cal Tech, and our Amit project in IBM Haifa Research Lab. I share Philip's view that John Bates should have full credit as one of the pioneers, and still view David Luckham as the "elder statesman" of the community.
Is CEP necessarily associated with real-time?
I have written several times about this topic, last time in response to Chris Carlson, to whom Philip also responds. There is some abuse of the term real-time in the industry, while its meaning is "within time constraints", many people interpret it as "with very law latency". This is not the same, anyway, event processing is a functionality with applications that require very law latency, applications which require to react within real-time constraints (which can be: 2 hours), some require both, and some require none.
Who are the CEP big data platforms?
I have taken upon myself the limitation not to state opinions on commercial products within this Blog - leaving it to analysts. Thus will make one comment. There is distinction between two types of software entities -
which is sometimes confused in the language used by people.
- Event Processing Platform is a software that enables the creation of event processing network, handle the routing of events among agents, management, and other common infrastructure issues.
- Event Processing Engine is a software that enables the creation of the actual function - in the EPN term implementing agents.
On spime
According to Wikipedia: Spime is a neologism for a currently theoretical object that can be tracked through space and time throughout the lifetime of the object. The name “spime” for this concept was coined by author Bruce Sterling.
Spime comes from the combination of the words space and time, and is said to be enabled by the Internet of Things. In the event processing terminology - spime is the collection of events that happened to a single entity during its life-span, where each event has both time and space properties recorded as part of this event. Any person may have a spime associated with this person, which can span from birth and actually last long time after the person's death, e.g. if I am writing now about Isaac Asimov, this can be considered an event in Asimov's spime, although he is not a living entity. Spimes can relate to something with more limited length like a certain flight, or the event processing course I taught this semester.
In some cases it make more sense to have Spime processing rather than individual event processing and have some patterns associated with Spimes, this, of course, has strong relationship to event processing -- I've recently started to look and spime processing and will write more about it in the future
On Pecha Kucha
There is a youtube presentation containing Pecha Kucha style presentation about how to prepare Pecha Kucha style presentations. I should try it once. There are also Pech Kucha nights which seems to be marathon of Pecha Kucha presentations.
Is computer science a science or engineering?
I remember years ago a heated discussion in a conference whether computer science is a science or engineering, my daughter had a "science day" in the high school that she'll attend next year, and while they teach computer science they don't view it as a science, for them science consists of biology, chemistry, physics and some of their derivatives.
Recently I came across an article in "Scientific American", about U.S. science degrees. In this article, as you can see in the picture below, computer science is neither classified as science nor as engineering, it is actually classified as technology. Interesting -- I think that computer science is not monolithic, and various sub-disciplines may be classified differently.
CEP and Big Data 2
Collision in the Making Between Self-Driving Cars and How the World Works
Human Event Processing at WEF
“Gentlemen’s magazine” Esquire has an article by Ryan D’Agostino about TIBCO CEO Vivek Ranadive and mentions the new tibbr-based application for coordinating strategies and tactics among world leaders at WEF.
TopCom, … is a private communications platform for the two hundred most powerful people in the world.
TopCom is being officially launched in late January at the annual meeting of the World Economic Forum in Davos, Switzerland. It is basically a customized, ridiculously secure version of tibbr, a platform developed by Tibco as a kind of combination Facebook, Twitter, e-mail, texting, and Skype. It is a private social network, essentially - in this case, for world leaders.
… The top two hundred WEF members - basically, the people who run the world - can speak to one another on a given subject, and then they can choose to loop in members from lower tiers (experts, academics, etc.) as needed, widening the pool of knowledge on whatever problem is on the table.
…Tibco consulted with both the Japanese prime minister at the time of last year’s tsunami, Naoto Kan, and his successor, Noda, when it was developing its presentation for the WEF board of directors, to find out what would have been useful to them at the time of the disaster. Schwab, too, collaborated. The result, which will be on display in Davos, is the first time a global organization will introduce its own proprietary communications platform. …
No related posts.
