Stream Punctuation and RDF Stream Processing

Definition by Tucker et al.1 and Maier et al.2:

''A punctuation is a pattern p inserted into the data stream with the meaning that no data item i matching p will occur further on in the stream.''

For event processing systems, events are the fundamental unit of information3. This means each event is processed atomically, i.e. completely or not at all. For RDF stream processing systems this can cause problems if events are modelled as graphs consisting of multiple quadruples: How can a receiver of an event know that all quadruples pertaining to the event are transmitted in order to start processing the event?

For streams of RDF graphs punctuation can be used like this: A punctuation is a pattern ''p'' inserted into the quadruple stream with the meaning that no quadruples i from graph p will occur further on in the stream.

Punctuation could be implemented using special ("magic") quadruples but when using the Web stack(!) we can do punctuation out-of-band, i.e. implement punctuation on a lower layer of the stack. For example, we can communicate through ''chunked transfer encoding'' (Fielding et al. 1999, Section 3.6.1)4 from HTTP 1.1. Each chunk contains a complete graph and the receiver will know that after a chunk is received the event is completely received and can be processed further in an atomic fashion. There is a guarantee that no quads for this graph will arrive later. Using HTTP chunked connections no special (or magic) quads are needed.

''Chunked transfer encoding'' is also used by the RDF publish/subscribe middleware Ztreamy5 to provide long-lived connections using pure HTTP with the goal of disseminating events to subscribers. Further related work6 investigates the exchange of RDF over different protocols such as XMPP on top of HTTP (and thus TCP) but even UDP. However, none of these protocols provides pure HTTP stream URIs which are easily referenced in Linked Data.


  1. Tucker, P.; Maier, D.; Sheard, T. & Fegaras, L. Exploiting punctuation semantics in continuous data streams Knowledge and Data Engineering, IEEE Transactions on, 2003, 15, 555-568 [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1198390] ↩︎

  2. Maier, D.; Li, J.; Tucker, P.; Tufte, K. & Papadimos, V. Semantics of Data Streams and Operators Proceedings of the 10th International Conference on Database Theory, Springer-Verlag, 2005, 37-52 [http://datalab.cs.pdx.edu/niagaraST/icdt05.pdf] ↩︎

  3. Gupta, A. & Jain, R. Managing Event Information: Modeling, Retrieval, and Applications Managing Event Information, Morgan & Claypool Publishers, 2011 ↩︎

  4. Fielding, R.; Gettys, J.; Mogul, J.; Frystyk, H.; Masinter, L.; Leach, P. & Berners-Lee, T. Hypertext Transfer Protocol -- HTTP/1.1 RFC Editor, 1999 [http://www.w3.org/Protocols/rfc2616/rfc2616.html] ↩︎

  5. Fisteus, J. A.; García, N. F.; Fernández, L. S. & Fuentes-Lorenzo, D. (2014), 'Ztreamy: A middleware for publishing semantic streams on the Web ', Web Semantics: Science, Services and Agents on the World Wide Web 25(0), 16 - 23. ↩︎

  6. Shinavier, J. Optimizing real-time RDF data streams CoRR, 2010, abs/1011.3595 [http://arxiv.org/abs/1011.3595] ↩︎

Highlights from Research Project PLAY

My research project PLAY has had its final review meeting on Tuesday, November 26th, 2013.

Notable highlights from the project are:

  • Use of open Web Standards for event modelling, pattern modelling, access control for real-time/streaming data
  • Integrated Prototype (delivering a runnable architecture with all Open Source components)
  • Contribution to Open Source community, e.g. continued maintenance of RDF2Go, MultiActive objects for ProActive

Benefits PLAY can provide as a semantics-based event-driven platform for a use case of nuclear crisis management:

  • Eliminating superfluous, inaccurate or irrelevant information
  • Automating some analysis or actions based on predefined business rules
  • Reducing the time of information transmission between devices, stakeholders and decision makers
  • Increasing the reliability of knowledge (exhaustiveness)
  • Improving the agility capability of the crisis stakeholders

Benefits and opportunity PLAY can provide as an Event-driven Architecture and elastic platform to ORANGE Telecom:

  • QoS problems experienced by customers of LiveBox Pro can be detected in real-time
  • Improvement of knowledge about customer experience, reduction of after-sales costs with residential customers of 2G/3G data mobile services
  • Ability to to compare/challenge other Open Source platforms with PLAY such as Storm/kafka  + Hadoop/Hbase and commercial middleware products

Links:

  1. Opher Etzion (who is reviewer of PLAY together with Silvia Vecchi) blogged about PLAY: http://epthinking.blogspot.de/2013/11/on-play-project.html
  2. Project reports and published papers are here: PLAY publications

 

Tutorial on Complex Event Processing

I gave a tutorial about Complex Event Processing and more specifically the PLAY semantic event format and query language at the Winter School on Knowledge Technologies for Complex Business Environments in Ljubljana on December 2, 2011 at the Faculty of Mechanical Engineering (FME).

CEP - Complex Event Processing

Roland Stühmer

Abstract:

Real-time has become one of the crucial characteristics of modern applications and is completely changing the game in the data processing. Due to its capability to support continual monitoring, real-time data processing has become a very important mechanism in many application areas: traffic management, logistics, eHealth, smart grids, to name but a few. In this talk we present technologies to deal with real-time data on-the-fly, challenges and possible solutions to deal with these challenges such as using Web-friendly standards to create open and extensible systems for real-time data.

See also: