Stream Punctuation and RDF Stream Processing

Definition by Tucker et al.1 and Maier et al.2:

''A punctuation is a pattern p inserted into the data stream with the meaning that no data item i matching p will occur further on in the stream.''

For event processing systems, events are the fundamental unit of information3. This means each event is processed atomically, i.e. completely or not at all. For RDF stream processing systems this can cause problems if events are modelled as graphs consisting of multiple quadruples: How can a receiver of an event know that all quadruples pertaining to the event are transmitted in order to start processing the event?

For streams of RDF graphs punctuation can be used like this: A punctuation is a pattern ''p'' inserted into the quadruple stream with the meaning that no quadruples i from graph p will occur further on in the stream.

Punctuation could be implemented using special ("magic") quadruples but when using the Web stack(!) we can do punctuation out-of-band, i.e. implement punctuation on a lower layer of the stack. For example, we can communicate through ''chunked transfer encoding'' (Fielding et al. 1999, Section 3.6.1)4 from HTTP 1.1. Each chunk contains a complete graph and the receiver will know that after a chunk is received the event is completely received and can be processed further in an atomic fashion. There is a guarantee that no quads for this graph will arrive later. Using HTTP chunked connections no special (or magic) quads are needed.

''Chunked transfer encoding'' is also used by the RDF publish/subscribe middleware Ztreamy5 to provide long-lived connections using pure HTTP with the goal of disseminating events to subscribers. Further related work6 investigates the exchange of RDF over different protocols such as XMPP on top of HTTP (and thus TCP) but even UDP. However, none of these protocols provides pure HTTP stream URIs which are easily referenced in Linked Data.


  1. Tucker, P.; Maier, D.; Sheard, T. & Fegaras, L. Exploiting punctuation semantics in continuous data streams Knowledge and Data Engineering, IEEE Transactions on, 2003, 15, 555-568 [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1198390] ↩︎

  2. Maier, D.; Li, J.; Tucker, P.; Tufte, K. & Papadimos, V. Semantics of Data Streams and Operators Proceedings of the 10th International Conference on Database Theory, Springer-Verlag, 2005, 37-52 [http://datalab.cs.pdx.edu/niagaraST/icdt05.pdf] ↩︎

  3. Gupta, A. & Jain, R. Managing Event Information: Modeling, Retrieval, and Applications Managing Event Information, Morgan & Claypool Publishers, 2011 ↩︎

  4. Fielding, R.; Gettys, J.; Mogul, J.; Frystyk, H.; Masinter, L.; Leach, P. & Berners-Lee, T. Hypertext Transfer Protocol -- HTTP/1.1 RFC Editor, 1999 [http://www.w3.org/Protocols/rfc2616/rfc2616.html] ↩︎

  5. Fisteus, J. A.; García, N. F.; Fernández, L. S. & Fuentes-Lorenzo, D. (2014), 'Ztreamy: A middleware for publishing semantic streams on the Web ', Web Semantics: Science, Services and Agents on the World Wide Web 25(0), 16 - 23. ↩︎

  6. Shinavier, J. Optimizing real-time RDF data streams CoRR, 2010, abs/1011.3595 [http://arxiv.org/abs/1011.3595] ↩︎