CRSP Engine

Continuous RSPQL Stream Processing (CRSP) Engine

View project on GitHub

CRSP Engine


Project Status:


  • Continuous Integration: Build Status

What is CRSP?


Our Continuous RSPQL Stream Processing (CRSP) Engine is a new tool that allows for continuous RSP-QL queries to be applied over streams of RDF graph data. The primary goal for our system is to be able to run any valid RSP-QL query over any valid RDF graph stream and output the correct result. The CRSP-Engine makes use of a frontend graphical user interface that allows users to input a query and graph stream. These inputs are then passed to our Data Stream Management System which will parse the query to ensure the given query’s RSP-QL syntax is correct. Our engine will then evaluate the query over the given stream and return the result to the user via the graphical user interface.

How does the CRSP-Engine work?


The CRSP-Engine is an extension of the RDF4J SPARQL parser. This extension comes in the form of 3 new key developments:

  1. A RSP-QL parser.
  2. A Graph stream handler.
  3. A frontend GUI.

Project Scope

The creation of the CRSP-Engine was a 12 week student project in collaboration with a research group, with the intention of it becoming an open source project after submission. The team consisted of 3 people and as a result the scope had to be altered slightly. Initially the plan was to implement a full representation of RSP-QL features over RDF Graph streams, we quickly learned this was not feasible given our teams lack of experience in the field and no precedent for this kind of engine. As a result we broke our project into 4 objectives:

  1. Parse and validate RSP-QL queries,
  2. Parse, verify and internalise RDF Graph Streams,
  3. Apply at least 1 RSP-QL language feature (windowing) and all underlying SPARQL queries to internal graph streams,
  4. Create user friendly and intuitive GUI.

CRSP-Engine’s Parser


Our engine provides validation for queries that have RSP-QL’s extended syntax through modifications made to the abstract syntax tree of RDF4J’s SPARQL parser. These modifications where made through the use of the JJtree preprocessor and the JavaCC compiler which allowed for the generation of a new RSP-QL parser that reads in a new grammar specification that was designed by our team, based on syntax provided in this complex RSP-QL query. Our new parser and grammar allows for the CRSP-Engine to understand the concept of continuous windows that are specified inside of a query and how they are related to RDF graph stream data. These windows can be used by the CRSP-Engine to evaluate a query over a subset of a RDF graph stream by comparing the query window to the extra time attribute attached to each graph within a stream.

CRSP-Engine’s Graph implemention.


Graphs are represented in the CRSP-Engine as our own internal object. Each graph object has the following properties: graphData, observedAt time and a unique ID. Our internal graph structure was able to utilise RDF4J’s Model interface to hold RDF data represented as a Model within each graph, this made computation over said data simpler. The observedAt times denote the time that an entire graph was observed at, which will became useful to correlate graphs and windows.

Graph Streams are represented internally as a list of those above graphs, the graphs in the list relate to the original stream graphs in a one-to-one ratio. Our Graph Streams are read into the CRSP-Engine via JSON, this was decided upon as it is a common communication language used on the web, which will provide scalable and robust options in the future.

CRSP-Engine’s Graphical User Interface


Our engine also features a simple GUI that allows a user to input a query and graph stream then view the result. A query can be inputted via a text file or by manually typing in a query string. A graph stream can be inputted via a json file. After the engine has applied the query over the graph stream, the user can view the result in a text box displayed on the GUI or via an outputted text file.

What is RDF?


The Resource Description Framework (RDF) is a data model used to relate items of data to one another. RDF statements are displayed as triples in the form subject–predicate–object where:

Subject: Denotes a resource.
Predicate: Defines relationship between subject and object.
Object: Denotes a resource or literal.

High level example of an RDF triple:

Apple (subject) sells (predicate) IPhone (object)

In-depth information on RDF can be found at the following links:

https://www.w3.org/RDF/

https://www.w3.org/TR/rdf-concepts/

What are RDF Streams?


RDF streams are collections of RDF data sent out or received at a continuous rate. Typically these data streams contain RDF triples as shown above with an extra time attribute attached to the triple.

E.g. <s1,p1,o1>,t1 where s1=subject, p1=predicate, o1=object and t1-timestamp.

How do RDF Graph Streams differ from RDF Streams?


RDF graph streams collect all RDF data at a given event and assign a timestamp. This allows RDF graphs to hold numerious triples that relate to a specific event, eg. the temperature in a room at time 10pm. Abstracting the time element up a layer provides them more structure providing a more usable data format.

General Format: 
:GraphID {RDF Triple Data} {Timestamp When Graph Data Was Observed}

Example:
:g1{:axel :isIn :RedRoom. :darko :isIn :RedRoom.} {:g1 :observedAt t1}

What is RSPQL?


RSP-QL (RDF Stream Processing Query Language) is a new language for the semantic web that queries over streams of RDF graphs. RSP-QL will expand the application domain of previous existing RSP languages by allowing queries to be directly applied to multiple triples that are grouped inside graphs. Such application domains include real-time reasoning over sensors, urban computing, and social semantic data.

The grammar of RSP-QL is an extension of the features currently present in C-SPARQL, CQELS, and SPARQL-Stream because RSP-QL contains unique features that are missing in all other RSP languages, such as the FROM NAMED WINDOW clause in the dataset declaration or the WINDOW keyword in the WHERE clause.

An example of a simple SELECT query using this new syntax is shown below:

PREFIX ex: <http://example.org/> 
SELECT ?p ?o
FROM NAMED WINDOW :win ON ex:examples [RANGE PT5m STEP PT1m]
WHERE { 
	WINDOW :win { 
		ex:Paris ?p ?o
	}
}

More information on RSP-QL can be found its GitHub page and the W3C Community Group on RDF Stream Processing website:

https://github.com/streamreasoning/RSP-QL

https://www.w3.org/community/rsp/