Name

cx:sparql — Perform SPARQL queries.

Synopsis

<p:declare-step type="cx:sparql" xmlns:cx="http://xmlcalabash.com/ns/extensions">
     <p:input port="source" sequence="true" primary="true"/>
     <p:input port="query"/>
     <p:output port="result" sequence="true"/>
</p:declare-step>

Description

This step uses the Jena project libraries to perform SPARQL queries on semantic web data. The sequence of triple documents that appears on the source port is used to construct a graph which is then queries.

The format of sem:triples files is straightforward, it contains a set of one or more sem:triple elements. Each sem:triple in turn contains a sem:subject, a sem:predicate, and a sem:object.

The subject and predicate are always IRIs, the object is either an IRI or a literal value. The object is an IRI unless it has a datatype or xml:lang attribute, in which case it is a literal.

If any IRI begins with “http://marklogic.com/semantics/blank/”, it represents a blank node.

What the heck is this format?

This format is a serialization of the internal format that MarkLogic uses to represent semantics data. It's convenient for me and easy to convert into other formats. Eventually, I'll add serialization options to produce more common formats.

Implementation

This step is implemented by the xmlcalabash1-rdf module. The jar file from that project must be in the class path in order to use this step.