Name
cx:rdf-load — Load RDF triples from semantic web data sources.
Synopsis
<p:declare-step
type
="
cx:rdf-load
"
xmlns:cx
="
http://xmlcalabash.com/ns/extensions
"
>
<p:input
port
="
source
"
sequence
="
true
"
/>
<p:output
port
="
result
"
sequence
="
true
"
/>
<p:option
name
="
href
"
required
="
true
"
/>
<!--
anyURI -->
<p:option
name
="
language
"
/>
<!--
string -->
<p:option
name
="
graph
"
/>
<!--
string -->
<p:option
name
="
max-triples-per-document
"
select
="
100
"
/>
<!--
long -->
</p:declare-step>
Description
This step uses the Jena project libraries to extract RDF triples from semantic web data sources. The results are returned in a sequence of XML documents that encode the triples directly.
The format of sem:triples
files is straightforward,
it contains a set of one or more sem:triple
elements. Each
sem:triple
in turn contains a sem:subject
, a
sem:predicate
, and a sem:object
.
The subject and predicate are always IRIs, the object is either
an IRI or a literal value. The object is an IRI unless it has a datatype
or xml:lang
attribute, in which case it is a
literal.
If any IRI begins with
“http://marklogic.com/semantics/blank/
”,
it represents a blank node.
What the heck is this format?
This format is a serialization of the internal format that MarkLogic uses to represent semantics data. It's convenient for me and easy to convert into other formats. Eventually, I'll add serialization options to produce more common formats.
Implementation
This step is implemented by the xmlcalabash1-rdf module. The jar file from that project must be in the class path in order to use this step.