Documentation
Woefully incomplete, to be sure, but at least something exists now.
What is it?
Calabash is an implementation of XProc: An XML Pipeline Language
This is an beta release. It passes all of the tests in the XProc Test Suite, but is known to be incomplete.
Prerequisites
Calabash is built with Java 1.6 on top of Saxon. To run it, you'll need:
- Java 1.6 or later
- Saxon 9.3.0.x or later
- Apache HTTP Client (and Apache Commons logging and codec) if you want to use p:http-request
- Saxon-SA if you want to use p:validate-with-xml-schema
- The XQuery API for Java (XQJ) if you want to use p:xquery
- Jing if you want to use p:validate-with-relax-ng or cx:nvdl.
Note: This is a change in V0.9.9. Previous versions of XML Calabash relied on ISO RELAX and Sun's Multi-Schema Validator for p:validate-with-relax-ng. Switching to Jing allows XML Calabash to support RELAX NG compact syntax.
I may add a command-line switch to allow users to select which validator they want to use.
- TagSoup if you want to be able to parse text/html content with the p:unescape-markup step
- XEP if you want to use the p:xsl-formatter step. (You may possibly need the XEP developers kit in addition to XEP.)
(Note: dependency on commercial applications for core XProc steps is a temporary expediency. Eventually, I'll integrate support for open source tools like Xerces and FOP. Patches welcome.)
Calabash also implements several extension steps. These are not part of the XProc core language standard and cannot be expected to reliably interoperate with other implementations.
- cx:collection-manager provides a mechanism for associating sets of documents with collection() URIs for use in the p:xslt (2.0) and p:xquery steps.
- cx:delta-xml provides an integration with DeltaXML's commercial XML-diffing tool suite. Requires the DeltaXML tools, naturally.
- cx:message is a debugging aid; it's the identity step with a message option that will be printed on stderr.
- cx:unzip provides access to the files in a ZIP archive.
- ml:adhoc-query, ml:invoke-module, and ml:insert-document implement Mark Logic Server XCC primitives (incompletely).
How do I use it?
Download the latest release. Inside the archive you'll find calabash.jar. Make sure that jar file and the prerequisites are on your class path. Then you can run it from the command line:
java com.xmlcalabash.drivers.Main options pipeline.xpl
For example:
$ java com.xmlcalabash.drivers.Main xpl/pipe.xpl <doc xmlns:p="http://www.w3.org/ns/xproc"> Congratulations! You've run your first pipeline! </doc>
You can use -iport=file to change the inputs and -oport=file> to change the output location.
For example:
$ java com.xmlcalabash.drivers.Main -isource=pipe.xpl -oresult=/tmp/out.xml xpl/pipe.xpl
That will run pipe.xpl using pipe.xpl as the input and writing the result to /tmp/out.xml.
If you run java com.xmlcalabash.drivers.Main with no options, it will print a short usage summary.
Simple pipelines from the command-line
Starting with version 0.9.18, XML Calabash supports simple, linear pipelines on the command-line. The basic idea is that you list each of the steps with the -s option. You can precede each step with its inputs and parameters and follow it with its options.
For example, to run an XSLT step, you could do something like this:
$ java com.xmlcalabash.drivers.Main \
-isource=doc.xml -istylesheet=style.xsl -s p:xslt
To validate the input then process it with XSLT, like this:
$ java com.xmlcalabash.drivers.Main \
-isource=doc.xml -ischema=schema.xsd -s p:validate-with-xml-schema \
-istylesheet=style.xsl -s p:xslt
The way this works is by constructing a literal pipeline from the steps passed on the command-line and then running that pipeline.
If you run with the --debug option, you can see the pipeline that was constructed.
If you load a library or libraries with --library option, you can refer to those steps in your pipeline, but each step must have a single primary input and a single primary output.
What do I do if it all goes wrong?
Tell Norm.