Name

cx:pretty-print — Reformat whitespace in a document.

Synopsis

<p:declare-step type="cx:pretty-print" xmlns:cx="http://xmlcalabash.com/ns/extensions">
     <p:input port="source"/>
     <p:output port="result"/>
</p:declare-step>

Description

The cx:pretty-print step reformats an XML document by passing it through the following XSLT stylesheet, serializing the result, and then reparsing it[1].

<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform'
                xmlns:saxon='http://icl.com/saxon'
                exclude-result-prefixes='saxon'
                version='2.0'>

  <xsl:output method='xml' indent='yes' saxon:indent-spaces='2'/>

  <xsl:strip-space elements='*'/>

  <xsl:template match='*'>
    <xsl:copy>
      <xsl:copy-of select='@*'/>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match='comment()'>
    <xsl:choose>
      <xsl:when test="preceding-sibling::node()[1]/self::text()
                      and contains(preceding-sibling::text()[1], '&#10;')">
      </xsl:when>
      <xsl:otherwise>
        <xsl:text>&#10;</xsl:text>
      </xsl:otherwise>
    </xsl:choose>

    <xsl:copy/>

    <xsl:choose>
      <xsl:when test="following-sibling::node()[1]/self::text()
                      and contains(following-sibling::text()[1], '&#10;')">
      </xsl:when>
      <xsl:when test="following-sibling::node()[1]/self::comment()
                      or following-sibling::node()[1]/self::processing-instruction()">
      </xsl:when>
      <xsl:otherwise>
        <xsl:text>&#10;</xsl:text>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template match='processing-instruction()'>
    <xsl:choose>
      <xsl:when test="preceding-sibling::node()[1]/self::text()
                      and contains(preceding-sibling::text()[1], '&#10;')">
      </xsl:when>
      <xsl:otherwise>
        <xsl:text>&#10;</xsl:text>
      </xsl:otherwise>
    </xsl:choose>

    <xsl:copy/>

    <xsl:choose>
      <xsl:when test="following-sibling::node()[1]/self::text()
                      and contains(following-sibling::text()[1], '&#10;')">
      </xsl:when>
      <xsl:when test="following-sibling::node()[1]/self::comment()
                      or following-sibling::node()[1]/self::processing-instruction()">
      </xsl:when>
      <xsl:otherwise>
        <xsl:text>&#10;</xsl:text>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>

Serializing the pretty-printed output and reparsing it should have the effect of normalizing the whitespace so that the document will print with reasonable line breaks and indentation. However,

  • There's nothing about this process that will break very long runs of text into lines of reasonable length.

  • If the parser performs validation on the input, it may have the effect of removing insignificant whitespace.

Your milage may vary.


[1] 

Technically, the stylesheet used is /etc/prettyprint.xsl in the XML Calabash jar file.