Name

cx:metadata-extractor — Extract metadata from images.

Synopsis

<p:declare-step type="cx:metadata-extractor" xmlns:cx="http://xmlcalabash.com/ns/extensions">
     <p:output port="result"/>
     <p:option name="href"/>                                       <!-- anyURI -->
</p:declare-step>

Description

The cx:metadata-extractor step returns an XML description of the metadata associated with an image. For example, the metadata associated with the image in Figure 1, “Amaryllis” is shown in Example 2, “Amaryllis metadata”.

Amaryllis
Figure 1. Amaryllis
Example 2. Amaryllis metadata
<c:metadata xmlns:c="http://www.w3.org/ns/xproc-step"
            href="amaryllis.jpg">
   <c:tag dir="Jpeg" type="0x0000" name="Data Precision">8 bits</c:tag>
   <c:tag dir="Jpeg" type="0x0001" name="Image Height">336 pixels</c:tag>
   <c:tag dir="Jpeg" type="0x0003" name="Image Width">500 pixels</c:tag>
   <c:tag dir="Jpeg" type="0x0005" name="Number of Components">3</c:tag>
   <c:tag dir="Jpeg" type="0x0006" name="Component 1">Y component: Quantization table 0, Sampling factors 1 horiz/1 vert</c:tag>
   <c:tag dir="Jpeg" type="0x0007" name="Component 2">Cb component: Quantization table 1, Sampling factors 1 horiz/1 vert</c:tag>
   <c:tag dir="Jpeg" type="0x0008" name="Component 3">Cr component: Quantization table 1, Sampling factors 1 horiz/1 vert</c:tag>
</c:metadata>

In practice, the amount of metadata associated with an image is almost arbitrarily large. Example 3, “Digital Camera metadata”, for example, shows the metadata associated with an image from a digital camera.

Example 3. Digital Camera metadata
<c:metadata xmlns:c="http://www.w3.org/ns/xproc-step"
            href="file:/Volumes/Data/Pictures/2012/01/14/20120114-085508.jpg">
   <c:tag dir="Exif" type="0x010f" name="Make">Panasonic</c:tag>
   <c:tag dir="Exif" type="0x0110" name="Model">DMC-FS25</c:tag>
   <c:tag dir="Exif" type="0x0112" name="Orientation">Right side, top (Rotate 90 CW)</c:tag>
   <c:tag dir="Exif" type="0x011a" name="X Resolution">180 dots per inch</c:tag>
   <c:tag dir="Exif" type="0x011b" name="Y Resolution">180 dots per inch</c:tag>
   <c:tag dir="Exif" type="0x0128" name="Resolution Unit">Inch</c:tag>
   <c:tag dir="Exif" type="0x0131" name="Software">Ver.1.1  </c:tag>
   <c:tag dir="Exif" type="0x0132" name="Date/Time">2012-01-14T08:55:08</c:tag>
   <c:tag dir="Exif" type="0x0213" name="YCbCr Positioning">Datum point</c:tag>
   <c:tag dir="Exif" type="0x829a" name="Exposure Time">1/40 sec</c:tag>
   <c:tag dir="Exif" type="0x829d" name="F-Number">F5.6</c:tag>
   <c:tag dir="Exif" type="0x8822" name="Exposure Program">Program normal</c:tag>
   <c:tag dir="Exif" type="0x8827" name="ISO Speed Ratings">160</c:tag>
   <c:tag dir="Exif" type="0x9000" name="Exif Version">2.21</c:tag>
   <c:tag dir="Exif" type="0x9003" name="Date/Time Original">2012-01-14T08:55:08</c:tag>
   <c:tag dir="Exif" type="0x9004" name="Date/Time Digitized">2012-01-14T08:55:08</c:tag>
   <c:tag dir="Exif" type="0x9101" name="Components Configuration">YCbCr</c:tag>
   <c:tag dir="Exif" type="0x9102" name="Compressed Bits Per Pixel">4 bits/pixel</c:tag>
   <c:tag dir="Exif" type="0x9204" name="Exposure Bias Value">0 EV</c:tag>
   <c:tag dir="Exif" type="0x9205" name="Max Aperture Value">F3.3</c:tag>
   <c:tag dir="Exif" type="0x9207" name="Metering Mode">Multi-segment</c:tag>
   <c:tag dir="Exif" type="0x9208" name="Light Source">Unknown (4)</c:tag>
   <c:tag dir="Exif" type="0x9209" name="Flash">Flash fired, auto</c:tag>
   <c:tag dir="Exif" type="0x920a" name="Focal Length">14.2 mm</c:tag>
   <c:tag dir="Exif" type="0xa000" name="FlashPix Version">1.00</c:tag>
   <c:tag dir="Exif" type="0xa001" name="Color Space">sRGB</c:tag>
   <c:tag dir="Exif" type="0xa002" name="Exif Image Width">4000 pixels</c:tag>
   <c:tag dir="Exif" type="0xa003" name="Exif Image Height">3000 pixels</c:tag>
   <c:tag dir="Exif" type="0xa217" name="Sensing Method">One-chip color area sensor</c:tag>
   <c:tag dir="Exif" type="0xa300" name="File Source">Digital Still Camera (DSC)</c:tag>
   <c:tag dir="Exif" type="0xa301" name="Scene Type">Directly photographed image</c:tag>
   <c:tag dir="Exif" type="0xa401" name="Custom Rendered">Normal process</c:tag>
   <c:tag dir="Exif" type="0xa402" name="Exposure Mode">Auto exposure</c:tag>
   <c:tag dir="Exif" type="0xa403" name="White Balance">Auto white balance</c:tag>
   <c:tag dir="Exif" type="0xa404" name="Digital Zoom Ratio">Digital zoom not used.</c:tag>
   <c:tag dir="Exif" type="0xa405" name="Focal Length 35">80mm</c:tag>
   <c:tag dir="Exif" type="0xa406" name="Scene Capture Type">Standard</c:tag>
   <c:tag dir="Exif" type="0xa407" name="Gain Control">Low gain up</c:tag>
   <c:tag dir="Exif" type="0xa408" name="Contrast">None</c:tag>
   <c:tag dir="Exif" type="0xa409" name="Saturation">None</c:tag>
   <c:tag dir="Exif" type="0xa40a" name="Sharpness">None</c:tag>
   <c:tag dir="Exif" type="0xc4a5" name="Unknown tag (0xc4a5)">...</c:tag>
   <c:tag dir="Exif" type="0x0103" name="Compression">JPEG (old-style)</c:tag>
   <c:tag dir="Exif" type="0x0201" name="Thumbnail Offset">10740 bytes</c:tag>
   <c:tag dir="Exif" type="0x0202" name="Thumbnail Length">4153 bytes</c:tag>
   <c:tag dir="Exif" type="0xf001" name="Thumbnail Data">[4153 bytes of thumbnail data]</c:tag>
   <c:tag dir="Panasonic Makernote" type="0x0001" name="Quality Mode">2</c:tag>
   <c:tag dir="Panasonic Makernote" type="0x0002" name="Version">0 1 1 0</c:tag>
   <c:tag dir="Panasonic Makernote" type="0x0003" name="Unknown tag (0x0003)">1</c:tag>
   ...
   <c:tag dir="Panasonic Makernote" type="0x001c" name="Macro Mode">Off</c:tag>
   <c:tag dir="Panasonic Makernote" type="0x001f" name="Record Mode">Unknown (37)</c:tag>
   <c:tag dir="Interoperability" type="0x0001" name="Interoperability Index">Recommended Exif Interoperability Rules (ExifR98)</c:tag>
   <c:tag dir="Interoperability" type="0x0002" name="Interoperability Version">1.00</c:tag>
   <c:tag dir="Jpeg" type="0x0000" name="Data Precision">8 bits</c:tag>
   <c:tag dir="Jpeg" type="0x0001" name="Image Height">3000 pixels</c:tag>
   <c:tag dir="Jpeg" type="0x0003" name="Image Width">4000 pixels</c:tag>
   <c:tag dir="Jpeg" type="0x0005" name="Number of Components">3</c:tag>
   <c:tag dir="Jpeg" type="0x0006" name="Component 1">Y component: Quantization table 0, Sampling factors 1 horiz/2 vert</c:tag>
   <c:tag dir="Jpeg" type="0x0007" name="Component 2">Cb component: Quantization table 1, Sampling factors 1 horiz/1 vert</c:tag>
   <c:tag dir="Jpeg" type="0x0008" name="Component 3">Cr component: Quantization table 1, Sampling factors 1 horiz/1 vert</c:tag>
</c:metadata>

The cx:metadata-extractor step relies on Drew NoakesMetadata Extractor libraries. If the image identified is not a JPEG image, then the underlying Java Image properties are returned, if possible. For example, a PNG image might return results such as those shown in Example 4, “PNG metadata”.

Example 4. PNG metadata
<c:metadata xmlns:c="http://www.w3.org/ns/xproc-step"
            href="cover.png">
   <c:tag dir="Exif" type="0x9000" name="Exif Version">0</c:tag>
   <c:tag dir="Jpeg" type="0x0001" name="Image Height">480 pixels</c:tag>
   <c:tag dir="Jpeg" type="0x0003" name="Image Width">364 pixels</c:tag>
</c:metadata>

The “Exif Version” of “0” is returned to identify this case.

Implementation

This step is implemented by the xmlcalabash1-metadata-extractor module. The jar file from that project must be in the class path in order to use this step.