XMLCatalog

An XMLCatalog is a catalog of public resources such as DTDs or entities that are referenced in an XML document. Catalogs are typically used to make web references to resources point to a locally cached copy of the resource.

This allows the XML Parser, XSLT Processor or other consumer of XML documents to efficiently allow a local substitution for a resource available on the web.

Note: This task uses, but does not depend on external libraries not included in the Apache Ant distribution. See Library Dependencies for more information.

This data type provides a catalog of resource locations based on the OASIS "Open Catalog" standard. The catalog entries are used both for Entity resolution and URI resolution, in accordance with the org.xml.sax.EntityResolver and javax.xml.transform.URIResolver interfaces as defined in the Java API for XML Processing (JAXP) Specification.

For example, in a web.xml file, the DTD is referenced as:

<!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN"
  "http://java.sun.com/j2ee/dtds/web-app_2_2.dtd">
The XML processor, without XMLCatalog support, would need to retrieve the DTD from the URL specified whenever validation of the document was required.

This can be very time consuming during the build process, especially where network throughput is limited. Alternatively, you can do the following:

  1. Copy web-app_2_2.dtd onto your local disk somewhere (either in the filesystem or even embedded inside a jar or zip file on the classpath).
  2. Create an <xmlcatalog> with a <dtd> element whose location attribute points to the file.
  3. Success! The XML processor will now use the local copy instead of calling out to the internet.

XMLCatalogs can appear inside tasks that support this feature or at the same level as target - i.e., as children of project for reuse across different tasks, e.g. XML Validation and XSLT Transformation. The XML Validate task uses XMLCatalogs for entity resolution. The XSLT Transformation task uses XMLCatalogs for both entity and URI resolution.

XMLCatalogs are specified as either a reference to another XMLCatalog, defined previously in a build file, or as a list of dtd or entity locations. In addition, external catalog files may be specified in a nested catalogpath , but they will be ignored unless the resolver library from xml-commons is available in the system classpath. Due to backwards incompatible changes in the resolver code after the release of resolver 1.0, Ant will not support resolver.jar in version 1.0 - we expect a resolver release 1.1 to happen before Ant 1.6 gets released. A separate classpath for entity resolution may be specified inline via nested classpath elements; otherwise the system classpath is used for this as well.

XMLCatalogs can also be nested inside other XMLCatalogs. For example, a "superset" XMLCatalog could be made by including several nested XMLCatalogs that referred to other, previously defined XMLCatalogs.

Resource locations can be specified either in-line or in external catalog file(s), or both. In order to use an external catalog file, the xml-commons resolver library ("resolver.jar") must be in your path. External catalog files may be either plain text format or XML format. If the xml-commons resolver library is not found in the classpath, external catalog files, specified in catalogpath, will be ignored and a warning will be logged. In this case, however, processing of inline entries will proceed normally.

Currently, only <dtd> and <entity> elements may be specified inline; these roughly correspond to OASIS catalog entry types PUBLIC and URI respectively. By contrast, external catalog files may use any of the entry types defined in the +OASIS specification.

Entity/DTD/URI Resolution Algorithm

When an entity, DTD, or URI is looked up by the XML processor, the XMLCatalog searches its list of entries to see if any match. That is, it attempts to match the publicId attribute of each entry with the PublicID or URI of the entity to be resolved. Assuming a matching entry is found, XMLCatalog then executes the following steps:

1. Filesystem lookup

The location is first looked up in the filesystem. If the location is a relative path, the ant project basedir attribute is used as the base directory. If the location specifies an absolute path, it is used as is. Once we have an absolute path in hand, we check to see if a valid and readable file exists at that path. If so, we are done. If not, we proceed to the next step.

2. Classpath lookup

The location is next looked up in the classpath. Recall that jar files are merely fancy zip files. For classpath lookup, the location is used as is (no base is prepended). We use a Classloader to attempt to load the resource from the classpath. For example, if hello.jar is in the classpath and it contains foo/bar/blat.dtd it will resolve an entity whose location is foo/bar/blat.dtd. Of course, it will not resolve an entity whose location is blat.dtd.

3a. Apache xml-commons resolver lookup

What happens next depends on whether the resolver library from xml-commons is available on the classpath. If so, we defer all further attempts at resolving to it. The resolver library supports extremely sophisticated functionality like URL rewriting and so on, which can be accessed by making the appropriate entries in external catalog files (XMLCatalog does not yet provide inline support for all of the entries defined in the OASIS standard).

3. URL-space lookup

Finally, we attempt to make a URL out of the location. At first this may seem like this would defeat the purpose of XMLCatalogs -- why go back out to the internet? But in fact, this can be used to (in a sense) implement HTTP redirects, substituting one URL for another. The mapped-to URL might also be served by a local web server. If the URL resolves to a valid and readable resource, we are done. Otherwise, we give up. In this case, the XML processor will perform its normal resolution algorithm. Depending on the processor configuration, further resolution failures may or may not result in fatal (i.e. build-ending) errors.

XMLCatalog attributes

Attribute Description Required
id a unique name for an XMLCatalog, used for referencing the XMLCatalog's contents from another XMLCatalog No
refid the id of another XMLCatalog whose contents you would like to be used for this XMLCatalog No

XMLCatalog nested elements

dtd/entity

The dtd and entity elements used to specify XMLCatalogs are identical in their structure

Attribute Description Required
publicId The public identifier used when defining a dtd or entity, e.g. "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN" Yes
location The location of the local replacement to be used for the public identifier specified. This may be specified as a file name, resource name found on the classpath, or a URL. Relative paths will be resolved according to the base, which by default is the Ant project basedir. Yes

classpath

The classpath to use for entity resolution. The nested <classpath> is a path-like structure.

catalogpath

The nested catalogpath element is a path-like structure listing catalog files to search. All files in this path are assumed to be OASIS catalog files, in either plain text format or XML format. Entries specifying nonexistent files will be ignored. If the resolver library from xml-commons is not available in the classpath, all catalogpaths will be ignored and a warning will be logged.

Examples

Set up an XMLCatalog with a single dtd referenced locally in a user's home directory:

    <xmlcatalog>
        <dtd 
            publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
            location="/home/dion/downloads/docbook/docbookx.dtd"/>
    </xmlcatalog>

Set up an XMLCatalog with a multiple dtds to be found either in the filesystem (relative to the Ant project basedir) or in the classpath:

    <xmlcatalog id="commonDTDs">
        <dtd 
            publicId="-//OASIS//DTD DocBook XML V4.1.2//EN"
            location="docbook/docbookx.dtd"/>
        <dtd 
            publicId="-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN"
            location="web-app_2_2.dtd"/>
    </xmlcatalog>

Set up an XMLCatalog with a combination of DTDs and entities as well as a nested XMLCatalog and external catalog files in both formats:

    <xmlcatalog id="allcatalogs">
        <dtd 
            publicId="-//ArielPartners//DTD XML Article V1.0//EN"
            location="com/arielpartners/knowledgebase/dtd/article.dtd"/>
        <entity 
            publicId="LargeLogo"
            location="com/arielpartners/images/ariel-logo-large.gif"/>
        <xmlcatalog refid="commonDTDs"/>
            <catalogpath>
                <pathelement location="/etc/sgml/catalog"/>
                <fileset 
                    dir="/anetwork/drive"
                    includes="**/catalog"/>
                <fileset 
                    dir="/my/catalogs"
                    includes="**/catalog.xml"/>
            </catalogpath>
        </xmlcatalog>
    </xmlcatalog>

To reference the above XMLCatalog in an xslt task:

    <xslt basedir="${source.doc}"
           destdir="${dest.xdocs}"
           extension=".xml"
           style="${source.xsl.converter.docbook}"
           includes="**/*.xml"
           force="true">
        <xmlcatalog refid="allcatalogs"/>
    </xslt>