brian m. carlson brian m. carlson Code version 2 of the GNU General Public License the Creative Commons Attribution-ShareAlike 3.0 License

My website is built mostly from DocBook 5.0 XML files, and then converted into HTML, XHTML, Atom, and PDF (for some files). Currently, it involves a lot of XSLT stylesheets and is built by a set of makefiles for NetBSD make (Debian package pmake).

One problem that I have, though, is that it takes a long time, and it isn't very resilient. I'm using tools built on libxml2 and libxslt1.1. However, a few problems occur. Since I want to make sure all my XML validates, I validate my DocBook 5.0 source against the RELAX NG schema. libxml2's RELAX NG support is buggy, however, and it sometimes marks valid files as invalid. Because it also breaks on the Atom RELAX NG schema, I end up having to call some Java tools (like msv). But starting Java repeatedly is slow.

Java programs aren't in and of themselves slow, but starting the VM is very expensive. I happen to know that Java has a lot of really great XML support, and I was vaguely familiar with ant, so I figured that I'd try building my website with ant.

This has been a colossal disaster. Not specifically because of ant, but different pieces of the underlying infrastructure have had subtle breakage. One requirement for the project is that except for the actual upload of the finished product to the server, it cannot require Internet access. This generally requires that all the pieces have catalog support. It is quite fortitous, therefore, that networking is mostly broken in Debian's IcedTea packages. Another important requirement is that XInclude work properly, since my blog and writings use that to generate complete pages.

Today, my problem has been with XInclude. The strategy I originally took was to call xmllint, but that seems kind of dodgy, since my hope is to keep everything in Java. Another strategy I tried was to define a Java system property that would tell Xerces (the XML parser) to automatically XInclude when parsing. This strategy will work in Ant 1.8, but not in Ant 1.7.

The temporary strategy I've settled on is using an XSLT 2.0 stylesheet called xipr. xipr is basically an XInclude implementation in XSLT 2.0. Since it requires an XSLT 2.0 processor, that basically means that I'll be using Saxon-B 9. xipr works great, except that I found that it fails whilst trying to include portions of an XHTML file. This is because even with DTD validation turned off, Saxon-B's doc-available() function tries to parse the to-be-included file, including accessing the DTD. Since by default, Saxon-B does not use catalogs, this results in a failed lookup (because remember, IcedTea has broken networking at the moment), and Saxon-B reports that the document is not available. xipr, not knowing any better, gives up, and the build process is halted.

I'm looking into other methods of attack. Making my documents standalone does not seem to solve the problem.