Comments for DocBook Doclet

06 Sep 2002 01:15 xmldoc

Important tool with some current limitations

This is an important addition to the DocBook
toolchain -- at least as important for its ability to
do "standalone" HTML->DocBook conversion as it is for
its ability to produce DocBook from Java source
documentation. And as far as I know there are no other
open-source tools available for converting HTML to

As of the 0.29 release, however, I think you can't yet
expect it to always produce valid DocBook that doesn't
require some manual cleanup (though it does always generate
clean well-formed XML -- nicely indented even).

The validity limitations I've seen relate mostly to
the fact that HTML permit certain kinds of markup
instances that really aren't complete, though they are
valid against the HTML DTD. When these markup instances get
converted to DocBook, which does require more complete
structures, they may not be valid.

For, example, in HTML, it's valid for a definition
list (dl element) to contain only a term (dt) with no
corresponding description (dd). But the DocBook Doclet
will convert that to a Variablelist containing a Term
but no associated Listitem (the equivalent of dd). This
generates validity errors because the Variablelist
content model requires a Listitem.

But validity errors like that are fairly easy to
find and clean up manually, so it's not that big of a
limitation. For future releases, it would be very
useful to have some logic in DocBook Doclet to detect
and automatically correct certain instances like that,
so that they don't need to be corrected manually.


