doclifter is a tool that transcodes {n,t,g}roff documentation to DocBook XML markup. It parses man, mandoc, ms, me, or TkMan page sources, does structural analysis, and recognizes common troff-markup cliches. The result is usable without further hand-hacking about 95% of the time.
| Tags | Documentation Text Processing Markup SGML XML |
|---|---|
| Licenses | BSD Three-Clause |
| Operating Systems | OS Independent |
| Implementation | Python |
Recent releases


Release Notes: An improvement was made for lynxprep handling.


Release Notes: This release cleans up glitches revealed by pychecker. It fixes buggy interpretation of the ms .AI macro. It maps the TBL "box" attribute to Docbook frame="border".


Release Notes: The groff \m color extension is now handled. Manpages generated by reStructeredText are dealt with gracefully. Groff-style \F font escapes are coped with better. A partial interpretation of troff \h was added.


Release Notes: eqn markup is now handled if the eqn -TMathml switch produces results. A -w option was added for strict portability checking. All troff glyphs are now mapped - bracket-pile characters, yogh, hooked-o, and underdot were added. You are now warned of sequences that look like glyphs but can't be mapped. Table handling for mdoc pages has been much improved. Tests for requests that can't be turned into structures are stricter.


Release Notes: A bug in db2man.xsl was worked around. Markus Hoenicka's requested behavior for multiple-file conversions was implemented. Translation of groff extended .cc and .c2 requests was implemented. The .TA macro that occurs duplicatively with .ta in X.org manual pages is now ignored. The program can cope with unresolved .Sx refererences in mdoc. .Ex and .Ee are handled. The X consortium macro preamble is now handled better. .RS/.RE is now fully handled, with no more spurious warnings.
Recent comments
18 Sep 2002 01:38
Re: Unique and powerful addition to DocBook toolchain
> The only significant problem I've run
> into with the
> 1.0.0 version is in the implementation
> it uses for dealing
> with ISO character entities: In some XML
> instances, it
> generates internal DTD subsets that
> include entity
> declarations which reference the SGML
> versions of the ISO
> character-entity sets instead of the XML
> versions.
>
> But that's a really minor issue, and one
> that I'm sure
> Eric will probably have fixed in the
> next release.
Your wish is granted. :-)
05 Sep 2002 23:06
Unique and powerful addition to DocBook toolchain
This is an important addition to the DocBook toolchain.
It fills a big need and is unique in that (as far as I
know) there are no other tools available -- open-source or
proprietary -- for converting man/roff docs to DocBook.
There's some very clever logic in it for making
inferences about structure from some of the
not-that-explicitly-structured roff markup and turning it
into fairly structured DocBook markup. In particular, it
can:
* parse command/function synopses and convert them into
DocBook markup (using "real" markup like Cmdsynopsis, Arg,
Replaceable, etc.)
* recognize things like use of italics in a FILES
section to mark filenames, and convert them to correct
DocBook markup (e.g., using the Filename element)
* recognize patterns such as URLs, email addresses, man
page references, and C program listings, and convert them
to correct DocBook markup
The only significant problem I've run into with the
1.0.0 version is in the implementation it uses for dealing
with ISO character entities: In some XML instances, it
generates internal DTD subsets that include entity
declarations which reference the SGML versions of the ISO
character-entity sets instead of the XML versions.
A workaround is simply to delete any ISO character
entity declarations from doclifter-generated XML documents.
The declarations are actually redundant at best, because
both the DocBook XML and SGML DTDs already reference the
appropriate sets.
But that's a really minor issue, and one that I'm sure
Eric will probably have fixed in the next release.