Projects / doclifter

doclifter

doclifter helps with lifting documents with nroff markup to XML-DocBook. Lifting documents from presentation level to semantic level is hard, and a really good job requires human polishing. This tool aims to do everything that can be mechanized, and to preserve any troff-level information that might have structural implications in XML comments. TBL tables are translated into DocBook table markup, PIC into SVG, and EQN into MathML (relying on pic2svg and GNU eqn for the last two).

Tags
Licenses
Operating Systems
Implementation

RSS Recent releases

  •  17 Jun 2013 21:37

    Release Notes: New logic prevents spurious warnings from .in +N just before .nf. Many more instances of .ta are now automatically handled. Multi-file compilation was broken, and is now repaired.

    •  02 Jun 2013 21:02

      Release Notes: This release deals with W3C moving a math DTD, improves .Bl/.El handling and updates canned strings in mdoc, accepts \(hy in name sections, handles   (inadvertently omitted from DocBook v4), and adds -V for a version option.

      •  19 Mar 2013 12:48

      Release Notes: Trailing comments after table rows are now preserved (example: matherr(3)). Support for some previously missing groff extension glyphs was added. Handling of .Bd/.Be in mdoc was improved.

      •  31 Jul 2012 22:38

      Release Notes: This release handles foojzs pages better, interprets some cases of .rj, recognizes "Feature Test" as a function synopsis ender, handles m, r, and d troff conditionals, processes .ti with positive indent into <blockquote> around the following line, supports all mdoc special-character strings, improved recognition of program listings, and fixed a brown-paper-bag bug in processing of mdoc.

      •  25 Jun 2012 23:20

      Release Notes: This release fixes a bug in command-synopsis parsing. It now lifts 97% of 11,029 pages in a full Ubuntu Precise Pangolin release.

      RSS Recent comments

      18 Sep 2002 01:38 esr

      Re: Unique and powerful addition to DocBook toolchain

      > The only significant problem I've run
      > into with the
      > 1.0.0 version is in the implementation
      > it uses for dealing
      > with ISO character entities: In some XML
      > instances, it
      > generates internal DTD subsets that
      > include entity
      > declarations which reference the SGML
      > versions of the ISO
      > character-entity sets instead of the XML
      > versions.
      >
      > But that's a really minor issue, and one
      > that I'm sure
      > Eric will probably have fixed in the
      > next release.

      Your wish is granted. :-)

      05 Sep 2002 23:06 xmldoc Thumbs up

      Unique and powerful addition to DocBook toolchain

      This is an important addition to the DocBook toolchain.
      It fills a big need and is unique in that (as far as I
      know) there are no other tools available -- open-source or
      proprietary -- for converting man/roff docs to DocBook.

      There's some very clever logic in it for making
      inferences about structure from some of the
      not-that-explicitly-structured roff markup and turning it
      into fairly structured DocBook markup. In particular, it
      can:

      * parse command/function synopses and convert them into
      DocBook markup (using "real" markup like Cmdsynopsis, Arg,
      Replaceable, etc.)

      * recognize things like use of italics in a FILES
      section to mark filenames, and convert them to correct
      DocBook markup (e.g., using the Filename element)

      * recognize patterns such as URLs, email addresses, man
      page references, and C program listings, and convert them
      to correct DocBook markup

      The only significant problem I've run into with the
      1.0.0 version is in the implementation it uses for dealing
      with ISO character entities: In some XML instances, it
      generates internal DTD subsets that include entity
      declarations which reference the SGML versions of the ISO
      character-entity sets instead of the XML versions.

      A workaround is simply to delete any ISO character
      entity declarations from doclifter-generated XML documents.
      The declarations are actually redundant at best, because
      both the DocBook XML and SGML DTDs already reference the
      appropriate sets.

      But that's a really minor issue, and one that I'm sure
      Eric will probably have fixed in the next release.

      Screenshot

      Project Spotlight

      synctool

      A cluster administration tool.

      Screenshot

      Project Spotlight

      Snow

      A program to conceal messages in ASCII text by appending whitespace to the end of lines.