Projects / pdftohtml

pdftohtml

pdftohtml converts Portable Document Format files to HTML. This release converts text and links. Bold and italic face are preserved, but high level HTML structures (like lists or tables) are not yet generated. Images are ignored in the current version (but you can extract them from the pdf file using pdfimages, distributed with xpdf).

Tags
Licenses

Recent releases

  •  27 Jun 2003 21:07

    Release Notes: Medium bugfixes with font handling, outline generation added.

    •  23 Feb 2003 00:53

      Release Notes: Updates to use Xpdf 2.01 sources.

      •  19 Jun 2002 01:07

        Release Notes: A new commandline option to extract hidden text was added. Parts of hrefs (links) are joined together, as are paragraphs. A crash bug was fixed. Xpdf was updated to 1.01, and a new ability to produce noframes (single HTML file) output for complex mode was added.

        •  22 Apr 2002 22:04

          Release Notes: An update to use xpdf 1.0, an experimental ability to specify output encoding (UTF-8 might work), an ability to specify user/master password, a fix for a core dump on documents with type 3 fonts only, a fix for a bug with inline images not being handled properly, executing Ghostscript from the program itself, production of ps output (for complex mode) and removal of pdftops, and fixes for several XML-related bugs and some memory leaks.

          •  13 Jun 2000 12:22

            No changes have been submitted for this release.

            Recent comments

            08 Apr 2003 13:07 meshko

            Re: Unproffesional first impression
            have you considered changing your nickname to 519r4?
            You are right though, I should change to -0.35, underscores are ugly.


            > The filename of this project gives an
            > unprofessional
            > first impression. The ugly
            > pdftohtml_0_35.tar.gz
            > should of course be
            > pdf2html-0.35.tar.gz.


            08 Apr 2003 11:39 sigra

            Unproffesional first impression
            The filename of this project gives an unprofessional
            first impression. The ugly pdftohtml_0_35.tar.gz
            should of course be pdf2html-0.35.tar.gz.

            09 Aug 2002 02:02 jleavens

            Works great
            Great command line tool, just compile and go.

            Screenshot

            Project Spotlight

            OpenStack4j

            A Fluent OpenStack client API for Java.

            Screenshot

            Project Spotlight

            TurnKey TWiki Appliance

            A TWiki appliance that is easy to use and lightweight.