pdftohtml converts Portable Document Format files to HTML. This release converts text and links. Bold and italic face are preserved, but high level HTML structures (like lists or tables) are not yet generated. Images are ignored in the current version (but you can extract them from the pdf file using pdfimages, distributed with xpdf).
| Tags | Text Processing |
|---|---|
| Licenses | GPL |
Recent releases


Release Notes: Medium bugfixes with font handling, outline generation added.


Release Notes: Updates to use Xpdf 2.01 sources.


Release Notes: A new commandline option to extract hidden text was added. Parts of hrefs (links) are joined together, as are paragraphs. A crash bug was fixed. Xpdf was updated to 1.01, and a new ability to produce noframes (single HTML file) output for complex mode was added.


Release Notes: An update to use xpdf 1.0, an experimental ability to specify output encoding (UTF-8 might work), an ability to specify user/master password, a fix for a core dump on documents with type 3 fonts only, a fix for a bug with inline images not being handled properly, executing Ghostscript from the program itself, production of ps output (for complex mode) and removal of pdftops, and fixes for several XML-related bugs and some memory leaks.


No changes have been submitted for this release.
Recent comments
08 Apr 2003 13:07
Re: Unproffesional first impression
have you considered changing your nickname to 519r4?
You are right though, I should change to -0.35, underscores are ugly.
> The filename of this project gives an
> unprofessional
> first impression. The ugly
> pdftohtml_0_35.tar.gz
> should of course be
> pdf2html-0.35.tar.gz.
08 Apr 2003 11:39
Unproffesional first impression
The filename of this project gives an unprofessional
first impression. The ugly pdftohtml_0_35.tar.gz
should of course be pdf2html-0.35.tar.gz.
09 Aug 2002 02:02
Works great
Great command line tool, just compile and go.