Projects / pdftohtml

pdftohtml

pdftohtml converts Portable Document Format files to HTML. This release converts text and links. Bold and italic face are preserved, but high level HTML structures (like lists or tables) are not yet generated. Images are ignored in the current version (but you can extract them from the pdf file using pdfimages, distributed with xpdf).

Tags
Licenses

RSS Recent releases

  •  27 Jun 2003 14:07

Release Notes: Medium bugfixes with font handling, outline generation added.

  •  22 Feb 2003 16:53

Release Notes: Updates to use Xpdf 2.01 sources.

  •  18 Jun 2002 21:07

Release Notes: A new commandline option to extract hidden text was added. Parts of hrefs (links) are joined together, as are paragraphs. A crash bug was fixed. Xpdf was updated to 1.01, and a new ability to produce noframes (single HTML file) output for complex mode was added.

  •  22 Apr 2002 18:04

Release Notes: An update to use xpdf 1.0, an experimental ability to specify output encoding (UTF-8 might work), an ability to specify user/master password, a fix for a core dump on documents with type 3 fonts only, a fix for a bug with inline images not being handled properly, executing Ghostscript from the program itself, production of ps output (for complex mode) and removal of pdftops, and fixes for several XML-related bugs and some memory leaks.

  •  30 Jan 2001 06:13

    No changes have been submitted for this release.

    RSS Recent comments

    08 Apr 2003 13:07 meshko

    Re: Unproffesional first impression
    have you considered changing your nickname to 519r4?
    You are right though, I should change to -0.35, underscores are ugly.

    > The filename of this project gives an
    > unprofessional
    > first impression. The ugly
    > pdftohtml_0_35.tar.gz
    > should of course be
    > pdf2html-0.35.tar.gz.

    08 Apr 2003 11:39 Avatar sigra

    Unproffesional first impression
    The filename of this project gives an unprofessional
    first impression. The ugly pdftohtml_0_35.tar.gz
    should of course be pdf2html-0.35.tar.gz.

    09 Aug 2002 02:02 jleavens Thumbs up

    Works great
    Great command line tool, just compile and go.

    Screenshot

    Project Spotlight

    jOOQ

    Java-based object oriented querying.

    Screenshot

    Project Spotlight

    Jipes

    A Java library to efficiently compute audio features.