Projects / PdfParser

PdfParser

PdfParser is a standalone PHP library that provides various tools for extracting data from PDF files. It loads and parses objects and headers, extracts meta data, and extracts text from ordered pages. It supports compressed PDF, MAC OS Roman charset encoding, hex and octal encoding in text sections, and is compliant with PSR-0 (autoloader) and PSR-1 (code styling). Currently, secured documents are not supported.

Tags
Licenses
Operating Systems
Implementation
Translations

Last announcement

Referenced by Softpedia 03 Feb 2014 10:36

Proud to be referenced by Softpedia Linux software database : http://linux.softpedia.com/get/Printing/PdfParser-103281.shtml

Recent releases

  •  17 Feb 2014 09:42

    Release Notes: This release fixes some bugs in parsing (font, secured files, etc.). The TCPDF dependency needs to be updated.

    •  29 Jan 2014 23:43

      Release Notes: This release fixed xobject text extraction and added text fallback in case of missing fonts.

      •  26 Jan 2014 17:34

        Release Notes: The project has changed licensing from the GPLv2 to the GPLv3 to match TCPDF requirements.

        •  25 Jan 2014 18:34

          Release Notes: This release updates the parser to support content array objects outside the header (a rewrite of the method Page::getText and a hotfix).

          •  25 Jan 2014 12:24

            Release Notes: This release adds support for specific date formats and spaces escapes.

            Recent comments

            06 Sep 2013 14:36 smalot

            The library is currently under active development on charset encoding handling.
            Any help will be appreciate.

            Screenshot

            Project Spotlight

            OpenStack4j

            A Fluent OpenStack client API for Java.

            Screenshot

            Project Spotlight

            TurnKey TWiki Appliance

            A TWiki appliance that is easy to use and lightweight.