PdfParser is a standalone PHP library that provides various tools for extracting data from PDF files. It loads and parses objects and headers, extracts meta data, and extracts text from ordered pages. It supports compressed PDF, MAC OS Roman charset encoding, hex and octal encoding in text sections, and is compliant with PSR-0 (autoloader) and PSR-1 (code styling). Currently, secured documents are not supported.
jPDFText is a Java library to extract text from PDF documents. PDF documents can be processed to extract the textual content for archiving, storage, searching, or indexing. jPDFText is built on top of Qoppa's proprietary PDF technology, so there is no need for any third party software or drivers. Main Features: loading PDF documents from files, network drives, URLs, or input streams; extracting text; and extracting words as a vector of Strings. It is written entirely in Java, which allows your application to remain platform independent. There is no need to install or configure additional drivers or software when deploying.