PDFTextStream is a PDF text and metadata extraction library available for Java and .NET. It supports all versions of the PDF document specification (including v1.7, used by Acrobat 8, 9, and X), extraction of text encoded using double-byte character sets (including Chinese, Japanese, and Korean), decryption of documents encrypted using 40-bit, 128-bit, 256-bit, and variable bit length ciphers, and extraction of all document metadata provided by PDF documents (including form data, bookmarks, and annotations). Easy integration with Jakarta Lucene is included, as well as interactive form update capability.
Guardian Grab eases the process of downloading the digital editions of The Guardian and The Observer newspapers, which are available through Newspaper Direct. Guardian Grab interacts with the Newspaper Direct site to log-on, identify available sections, and download the newspaper in the PDF, mobi (Kindle), or ePub formats. Downloads are arranged by paper, date, and format under a specified directory. Guardian Grab also maintains a directory holding the latest copy of each paper for easy syncing.