PDFTextStream is a PDF text and metadata extraction library available for Java and .NET. It supports all versions of the PDF document specification (including v1.7, used by Acrobat 8, 9, and X), extraction of text encoded using double-byte character sets (including Chinese, Japanese, and Korean), decryption of documents encrypted using 40-bit, 128-bit, 256-bit, and variable bit length ciphers, and extraction of all document metadata provided by PDF documents (including form data, bookmarks, and annotations). Easy integration with Jakarta Lucene is included, as well as interactive form update capability.
PDFtk Server is a simple commandline tool for doing everyday things with PDF documents. You can use it to merge PDF documents or collate PDF page scans, split PDF pages into a new document, rotate PDF documents or pages, decrypt input as necessary (password required), encrypt output as desired, fill PDF forms with X/FDF data and/or flatten forms, generate FDF data stencils from PDF forms, apply a background watermark or a foreground stamp, report PDF metrics, bookmarks, and metadata, add/update PDF bookmarks or metadata, attach files to PDF pages or the PDF document, unpack PDF attachments, burst a PDF document into single pages, uncompress and re-compress page streams, and repair corrupted PDF files (where possible).
Guardian Grab eases the process of downloading the digital editions of The Guardian and The Observer newspapers, which are available through Newspaper Direct. Guardian Grab interacts with the Newspaper Direct site to log-on, identify available sections, and download the newspaper in the PDF, mobi (Kindle), or ePub formats. Downloads are arranged by paper, date, and format under a specified directory. Guardian Grab also maintains a directory holding the latest copy of each paper for easy syncing.