PDFTextStream is a PDF text and metadata extraction library available for Java and .NET. It supports all versions of the PDF document specification (including v1.7, used by Acrobat 8, 9, and X), extraction of text encoded using double-byte character sets (including Chinese, Japanese, and Korean), decryption of documents encrypted using 40-bit, 128-bit, 256-bit, and variable bit length ciphers, and extraction of all document metadata provided by PDF documents (including form data, bookmarks, and annotations). Easy integration with Jakarta Lucene is included, as well as interactive form update capability.
PDFtk Server is a simple commandline tool for doing everyday things with PDF documents. You can use it to merge PDF documents or collate PDF page scans, split PDF pages into a new document, rotate PDF documents or pages, decrypt input as necessary (password required), encrypt output as desired, fill PDF forms with X/FDF data and/or flatten forms, generate FDF data stencils from PDF forms, apply a background watermark or a foreground stamp, report PDF metrics, bookmarks, and metadata, add/update PDF bookmarks or metadata, attach files to PDF pages or the PDF document, unpack PDF attachments, burst a PDF document into single pages, uncompress and re-compress page streams, and repair corrupted PDF files (where possible).
PDFXMLRPC enables client-server PDF creation from text over the Internet or an intranet, running over XML-RPC and HTTP. It consists of a library that you can use to create an XML-RPC PDF server, a library that you can use to create an XML-RPC PDF client, and clients created using these libraries. The client can send text to the server repeatedly, using XML-RPC method calls. The server converts that text to PDF content, and in response to a different method call from the client, sends that PDF content back to the client. The client then saves that PDF content to a local PDF file.
The ReportLab Toolkit is a library for programatically creating documents in PDF format. It can quickly and easily create or automate complex, data-driven documents. It features a real document layout engine, flowable objects (such as paragraphs, headlines, tables, images, and graphics), support for embedded Type-1 or TTF fonts, support for Asian, Hebrew, and Arabic characters, support for bitmap images in any popular format, support for vector graphics, a library of reusable primitive shapes, and an extensible widget library. It includes simple demos and more complex tools. It allows for any data sources.
getxbook is a collection of tools to download books from websites. There are tools to download from Google Books' "book preview", Amazon's "look inside the book", and Barnes and Noble's "book viewer". There is an optional GUI written in Tcl/Tk, and some shell scripts using OCR to create plain text or searchable PDFs and DjVu files from the downloaded books.