PDFTextStream is a PDF text and metadata extraction library available for Java and .NET. It supports all versions of the PDF document specification (including v1.7, used by Acrobat 8, 9, and X), extraction of text encoded using double-byte character sets (including Chinese, Japanese, and Korean), decryption of documents encrypted using 40-bit, 128-bit, 256-bit, and variable bit length ciphers, and extraction of all document metadata provided by PDF documents (including form data, bookmarks, and annotations). Easy integration with Jakarta Lucene is included, as well as interactive form update capability.
libHaru is a library for generating PDF files. It supports generating PDF files with lines, text, images, outlines, text annotations, and link annotations, compressing a document with deflate-decode, embedding PNG and JPEG images, embedding Type1 and TrueType fonts, creating encrypted PDF files, using various character sets, using CJK fonts and encodings, and basic U3D usage.
offrss is a standalone program that can download your favorite feeds and then show them in your favorite Web browser by spawning a simple local Web server. It will not only download the feeds' text, but also the pictures, so you will also be able to read comics strips and enjoy posts with pictures in them while offline. It can also generate PDFs from text. It remembers what you read and what you don't, and all the information stays in normal files, so you can synchronize it easily to any device that may not have an Internet connection. It can also work as a CGI to serve your feeds in your Web site, and it can update the feeds from crontab. It has few dependencies to build and can be cross compiled easily.