PDFTextStream is a PDF text and metadata extraction library available for Java and .NET. It supports all versions of the PDF document specification (including v1.7, used by Acrobat 8, 9, and X), extraction of text encoded using double-byte character sets (including Chinese, Japanese, and Korean), decryption of documents encrypted using 40-bit, 128-bit, 256-bit, and variable bit length ciphers, and extraction of all document metadata provided by PDF documents (including form data, bookmarks, and annotations). Easy integration with Jakarta Lucene is included, as well as interactive form update capability.
ProteomeCommons.org IO Framework is a proper Java framework for handling spectra and peak lists. The framework can read and write to a number of different spectra and peak list formats, and it provides a simple, intuitive Java object model for working with spectra or peak lists. All classes support two methods of handling peak list and spectrum data: in-memory or stream. The goal of this framework is to support all the popular MS and MSMS data formats, and to eliminate any time or effort involved in figuring out how to read and write peak list or spectrum files.