PDFTextStream is a PDF text and metadata extraction library available for Java and .NET. It supports all versions of the PDF document specification (including v1.7, used by Acrobat 8, 9, and X), extraction of text encoded using double-byte character sets (including Chinese, Japanese, and Korean), decryption of documents encrypted using 40-bit, 128-bit, 256-bit, and variable bit length ciphers, and extraction of all document metadata provided by PDF documents (including form data, bookmarks, and annotations). Easy integration with Jakarta Lucene is included, as well as interactive form update capability.
The National Space Science Data Center's (NSSDC) Common Data Format (CDF) is a self-describing data abstraction for the storage and manipulation of multidimensional data in a platform- and discipline-independent fashion. It consists of a scientific data management package (known as the "CDF Library") that allows programmers and application developers to manage and manipulate scalar, vector, and multi-dimensional data arrays.