jPDFText is a Java library to extract text from PDF documents. PDF documents can be processed to extract the textual content for archiving, storage, searching, or indexing. jPDFText is built on top of Qoppa's proprietary PDF technology, so there is no need for any third party software or drivers. Main Features: loading PDF documents from files, network drives, URLs, or input streams; extracting text; and extracting words as a vector of Strings. It is written entirely in Java, which allows your application to remain platform independent. There is no need to install or configure additional drivers or software when deploying.
Japplis Toolbox is a compilation of text utilities in one application. It can encode and decode URL, Base64, Hex, SoundEx, or Metaphone. It can convert numbers from/to binary, octal, decimal, and hexadecimal, and to date. It gives you text information such as character count, word count, MD5, or SHA. You can get Java system properties, environment variables, or Swing default values. It checks and finds regular expressions. It can also manipulate lines of text by sorting, reversing, shuffling, deleting duplicates, trimming spaces, or numbering lines.
Pymur provides Python bindings to the C++ based Lemur Toolkit. The Lemur Toolkit is an open-source toolkit designed to facilitate research in language modeling and information retrieval. Lemur supports a wide range of industrial and research language applications such as ad-hoc retrieval, site-search, and text mining.
a2b invokes conversion tools in sequence to convert files from one type to another, possibly performing some extra processing along the way. Itis a wrapper for many programs like gif2png, latex2html, netpbm, mencoder, ffmpeg, oggenc, etc., that can convert content from one type to another. It can covert text, documents, images, audio, and video and more. Some examples: a2b test.mp3 test.ogg; mencoder_opts="-ss 60 -endpos 10" a2b video.avi clip.flv; a2b -g=subtitles dvd://8 movie.sub; a2b -q http://sam.nipl.net/ sam.aac. That last example downloads the author's Web page, converts it to text with lynx -dump, speaks the text with flite or espeak, and converts the audio file to AAC with faac.
Notational Velocity is a Mac OS X desktop application that stores and retrieves notes. The same area is used both for creating notes and searching. I.e., in the process of entering the title for a new note, related notes appear below, letting users file information there if they choose. Likewise, if a search reveals nothing, one need simply press return to create a note with the appropriate title.
Thot takes as input text in a wiki-like format and outputs results for different formats: HTML, Latex, DocBook, and PDF. Although delivered with only one input language (Dokuwiki format), Thot is very versatile and easy to extend. For example, the initial version allows you to embed a document description from different entities: source language, GraphViz DOT graphs, Latex math, etc.