TXR is a new data munging language. TXR's special pattern language provides template-based matching of entire documents or large sections of documents. It also contains a language for functional and imperative programming. It is written in C and takes the form of a utility that is portable to Unix-like platforms and Windows.
The AlchemyAPI Android SDK enables real-time semantic analysis of text, HTML, or Internet-hosted Web page content. The SDK provides mechanisms to extract Concepts, Named Entities, Keywords and Tags, Categories, and clean HTML into text, and even detects languages. It can analyze text in eight different languages: English, French, German, Italian, Portuguese, Russian, Spanish, and Swedish. Example code and a demo application are included to help get you started.
PHP Text Diff Highlight class can find and view the difference between text strings. It takes two text strings and uses the diff algorithm to find the differences between them and return a list of changes to patch the original string to become the final string. The patch list shows what text should be added or removed to change one string into the other. The difference between the text strings may be computed in three modes: by character, by word, or by line. The class may also format the strings to view them as HTML, showing which characters are added and removed with special insertion and deletion styles. The example page works as a tool to interactively view the changes as the user changes the texts before and after the changes are applied.
Apache OpenNLP is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services.
Find What I Mean aims to provide a searching library that tolerates errors in queries. It will auto-correct typos, extra letters, and so on. This is extremely useful when searching for an item in a list. In traditional search methods the query must be perfect or you get zero matches.