ANTLR (ANother Tool for Language Recognition) is a language tool that provides a framework for constructing recognizers, compilers, and translators from grammatical descriptions containing C++, Java, or Sather actions. It is similar to the popular compiler generator YACC, however ANTLR is much more powerful and easy to use. ANTLR-produced parsers are not only highly efficient, but are both human-readable and human-debuggable (especially with the interactive ParseView debugging tool). ANTLR can generate parsers, lexers, and tree-parsers in either C++, Java, or Sather. ANTLR is currently written in Java.
Algraeph is a tool for manual alignment of linguistic graphs, such as phrase structure trees or dependency structures, where each node corresponds to a subsequence of the analyzed input sentence. It allows you to express the similarity between two graphs by aligning their nodes and attaching relation labels to these alignments. Graphs are read from one or more graphbanks (or treebanks) in the GraphML or Alpino formats. Alignment relations are user-defined and are stored in a simple XML format, which can be used for further processing. The resulting parallel graph corpus is a useful data set for many tasks in computational linguistics and natural language processing.
An Gramadóir is a grammar checking engine that is designed for the rapid development of grammar checkers for minority languages and other languages with limited computational resources. Rule specifications are given according to a simple syntax combining XML and regular expressions. Part-of-speech tagging can be learned from text corpora using statistical methods. It is currently implemented for Irish (Gaeilge).
Apertium is a machine translation platform, initially aimed at related-language pairs, but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides a language-independent machine translation engine, tools to manage the linguistic data necessary to build a machine translation system for a given language pair, and linguistic data for a growing number of language pairs.
Arabic Wordlist is a project to deliver an English to Arabic translated word list to be used in translations and/or dictionaries. The word list contains in excess of 83,500 words (and growing), and spans a variety of categories (i.e. it is general in nature). This word list is encoded in UTF-8, and is expected to be used in many online free dictionaries.
Biaroza is a multi-dictionary system for human languages which aims to set a standard on such type of software. It works internally (and externally if you want so) in UTF-8. The software itself supports querying by particles, customizable in/out filtering, and interface mode (for using with another software) among other features.
Booleano is an interpreter of Boolean expressions; a library to define and run filters available as text (e.g. in a natural language) or in Python code. In order to handle text-based filters, Booleano ships with a fully-featured parser whose grammar is adaptive: Its properties can be overridden using simple configuration directives. On the other hand, the library exposes a Pythonic API for filters written in pure Python. These filters are particularly useful to build reusable conditions from objects provided by a third party library.
ByteName is a tool that for each byte of the input prints a line consisting of the byte offset, the byte in hex, octal, binary, and decimal, and its description in a selected single-byte encoding. A command line flag suppresses printing of lines corresponding to ASCII characters, which is useful for locating stray non-ASCII codes. It can also generate a chart for a specified encoding or, for a specified codepoint, generate descriptions in all known encodings.