BabelKit is an interface to a universal multilingual database code table. It takes all of the programming work out of maintaining multiple database code definition sets in multiple languages. The code administration and translation page lets developers define new virtual code tables, new languages, enter all codes and their descriptions, and then translate them into all languages of interest. Perl and PHP classes retrieve the code descriptions and automatically generate HTML code selection elements in the user's language. This makes internationalization and localization of Web sites and database interfaces much easier.
Biaroza is a multi-dictionary system for human languages which aims to set a standard on such type of software. It works internally (and externally if you want so) in UTF-8. The software itself supports querying by particles, customizable in/out filtering, and interface mode (for using with another software) among other features.
Booleano is an interpreter of Boolean expressions; a library to define and run filters available as text (e.g. in a natural language) or in Python code. In order to handle text-based filters, Booleano ships with a fully-featured parser whose grammar is adaptive: Its properties can be overridden using simple configuration directives. On the other hand, the library exposes a Pythonic API for filters written in pure Python. These filters are particularly useful to build reusable conditions from objects provided by a third party library.
ByteName is a tool that for each byte of the input prints a line consisting of the byte offset, the byte in hex, octal, binary, and decimal, and its description in a selected single-byte encoding. A command line flag suppresses printing of lines corresponding to ASCII characters, which is useful for locating stray non-ASCII codes. It can also generate a chart for a specified encoding or, for a specified codepoint, generate descriptions in all known encodings.
CharEntry is a tool for inserting non-ASCII characters into text, with particular emphasis on linguistic notation. It provides charts of the consonants, vowels, and diacritics of the International Phonetic Alphabet as well as a chart of precomposed accented characters. Clicking on a character inserts it into a text region, the contents of which may be saved to a file or copied and pasted elsewhere. A widget for inserting characters by Unicode codepoint is also provided. Furthermore, it is possible to read the definition of a custom character chart from a file.
Ciao is a complete Prolog system subsuming ISO-Prolog with a novel modular design which allows both restricting and extending the language. Ciao extensions currently include feature terms (records), higher-order, functions, constraints, objects, persistent predicates, a good base for distributed execution (agents), and concurrency. Libraries also support WWW programming, sockets, and external interfaces (C, Java, TCL/Tk, relational databases, etc.). An Emacs-based environment, a stand-alone compiler, and a toplevel shell are also provided.
The Computational Linguistics Toolset is a set of tools for computational linguistics. It contains re-usable code for cleaning, splitting, refining, and taking samples from corpora (ICE, Penn, and a native one), for tagging them using the TnT-tagger, for doing permutation statistics on N-grams (useful for finding statistically significant syntactical differences between any two sets of tagged texts), and various examination-tools. The tools themselves are well documented.
Connexor Machinese analyzers process sequences of written words, identify and classify the various entities in them, and show how these relate to each other, marking the language with a simple and systematic notation. Currently, the Machinese product family includes: Machinese Phrase Tagger, a fast, light-weight morphosyntactic tagger; Machinese Syntax, a full-scale dependency parser; Machinese Semantics, a dependency parser with semantic analysis; and Machinese Metadata, an entity extractor.
Convert character set is meant to convert text strings between different character set encodings. It features conversion between single byte character sets, from single byte to multi-byte character sets (UTF-8), and from multi-byte to single byte. All conversion output can be saved with numeric entities (browser character set independent). The main requirement is that a character has to be in both character sets, or it will return an error.