193 projects tagged "Linguistic"

Download Website Updated 15 May 2011 uni2ascii

Pop 187.87
Vit 12.06

uni2ascii and ascii2uni provide conversion in both directions between UTF-8 Unicode and more than thirty 7-bit ASCII equivalents, including RFC 2396 URI format and RFC 2045 Quoted Printable format, the representations used in HTML, SGML, XML, OOXML, the Unicode standard, Rich Text Format, POSIX portable charmaps, POSIX locale specifications, and Apache log files. It can also convert between the escapes used for Unicode in languages such as Ada, C, Common Lisp, Java, Pascal, Perl, Postscript, Python, Scheme, and Tcl.

Download Website Updated 11 Jan 2010 msort

Pop 180.88
Vit 11.35

Msort sorts files in sophisticated ways. Records may be fixed size, newline-separated blocks, or terminated by any specified character. Key fields may be selected by position, tag, or character range. For each key, distinct exclusions, multigraphs, substitutions, and a sort order may be defined or locale collation rules used. Comparisons may be lexicographic, numeric, numeric string, hybrid, random, by string length, angle, domain name, date, time, month name, or ISO8601 timestamp. Keys may be reversed so as to generate reverse dictionaries. Optional keys are supported. Unicode is supported, including full case-folding. Msort itself has a somewhat complex command line interface, but may be driven by an optional GUI.

Download Website Updated 12 Mar 2005 HaTx

Pop 19.90
Vit 1.75

HaTx is a script intended for adding diacritic marks to (Czech) text. It is based on statistical methods. Statistics are gathered from training data, stored in a database, and then used.

Download Website Updated 28 Jan 2007 ByteName

Pop 46.58
Vit 3.92

ByteName is a tool that for each byte of the input prints a line consisting of the byte offset, the byte in hex, octal, binary, and decimal, and its description in a selected single-byte encoding. A command line flag suppresses printing of lines corresponding to ASCII characters, which is useful for locating stray non-ASCII codes. It can also generate a chart for a specified encoding or, for a specified codepoint, generate descriptions in all known encodings.

No download Website Updated 30 Jan 2005 libtranslate

Pop 21.10
Vit 1.00

libtranslate is a library for translating text and Web pages between natural languages. Its modular infrastructure allows the user to implement new translation services separately from the core library. libtranslate is shipped with a generic module that supports Web-based translation services such as Babel Fish, Google Language Tools, and SYSTRAN. Moreover, the generic module allows new services to be added simply by adding a few lines to an XML file. The libtranslate distribution includes a powerful command line interface.

Download Website Updated 30 Jan 2005 GNOME Translate

Pop 25.10
Vit 1.00

GNOME Translate is a GNOME interface to libtranslate. It can translate a text or Web page between several natural languages, and it can automatically detect the source language as you type in text.

Download Website Updated 15 Nov 2009 minpair

Pop 43.08
Vit 4.53

Minpair consists of two programs, a C command-line program and a Tcl/Tk GUI, each of which can independently generate a complete list of minimal pairs (words differing in exactly one segment) for use in linguistic research. The GUI may also be used to control the faster CLI program. Both allow sequences of characters to be defined as single segments. Unicode is fully supported. It is also possible to obtain a list of pairs differing in exactly two positions for use in finding phonological rules.

Download Website Updated 18 Mar 2005 Universal Text Recognizer and Converter

Pop 39.24
Vit 1.00

The Universal Text Recognizer and Converter (Utrac) is a commandline tool and a C library that recognizes the encoding of an input file (UTF-8, ISO-8859-1, CP437, etc.) and its end-of-line type (CR, LF, or CRLF). It features automatic recognition (depending on the file and on the system's locale, reliable in most cases), assistance for verification or manual recognition, and conversion to another charset and/or end-of-line type.

Download Website Updated 22 Mar 2007 SlpTK

Pop 21.33
Vit 1.60

SlpTK is an ANSI C library, a set of utilities, and scripts for natural language processing. It provides data structures and treatments related to lexical and syntactic levels.

Download Website Updated 20 Nov 2009 po for anything

Pop 41.86
Vit 3.19

The goal of po4a (po for anything) is to ease the creation and maintenance of translations using gettext tools on areas where they were not expected, like documentation.


Project Spotlight


A GUI toolkit for Linux and Android.


Project Spotlight


A command scheduler for non-permanently-running systems.