193 projects tagged "Linguistic"

Download Website Updated 12 Mar 2005 HaTx

Screenshot
Pop 19.90
Vit 1.75

HaTx is a script intended for adding diacritic marks to (Czech) text. It is based on statistical methods. Statistics are gathered from training data, stored in a database, and then used.

Download Website Updated 11 Jan 2010 msort

Screenshot
Pop 180.88
Vit 11.35

Msort sorts files in sophisticated ways. Records may be fixed size, newline-separated blocks, or terminated by any specified character. Key fields may be selected by position, tag, or character range. For each key, distinct exclusions, multigraphs, substitutions, and a sort order may be defined or locale collation rules used. Comparisons may be lexicographic, numeric, numeric string, hybrid, random, by string length, angle, domain name, date, time, month name, or ISO8601 timestamp. Keys may be reversed so as to generate reverse dictionaries. Optional keys are supported. Unicode is supported, including full case-folding. Msort itself has a somewhat complex command line interface, but may be driven by an optional GUI.

Download Website Updated 15 May 2011 uni2ascii

Screenshot
Pop 187.87
Vit 12.06

uni2ascii and ascii2uni provide conversion in both directions between UTF-8 Unicode and more than thirty 7-bit ASCII equivalents, including RFC 2396 URI format and RFC 2045 Quoted Printable format, the representations used in HTML, SGML, XML, OOXML, the Unicode standard, Rich Text Format, POSIX portable charmaps, POSIX locale specifications, and Apache log files. It can also convert between the escapes used for Unicode in languages such as Ada, C, Common Lisp, Java, Pascal, Perl, Postscript, Python, Scheme, and Tcl.

Download Website Updated 16 Jul 2005 Transtalo

Screenshot
Pop 17.18
Vit 3.71

Transtalo is an automatic translator. It consists of a library interface and modules for source and destination languages (called input and output modules). These modules communicate to each other through sentence files in an XML format.

Download Website Updated 05 Oct 2005 ISCII Utilities

Screenshot
Pop 23.43
Vit 1.81

ISCII Utilities is two programs for analyzing text files encoded according to the Indian Script Code for Information Interchange (ISCII), the Indian national standard. IsciiName identifies each code, printing the byte offset, the code in hex, and an explanation of the meaning of the code. ATR codes for writing system transition and display mode are interpreted. CountIsciiChars counts the codes in an ISCII file and classifies them according to their type and function. The original purpose was computing accurate letter counts for reading studies, but this information is also useful when processing ISCII-encoded text.

Download Website Updated 18 Feb 2009 Unicode Utilities

Screenshot
Pop 89.97
Vit 5.83

The Unicode Utilities are a set of programs for manipulating and analyzing Unicode text. uniname prints any combination of the character offset of each character, its byte offset, its hex code value, its encoding, the glyph itself, and its name. unidesc reports the character ranges to which different portions of the text belong. unihist generates a histogram of the characters in its input. ExplicateUTF8 determines and explains the validity of a sequence of bytes as a UTF-8 encoding. unirev reverses UTF-8 strings. unifuzz tests other programs' unicode handling.

Download Website Updated 12 Sep 2008 Redet

Screenshot
Pop 219.34
Vit 10.98

Redet is a tool for developing and executing regular expressions using any of more than 50 search programs, editors, and programming languages, intended both for developing regular expressions for use elsewhere and as a search tool in its own right. For each program in each locale, a palette showing the available constructs is provided. The properties of each program are determined by runtime tests, which guarantees that they will be correct for the program version and locale. Additional features include persistent history, extensive help, a variety of character entry tools, and the ability to change locale while running. Redet is highly configurable and fully supports Unicode.

Download Website Updated 07 Mar 2005 Keyano

Screenshot
Pop 18.11
Vit 2.26

Keyano is a graphical front end for popular Unix applications such as play, aplay, festival, and fortune. It has the ability to turn your PC into a audio/visual sampler that works similar to samplers now in use by DJs. It also includes vocal dictionary and text reader capabilities, as well as a spelling Tutorial and an early version of a chatter bot (in alphabet mode you can: type "A B C" and it says them out loud while it shows letters on screen).

Download Website Updated 17 Jan 2007 ddc-concordance

Screenshot
Pop 31.19
Vit 2.27

ddc-concordance is a search engine for linguists. It lets you search for words or sequences of words together with morphological patterns. It was created to help linguists find a particular collocation or word in a given context.

Download Website Updated 17 Feb 2006 IPA Zounds

Screenshot
Pop 25.14
Vit 2.63

IPA Zounds models language sound changes by applying a given set of sound change rules to a given lexicon. It has a built-in model of the International Phonetic Alphabet, allowing users to write input words in IPA characters and rules using those characters or the distinctive features of the model.

Screenshot

Project Spotlight

CT-gui/CT-synth/CT-farfisa

A GUI toolkit for Linux and Android.

Screenshot

Project Spotlight

fcron

A command scheduler for non-permanently-running systems.