193 projects tagged "Linguistic"

Download Website Updated 22 Jul 2002 Linguaphile

Screenshot
Pop 65.45
Vit 1.49

Linguaphile is a simple command line language translator. It is open source, platform independent, and programmed in Perl. Linguaphile currently supports the following languages: Afrikaans, Alawa, Albanian, Arrernte, Basque, Belarusian, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, German, Greek, Hawaiian, Hungarian, Icelandic, Indonesian, Interlingua, Irish, Italian, Kala Lagaw Ya, Korean, Kriol, Latvian, Lithuanian, Malay, Maltese, Maori, Norwegian, Pitjantjatjara, Polish, Portuguese, Romanian, Russian, Samoan, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Thai, Tok Pisin, Turkish, Ukrainian, Warlpiri, and Welsh. The Spanish to English translation is the most useful at this stage.

No download Website Updated 05 May 2012 Text-Tokenizer

Screenshot
Pop 65.31
Vit 7.54

Text-Tokenizer is Perl module based on the flex generated lexical analyzer that can be used for parsing of text (configuration) files. With this module, a simple full-featured configuration parser can be written very easily.

Download Website Updated 04 Feb 2007 Japana

Screenshot
Pop 62.77
Vit 4.03

Japana is a small HTTP proxy written in Perl. It converts Japanese characters (Hiragana, Katakana, and Kanji) into ASCII (Romaji) on the fly. The translation is done with the kakasi library (an older version without the need for kakasi still exists).

Download Website Updated 12 Dec 2008 WordGenerator

Screenshot
Pop 60.97
Vit 3.95

WordGenerator generates hypothetical words from specifications of their syllable structure. The user specifies the maximum length of the words in syllables, the abstract structure of syllables in the language (in terms of such units as consonants and vowels or onsets and rhymes), and the actual sounds that comprise each abstract class (e.g. the list of vowels in the language); WordGenerator then generates the words that conform to this specification. Such lists are useful to field linguists exploring the vocabulary of a language, and to designers of artificial languages.

Download Website Updated 19 Sep 2004 HumAn Language GENerator

Screenshot
Pop 60.08
Vit 1.51

HALoGEN is an extremely powerful and easy to use general-purpose natural language generation system. It consists of a symbolic generator, a forest ranker, and some sample inputs. The symbolic generator includes the Sensus Ontology dictionary based on WordNet. The forest ranker includes a 250 million word ngram language model (unigram, bigram, and trigram) trained on the Wall Street Journal newspaper text. The symbolic generator is written in LISP and requires a Lisp interpreter.

Download Website Updated 06 Oct 2004 Dowser

Screenshot
Pop 55.48
Vit 2.24

Dowser is a Web research and archiving tool that clusters results from search engines, associates words that appear in previous searches, and keeps a local cache of all the results you click on in a searchable database along with summaries and links to related information. It helps you to keep track of what you find, with no advertising.

No download Website Updated 02 Aug 2012 Poliqarp

Screenshot
Pop 55.43
Vit 7.64

Poliqarp is a universal suite of utilities for processing large corpora. It includes a concordancer that works on binary corpora compiled for efficient searching and a corpus builder. It supports positional tagsets, ambiguities in the texts, and Unicode.

No download Website Updated 02 Mar 2013 jsesh

Screenshot
Pop 54.99
Vit 7.64

JSesh is an editor for ancient Egyptian hieroglyphic texts. It can export the text into picture formats, such as WMF files for easy inclusion in word processors. JSesh can also be used as a library for other projects concerning ancient Egyptian.

Download Website Updated 24 Aug 2004 Arabic Wordlist

Screenshot
Pop 54.93
Vit 1.82

Arabic Wordlist is a project to deliver an English to Arabic translated word list to be used in translations and/or dictionaries. The word list contains in excess of 83,500 words (and growing), and spans a variety of categories (i.e. it is general in nature). This word list is encoded in UTF-8, and is expected to be used in many online free dictionaries.

Download Website Updated 05 Dec 2001 Grok

Screenshot
Pop 53.74
Vit 2.37

Grok is a library of Java components for performing various natural language tasks. These include several preprocessing tasks, chart parsing, a large categorial grammar for English (induced from the Penn treebank), and some knowledge representation components (basic coreference, salience tracking, etc.). The library also has a companion kit which provides a GUI interface to the components, several of which are implementations of interfaces in the Quipu OpenNLP API.

Screenshot

Project Spotlight

CT-gui/CT-synth/CT-farfisa

A GUI toolkit for Linux and Android.

Screenshot

Project Spotlight

fcron

A command scheduler for non-permanently-running systems.