193 projects tagged "Linguistic"

Download Website Updated 26 Mar 2006 dbacl

Screenshot
Pop 176.95
Vit 4.91

dbacl is a digramic Bayesian text classifier. Given some text, it calculates the posterior probabilities that the input resembles one of any number of previously learned document collections. It can be used to sort incoming email into arbitrary categories such as spam, work, and play, or simply to distinguish an English text from a French text. It fully supports international character sets, and uses sophisticated statistical models based on the Maximum Entropy Principle.

No download Website Updated 05 Oct 2013 Apache Solr

Screenshot
Pop 163.58
Vit 12.31

Solr is an enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g. Word and PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.

No download No website Updated 21 Feb 2005 Zoe Intertwingle

Screenshot
Pop 158.29
Vit 7.12

Zoe is a Web based email client with a built in SMTP and POP3 server and Google-like search functionality that lives on your desktop. It is written in Java and uses Lucene technology to provided instant searching and threading of your email messages.

Download Website Updated 06 Apr 2010 Glossword

Screenshot
Pop 137.06
Vit 7.18

Glossword is a system to publish dictionaries, glossaries, and encyclopedias. It features an installation wizard, support for multiple languages, visual themes, multi-domain installation, an administrative interface with multi-user support, built-in search and cache engines, the ability to export/import dictionaries in XML format, and W3C-validated code. Glossword is useful for any sort of dictionary-like content, including sites with game cheat codes, online translators, references, and various kinds of CMS solutions.

Download Website Updated 22 Oct 2007 Diogenes

Screenshot
Pop 110.93
Vit 4.69

Diogenes is a tool for searching and browsing the Latin and ancient Greek texts published on CD-ROM by the Packard Humanities Institute and the Thesaurus Linguae Graecae. It comes as an easy-to-install stand-alone application for GNU/Linux, Mac OS X, and Windows, based on the Firefox browser (i.e. Xulrunner). Alternatively, it can be installed by a network administrator as a server on a local network, and users then access it via an ordinary Web browser. There is also a command-line tool which can optionally format output as LaTeX instead of HTML.

Download Website Updated 29 Dec 2007 GNU Talk Filters

Screenshot
Pop 109.61
Vit 4.78

The GNU Talk Filters are filter programs that convert ordinary English text into text that mimics a stereotyped or otherwise humorous dialect. Some of these filters have been in the public domain for many years, but here they are provided as a single integrated package. The filters include austro, b1ff, brooklyn, chef, cockney, drawl, dubya, fudd, funetak, jethro, jive, kraut, pansy, pirate, postmodern, redneck, valspeak, and warez. This package provides the filters both as individual executables and collectively as a C library, so they can be easily embedded in other programs.

Download No website Updated 06 May 2014 yawl

Screenshot
Pop 106.16
Vit 3.50

This is a comprehensive "word game" word list for UNIX/Linux. It is a superset of the author's ENABLE list, the "OSW", and various lists researched by the author's colleague, Alan Beale. At 264,093 words, it is the largest list of its kind, suitable for use in all manners of crossword-type board games and word construction games, as well as for a spell checker dictionary. The YAWL package now includes two anagramming utilities (supplied as source code, handled by the included Makefile). There is also a shell script that extends the UNIX "strings" system command. This is the word list package recommended for the author's Quackey word game.

Download Website Updated 30 Jan 2001 Ciao Prolog

Screenshot
Pop 104.23
Vit 1.00

Ciao is a complete Prolog system subsuming ISO-Prolog with a novel modular design which allows both restricting and extending the language. Ciao extensions currently include feature terms (records), higher-order, functions, constraints, objects, persistent predicates, a good base for distributed execution (agents), and concurrency. Libraries also support WWW programming, sockets, and external interfaces (C, Java, TCL/Tk, relational databases, etc.). An Emacs-based environment, a stand-alone compiler, and a toplevel shell are also provided.

Download Website Updated 22 Jan 2004 Pythoñol

Screenshot
Pop 93.21
Vit 2.11

Pythoñol is an all-in-one program that helps English speakers learn Spanish. It features pronunciation, verb conjugation, a dictionary with over 70,000 words, a thesaurus, quizzes, full-text translation, idioms, a verb browser, and a large reference section.

Download Website Updated 18 Feb 2009 Unicode Utilities

Screenshot
Pop 89.97
Vit 5.83

The Unicode Utilities are a set of programs for manipulating and analyzing Unicode text. uniname prints any combination of the character offset of each character, its byte offset, its hex code value, its encoding, the glyph itself, and its name. unidesc reports the character ranges to which different portions of the text belong. unihist generates a histogram of the characters in its input. ExplicateUTF8 determines and explains the validity of a sequence of bytes as a UTF-8 encoding. unirev reverses UTF-8 strings. unifuzz tests other programs' unicode handling.

Screenshot

Project Spotlight

CT-gui/CT-synth/CT-farfisa

A GUI toolkit for Linux and Android.

Screenshot

Project Spotlight

fcron

A command scheduler for non-permanently-running systems.