RSS 194 projects tagged "Linguistic"

Download Website Updated 20 May 2013 Verbiste

Screenshot
Pop 360.16
Vit 208.04

Verbiste is a French conjugation system implemented as a C++ library, a GNOME applet, and two command-line tools. It can conjugate verbs and analyze conjugated verbs to determine their mode, tense, and person. The knowledge base contains over 6700 verbs.

Download Website Updated 11 Feb 2013 ANTLR

Screenshot
Pop 306.66
Vit 6.82

ANTLR (ANother Tool for Language Recognition) is a language tool that provides a framework for constructing recognizers, compilers, and translators from grammatical descriptions containing C++, Java, or Sather actions. It is similar to the popular compiler generator YACC, however ANTLR is much more powerful and easy to use. ANTLR-produced parsers are not only highly efficient, but are both human-readable and human-debuggable (especially with the interactive ParseView debugging tool). ANTLR can generate parsers, lexers, and tree-parsers in either C++, Java, or Sather. ANTLR is currently written in Java.

Download Website Updated 03 May 2013 pyPEG

Screenshot
Pop 280.40
Vit 52.18

pyPEG is a quick and easy solution for creating a parser in Python programs. pyPEG uses a PEG language in Python data structures to parse, so it can be used dynamically to parse nearly every context free language. The output is a plain Python data structure called pyAST, or, as an alternative, XML.

Download Website Updated 12 Sep 2008 Redet

Screenshot
Pop 273.00
Vit 11.44

Redet is a tool for developing and executing regular expressions using any of more than 50 search programs, editors, and programming languages, intended both for developing regular expressions for use elsewhere and as a search tool in its own right. For each program in each locale, a palette showing the available constructs is provided. The properties of each program are determined by runtime tests, which guarantees that they will be correct for the program version and locale. Additional features include persistent history, extensive help, a variety of character entry tools, and the ability to change locale while running. Redet is highly configurable and fully supports Unicode.

No download Website Updated 12 Feb 2013 TAMS Analyzer

Screenshot
Pop 260.37
Vit 65.29

TAMS (Text Analysis Markup System) Analyzer is a qualitative or ethnographic coding and data extraction-analysis system.

Download Website Updated 03 May 2013 translate word

Screenshot
Pop 226.22
Vit 41.77

translate word is a commandline program which translates words into different languages. It uses internal dictionaries and connects online to the Google Translation and the FreeTranslation engines.

Download Website Updated 11 Jan 2010 msort

Screenshot
Pop 221.99
Vit 12.22

Msort sorts files in sophisticated ways. Records may be fixed size, newline-separated blocks, or terminated by any specified character. Key fields may be selected by position, tag, or character range. For each key, distinct exclusions, multigraphs, substitutions, and a sort order may be defined or locale collation rules used. Comparisons may be lexicographic, numeric, numeric string, hybrid, random, by string length, angle, domain name, date, time, month name, or ISO8601 timestamp. Keys may be reversed so as to generate reverse dictionaries. Optional keys are supported. Unicode is supported, including full case-folding. Msort itself has a somewhat complex command line interface, but may be driven by an optional GUI.

Download Website Updated 22 Jul 2012 Apache Lucene

Screenshot
Pop 221.80
Vit 15.53

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is suitable for nearly any application that requires full-text search, especially cross-platform.

Download Website Updated 26 May 2011 International Components for Unicode (C/C++)

Screenshot
Pop 217.66
Vit 14.46

ICU provides a Unicode implementation, with functions for formatting numbers, dates, times, and currencies (according to locale conventions, transliteration, and parsing text in those formats). It provides flexible patterns for formatting messages, where the pattern determines the order of the variable parts of the messages, and the format for each of those variables. These patterns can be stored in resource files for translation to different languages. Included are more than 100 codepage converters for interaction with non-unicode systems.

Download Website Updated 15 May 2011 uni2ascii

Screenshot
Pop 200.91
Vit 13.95

uni2ascii and ascii2uni provide conversion in both directions between UTF-8 Unicode and more than thirty 7-bit ASCII equivalents, including RFC 2396 URI format and RFC 2045 Quoted Printable format, the representations used in HTML, SGML, XML, OOXML, the Unicode standard, Rich Text Format, POSIX portable charmaps, POSIX locale specifications, and Apache log files. It can also convert between the escapes used for Unicode in languages such as Ada, C, Common Lisp, Java, Pascal, Perl, Postscript, Python, Scheme, and Tcl.

Screenshot

Project Spotlight

queXML

A simple XML schema for designing questionnaires.

Screenshot

Project Spotlight

YourKit Java Profiler

A CPU and memory Java profiler.