RSS 192 projects tagged "Text Processing"

Download Website Updated 07 Apr 2014 Verbiste

Screenshot
Pop 336.12
Vit 135.35

Verbiste is a French conjugation system implemented as a C++ library, a GNOME applet, and two command-line tools. It can conjugate verbs and analyze conjugated verbs to determine their mode, tense, and person. The knowledge base contains over 6700 verbs.

Download Website Updated 10 Jan 2014 pyPEG

Screenshot
Pop 202.55
Vit 27.13

pyPEG is a quick and easy solution for creating a parser in Python programs. pyPEG uses a PEG language in Python data structures to parse, so it can be used dynamically to parse nearly every context free language. The output is a plain Python data structure called pyAST, or, as an alternative, XML.

Download Website Updated 21 Dec 2013 queXC

Screenshot
Pop 80.12
Vit 8.94

queXC is a Web-based data cleaning and coding/classification system that takes a data file (such as data collected from a questionnaire) and cleans the text input fields by spacing them and spell checking them. It allows operators to code text fields to existing coding schemes, or to create a coding scheme on the fly. Multiple operators can code and clean simultaneously, with the ability to assign operators to do particular codes. The queXC system includes some coding schemes created from ABS (Australian Bureau of Statistics) data. It can be used as an open source replacement for Nvivo in some situations.

No download Website Updated 05 Oct 2013 Apache Solr

Screenshot
Pop 164.38
Vit 13.51

Solr is an enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g. Word and PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.

Download Website Updated 05 Oct 2013 Apache Lucene

Screenshot
Pop 257.79
Vit 21.40

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is suitable for nearly any application that requires full-text search, especially cross-platform.

Download Website Updated 21 Mar 2013 JOrtho

Screenshot
Pop 69.91
Vit 8.41

JOrtho is a spell checker for Java. The library works with any JTextComponent from the Swing framework and checks as you type. The dictionary is based on the free Wiktionary.org, and is applicable for multiple languages. You can select the spell checking language via a context menu. The Features of JOrtho are the highlighting of potentially wrongly spelled words, a context menu with suggestions for correct forms of the word, and a context menu with option to change the checking language. At the moment there are nine languages for spell checking available: English, German, French, Spanish, Italian, Russian, Polish, Dutch, and Arabic.

No download Website Updated 02 Mar 2013 jsesh

Screenshot
Pop 55.93
Vit 7.98

JSesh is an editor for ancient Egyptian hieroglyphic texts. It can export the text into picture formats, such as WMF files for easy inclusion in word processors. JSesh can also be used as a library for other projects concerning ancient Egyptian.

No download Website Updated 12 Feb 2013 TAMS Analyzer

Screenshot
Pop 185.96
Vit 32.97

TAMS (Text Analysis Markup System) Analyzer is a qualitative or ethnographic coding and data extraction-analysis system.

Download Website Updated 11 Feb 2013 ANTLR

Screenshot
Pop 296.90
Vit 5.53

ANTLR (ANother Tool for Language Recognition) is a language tool that provides a framework for constructing recognizers, compilers, and translators from grammatical descriptions containing C++, Java, or Sather actions. It is similar to the popular compiler generator YACC, however ANTLR is much more powerful and easy to use. ANTLR-produced parsers are not only highly efficient, but are both human-readable and human-debuggable (especially with the interactive ParseView debugging tool). ANTLR can generate parsers, lexers, and tree-parsers in either C++, Java, or Sather. ANTLR is currently written in Java.

No download Website Updated 02 Aug 2012 Poliqarp

Screenshot
Pop 54.63
Vit 7.84

Poliqarp is a universal suite of utilities for processing large corpora. It includes a concordancer that works on binary corpora compiled for efficient searching and a corpus builder. It supports positional tagsets, ambiguities in the texts, and Unicode.

Screenshot

Project Spotlight

Evo/Lution

A live Linux CD graphical Arch installer.

Screenshot

Project Spotlight

icc2to4

A utility that converts an ICC profile from v2 to v4.