Projects / Computational Linguistics Toolset

Computational Linguistics Toolset

The Computational Linguistics Toolset is a set of tools for computational linguistics. It contains re-usable code for cleaning, splitting, refining, and taking samples from corpora (ICE, Penn, and a native one), for tagging them using the TnT-tagger, for doing permutation statistics on N-grams (useful for finding statistically significant syntactical differences between any two sets of tagged texts), and various examination-tools. The tools themselves are well documented.

Operating Systems

Recent releases

  •  22 Apr 2007 14:53

    Release Notes: A CorpusTagsetReducer tool was added to the corpus task-set for filtering out tags and tag-types. A RowChecker, TableScaler, and TableTurner tool were added to the examine-set for checking the alignment of tags and words and for manipulating tab-delimited output-tables. Several smaller fixes and additions were applied.

    •  30 Nov 2006 13:45

      Release Notes: Compression was made the default for NgramPermutator and the PermutationStatter, and it was removed as an option. A bug was fixed in the compression of NgramPermutator that prevented the creation of data since version 1.1.2.

      •  10 Oct 2006 15:04

        Release Notes: Full support for the manual n-gram search function (-n option) was added to Tag Sample Finder.

        •  22 May 2006 15:08

          Release Notes: PermStatResultSelector was added, which is a tool to select and sort significant POS-tag n-grams by weight for each compared sub-corpus. The Goall-script was restructured for the permutation testing, and a few minor bugs were fixed.

          •  13 Dec 2005 18:22

            Release Notes: Tools for disambiguating were added. They allow semantic disambiguation to be done about ten times faster than was possible previously with only the WordNet::Similarity package. Two extra corpus-tools for preparing the ICE-corpus for disambiguation were added.


            Project Spotlight


            A Fluent OpenStack client API for Java.


            Project Spotlight

            TurnKey TWiki Appliance

            A TWiki appliance that is easy to use and lightweight.