Libtextcat is a library with functions that implement the classification technique described in Cavnar & Trenkle, "N-Gram-Based Text Categorization". It was primarily developed for language guessing, a task on which it is known to perform with near- perfect accuracy. Considerable effort went into making this implementation fast and efficient. The language guesser processes over 100 documents/second on a simple PC, which makes it practical for many uses.
|Tags||Scientific/Engineering Artificial Intelligence Software Development Libraries Text Processing Linguistic|
|Operating Systems||POSIX Linux|
Release Notes: A long overdue autoconfig script has been added.
Release Notes: The distribution now contains Gertjan van Noord's language models for the automatic recognition of over 70 languages. The makefiles were cleaned up to make them more portable.