TWSI is software that produces lexical substitutions in context for over 1000 frequent nouns. It processes English text. This functionality is realized by a supervised word sense disambiguation system, which is trained by sense-labeled occurrences of target words. A classification model is trained for each word, and used to decide which sense an unseen occurrence most likely belongs to. Associated with senses are lists of substitutions, which are injected into the text using inline annotation.
The Language Detection Library for Java is a Java library to detect the natural languages in which texts are written. This task is also known as "language identification", "language guessing", and "language recognition". It has over 99% precision for more than 40 languages. The supported languages are Afrikaans, Arabic, Bulgarian, Bengali, Czech, German, Greek, English, Spanish, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Macedonian, Malayalam, Marathi, Nepali, Dutch, Punjabi, Polish, Portuguese, Romanian, Russian, Slovak, Somali, Albanian, Swedish, Swahili, Tamil, Telugu, Thai, Tagalog, Turkish, Ukrainian, Urdu, Vietnamese, and Simplified/Traditional Chinese.