Projects / Bitextor

Bitextor

Bitextor is an application whose objective is to generate translation memories using multilingual Web sites as a corpus source. It downloads all the HTML files in a Web site, it performs a preprocess to convert them to a coherent and suitable format and, finally, applies a set of heuristics (based mainly on HTML tag structure and text block length) to make pairs of files which are candidates to contain the same text in different languages. From these candidates, translation memories are generated in TMX format using the library LibTagAligner, which uses the HTML tags and the length of text chunks to perform the alignment.

Tags
Licenses
Operating Systems
Implementation
Screenshot

Project Spotlight

Bible-Discovery

Bible study and concordance software.

Screenshot

Project Spotlight

SchemaCrawler

A command line tool to output your database schema and data in diff-able form.