Sanzang is a compact and simple cross-platform machine translation system. It is especially useful for translating from the CJK languages (Chinese, Japanese, and Korean), and it is very suitable for working with ancient and otherwise difficult texts. Unlike most other machine translation systems, Sanzang is small and approachable. Any user can develop his or her own translation rules, and these rules are simply stored in a text file and applied at runtime.
|Tags||Command Line Tools Machine Translation Chinese character CJK Text Processing|
|Operating Systems||Linux BSD Unix Mac OS X Windows POSIX|
You can now use a Web interface to Sanzang to try its translation method. This is called "Sanzang on the Web," and it is configured to use an early translation table for translating from the Chinese Buddhist canon (Taisho Tripitaka) into Hanyu Pinyin and English. Try it at: http://www.lapislazulitexts.com/szweb/
Release Notes: The vocab building code was updated for more efficient term matching. The TextFormatter class was refactored into a Formatting module. Methods were added for merging translation tables into one another. The Sanzang module "requires" was consolidated into a central location.
Release Notes: This release cleans the translation table initialization code to be faster, cleaner, and simpler, adds an RDoc option to set the documentation encoding to UTF-8 for RDoc 3.x, so the documentation will build properly (including when installed as a gem), and adjusts the example and test translation tables to not use leading spaces and other deprecated table formatting.
Release Notes: Horizontal space formatting has been updated so spaces will never be added to the end of a line. Horizontal spacing code has also been updated to be more robust. A transcoding bug was also fixed in Sanzang::Translator#translate_io, which would be triggered if using Sanzang internals as a library, calling the method with file paths as the arguments, and using an encoding other than UTF-8.
Release Notes: This is a minor release containing a new feature but maintaining backward compatibility. The Sanzang translation method has been updated to automatically handle horizontal spacing between translated terms. This means that translation tables no longer need to have extra spacing as part of their format.
Release Notes: This is a bugfix release to primarily resolve issues with internal transcoding between UTF-8 and other encodings. Additionally, since JRuby encoding support is limited, Sanzang on JRuby now uses UTF-8 by default.