Projects / Sanzang

Sanzang

Sanzang is a compact and simple cross-platform machine translation system. It is especially useful for translating from the CJK languages (Chinese, Japanese, and Korean), and it is very suitable for working with ancient and otherwise difficult texts. Unlike most other machine translation systems, Sanzang is small and approachable. Any user can develop his or her own translation rules, and these rules are simply stored in a text file and applied at runtime.

Tags
Licenses
Operating Systems
Implementation

Last announcement

Sanzang on the Web 08 Nov 2013 09:40

You can now use a Web interface to Sanzang to try its translation method. This is called "Sanzang on the Web," and it is configured to use an early translation table for translating from the Chinese Buddhist canon (Taisho Tripitaka) into Hanyu Pinyin and English. Try it at: http://www.lapislazulitexts.com/szweb/

Recent releases

  •  04 Mar 2014 02:01

    Release Notes: The vocab building code was updated for more efficient term matching. The TextFormatter class was refactored into a Formatting module. Methods were added for merging translation tables into one another. The Sanzang module "requires" was consolidated into a central location.

    •  13 Feb 2014 10:41

      Release Notes: This release cleans the translation table initialization code to be faster, cleaner, and simpler, adds an RDoc option to set the documentation encoding to UTF-8 for RDoc 3.x, so the documentation will build properly (including when installed as a gem), and adjusts the example and test translation tables to not use leading spaces and other deprecated table formatting.

      •  29 Jan 2014 07:20

        Release Notes: Horizontal space formatting has been updated so spaces will never be added to the end of a line. Horizontal spacing code has also been updated to be more robust. A transcoding bug was also fixed in Sanzang::Translator#translate_io, which would be triggered if using Sanzang internals as a library, calling the method with file paths as the arguments, and using an encoding other than UTF-8.

        •  27 Jan 2014 09:19

          Release Notes: This is a minor release containing a new feature but maintaining backward compatibility. The Sanzang translation method has been updated to automatically handle horizontal spacing between translated terms. This means that translation tables no longer need to have extra spacing as part of their format.

          •  29 Dec 2013 13:28

            Release Notes: This is a bugfix release to primarily resolve issues with internal transcoding between UTF-8 and other encodings. Additionally, since JRuby encoding support is limited, Sanzang on JRuby now uses UTF-8 by default.

            Screenshot

            Project Spotlight

            OpenStack4j

            A Fluent OpenStack client API for Java.

            Screenshot

            Project Spotlight

            TurnKey TWiki Appliance

            A TWiki appliance that is easy to use and lightweight.