The Cainteoir Engine is a library for reading and recording different document formats (ePub, HTML, MHT, RTF, email, and others) to various audio output formats (such as PulseAudio, WAV, and Ogg/Vorbis). It also provides the following command-line tools: cainteoir, a front-end to the Cainteoir text-to-speech library; metadata, which extracts metadata from documents to RDF tuples; and tagcloud, which generates tag clouds and tag cloud data.
| Tags | epub HTML Text to speech RDF Metadata espeak Speech Sound/Audio MHT or EML RTF |
|---|---|
| Licenses | GPLv3 |
| Operating Systems | Linux |
| Implementation | C++ Python |
| Translations | English |
Recent releases


Release Notes: This release supports espeak using installed mbrola voices, translates language and region names using the iso-codes package, implements the BCP 47 standard for interpreting language, script, and region tags, recognizes h1..h6 as table of content entries in (X)HTML, supports newsgroup information for email, supports property attributes on an empty property element in RDF/XML, and fixes detection of HTML which is valid XML, but is not marked as XHTML.


Release Notes: This version releases the file handle when finished recording Ogg files. It has improved HTML parsing. It supports epub 3 @refines and @datatype metadata. It handles malformed entities in opf (epub package) documents.


Release Notes: The shared-mime-info database is now used for MIME type detection. iconv is used for character encoding conversions. All Content-Transfer-Encoding types in MIME headers are supported, including Base64. Audio error handling during reading/recording was improved. Additional language code mappings for new espeak voices are supported. UND is now supported as a language identifier for Calibri eBooks. XML-encoded HTML without an associated xmlns is now supported.


Release Notes: This release support epub 2.0 table of contents. It supports epub 3.0 metadata. Basic support for SSML. cainteoir: read/record a range of sections in a document's table of contents. cainteoir: allows the voice reading speed, pitch, and volume to be set on the command line.


Release Notes: Single-file HTML pages (MHTML, MHT) are supported. RTF documents are supported. The heuristics for estimating the total reading time for documents were improved. Support for HTML documents was improved. The program no longer crashes when opening an epub with a missing OPF file.