SILVERCODERS DocToText is a powerful utility which can convert documents in many formats to plain text. It includes a console application and C/C++ library, which allows embedding text extraction mechanisms into other applications. It supports MS Office binary formats (MS Word (DOC), MS Excel (XLS, XLSB), MS PowerPoint (PPT), and Rich Text Format (RTF)), OpenDocument formats (text documents (ODT), spreadsheets (ODS), presentations (ODP) and graphics (ODG)), Office Open XML formats (MS Word (DOCX), MS Excel (XLSX), and MS PowerPoint (PPTX)), iWork formats (PAGES, NUMBERS, KEYNOTE), OpenDocument Flat XML formats (FODP, FODS, FODT), Portable Document Format (PDF), Email files (EML), and HyperText Markup Language (HTML). DocToText can extract text not only from the document body but also from annotations (comments) embedded in odt, doc, docx, or rtf files and read metadata like author, last modification date, or number of pages. It can be used as a fast console viewer, and is able to convert corrupted OpenDocument and Office Open XML documents. It can be used to recover text even if other recovery methods failed.
SILVERCODERS SqlSync gives you the ability to compare data stored in two SQL databases and synchronize it. The results contain a list of records that differ, records that are missing, and records that are additional. It can generate SQL queries to make the second database identical to the first, and can execute them automatically. It will synchronize selected objects in databases or whole databases. You can control the process using command line arguments, and the results are both readable and easy to use for future processing with shell scripts. A C language library allows comparing databases from other applications. PostgreSQL 8.1.x, MySQL 5.0.x, FirebirdSQL 1.5.x, Microsoft SQL Server 2000, Microsoft SQL Server 2005, and Oracle 10g are supported. The two databases being compared can be driven by two different SQL servers.
SILVERCODERS OCR Server is a server-based optical character recognition (OCR) and PDF conversion solution for enterprises. It is able to perform conversion of printed documents to editable and searchable formats like plain text, RTF, PDF, and HTML, providing highly accurate recognition in 189 languages. It is available as a Linux application or a stand-alone machine, with a fully documented API, very good performance, and flexible licensing rules. It has been designed specifically for the purpose of cooperation with document management systems such as SILVERCODERS DocStorage.
SILVERCODERS DocStorage is a utility to improve document management. You can have one database for all invoices, guarantees, protocols, and other documents. DocStorage can extract plain text from documents in doc, XLS, PPT, PDF, RTF, ODT, ODS, ODP, docx, XLSX, PPTX, and many other formats. It can use an OCR engine to extract plain text even from scanned documents. It can perform global fulltext search in all documents regardless of format. It supports document versioning, document duplicate detection, document notes, and document signing. It provides full integration with software suites like Microsoft Office and OpenOffice.
Re: prior art
> Thanks for both the utility and
> description update then; will have a
> look :-)
There is one more thing: catdoc is not actively developed since 2005. Doctotext was started in 2006 and will have new functionalities, like for example pdf support. You can consider it as a future replacement.
We could try to add something to catdoc, but we started new project because of licensing issues (we need to use doctotext in our commercial software).
Re: prior art
> ...is it much better than catdoc(1)? :)
It supports more formats (OpenDocument, Office Open XML) and as far as I know some inconvenient DOC documents are handled better.