Algraeph is a tool for manual alignment of linguistic graphs, such as phrase structure trees or dependency structures, where each node corresponds to a subsequence of the analyzed input sentence. It allows you to express the similarity between two graphs by aligning their nodes and attaching relation labels to these alignments. Graphs are read from one or more graphbanks (or treebanks) in the GraphML or Alpino formats. Alignment relations are user-defined and are stored in a simple XML format, which can be used for further processing. The resulting parallel graph corpus is a useful data set for many tasks in computational linguistics and natural language processing.
XAware provides a solution for building real time data integrations and data services. It uses an Eclipse-based designer and a run-time engine implemented using the Spring Framework. XAware has extensive built-in support for database transactions, messaging systems, structured and unstructured text, XML schemas, and more.
SILVERCODERS DocToText is a powerful utility which can convert documents in many formats to plain text. It includes a console application and C/C++ library, which allows embedding text extraction mechanisms into other applications. It supports MS Office binary formats (MS Word (DOC), MS Excel (XLS, XLSB), MS PowerPoint (PPT), and Rich Text Format (RTF)), OpenDocument formats (text documents (ODT), spreadsheets (ODS), presentations (ODP) and graphics (ODG)), Office Open XML formats (MS Word (DOCX), MS Excel (XLSX), and MS PowerPoint (PPTX)), iWork formats (PAGES, NUMBERS, KEYNOTE), OpenDocument Flat XML formats (FODP, FODS, FODT), Portable Document Format (PDF), Email files (EML), and HyperText Markup Language (HTML). DocToText can extract text not only from the document body but also from annotations (comments) embedded in odt, doc, docx, or rtf files and read metadata like author, last modification date, or number of pages. It can be used as a fast console viewer, and is able to convert corrupted OpenDocument and Office Open XML documents. It can be used to recover text even if other recovery methods failed.
Boxes is a text filter that can draw any kind of box around its input text. Box design choices range from simple boxes to complex ASCII art. A box can also be removed and repaired, even if it has been badly damaged by editing of the text inside. Since the generated boxes may be open on any side, the program can also be used to create regional comments in any programming language. New box designs of all sorts can easily be added and shared by appending to a free format configuration file. In addition to being a command line tool, Boxes integrates well with any text editor that supports filters.