RSS 231 projects tagged "Text Processing"

Download Website Updated 20 Apr 2014 Asymptote

Screenshot
Pop 760.10
Vit 344.02

Asymptote is a powerful descriptive 2D and 3D vector graphics language for technical drawing, inspired by MetaPost but with an improved C++-like syntax. It provides for figures the same high-quality level of typesetting that LaTeX does for scientific text. Asymptote is a programming language as opposed to just a graphics program. It can exploit the best features of script (command-driven) and graphical user interface (GUI) methods. High-level graphics commands are implemented in the language itself, allowing them to be easily tailored to specific applications.

Download Website Updated 09 Apr 2014 Highlight

Screenshot
Pop 927.49
Vit 184.49

Highlight is a universal converter from source code to HTML, XHTML, RTF, TeX, LaTeX, SVG, BBCode, and terminal escape sequences. (X)HTML and SVG output are formatted by Cascading Style Sheets. It supports more than 170 programming languages, and includes 80 highlighting color themes. The configuration files are Lua scripts with plug-in support. The converter includes some features to provide a consistent layout of the output code.

Download Website Updated 02 Apr 2014 poppler

Screenshot
Pop 692.18
Vit 99.81

Poppler is a PDF rendering library derived from xpdf. It has been enhanced to utilize modern libraries, and new features have been added. It also provides basic command line utilities.

Download Website Updated 01 Apr 2014 gjots

Screenshot
Pop 468.03
Vit 79.99

gjots lets you organize text notes in a convenient, hierarchical way. It can be used for notes, jottings, bits and pieces, recipes, and even PINs and passwords, using encryption. It can also be used to "mind-map" larger compositions like manuals, Web pages, articles, etc. It is a bit like the KDE program "kjots", but uses the GTK library and supports a hierarchy of folders. Files can be output to HTML with an automatic table of contents or to docbook XML. Encryption is supported with ccrypt(1), gpg(1), and openssl(1), so that musings can be kept private.

Download Website Updated 04 Mar 2014 Sanzang

Screenshot
Pop 312.55
Vit 9.83

Sanzang is a compact and simple cross-platform machine translation system. It is especially useful for translating from the CJK languages (Chinese, Japanese, and Korean), and it is very suitable for working with ancient and otherwise difficult texts. Unlike most other machine translation systems, Sanzang is small and approachable. Any user can develop his or her own translation rules, and these rules are simply stored in a text file and applied at runtime.

Download Website Updated 23 Feb 2014 loook

Screenshot
Pop 312.51
Vit 18.18

Loook searches for text strings in OpenOffice.org (and StarOffice 6.0 or later) files. AND, OR, and phrase searches are supported. It doesn't create an index, but searching should be fast enough, unless you have very many files.

No download Website Updated 07 Jan 2014 SILVERCODERS DocToText

Screenshot
Pop 178.59
Vit 15.33

SILVERCODERS DocToText is a powerful utility which can convert documents in many formats to plain text. It includes a console application and C/C++ library, which allows embedding text extraction mechanisms into other applications. It supports MS Office binary formats (MS Word (DOC), MS Excel (XLS, XLSB), MS PowerPoint (PPT), and Rich Text Format (RTF)), OpenDocument formats (text documents (ODT), spreadsheets (ODS), presentations (ODP) and graphics (ODG)), Office Open XML formats (MS Word (DOCX), MS Excel (XLSX), and MS PowerPoint (PPTX)), iWork formats (PAGES, NUMBERS, KEYNOTE), OpenDocument Flat XML formats (FODP, FODS, FODT), Portable Document Format (PDF), Email files (EML), and HyperText Markup Language (HTML). DocToText can extract text not only from the document body but also from annotations (comments) embedded in odt, doc, docx, or rtf files and read metadata like author, last modification date, or number of pages. It can be used as a fast console viewer, and is able to convert corrupted OpenDocument and Office Open XML documents. It can be used to recover text even if other recovery methods failed.

Download Website Updated 06 Jan 2014 HTMLDOC

Screenshot
Pop 743.95
Vit 38.77

HTMLDOC converts HTML files and Web pages into indexed HTML, PostScript, and PDF files suitable for online viewing and printing. It can be used as a standalone GUI application, in a batch document processing environment, as a Web-based report generation application, or in embedded environments to support printing of HTML content. It runs on all Unix platforms as well as Mac OS X and Windows 2000 and higher.

Download Website Updated 05 Jan 2014 Mini-XML

Screenshot
Pop 252.59
Vit 26.90

Mini-XML is a small XML parsing library that you can use to read XML and XML-like data files in your application without requiring large non-standard libraries. It only requires an ANSI C compatible compiler (GCC works, as do most vendors' ANSI C compilers) and a "make" program. It supports reading of UTF-8 and UTF-16 and writing of UTF-8 encoded XML strings and files, and provides a hierarchical view of the file via a linked-list tree structure of typed nodes and functions for managing, traversing, indexing, and searching the tree.

Download Website Updated 23 Dec 2013 GNU libextractor

Screenshot
Pop 479.90
Vit 50.15

libextractor is a library used to extract meta-data from files of arbitrary type. It is designed to use helper-libraries to perform the actual extraction, and to be trivially extendable by linking against external extractors for additional file types. The goal is to provide developers of file-sharing networks, file managers, and WWW-indexing bots with a universal library to obtain meta-data about files. It includes a shell-command and bindings for Java (JNI) and Python.

Screenshot

Project Spotlight

libdvbpsi

A library designed for MPEG TS and DVB PSI tables decoding and generation.

Screenshot

Project Spotlight

WavePacket (C++ version)

A library to solve the Schroedinger equation numerically.