2131 projects tagged "Text Processing"

Download Website Updated 27 Jan 2014 GNU awk

Screenshot
Pop 516.82
Vit 19.62

The awk utility interprets a special-purpose programming language that makes it possible to handle simple data-reformatting jobs with just a few lines of code.

No download No website Updated 23 Jan 2014 OGDL

Screenshot
Pop 71.75
Vit 10.31

OGDL (Ordered Graph Data Language) is a structured textual format for representing graphs of information. Its grammar is very simple, allowing for very compact parsers. It is a readable substitute for XML in data applications (such as configuration files).

No download Website Updated 14 Jan 2014 jcpp

Screenshot
Pop 100.63
Vit 13.03

JCPP is a complete, compliant, standalone, pure Java implementation of the C preprocessor. It is intended to be of use to people writing C-style compilers in Java using tools like sablecc, antlr, JLex, CUP, and so forth. It has been used to successfully preprocess much of the source code of the GNU C library.

Download Website Updated 12 Jan 2014 GNU m4

Screenshot
Pop 630.73
Vit 18.10

GNU m4 is an implementation of the traditional Unix macro processor. It is mostly SVR4 compatible, although it has some extensions (for example, handling more than 9 positional parameters to macros). GNU m4 also has built-in functions for including files, running shell commands, doing arithmetic, etc. Autoconf needs GNU m4 for generating `configure' scripts, but not for running them.

Download Website Updated 10 Jan 2014 pyPEG

Screenshot
Pop 183.30
Vit 21.41

pyPEG is a quick and easy solution for creating a parser in Python programs. pyPEG uses a PEG language in Python data structures to parse, so it can be used dynamically to parse nearly every context free language. The output is a plain Python data structure called pyAST, or, as an alternative, XML.

No download Website Updated 07 Jan 2014 SILVERCODERS DocToText

Screenshot
Pop 160.17
Vit 12.64

SILVERCODERS DocToText is a powerful utility which can convert documents in many formats to plain text. It includes a console application and C/C++ library, which allows embedding text extraction mechanisms into other applications. It supports MS Office binary formats (MS Word (DOC), MS Excel (XLS, XLSB), MS PowerPoint (PPT), and Rich Text Format (RTF)), OpenDocument formats (text documents (ODT), spreadsheets (ODS), presentations (ODP) and graphics (ODG)), Office Open XML formats (MS Word (DOCX), MS Excel (XLSX), and MS PowerPoint (PPTX)), iWork formats (PAGES, NUMBERS, KEYNOTE), OpenDocument Flat XML formats (FODP, FODS, FODT), Portable Document Format (PDF), Email files (EML), and HyperText Markup Language (HTML). DocToText can extract text not only from the document body but also from annotations (comments) embedded in odt, doc, docx, or rtf files and read metadata like author, last modification date, or number of pages. It can be used as a fast console viewer, and is able to convert corrupted OpenDocument and Office Open XML documents. It can be used to recover text even if other recovery methods failed.

Download Website Updated 06 Jan 2014 HTMLDOC

Screenshot
Pop 705.28
Vit 31.83

HTMLDOC converts HTML files and Web pages into indexed HTML, PostScript, and PDF files suitable for online viewing and printing. It can be used as a standalone GUI application, in a batch document processing environment, as a Web-based report generation application, or in embedded environments to support printing of HTML content. It runs on all Unix platforms as well as Mac OS X and Windows 2000 and higher.

Download Website Updated 05 Jan 2014 Mini-XML

Screenshot
Pop 225.06
Vit 22.18

Mini-XML is a small XML parsing library that you can use to read XML and XML-like data files in your application without requiring large non-standard libraries. It only requires an ANSI C compatible compiler (GCC works, as do most vendors' ANSI C compilers) and a "make" program. It supports reading of UTF-8 and UTF-16 and writing of UTF-8 encoded XML strings and files, and provides a hierarchical view of the file via a linked-list tree structure of typed nodes and functions for managing, traversing, indexing, and searching the tree.

Download Website Updated 23 Dec 2013 GNU libextractor

Screenshot
Pop 464.30
Vit 42.07

libextractor is a library used to extract meta-data from files of arbitrary type. It is designed to use helper-libraries to perform the actual extraction, and to be trivially extendable by linking against external extractors for additional file types. The goal is to provide developers of file-sharing networks, file managers, and WWW-indexing bots with a universal library to obtain meta-data about files. It includes a shell-command and bindings for Java (JNI) and Python.

Download Website Updated 21 Dec 2013 RSS-Planet

Screenshot
Pop 69.09
Vit 2.82

RSS-Planet is a script which fetches headlines from various news Web sites via RSS feeds and then plots the story titles on a world map using xplanet.

Screenshot

Project Spotlight

phpMyAdmin

A tool that handles the basic administration of MySQL over the Web.

Screenshot

Project Spotlight

Collax V-Cube+

Virtualization and HA Management of virtual machines and embedded HA Storage.