RSS 402 projects tagged "Text Processing"

Download Website Updated 06 Jun 2013 LibAxl

Screenshot
Pop 225.25
Vit 85.06

LibAxl is an efficient implementation of the XML 1.0 standard specification. It doesn't have any external library dependencies, having a clean implementation based on opaque types and a consistent API to manipulate your XML documents without compromising your code. It is extremely memory efficient and thread safe with a small footprint (111k). It also includes XML Namespaces support.

Download Website Updated 30 May 2013 John the Ripper

Screenshot
Pop 1,241.28
Vit 106.75

John the Ripper is a fast password cracker, currently available for many flavors of Unix, Windows, DOS, BeOS, and OpenVMS. Its primary purpose is to detect weak Unix passwords. It supports several crypt(3) password hash types commonly found on Unix systems, as well as Windows LM hashes. On top of this, lots of other hashes and ciphers are added in the community-enhanced version (-jumbo), and some are added in John the Ripper Pro.

Download Website Updated 28 May 2013 PCRE

Screenshot
Pop 883.91
Vit 105.25

The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5, with just a few differences. PCRE is used by many programs, including Exim, Postfix, and PHP.

Download Website Updated 17 May 2013 TXR

Screenshot
Pop 290.69
Vit 47.14

TXR is a new data munging language to replace the likes of awk and Perl. TXR's special pattern language provides template-based matching of entire documents or large sections of documents. It also contains a language for functional and imperative programming. It is written in C and takes the form of a utility that is portable to Unix-like platforms and Windows.

No download Website Updated 12 May 2013 Template Data Interface (TDI)

Screenshot
Pop 61.81
Vit 4.04

Template Data Interface (TDI, /ʹtedɪ/) is a markup templating system written in Python with (optional but recommended) speedup code written in C. Unlike most templating systems, TDI does not invent its own language to provide functionality. Instead, you simply mark the nodes you want to manipulate within the template document. The template is parsed, and the marked nodes are presented to your Python code, where they can be modified in any way you want.

Download Website Updated 19 Apr 2013 ChkTeX

Screenshot
Pop 201.87
Vit 24.02

ChkTeX finds syntax and typographical errors in LaTeX text.

No download Website Updated 13 Mar 2013 tpl

Screenshot
Pop 87.00
Vit 3.46

Tpl makes it easy to serialize your C data using just a handful of API functions. The data is stored in its native binary form for maximum efficiency. C, Perl, and XML are supported. Data is portable across CPU types and OSs from Unix to Mac to Windows.

No download Website Updated 08 Mar 2013 SILVERCODERS DocToText

Screenshot
Pop 162.73
Vit 14.02

SILVERCODERS DocToText is a powerful utility which can convert documents in many formats to plain text. It includes a console application and C/C++ library, which allows embedding text extraction mechanisms into other applications. It supports MS Office binary formats (MS Word (DOC), MS Excel (XLS), MS PowerPoint (PPT), and Rich Text Format (RTF)), OpenDocument formats (text documents (ODT), spreadsheets (ODS), and presentations (ODP)), Office Open XML formats (MS Word (DOCX), MS Excel (XLSX), and MS PowerPoint (PPTX)), and HyperText Markup Language (HTML). DocToText can extract text not only from the document body but also from annotations (comments) embedded in odt, doc, docx, or rtf files and read metadata like author, last modification date, or number of pages. It can be used as a fast console viewer, and is able to convert corrupted OpenDocument and Office Open XML documents. It can be used to recover text even if other recovery methods failed.

Download Website Updated 28 Feb 2013 pngslice

Screenshot
Pop 82.37
Vit 3.31

The pngslice utility is a tool for creating ragged images in HTML: Eric Meyer's ragged floats. The idea is to slice the image into thin strips that are trimmed and stacked vertically. The program produces the trimmed PNG slices and a fragment of HTML to include them.

Download Website Updated 18 Feb 2013 GNU texinfo

Screenshot
Pop 445.59
Vit 18.00

"Texinfo" is a documentation system that uses a single source to produce both on-line information (info, HTML, XML, Docbook) and printed output (DVI, PDF).

Screenshot

Project Spotlight

meetmint

A meeting minutes tool for Web browser.

Screenshot

Project Spotlight

gd

A library used to create PNGs, JPEGs, and other images