2131 projects tagged "Text Processing"

Download Website Updated 07 Oct 2012 pdf2djvu

Screenshot
Pop 169.10
Vit 14.10

pdf2djvu creates DjVu files from PDF files. It's able to extract: graphics, text layer, hyperlinks, document outline (bookmarks), and metadata.

No download Website Updated 03 Oct 2012 XML parser class

Screenshot
Pop 105.88
Vit 7.31

XML parser class is a PHP class that parses arbitrary XML input and builds an array with the structure of all tag and data elements. Optionally it can keep track of the positions of each element to locate elements that may be contextually in error. Supports a parsed file cache to minimize the overhead of parsing the same file repeatedly. Optimized parsing of simplified XML (SML) formats ignoring the tag attributes.

Download Website Updated 24 Sep 2012 Serene

Screenshot
Pop 34.81
Vit 4.01

Serene is a validation engine that implements the JAXP 1.3 Validation Framework API for RELAX NG based on an algorithm centered on providing good messages and having a clear handling of ambiguity and conflicts. It has an implementation of the JAXP Validation Framework API for ISO Schematron and support for Schematron markup embedded in RELAX NG schemas.

No download Website Updated 21 Sep 2012 SILVERCODERS DocStorage

Screenshot
Pop 34.64
Vit 3.92

SILVERCODERS DocStorage is a utility to improve document management. You can have one database for all invoices, guarantees, protocols, and other documents. DocStorage can extract plain text from documents in doc, XLS, PPT, PDF, RTF, ODT, ODS, ODP, docx, XLSX, PPTX, and many other formats. It can use an OCR engine to extract plain text even from scanned documents. It can perform global fulltext search in all documents regardless of format. It supports document versioning, document duplicate detection, document notes, and document signing. It provides full integration with software suites like Microsoft Office and OpenOffice.

Download Website Updated 15 Sep 2012 LilyPond

Screenshot
Pop 264.45
Vit 19.02

LilyPond is a music typesetter. It produces beautiful sheet music using a file as input. LilyPond is part of the GNU Project.

Download Website Updated 11 Sep 2012 Universal Office Converter

Screenshot
Pop 171.71
Vit 4.90

unoconv converts between any document formats that LibreOffice understands. It uses LibreOffice's UNO bindings for non-interactive conversion of documents. Supported document formats include Open Document Format (.odt), MS Word (.doc), MS Office Open/MS OOXML (.xml), Portable Document Format (.pdf), HTML, XHTML, RTF, Docbook (.xml), and more, but image, spreadsheet and presentation formats are also supported.

No download Website Updated 06 Sep 2012 VTD-XML

Screenshot
Pop 93.72
Vit 9.79

Virtual Token Descriptor for eXtensible Markup Language (VTD-XML) refers to a collection of efficient XML processing technologies centered around a non-extractive XML parsing technique called Virtual Token Descriptor (VTD). Depending on the perspective, VTD-XML can be viewed as either an XML parser, a native XML indexer or a file format that uses binary data to enhance the text XML, an incremental XML content modifier, an XML slicer/splitter/assembler, or an XML editor/eraser.

Download Website Updated 05 Sep 2012 aephea

Screenshot
Pop 24.29
Vit 2.23

Aephea is a text-based authoring tool for HTML. It enforces well-formedness with a simpler and stricter TeX-like syntax and provides useful extensions and abstractions with facilities for adding new ones. It emphasizes a single unified approach that stays close to HTML itself and promotes and utilizes CSS extensively. Abstractions such as dictionary stacks, arithmetic, and iteration are part of Aephea.

Download Website Updated 04 Sep 2012 ePiX

Screenshot
Pop 193.52
Vit 20.44

ePiX creates mathematically accurate, publication-quality figures, plots, and animations. The input syntax is easy to learn, and the output is expressly designed for use with LaTeX. Complete documentation and dozens of sample files are included.

Download Website Updated 30 Aug 2012 urlwatch

Screenshot
Pop 163.75
Vit 7.46

urlwatch is a script intended to help you watch URLs and get notified (via email) of any changes. The change notification will include the URL that has changed and a unified diff of what has changed. The script works out of a single directory, so there is no need to install anything. State files are kept in the same folder. The script supports stripping parts of a page that are always changing through the use of a filter hook function. It is typically run as a cronjob.

Screenshot

Project Spotlight

phpMyAdmin

A tool that handles the basic administration of MySQL over the Web.

Screenshot

Project Spotlight

Collax V-Cube+

Virtualization and HA Management of virtual machines and embedded HA Storage.