RSS 2000 projects tagged "Text Processing"

Download Website Updated 22 May 2013 GNU Parallel

Screenshot
Pop 587.16
Vit 188.56

GNU parallel is a shell tool for executing jobs in parallel locally or using remote computers. A job is typically a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. If you use xargs today you will find GNU parallel very easy to use, as GNU parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. If you use ppss or pexec you will find GNU parallel will often make the command easier to read. GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU parallel as input for other programs.

Download Website Updated 20 May 2013 Verbiste

Screenshot
Pop 360.16
Vit 208.04

Verbiste is a French conjugation system implemented as a C++ library, a GNOME applet, and two command-line tools. It can conjugate verbs and analyze conjugated verbs to determine their mode, tense, and person. The knowledge base contains over 6700 verbs.

Download Website Updated 19 May 2013 TCPDF

Screenshot
Pop 1,880.13
Vit 627.40

TCPDF is a PHP class for generating PDF documents without requiring external extensions. TCPDF supports all ISO page formats and custom page formats, custom margins and units of measure, UTF-8 Unicode, RTL languages, HTML, barcodes, TrueTypeUnicode, TrueType, OpenType, Type1, and CID-0 fonts, images, graphic functions, clipping, bookmarks, JavaScript, forms, page compression, digital signatures, and encryption.

Download Website Updated 17 May 2013 TXR

Screenshot
Pop 271.64
Vit 101.13

TXR is a new data munging language to replace the likes of awk and Perl. TXR's special pattern language provides template-based matching of entire documents or large sections of documents. It also contains a language for functional and imperative programming. It is written in C and takes the form of a utility that is portable to Unix-like platforms and Windows.

Download Website Updated 14 May 2013 Recoll

Screenshot
Pop 328.28
Vit 107.53

Recoll is a personal full text desktop search tool based on Xapian. It provides an easy to use, feature-rich, easy administration interface with a Qt-based GUI. Text, HTML, PDF, PostScript, MS Word, OpenOffice, Wordperfect, KWord, Abiword, maildir, and mailbox mail folder formats are supported, along with their compressed versions and quite a few others. Powerful query facilities are provided. Multiple character sets are supported, and internal processing and storage uses Unicode UTF-8. Stemming is performed at query time and the stemming language can be switched after indexing.

No download Website Updated 12 May 2013 Template Data Interface (TDI)

Screenshot
Pop 55.50
Vit 6.83

Template Data Interface (TDI, /ʹtedɪ/) is a markup templating system written in Python with (optional but recommended) speedup code written in C. Unlike most templating systems, TDI does not invent its own language to provide functionality. Instead, you simply mark the nodes you want to manipulate within the template document. The template is parsed, and the marked nodes are presented to your Python code, where they can be modified in any way you want.

Download Website Updated 12 May 2013 Sanzang

Screenshot
Pop 86.93
Vit 1.87

Sanzang is a compact and simple cross-platform machine translation system. It is especially useful for translating from the CJK languages (Chinese, Japanese, and Korean), and it is very suitable for working with ancient and otherwise difficult texts. Unlike most other machine translation systems, Sanzang is small and approachable. Any user can develop his or her own translation rules, and these rules are simply stored in a text file and applied at runtime.

Download Website Updated 06 May 2013 The Epeios XML preprocessor

Screenshot
Pop 121.61
Vit 31.97

The 'expp' tool (the Epeios XML preprocessor) reads an XML file to transform it to another XML file. It simplifies the writing of XML files by allowing the handling of macros, the definition and testing of variables, the inclusion of files, and more. This is done by writing, directly in the source XML file, predefined tags owned by a given namespace, tags which are then recognized and handled by the 'expp' tool. The tool is also available as a Java native component.

No download Website Updated 06 May 2013 FBReaderJ

Screenshot
Pop 1,810.34
Vit 77.24

FBReaderJ is an e-book reader for the Android platform. It is a clone of the FBReader book reader written in Java by the same authors. FBReaderJ supports several e-book formats: oeb, epub, and fb2. Direct reading from zip, tar, and gzip archives is supported.

Download Website Updated 03 May 2013 pyPEG

Screenshot
Pop 280.40
Vit 52.18

pyPEG is a quick and easy solution for creating a parser in Python programs. pyPEG uses a PEG language in Python data structures to parse, so it can be used dynamically to parse nearly every context free language. The output is a plain Python data structure called pyAST, or, as an alternative, XML.

Screenshot

Project Spotlight

Bootjack

Twitter Bootstrap ported to Dart.

Screenshot

Project Spotlight

nct

A color-extended tetris game.