RSS 39 projects tagged "Text Processing"

No download Website Updated 07 Mar 2007 Ruty

Screenshot
Pop 24.49
Vit 47.91

Ruty is a Ruby templating engine inspired by the Django templating engine. It can handle any kind of text-based data.

No download No website Updated 01 Jan 2011 Winnow

Screenshot
Pop 23.32
Vit 30.04

Winnow efficiently trains and operates any number of unique Bayesian (Naive Bayes) classifiers on large sets of content. It has very high performance and works with very small training and unbalanced training sets. It has been used to power an innovative Web feed reader that uses smart tags, which learn and find the content you want to see, from more sources than you can follow with traditional feed readers. It works particularly well with Ruby and Ruby on Rails.

Download Website Updated 28 Jun 2012 Xapian and Omega

Screenshot
Pop 310.06
Vit 21.32

Xapian is a search engine library, scalable to collections containing hundreds of millions of documents. It's written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C#, Ruby, and Lua. It is a highly adaptable toolkit that allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also a rich set of boolean query operators. Omega is a Web search application built upon the Xapian library. It can index a Web server's document tree (including HTML, PDF, OpenOffice, MS Word/Excel/Powerpoint/Works, WordPerfect, RTF, PS, etc.), or data exported from arbitrary sources (e.g. SQL databases).

Download Website Updated 19 Jan 2009 SiSU

Screenshot
Pop 177.17
Vit 19.45

SiSU (Structured information, Serialized Units) is a lightweight markup based, text structuring and publishing framework (that features granular search). With minimal markup of a plaintext file, it produces: plain-text, HTML, XHTML, XML, ODF, LaTeX, PDF, and populates an SQL database at an object/paragraph level for granular searches. Prepare documents using your text editor of choice, then use SiSU to generate the desired output formats. SiSU is controlled from the command line.

Download Website Updated 05 Mar 2007 QDBM: Quick DataBase Manager

Screenshot
Pop 195.16
Vit 12.31

QDBM is an embedded database library compatible with GDBM and NDBM. It features hash database and B+ tree database and is developed referring to GDBM for the purpose of the following three points: higher processing speed, smaller size of a database file, and simpler API.

Download Website Updated 10 Feb 2007 glark

Screenshot
Pop 126.33
Vit 6.77

glark offers grep-like searching of text files, with very powerful, complex regular expressions (e.g., "/foo\w+/ and /bar[^\d]*baz$/ within 4 lines of each other"). It also highlights the matches, displays context (preceding and succeeding lines), does case-insensitive matches, and automatic exclusion of non-text files. It supports most options from the GNU version of grep.

Download Website Updated 08 Feb 2009 deplate

Screenshot
Pop 79.42
Vit 6.14

deplate converts wiki-like markup to LaTeX (standard classes, koma, dramatist, sweave), HTML/PHP (single page, chunked/website, HTML, or s5-based slideshow), DocBook (article, book, man/ref page), and really plain text. Currently supported input formats are viki and Ruby's rdoc. The viki markup supports footnotes, citations, index, table of contents, embedded LaTeX for mathematics, integration with R for dynamically generated figures and tables, and more. Output can be customized via page templates.

Download Website Updated 22 Feb 2006 xmltv2html

Screenshot
Pop 50.20
Vit 5.11

xmltv2html is a script that transforms the XML output of XMLTV into HTML.

Download Website Updated 23 Nov 2009 Syck

Screenshot
Pop 96.66
Vit 5.07

Syck is a YAML parser library that is designed to load data into scripting languages. Extensions for Ruby, PHP, and Python are included.

Download Website Updated 17 Mar 2004 ZenWeb

Screenshot
Pop 90.50
Vit 4.79

ZenWeb is a system for building entire Web sites, not just pages. It allows you to focus on the content and the structure of the website, while leaving page construction, markup, layout, and navigation as secondary concerns. It provides tools for complete Web site design and creation, simple paragraph to HTML generation with embellishments, and a rich set of tools for page and Web site creation, modification, and customization.

Screenshot

Project Spotlight

LogicalDOC

A Web-based document management system with a Google-like search engine.

Screenshot

Project Spotlight

Monkey HTTP Daemon

A small, powerful, and really fast Web server for Linux.