RSS 14 projects tagged "Indexing"

Download Website Updated 30 Jan 2001 Sary

Screenshot
Pop 37.04
Vit 2.66

Sary is a suffix array library and tools. It provides fast full-text search facilities for text files on the order of 10 to 100 MB using a data structure called a suffix array. It can also search specific fields in a text file by assigning index points to those fields.

Download Website Updated 16 Mar 2005 Radsearch

Screenshot
Pop 33.59
Vit 2.09

Radsearch is a text utility used to retrieve records from a text file or list of text files, given a keyword and delimiter. This utility was written with the specific purpose of allowing quick retrieval of all login and logout records for a particular user in Radiusd log files.

No download Website Updated 28 Nov 2005 X-Hive/DB

Screenshot
Pop 90.99
Vit 3.78

X-Hive/DB is a powerful native XML database designed for software developers who require advanced XML data processing and storage functionality within their applications. The comprehensive X-Hive/DB Java API contains methods for storing, querying, retrieving, transforming, and publishing XML data. X-Hive/DB supports all major W3C standards, such as XQuery, XPath, DOM, XPointer, XML Schemas, and more.

Download Website Updated 03 Nov 2002 Marko

Screenshot
Pop 39.33
Vit 1.42

Marko is a simple toolset that allows you to create markov chain databases of a corpus (or two) of text and then allows you to compare unknown texts to these databases. For any two marko databases you can calculate the probability that the unknown body is related to one over the other. Possible applications include intelligent mail filtering, plagiarism detection, and historical research.

Download Website Updated 12 Apr 2006 Docco

Screenshot
Pop 82.77
Vit 2.36

Docco is a personal document retrieval tool based on Apache's Lucene indexing engine. It allows you to create an index for files on your file system which you can then search for keywords. It is not only a lot faster than searching by recursing through your file system every time, it also offers you extended query options like wildcards and fuzzy search as well as a visualization of result set intersections.

Download No website Updated 28 Jan 2006 The Revisionist

Screenshot
Pop 45.18
Vit 1.58

The Revisionist is a tool for extracting and indexing hidden metadata (such as deleted or modified text) from large collections of MS Word files. It can operate whole Web sites or SMB or NFS directories. It is handy for pen-testing, or it can be used just to spot embarrassing secrets.

Download Website Updated 14 Jan 2010 Doodle

Screenshot
Pop 153.81
Vit 6.75

Doodle is a desktop search engine for Linux. It searches your hard drive for files using pattern matching on meta-data. It extracts file-format specific meta-data using libextractor and builds a suffix tree to index the files. The index can then be searched rapidly. It is similar to locate, but can take advantage of information such as ID3 tags. It is possible to do full-text indexing using the appropriate libextractor plugins. It also supports using FAM to keep the database up-to-date.

Download Website Updated 07 Jul 2004 Alb

Screenshot
Pop 17.83
Vit 1.42

Alb generates hierarchical, captioned, XHTML 1.1/CSS 2.1 Web galleries from image directories.

Download Website Updated 12 Oct 2006 PDFBox

Screenshot
Pop 139.94
Vit 2.81

PDFBox is a Java library for manipulating PDF documents and extracting contents from existing PDF documents.

No download Website Updated 15 Apr 2005 regain

Screenshot
Pop 35.58
Vit 54.41

regain is a desktop search engine. It supports most common file formats, including Word, Excel, Powerpoint, OpenOffice/ StarOffice, PDF, RTF, HTML, and more. It indexes both file systems and Web sites. It is platform independent, and also usable as a server-side search engine.

Screenshot

Project Spotlight

FFmpegPHP

A pure PHP wrapper for ffmpeg.

Screenshot

Project Spotlight

TEA

A modest and easy-to-use editor with many useful features for HTML editing.