RSS 28 projects tagged "Indexing"

Download Website Updated 30 Nov 2003 harvest

Screenshot
Pop 190.33
Vit 8.65

Harvest is a system to collect information and make it searchable using a Web interface. It can collect information using HTTP, FTP, NNTP, and local files. Supported formats include HTML, DVI, PS, fulltext, mail, man pages, news, troff, WordPerfect, C sources, and many more. Adding support for new formats is easy due to Harvest's modular design.

Download Website Updated 06 Jan 2014 HTMLDOC

Screenshot
Pop 751.91
Vit 40.05

HTMLDOC converts HTML files and Web pages into indexed HTML, PostScript, and PDF files suitable for online viewing and printing. It can be used as a standalone GUI application, in a batch document processing environment, as a Web-based report generation application, or in embedded environments to support printing of HTML content. It runs on all Unix platforms as well as Mac OS X and Windows 2000 and higher.

Download Website Updated 15 Dec 2004 Namazu

Screenshot
Pop 149.24
Vit 3.69

Namazu is a full-text search system intended for easy use. Not only does it work as a small or medium scale Web search engine, but also as a personal search system for email or other files. Supported document types: HTML, Mail/News, MHonArc, RFC, TeX (with detex), man (with groff), Word (with wvWare), PDF (with pdftotext) and plain text.

Download Website Updated 29 Oct 2008 WebGlimpse

Screenshot
Pop 175.63
Vit 11.34

WebGlimpse is a scalable, feature-rich search engine for indexing your Web site or any collection of local and remote sites you choose. Features include customizable output formats, custom ranking/ordering of hits, fuzzy matching, boolean queries, a Web administration interface for multiple archives, logging of queries, caching of results, and more. Localized search interfaces are provided in multiple languages including Spanish, German, French, Italian, Norwegian, Finnish, Russian, Hebrew, and others. It supports 3rd party filters for indexing PDF, Word, and Excel files. It is free for academic and most nonprofit users.

Download Website Updated 17 Jan 2008 wf

Screenshot
Pop 41.17
Vit 2.94

wf scans a text file or standard input and counts the frequency of words through the whole text, sending resulting output to stdout showing each word and corresponding frequency.

Download Website Updated 13 Apr 2009 YASE

Screenshot
Pop 79.66
Vit 3.30

YASE is a text indexing and retrieval system. It allows you to index your document collection very easily. All words are indexed and can be optionally stemmed. The query tool supports searching all/any terms and can rank query results by relevance using the cosine measure.

Download Website Updated 22 Dec 2001 XMLDB

Screenshot
Pop 54.41
Vit 1.78

XMLDB uses an RDBMS to persist arbitrary XML documents. Due to its storage mechanism, searching for and recalling documents is extremely quick. You can also perform XSL translation on documents with surprising speed. The library can be used in any program to store libxml2 documents. A PHP module is also included, making XMLDB into a complete three-tier Web application development suite.

Download Website Updated 16 Mar 2005 Radsearch

Screenshot
Pop 26.68
Vit 2.06

Radsearch is a text utility used to retrieve records from a text file or list of text files, given a keyword and delimiter. This utility was written with the specific purpose of allowing quick retrieval of all login and logout records for a particular user in Radiusd log files.

Download Website Updated 19 Aug 2005 swish-e

Screenshot
Pop 76.11
Vit 1.20

SWISH-E is a fast, powerful, flexible, free, and easy to use system for indexing collections of Web pages or other files.

Download Website Updated 21 Apr 2002 Ron's Indexing Program

Screenshot
Pop 24.08
Vit 1.00

Ripunix is a command-line based system for indexing, searching, and browsing very large (multi- gigabyte) collections of plain text such as Project Gutenberg. It is optimized for efficiently maintaining these very large databases on machines with very small computing resources.

Screenshot

Project Spotlight

LeechCraft

A modular live environment.

Screenshot

Project Spotlight

ACE

An object-oriented C++ class library and framework.