RSS 119 projects tagged "Indexing"

Download Website Updated 30 Nov 2003 harvest

Screenshot
Pop 188.35
Vit 8.65

Harvest is a system to collect information and make it searchable using a Web interface. It can collect information using HTTP, FTP, NNTP, and local files. Supported formats include HTML, DVI, PS, fulltext, mail, man pages, news, troff, WordPerfect, C sources, and many more. Adding support for new formats is easy due to Harvest's modular design.

Download Website Updated 06 Jan 2014 HTMLDOC

Screenshot
Pop 756.01
Vit 40.45

HTMLDOC converts HTML files and Web pages into indexed HTML, PostScript, and PDF files suitable for online viewing and printing. It can be used as a standalone GUI application, in a batch document processing environment, as a Web-based report generation application, or in embedded environments to support printing of HTML content. It runs on all Unix platforms as well as Mac OS X and Windows 2000 and higher.

Download Website Updated 15 Dec 2004 Namazu

Screenshot
Pop 147.78
Vit 3.69

Namazu is a full-text search system intended for easy use. Not only does it work as a small or medium scale Web search engine, but also as a personal search system for email or other files. Supported document types: HTML, Mail/News, MHonArc, RFC, TeX (with detex), man (with groff), Word (with wvWare), PDF (with pdftotext) and plain text.

Download Website Updated 14 Jun 2004 Net::Z3950::SimpleServer

Screenshot
Pop 49.72
Vit 2.63

Net::Z3950::SimpleServer is a Perl module which implements the server side of the Z39.50 (information retrieval) protocol. It hides the complexity of network exchanges, packet serialization, and session handling. You are required only to implement simple callbacks to support searching and record retrieval. It is the basis of the "Zoogle" project, which is a Z39.50 gateway to the Google web index.

Download Website Updated 30 Jan 2001 Sary

Screenshot
Pop 24.74
Vit 2.66

Sary is a suffix array library and tools. It provides fast full-text search facilities for text files on the order of 10 to 100 MB using a data structure called a suffix array. It can also search specific fields in a text file by assigning index points to those fields.

Download Website Updated 15 Mar 2006 SWISH++

Screenshot
Pop 236.75
Vit 9.25

SWISH++ is a Unix-based file indexing and searching engine (typically used to index and search files on web sites). It was based on SWISH-E although SWISH++ is a complete rewrite. SWISH++ is at least 10 times faster and can handle much larger numbers of files. Additionally, it has unique features such as selective non-indexing, on-the-fly filters, user-selectable stemming, and more.

Download Website Updated 29 Oct 2008 WebGlimpse

Screenshot
Pop 174.42
Vit 11.35

WebGlimpse is a scalable, feature-rich search engine for indexing your Web site or any collection of local and remote sites you choose. Features include customizable output formats, custom ranking/ordering of hits, fuzzy matching, boolean queries, a Web administration interface for multiple archives, logging of queries, caching of results, and more. Localized search interfaces are provided in multiple languages including Spanish, German, French, Italian, Norwegian, Finnish, Russian, Hebrew, and others. It supports 3rd party filters for indexing PDF, Word, and Excel files. It is free for academic and most nonprofit users.

Download Website Updated 17 Jan 2008 wf

Screenshot
Pop 40.25
Vit 2.94

wf scans a text file or standard input and counts the frequency of words through the whole text, sending resulting output to stdout showing each word and corresponding frequency.

Download Website Updated 13 Apr 2009 YASE

Screenshot
Pop 78.71
Vit 3.30

YASE is a text indexing and retrieval system. It allows you to index your document collection very easily. All words are indexed and can be optionally stemmed. The query tool supports searching all/any terms and can rank query results by relevance using the cosine measure.

Download Website Updated 03 Feb 2001 XM Tool

Screenshot
Pop 25.50
Vit 1.00

XM Tool is a series of Perl snippets than can be called separately or combined into more complex Perl scripts. It uses XMLish (plain) text as the representation between stages, and a sample processor to read C/JavaDoc sources and generate HTML or even docbook is provided.

Screenshot

Project Spotlight

GNU Parallel

Software to build and execute shell command lines from standard input in parallel.

Screenshot

Project Spotlight

fio

A flexible I/O tester/benchmarker.