RSS 21 projects tagged "Indexing"

Download Website Updated 30 Nov 2003 harvest

Screenshot
Pop 188.35
Vit 8.65

Harvest is a system to collect information and make it searchable using a Web interface. It can collect information using HTTP, FTP, NNTP, and local files. Supported formats include HTML, DVI, PS, fulltext, mail, man pages, news, troff, WordPerfect, C sources, and many more. Adding support for new formats is easy due to Harvest's modular design.

Download Website Updated 15 Dec 2004 Namazu

Screenshot
Pop 147.78
Vit 3.69

Namazu is a full-text search system intended for easy use. Not only does it work as a small or medium scale Web search engine, but also as a personal search system for email or other files. Supported document types: HTML, Mail/News, MHonArc, RFC, TeX (with detex), man (with groff), Word (with wvWare), PDF (with pdftotext) and plain text.

Download Website Updated 14 Jun 2004 Net::Z3950::SimpleServer

Screenshot
Pop 49.72
Vit 2.63

Net::Z3950::SimpleServer is a Perl module which implements the server side of the Z39.50 (information retrieval) protocol. It hides the complexity of network exchanges, packet serialization, and session handling. You are required only to implement simple callbacks to support searching and record retrieval. It is the basis of the "Zoogle" project, which is a Z39.50 gateway to the Google web index.

Download Website Updated 29 Oct 2008 WebGlimpse

Screenshot
Pop 174.42
Vit 11.35

WebGlimpse is a scalable, feature-rich search engine for indexing your Web site or any collection of local and remote sites you choose. Features include customizable output formats, custom ranking/ordering of hits, fuzzy matching, boolean queries, a Web administration interface for multiple archives, logging of queries, caching of results, and more. Localized search interfaces are provided in multiple languages including Spanish, German, French, Italian, Norwegian, Finnish, Russian, Hebrew, and others. It supports 3rd party filters for indexing PDF, Word, and Excel files. It is free for academic and most nonprofit users.

Download Website Updated 03 Feb 2001 XM Tool

Screenshot
Pop 25.50
Vit 1.00

XM Tool is a series of Perl snippets than can be called separately or combined into more complex Perl scripts. It uses XMLish (plain) text as the representation between stages, and a sample processor to read C/JavaDoc sources and generate HTML or even docbook is provided.

Download Website Updated 24 Oct 2001 gelapas

Screenshot
Pop 20.05
Vit 1.00

gelapas crawls the file tree and extracts information from files. The default settings (and the shorthand options) are useful to extract information such as the title or meta tags from HTML files, but it could also be used for other kind of documents.

Download Website Updated 16 Apr 2002 Zoogle

Screenshot
Pop 30.20
Vit 1.00

Zoogle is a Z39.50 gateway to the Google web index. With this gateway you can search the Google index with any Z39.50 client. It is based upon Google's official search API, the popular YAZ toolkit, and the Perl module Net::Z3950::SimpleServer.

Download Website Updated 13 Apr 2009 Sherlock Holmes

Screenshot
Pop 97.16
Vit 4.80

Sherlock Holmes is a modular system for gathering and indexing textual and image data, and searching in it. The most popular application is, of course, indexing of Web pages ranging from small Web sites to whole top-level domains, but other data sources, parsers, and user interfaces can be added easily.

Download Website Updated 03 Nov 2002 Marko

Screenshot
Pop 29.58
Vit 1.42

Marko is a simple toolset that allows you to create markov chain databases of a corpus (or two) of text and then allows you to compare unknown texts to these databases. For any two marko databases you can calculate the probability that the unknown body is related to one over the other. Possible applications include intelligent mail filtering, plagiarism detection, and historical research.

No download Website Updated 03 Jan 2003 Automated Topic Classifier

Screenshot
Pop 26.27
Vit 1.00

The Automated Topic Classifier takes a list of titles and then uses a Bayesian analysis to classify those titles according to a table which is in a known database. It is used by the Globewide Network Academy distance learning catalog to semi-automatically classify courses.

Screenshot

Project Spotlight

Guacamole

A pure HTML5/JavaScript VNC viewer.

Screenshot

Project Spotlight

DeforaOS Editor

A simple text editor for the desktop.