RSS 26 projects tagged "Indexing"

Download Website Updated 14 Jun 2004 Net::Z3950::SimpleServer

Screenshot
Pop 49.90
Vit 2.63

Net::Z3950::SimpleServer is a Perl module which implements the server side of the Z39.50 (information retrieval) protocol. It hides the complexity of network exchanges, packet serialization, and session handling. You are required only to implement simple callbacks to support searching and record retrieval. It is the basis of the "Zoogle" project, which is a Z39.50 gateway to the Google web index.

Download Website Updated 30 Jan 2001 Sary

Screenshot
Pop 25.10
Vit 2.66

Sary is a suffix array library and tools. It provides fast full-text search facilities for text files on the order of 10 to 100 MB using a data structure called a suffix array. It can also search specific fields in a text file by assigning index points to those fields.

Download Website Updated 22 Dec 2001 XMLDB

Screenshot
Pop 53.94
Vit 1.78

XMLDB uses an RDBMS to persist arbitrary XML documents. Due to its storage mechanism, searching for and recalling documents is extremely quick. You can also perform XSL translation on documents with surprising speed. The library can be used in any program to store libxml2 documents. A PHP module is also included, making XMLDB into a complete three-tier Web application development suite.

Download Website Updated 23 Dec 2013 GNU libextractor

Screenshot
Pop 490.62
Vit 51.82

libextractor is a library used to extract meta-data from files of arbitrary type. It is designed to use helper-libraries to perform the actual extraction, and to be trivially extendable by linking against external extractors for additional file types. The goal is to provide developers of file-sharing networks, file managers, and WWW-indexing bots with a universal library to obtain meta-data about files. It includes a shell-command and bindings for Java (JNI) and Python.

No download Website Updated 19 Feb 2013 Managing Gigabytes for Java

Screenshot
Pop 212.98
Vit 16.96

MG4J is a highly customizable, high-performance, full-text Java search engine for large document collections. It provides state-of-the-art features (such as BM25/BM25F scoring) and new research algorithms.

Download Website Updated 05 Mar 2007 QDBM: Quick DataBase Manager

Screenshot
Pop 189.17
Vit 12.03

QDBM is an embedded database library compatible with GDBM and NDBM. It features hash database and B+ tree database and is developed referring to GDBM for the purpose of the following three points: higher processing speed, smaller size of a database file, and simpler API.

Download Website Updated 02 May 2003 Pyndex

Screenshot
Pop 42.66
Vit 1.74

Pyndex is a simple and fast full-text indexer implemented in Python. Pyndex also includes an easy to use Bayesian classifier. It uses Metakit as its storage back-end. It works well for quickly adding a search feature to an application, and is also well suited to in-memory indexing and searching. It can handle phrase queries. It performs best in applications involving a few thousand documents, but its scaling is mostly limited by available memory.

Download Website Updated 20 Jul 2003 SearchAssist

Screenshot
Pop 27.48
Vit 1.42

SearchAssist is a simple but practical search engine application that uses a ternary search tree. It uses Java's dynamic loading feature to make the search engine highly customizable, and uses takes Mozilla bookmarks as input. A Swing UI allows users to enter search words and view the results.

Download Website Updated 28 Jun 2012 Xapian and Omega

Screenshot
Pop 403.31
Vit 16.33

Xapian is a search engine library, scalable to collections containing hundreds of millions of documents. It's written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C#, Ruby, and Lua. It is a highly adaptable toolkit that allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also a rich set of boolean query operators. Omega is a Web search application built upon the Xapian library. It can index a Web server's document tree (including HTML, PDF, OpenOffice, MS Word/Excel/Powerpoint/Works, WordPerfect, RTF, PS, etc.), or data exported from arbitrary sources (e.g. SQL databases).

Download Website Updated 15 May 2004 StringSearch

Screenshot
Pop 61.69
Vit 1.75

The StringSearch library provides implementations of algorithms of the Boyer-Moore family and the Shift-Or (bit-parallel) family, for use in Java programs that need fast string searching algorithms.

Screenshot

Project Spotlight

MailSteward

A way to archive and access your email with the power of a relational database.

Screenshot

Project Spotlight

Zentyal

A small business server.