RSS 15 projects tagged "Indexing"

Download Website Updated 30 Jan 2001 Sary

Screenshot
Pop 37.04
Vit 2.66

Sary is a suffix array library and tools. It provides fast full-text search facilities for text files on the order of 10 to 100 MB using a data structure called a suffix array. It can also search specific fields in a text file by assigning index points to those fields.

Download Website Updated 03 Feb 2001 XM Tool

Screenshot
Pop 37.20
Vit 1.00

XM Tool is a series of Perl snippets than can be called separately or combined into more complex Perl scripts. It uses XMLish (plain) text as the representation between stages, and a sample processor to read C/JavaDoc sources and generate HTML or even docbook is provided.

Download Website Updated 22 Dec 2001 XMLDB

Screenshot
Pop 63.43
Vit 1.78

XMLDB uses an RDBMS to persist arbitrary XML documents. Due to its storage mechanism, searching for and recalling documents is extremely quick. You can also perform XSL translation on documents with surprising speed. The library can be used in any program to store libxml2 documents. A PHP module is also included, making XMLDB into a complete three-tier Web application development suite.

No download Website Updated 03 Jan 2003 Automated Topic Classifier

Screenshot
Pop 29.83
Vit 1.00

The Automated Topic Classifier takes a list of titles and then uses a Bayesian analysis to classify those titles according to a table which is in a known database. It is used by the Globewide Network Academy distance learning catalog to semi-automatically classify courses.

Download Website Updated 05 Mar 2007 QDBM: Quick DataBase Manager

Screenshot
Pop 194.24
Vit 12.34

QDBM is an embedded database library compatible with GDBM and NDBM. It features hash database and B+ tree database and is developed referring to GDBM for the purpose of the following three points: higher processing speed, smaller size of a database file, and simpler API.

Download Website Updated 02 May 2003 Pyndex

Screenshot
Pop 54.09
Vit 1.75

Pyndex is a simple and fast full-text indexer implemented in Python. Pyndex also includes an easy to use Bayesian classifier. It uses Metakit as its storage back-end. It works well for quickly adding a search feature to an application, and is also well suited to in-memory indexing and searching. It can handle phrase queries. It performs best in applications involving a few thousand documents, but its scaling is mostly limited by available memory.

Download No website Updated 11 May 2004 Lupy

Screenshot
Pop 91.39
Vit 3.68

Lupy is a full-text indexer for Python. It is a port of Jakarta Lucene to Python, and reads, writes, and searches indexes in Lucene binary format. Like Lucene, it is sophisticated, scalable, and Unicode aware.

Download Website Updated 14 Mar 2004 The Lucene Application Layer

Screenshot
Pop 35.89
Vit 1.00

LUALA is an acronym for LUcene Application LAyer. It is an intermediate level API for document indexing and searching. It uses the low-level API of Lucene.

Download Website Updated 15 Mar 2005 Ellogon

Screenshot
Pop 65.00
Vit 1.83

Ellogon is a multi-lingual, cross-platform, general-purpose language engineering environment, developed in order to aid both researchers who are doing research in computational linguistics, as well as companies who produce and deliver language engineering systems. As a language engineering platform, it offers an extensive set of facilities, including tools for processing and visualising textual/HTML/XML data and associated linguistic information, support for lexical resources (like creating and embedding lexicons), tools for creating annotated corpora, accessing databases, comparing annotated data, or transforming linguistic information into vectors for use with various machine learning algorithms.

Download No website Updated 28 Jan 2006 The Revisionist

Screenshot
Pop 45.18
Vit 1.58

The Revisionist is a tool for extracting and indexing hidden metadata (such as deleted or modified text) from large collections of MS Word files. It can operate whole Web sites or SMB or NFS directories. It is handy for pen-testing, or it can be used just to spot embarrassing secrets.

Screenshot

Project Spotlight

beaTunes

An iTunes companion for BPM detection, song matching, and meta data correction.

Screenshot

Project Spotlight

Magento News And Press Release Extension

An extension which creates a separate page for an admin to publish news related to product and services.