RSS 72 projects tagged "Text Processing"

No download Website Updated 04 Feb 2002 Java Search Engine

Screenshot
Pop 76.14
Vit 66.77

Java Search Engine is a server-side search engine program for Web sites written completely in Java. It features HTML and PDF indexing, a built-in Web crawler, international encodings support, words and phrases search, and returning results as quotations with highlighted words (like Google). It is available as EJB, JSP, servlet, or Java API library. For non-Java enviroments, it is available as an XML server with XSLT support.

No download Website Updated 10 Jun 2004 @1 FAQ Publisher

Screenshot
Pop 21.31
Vit 60.01

@1 FAQ Publisher is a MySQL-based online FAQ management system.

No download Website Updated 03 Jul 2004 Amberfish

Screenshot
Pop 12.92
Vit 59.83

Amberfish is a general purpose text/XML retrieval utility. It features indexing of both free text and nested fields, built-in support for XML documents, structured queries allowing generalized field/tag paths, hierarchical result sets, automatic searching across multiple databases, efficient indexing, and relatively low memory requirements.

Download No website Updated 29 Aug 2005 BTE

Screenshot
Pop 10.58
Vit 56.19

BTE (Body Text Extractor) is a Python module that extracts the main body of text from a Web page. Many Web articles consist of a main body which constitutes the relevant part of the particular page. Surrounding this body is irrelevant information such as copyright notices, advertising, links to sponsors, etc. BTE identifies and extracts the main body text of an article.

Download Website Updated 23 Dec 2013 GNU libextractor

Screenshot
Pop 486.72
Vit 50.96

libextractor is a library used to extract meta-data from files of arbitrary type. It is designed to use helper-libraries to perform the actual extraction, and to be trivially extendable by linking against external extractors for additional file types. The goal is to provide developers of file-sharing networks, file managers, and WWW-indexing bots with a universal library to obtain meta-data about files. It includes a shell-command and bindings for Java (JNI) and Python.

Download Website Updated 04 Apr 2014 Terrier

Screenshot
Pop 210.88
Vit 42.66

Terrier is software for the rapid development of Web, intranet, and desktop search engines. More generally, it is a modular platform for building large-scale information retrieval applications, providing indexing and probabilistic retrieval functionalities. It comes with a desktop search application.

Download Website Updated 05 Oct 2013 Apache Lucene

Screenshot
Pop 259.46
Vit 21.14

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is suitable for nearly any application that requires full-text search, especially cross-platform.

No download Website Updated 19 Feb 2013 Managing Gigabytes for Java

Screenshot
Pop 212.98
Vit 16.87

MG4J is a highly customizable, high-performance, full-text Java search engine for large document collections. It provides state-of-the-art features (such as BM25/BM25F scoring) and new research algorithms.

Download Website Updated 28 Jun 2012 Xapian and Omega

Screenshot
Pop 403.31
Vit 16.28

Xapian is a search engine library, scalable to collections containing hundreds of millions of documents. It's written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C#, Ruby, and Lua. It is a highly adaptable toolkit that allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also a rich set of boolean query operators. Omega is a Web search application built upon the Xapian library. It can index a Web server's document tree (including HTML, PDF, OpenOffice, MS Word/Excel/Powerpoint/Works, WordPerfect, RTF, PS, etc.), or data exported from arbitrary sources (e.g. SQL databases).

No download Website Updated 05 Oct 2013 Apache Solr

Screenshot
Pop 163.42
Vit 13.35

Solr is an enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g. Word and PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.

Screenshot

Project Spotlight

Holtz

The abstract strategy games Zertz and Dvonn.

Screenshot

Project Spotlight

The Meson Build System

A next-generation build system.