Projects / Managing Gigabytes for Java

Managing Gigabytes for Java

MG4J is a highly customizable, high-performance, full-text Java search engine for large document collections. It provides state-of-the-art features (such as BM25/BM25F scoring) and new research algorithms.

Tags
Licenses
Operating Systems
Implementation

RSS Recent releases

  •  22 Sep 2011 13:04

Release Notes: This is the first release of the big version of MG4J, which is able to handle up to 2^63 terms and documents.

  •  15 Sep 2011 01:20

    Release Notes: This is part of a parallel release of fastutil, the DSI Utilities, Sux4J, MG4J, WebGraph, etc. that prepares the way for "big" versions supporting more than 2^31 entries in arrays (simulated), elements in lists, terms, documents, nodes, etc. There were several refinements to semantics, and a few subtle, longstanding bugs were fixed.

    •  12 Dec 2010 16:50

      Release Notes: Thanks to fastutil 6, MG4J is no longer dependent on COLT. A few bugs have been fixed.

      •  06 Jun 2009 23:36

        Release Notes: Major improvements were made to indexing. Lightweight compressed-collection construction was added. A skipping system with variable quanta was added. Memory mapping is used for large indices. Many bugs were fixed.

        •  29 Feb 2008 10:23

        Release Notes: All new stemmers from Snowball were generating empty strings, causing major indexing problems. This has been fixed.

        Screenshot

        Project Spotlight

        TomP2P

        A P2P-based high performance key-value pair storage library.

        Screenshot

        Project Spotlight

        shelisp

        A Common Lisp interface to bash and other Unix shells.