MG4J is a highly customizable, high-performance, full-text Java search engine for large document collections. It provides state-of-the-art features (such as BM25/BM25F scoring) and new research algorithms.
| Tags | Internet Web Indexing/Search Text Processing Indexing Software Development Libraries Java Libraries |
|---|---|
| Licenses | LGPLv3+ |
| Operating Systems | OS Independent |
| Implementation | Java |
Recent releases


Release Notes: This is the first release of the big version of MG4J, which is able to handle up to 2^63 terms and documents.


Release Notes: This is part of a parallel release of fastutil, the DSI Utilities, Sux4J, MG4J, WebGraph, etc. that prepares the way for "big" versions supporting more than 2^31 entries in arrays (simulated), elements in lists, terms, documents, nodes, etc. There were several refinements to semantics, and a few subtle, longstanding bugs were fixed.


Release Notes: Thanks to fastutil 6, MG4J is no longer dependent on COLT. A few bugs have been fixed.


Release Notes: Major improvements were made to indexing. Lightweight compressed-collection construction was added. A skipping system with variable quanta was added. Memory mapping is used for large indices. Many bugs were fixed.


Release Notes: All new stemmers from Snowball were generating empty strings, causing major indexing problems. This has been fixed.