Projects / JWPL


JWPL is a language independent, database-driven, high performance Wikipedia API that provides structured access to information nuggets like redirects, categories, articles, and link structure. It contains a Mediawiki Markup parser that can be used to further analyze the contents of a Wikipedia page or standalone with other text, TimeMachine, which reconstructs a snapshot of Wikipedia from a specific date, or multiple snapshots from a time span, and RevisionMachine, which offers efficient access to the history of articles using a dedicated storage format which decreases storage space by 98%. This enables random access to the whole revision history without requiring several terabytes of storage for a single Wikipedia dump.

Operating Systems

Recent releases

  •  15 Aug 2012 10:25

    Release Notes: This release contains major changes in the JWPL API. However, some of the changes are still ongoing, i.e. the switch from the JWPL parser to the SWEBLE parser. Hence the minor release. Nevertheless, the JWPL should be stable. The JWPL parser has been moved to its own module. Projects using the parser have to add it as an additional dependency. Individual bugfixes and enhancements can be seen in the individual module changelogs.

    •  20 Feb 2012 23:07

      Release Notes: This release fixes a bug in the API which prevented fetching inlink IDs. Several improvements to hibernate session handling have been made.

      •  09 Feb 2012 13:08

        Release Notes: JWPL Core now depends on Hibernate 4.0.0-final. The PageIterator can now iterate over a predefined list of pages. All components of the RevisionMachine are now able to produce datafiles in addition to SQL dumps. A severe error in the DiffTool has been fixed that caused exceptions when creating a new revision dump.


        Project Spotlight


        A Fluent OpenStack client API for Java.


        Project Spotlight

        TurnKey TWiki Appliance

        A TWiki appliance that is easy to use and lightweight.