JWPL is a language independent, database-driven, high performance Wikipedia API that provides structured access to information nuggets like redirects, categories, articles, and link structure. It contains a Mediawiki Markup parser that can be used to further analyze the contents of a Wikipedia page or standalone with other text, TimeMachine, which reconstructs a snapshot of Wikipedia from a specific date, or multiple snapshots from a time span, and RevisionMachine, which offers efficient access to the history of articles using a dedicated storage format which decreases storage space by 98%. This enables random access to the whole revision history without requiring several terabytes of storage for a single Wikipedia dump.
|Tags||Wikipedia API revisions edit history JWPL converter Parser MediaWiki|
|Operating Systems||Platform Independent|
|Implementation||Java 1.6+ hibernate|
Release Notes: This release contains major changes in the JWPL API. However, some of the changes are still ongoing, i.e. the switch from the JWPL parser to the SWEBLE parser. Hence the minor release. Nevertheless, the JWPL should be stable. The JWPL parser has been moved to its own module. Projects using the parser have to add it as an additional dependency. Individual bugfixes and enhancements can be seen in the individual module changelogs.
Release Notes: This release fixes a bug in the API which prevented fetching inlink IDs. Several improvements to hibernate session handling have been made.
Release Notes: JWPL Core now depends on Hibernate 4.0.0-final. The PageIterator can now iterate over a predefined list of pages. All components of the RevisionMachine are now able to produce datafiles in addition to SQL dumps. A severe error in the DiffTool has been fixed that caused exceptions when creating a new revision dump.