HtmlRipper is a Java package that enables dynamic data to be extracted from Web pages, using pre-defined rule sets. It allows multiple data sets to be combined into a single dynamic web page, and is ideal for the creation of data mining, page analysis, Web page filtering, and article clipping software. The package includes a sample rules-enabled browser and rules editor.
|Tags||Internet Web Indexing/Search|
Release Notes: Minor bugfixes to the example browser and ripping classes. The documentation has been updated. Shell scripts have been added to execute the various example programs in the package.
Release Notes: New division (packaging) of classes within the jar file. Minor bugfixes. Minor modifications to the interface with the updated Fishcroft Java Utilities.
Release Notes: The REB browser, as part of the rule set creation process, can now transfer your rule sets to a central rule set repository located on the 'web' for use by any standard or rules enabled browser. Minor bugfixes were made. Servlet classes for the creation of rules repositories are now included.
Release Notes: Minor bugs in the rule creation routines of the REB browser were corrected.
Release Notes: A sample rules-enabled Web browser with rule creation and editing features has been added. It supports GET/POST and helper applications. The rules manipulation classes have been rewritten.