HTML::TableExtract is a Perl module that simplifies the extraction of information from tables within HTML documents. Tables, no matter how nested or clustered, can be targeted symbolically with column headers or by more specific depth and count information.
|Tags||Internet Web Indexing/Search Software Development Libraries Perl Modules Text Processing Markup HTML/XHTML|
|Operating Systems||OS Independent|
Release Notes: A subtable slicing bug and an hrow() attachment bug were fixed. Tests were added.
Release Notes: Tightens up element interactions in TREE() mode when examining rows, columns, cells, etc. Was running into trouble with dereferencing scalars vs objects. The space() H::TE::T method has been documented, and tests have been added. POD tests have been added. There are documentation updates and fixes.
Release Notes: Tables can now be selected by table tag attributes. The lineage() method now returns row and column information as well as depth and count for each ancestor (a potential backwards incompatibility exists - entries are now 4 element arrays rather than 2). Header matching and column retention enhancements were made. Old-style procedures were deprecated in preparation for them to become methods. Various bugfixes were made.