xpath2rss is an XPath to RSS scraper. XPath makes a better HTML scraper than regex (the typical solution) because it understands the structure of the document, rather than just treating it as a big string. As a result, xpath2rss is a more reliable scraper, and much easier to use, once you get the hang of XPath.
| Tags | Internet Web Dynamic Content Text Processing Markup HTML/XHTML XML |
|---|---|
| Operating Systems | OS Independent |
| Implementation | Python |
Recent releases


Release Notes: This release has support for HTML and XHTML (see README for caveats), has arbitrary channel metadata and item attributes, checks for a link tag pointing to available feed(s) when scraping, checks robots.txt to be polite, has a new config file format (hopefully the last time it will change), includes an XSLT stylesheet to convert config files, has changed to RSS.py for RSS generation, and has general code cleanup.


Release Notes: This release changes xml.sax.writer to the spiffier xml.sax.saxutils.XMLGenerator, and fixes more encoding errors.