urlwatch is a script intended to help you watch URLs and get notified (via email) of any changes. The change notification will include the URL that has changed and a unified diff of what has changed. The script works out of a single directory, so there is no need to install anything. State files are kept in the same folder. The script supports stripping parts of a page that are always changing through the use of a filter hook function. It is typically run as a cronjob.
|Tags||Internet Web Dynamic Content Indexing/Search Site Management Link Checking Text Processing Filters Markup|
|Operating Systems||OS Independent|
Release Notes: This release added support for parsing HTML pages (via the html2text module) as UTF-8 in Lynx and Debian's html2text command line application.
Release Notes: This bugfix release corrects an issue related to the html2txt module when used in filter hook scripts. It's recommended for all users to upgrade to this release from previous releases. No new features have been introduced since the last release.
Release Notes: This release adds support for watching Web pages using HTTP POST requests by allowing the urls.txt file to contain POST data after each Web site URL. Starting from this release, urlwatch supports Python 3.x in addition to Python 2.x. For Python versions earlier than 3.2, you have to install "futures" from PyPI, because urlwatch now depends on this module to concurrently access Web pages for better bandwidth utilization.
Release Notes: This release is a minor maintenance update, making the creation of filters less error prone by accepting and handling a None return value as a "do not filter" indicator.
Release Notes: Shell commands that are monitored for their output now produce an error in the urlwatch report instead of being displayed as "everything was deleted from the output" when the exit code is non-zero.