NetCrawler is the frontend to a Web crawling system. This command line application will download all of the pages within a domain, and then parse and process all of the relative content (Images, Text, Audio, Video), saving this content within an XML document for later processing. It is definitely alpha quality, but has been used quite extensively.
xpath2rss is an XPath to RSS scraper. XPath makes a better HTML scraper than regex (the typical solution) because it understands the structure of the document, rather than just treating it as a big string. As a result, xpath2rss is a more reliable scraper, and much easier to use, once you get the hang of XPath.
Integratis is a Web development framework that supports quick scripting and selective optimization of interactive Web sites. It is implemented in C using Pthreads and SysV shared memory. It features a multi-threaded application server, CGI and commandline clients that directly access objects in shared memory, an intelligent, designer-friendly HTML parser that automatically pre-populates forms, rewrites query strings, and executes server-side scripts in multiple scripting languages, an object-relational mapping layer, a built-in scripting language and a single OO framework that allows classes to be implemented in multiple scripting languages or as shared libraries.
Silva is a CMS for organizations that manage multiple or complex Web sites. Content is stored in clean XML, independent of layout and presentation. Features include versioning, a workflow system, an integral visual editor, content reuse, sophisticated access control, multi-site management, extensive import/export facilities, fine-grained templating, and hi-res image storage and manipulation. Silva is built on top of the Zope Web application platform.
Pysite is a tool written in Python to generate Web sites based on the contents of files in a given directory tree. It crawls through a specified directory tree looking for files matching certain configurable patterns (like intro, title-fr, and this-is-another-page-body), building up an output Web page for each set of files found. It features valid XHTML output, support for multiple language output, simple hierarchical templates (just put a template in a directory and it will be used for all subdirectories), and basic plaintext to XHTML conversion.
PyHtmlTable is a class for Python CGIs to generate HTML tables on the fly. It allows you to set individual row and cell attributes via arbitrary dictionaries, and span rows and columns. It autogrows the table if cells are set outside its initial range. It allows dynamic insertion of new rows and columns anywhere in the table, and allows bulk population of table data via arrays to arbitrary locations in the table. It also provides default cell attributes for tablewide uniformity and the ability to override these on a cell-by-cell basis. PyHtmlTable is intended to be a functional equivalent to Table.pm or Table.php.