Xapian is a search engine library, scalable to collections containing hundreds of millions of documents. It's written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C#, Ruby, and Lua. It is a highly adaptable toolkit that allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also a rich set of boolean query operators. Omega is a Web search application built upon the Xapian library. It can index a Web server's document tree (including HTML, PDF, OpenOffice, MS Word/Excel/Powerpoint/Works, WordPerfect, RTF, PS, etc.), or data exported from arbitrary sources (e.g. SQL databases).
IMDbPY is a Python package useful to retrieve and manage the data of the IMDb movie database about movies, people, characters, and companies. It can retrieve data from both the IMDb's Web server and a local copy of the whole database. The IMDbPY package can be very easily used by programmers and developers to provide access to the IMDb's data to their programs. Some simple example scripts are included in the package.
itools is a collection of Python libraries which provides a wide range of capabilities, including an abstraction over directory and file resources, a search engine, type marshallers, datatype schemas, i18n support, URI handlers, a Web programming interface, a workflow interface, and support for data formats such as (X)HTML, XML, iCalendar, RSS 2.0, and XLIFF.
pyratemp is probably (one of) the smallest complete template-engines for Python (with less than 500 LOC). It has a very small set of special syntax in the templates. These features reduce complexity and the probability of bugs and lead to an easy-to-use and intuitive user-interface. It uses embedded Python-expressions (in a "sandbox"), is well documented, has full Unicode-support, and produces very good error-messages, which is very useful when creating new templates.
HarvestMan is a multithreaded off-line browser.It has many features for customizing offline browsing through URL filters, word filters, domain filters, URL priorities, depth-fetching, fetch levels, file limits, time limits, robot exclusion protocols, and many more. It is useful to download an entire Web site or certain files from a Web site to the hard disk for offline browsing later. It supports HTTP/HTTPS and FTP protocols and can work across proxies.
ClearSilver is a high-performance, powerful, and language-neutral HTML template system. It enforces a separation between presentation code and application logic which makes writing, debugging, and maintaining Web pages easier. It can be used from C/C++, Python, Perl, Java, and Ruby. It runs on Windows and Unix.