BTE (Body Text Extractor) is a Python module that extracts the main body of text from a Web page. Many Web articles consist of a main body which constitutes the relevant part of the particular page. Surrounding this body is irrelevant information such as copyright notices, advertising, links to sponsors, etc. BTE identifies and extracts the main body text of an article.
DharmaDoc automates most of the tedious work involved in setting up a local Web server that contains a Buddhist reference library. The program allows you to download and install documents, and generates a search engine index. Afterward, you can just type your Buddhist topic of interest into a Web browser and get a wealth of information.
Dowser is a Web research and archiving tool that clusters results from search engines, associates words that appear in previous searches, and keeps a local cache of all the results you click on in a searchable database along with summaries and links to related information. It helps you to keep track of what you find, with no advertising.
Ferret CMS is a Content-Management System based on Zope that aims to be simple and intuitive to use for the non-technical user and easy to install and maintain for the administrator, while offering the developer flexibility and extensibility. The aim is to be able to get a Web site mechanism up and running within five minutes. It offers built-in tools such as a search engine and a workflow mechanism to facilitate the content visualization, creation, and administration.
HarvestMan is a multithreaded off-line browser.It has many features for customizing offline browsing through URL filters, word filters, domain filters, URL priorities, depth-fetching, fetch levels, file limits, time limits, robot exclusion protocols, and many more. It is useful to download an entire Web site or certain files from a Web site to the hard disk for offline browsing later. It supports HTTP/HTTPS and FTP protocols and can work across proxies.