BTE (Body Text Extractor) is a Python module that extracts the main body of text from a Web page. Many Web articles consist of a main body which constitutes the relevant part of the particular page. Surrounding this body is irrelevant information such as copyright notices, advertising, links to sponsors, etc. BTE identifies and extracts the main body text of an article.
HEBCI is a technique that allows a Web form handler to transparently detect the character set with which its data was encoded. By using carefully-chosen character references, the browser's encoding can be inferred. Thus, it is possible to guarantee that data is in a standard encoding without relying on (often unreliable) Web server/browser encoding interactions.
Historical Event Markup and Linking Project (Heml) provides an XML schema for historical events and a Java Web app which transforms conforming documents into hyperlinked timelines, maps and tables. It aims to provide a most information-rich interchange format for historical data, and thus add a historical component to the growing movement for a 'Semantic Web.'