HtmlCleaner is an HTML parser. HTML found on the Web is usually dirty, ill-formed, and unsuitable for further processing. For any serious consumption of such documents, it is necessary to first clean up the mess and bring order to the tags, attributes, and ordinary text. For a given HTML document, HtmlCleaner reorders individual elements and produces well-formed XML. By default, it follows rules similar to those which most Web browsers use to create a Document Object Model. However, the user may provide custom tag and rule sets for tag filtering and balancing.
Changing directories in bash can be tedious if you have long names or nested paths. Creating aliases or adding to the CDPATH can help, but can be improved on. Bashcd adds 6 new commands to make changing directories a bit easier. This commands use find, the locate database, the mdfind database, or other contextual information to make it easier to change to other directories.
pg_repack is a PostgreSQL extension which lets you remove bloat from tables and indexes, and optionally restore the physical order of clustered indexes. Unlike CLUSTER and VACUUM FULL, it works online, without holding an exclusive lock on the processed tables during processing. pg_repack is efficient, with performance comparable to using CLUSTER directly.
EXIP provides a C library for the parsing and serialization of Efficient XML Interchange (EXI) format streams. The focus is portability and efficiency for embedded systems development. The project was started at the EISLAB research group in the Department of Computer Science, Electrical and Space Engineering, Luleå University of Technology, and is part of research efforts to bring resource-constrained embedded devices, such as wireless sensor nodes, closer to the enterprise business processes taking place in processing, manufacturing, and communication industries.
uterus is a codec library for financial tick data with an emphasis on market data integrity and maintainability. It comes with a set of tools to convert (mux) and print (demux) data from some sources, and to perform standard tasks like selecting instruments, creating snapshots and candles from tick data, etc. Special care is taken to provide longevity and consistence. All timestamps are internally converted to coordinated time, and price and quantity quotes are converted to a monetary datatype which doesn't suffer from rounding errors. Most importantly, meta data is stored along with the payload data in an inseparable unit, to provide self-contained and self-documenting files or network streams.