Weka is a collection of machine learning algorithms for solving real-world data mining problems. It is written in Java and runs on almost any platform. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka is also well-suited for developing new machine learning schemes. The development version contains a GUI with visualization tools and direct database access.
The Chemistry Development Kit (CDK) is a library of Java classes for chemo-, bioinformatics, computational chemistry, and chemometrics. It provides important algorithms like substructure search, SMILES, Gasteiger charges, QSAR descriptor calculation, 3D structure generation, 2D layout and rendering, many IO formats, atom typing, and more.
The libmba package is a collection of mostly independent C modules potentially useful to any project. There are the usual ADTs including a linkedlist, hashmap, pool, stack, and varray, a flexible memory allocator, CSV parser, path canonicalization routine, I18N text abstraction, configuration file module, portable semaphores, condition variables, and more. The code is designed so that individual modules can be integrated into existing codebases rather than requiring the user to commit to the entire library. The code has no typedefs, few comments, and extensive man pages and HTML documentation.
PhpLabware is a database system with features similar to Filemaker and Access, but is completely Web-based. Relational databases can be defined through the Web interface and extended with PHP plugin code. User and group access is regulated at the record level. It was written by and for life sciences research scientists, but should be generally useful as a Web-based front end to SQL databases.
GeneX Va is a gene expression database supporting storage and analysis of Affymetrix GeneChip technology. It is designed to serve as a secure repository and archive for many researchers' data. It is typically expected to be installed as part of a microarray center; the software is compact enough to install for a single department or even a single user. It includes an Analysis Tree package which includes an ever-expanding set of analytical tools, and has plug-in architecture allowing easy expansion. The "Va" in the name stands for the University of Virginia version, which is a total rewrite of what was originally NCGR's GeneX.
Berkeley DB XML is a native XML database engine for use within your product. Made available as a C++ library with language bindings for Java, Perl, Python, PHP, and Tcl, it integrates directly into your application (it is not a standalone database server). It provides XQuery access into a database of document containers. XML documents are stored and indexed in their native format using Berkeley DB as the transactional database engine.
The chemical-mime-data package is a collection of data files to add support for various chemical MIME types on Linux/Unix desktops, such as KDE and GNOME. Chemical MIME types were proposed in 1995, though it seems they have never been registered with IANA. But they are widely used, and the project's aim is to support these important but unofficial MIME types. Initial data was taken from "The Chemical MIME Home Page" of Henry Rzepa.
The NCBI C++ Toolkit provides portable libraries and applications for assisting genetic science. These include libraries for networking, SQL and BerkeleyDB access, CGI and HTML handling, ASN.1 and XML handling, sequence alignment engines, sequence retrieval engines, BLAST database engines, FLTK and OpenGL graphics toolkits, and basic system utilities.