The Full-text Index Data structure library, libfid for short, is a portable software library for accessing indexed data through a simple C interface. It implements, among others, functions for reading indexed data from files, and for performing common operations such as fast string matching. Easy alphabet handling for mapping between printable and binary alphabets is integrated from the ground up. Currently, the enhanced suffix array is the only full-text index data structure supported. A very simplistic program for constructing enhanced suffix arrays is included.
ParseGenBank is a minimalistic, incremental, event-driven GenBank Flat File Format parser. It aims to efficiently produce events for keywords, features, qualifiers, and coding data. It does not attempt to parse the contents of feature or qualifier data, but provides a framework on which such a system can be built.
The NCBI C++ Toolkit provides portable libraries and applications for assisting genetic science. These include libraries for networking, SQL and BerkeleyDB access, CGI and HTML handling, ASN.1 and XML handling, sequence alignment engines, sequence retrieval engines, BLAST database engines, FLTK and OpenGL graphics toolkits, and basic system utilities.
The Java Machine Learning Library is a set of reference implementations of machine learning algorithms. These algorithms are well documented, both in the source code as on the documentation site. Along with real machine learning algorithms, many supporting algorithms are provided: distance measures, evaluation criteria, data sets for validation purposes, and some sample code.
CGAL, the Computational Geometry Algorithms Library, is a large C++ library of geometric data structures and algorithms such as Delaunay triangulations, mesh generation, Boolean operations on polygons, and various geometry processing algorithms. CGAL is used in various areas: computer graphics, scientific visualization, computer aided design and modeling, geographic information systems, molecular biology, medical imaging, robotics and motion planning, and numerical methods.
PROMPT is a system for retrieval, analysis, mapping and comparison of protein sets. It allows easy mapping of different types of sequence identifiers, automatic data retrieval and integration, many analysis and comparison algorithms, and a full-featured GUI application. Exhaustive statistical tests are conducted automatically in appropriate cases, but can be performed manually. All analysis results can be viewed or visualized and exported in various formats. All methods can be used in your own Java code or with beanshell scripting in your own scripts, a pipeline, or grid systems.
MyCGR implements the research of the thesis of Peggy Cénac regarding the use of CGR (Chaos Game Representation) to build a new family of tests for the structure of sequences. It can empirically check the level and power of the tests and apply them on DNA sequences. It can generalize the dinucleotide abundance profile to a CGR-based relative abundance profile and use this profile on DNA sequences to build taxonomy trees and to define CGR-trees.