Fqutils provides a basic set of bioinformatics command line tools for working with sequence data in FASTQ format. It complements Greg Hannon's fine Fastx Toolkit suite. One characteristic of Fqutils is that it correctly handles the full FASTQ format as described by the published standard, which specifically allows multi-line sequence and quality score information per record. Fqutils is intended to be useful as part of the early portions of post-sequencing pipelines and quality assessment processes.
ScalaBLAST is a high-performance multiprocessor implementation of the NCBI BLAST library. It supports all 5 primary program types (blastn, blastp, tblastn, tblastx, and blastx) and several output formats (pairwise, tabular, and XML). It will run on most multiprocessor systems which have MPI installed, and can run over a wide variety of interconnects, including infiniband, quadrics, and ethernet. It is designed to run a large number of queries against either large or small databases. It parallelizes the BLAST calculations by dynamically scheduling them across processors using a fault-resilient scheme.
The Scalable Assembler at Notre Dame (SAND) replaces the early stages of the Celera Assembler with scalable versions that can run on collections of commodity computers. By harnessing clusters, clouds, grids, or just random machines in your office, many bioinformatics tasks can be reduced from weeks or months down to minutes or hours.
mkESA is a program for constructing enhanced suffix arrays (ESAs) from biological sequence data. The program is based on an implementation of Manzini's lightweight Deep-Shallow algorithm, which can also utilize multiple CPUs/cores for extra performance. The generated output is compatible with the output of mkvtree from the Vmatch package.
The Full-text Index Data structure library, libfid for short, is a portable software library for accessing indexed data through a simple C interface. It implements, among others, functions for reading indexed data from files, and for performing common operations such as fast string matching. Easy alphabet handling for mapping between printable and binary alphabets is integrated from the ground up. Currently, the enhanced suffix array is the only full-text index data structure supported. A very simplistic program for constructing enhanced suffix arrays is included.
xpress-analyzer is a suite of programs for analyzing functional genomics data. Programs are included for analysis of variance, multiple test correction for significance tests, a non- linear least square fitting algorithm (four parameters), general linear models, Kolmogorov-Smirnov distance, nearest neighbor algorithm, principal component analysis, and Student's t-test. It is intended to be used as a faster alternative to R on large data sets.
PDB2PQR is a Python software package that automates many of the common tasks of preparing structures for continuum electrostatics calculations, providing a platform-independent utility for converting protein files in PDB format to PQR format. These tasks include adding a limited number of missing heavy atoms to biomolecular structures, determining side-chain pKas, placing missing hydrogens, optimizing the protein for favorable hydrogen bonding, assigning charge and radius parameters from a variety of force fields.