Intelligent systems would benefit from being able to localize and track sound sources in real life settings. Such a capability can help in localizing a person or an interesting event in the environment, and also provides enhanced processing for other capabilities such as speech recognition. To give this capability to a computing system, the challenge is not only to localize simultaneous sound sources, but to track them over time. The ManyEars project propose a robust sound source localization and tracking method using an array of eight microphones. The method is based on a frequency-domain implementation of a steered beamformer along with a particle filter-based tracking algorithm. Tests on a mobile robot show that the algorithm can localize and track in real-time multiple moving sources of different types over a range of 7 meters. These new capabilities allowed the robot to interact using more natural means with people in real life settings. The ManyEars project provides an easy to use C library for microphone array processing. This includes sound source localisation, tracking, and separation.A tuning Qt GUI is also available for fine tuning the parameters.
Baudline is a time-frequency browser designed for scientific visualization of the spectral domain. Signal analysis is performed by Fourier, correlation, and raster transforms that create colorful spectrograms with vibrant detail. Conduct test and measurement experiments with the built in function generator, or play back audio files with a multitude of effects and filters. The baudline signal analyzer combines fast digital signal processing, versatile high speed displays, and continuous capture tools for hunting down and studying elusive signal characteristics.
FMS is a tool to create all kinds of sounds from scratch. You can play any sound (sine, triangular, vowels, etc.) with any property settings (frequency, volume, balance, sweep, etc.) and modulations thereof. It also features tools to save sounds, play .MUS music, graphically display sounds, and make real noise.
eXtace is a visual sound display/analysis program. It requires Esound (esd) for its audio source. It includes various fast fourier transforms of the audio data in realtime. Its displays include a 3D wireframe flying landscape, a 3D textured flying landscape, a 16-256 channel graphic EQ, three types of scopes, a 3D "spike" flying landscape, and two forms of spectragrams. The 3D traces can be picked up, manipulated, and displayed at nearly any angle. eXtace also features a 3D direction control widget for controlling the angle and speed at which the trace runs away and a gradient/colormap editor for changing the colormap to suit your needs. No OpenGL is required.
MARF is a general cross-platform framework with a collection of algorithms for audio (voice, speech, and sound) and natural language text analysis and recognition, along with sample applications (identification, NLP, etc.) of its use. MARF can run distributed over the network (CORBA, Java RMI, and Java XML-RPC Web Services) and may act as a library in applications or be used as a source for learning and extension.
Pause determines the location of silences in an audio file for use in fragmentation of large recordings, studies of pause duration, and the like. It generates both a nicely formatted table intended to be read by people and a simple tab-delimited file that is easily parsed by software.
The Snack sound extension adds commands for sound play/record and sound visualization, e.g. waveforms and spectrograms. It supports in- memory sound objects, file based audio, streaming audio, WAV, AU, AIFF, and MP3 file formats, synchronous and asynchronous playback. The visualization canvas item types update in real-time and can output postscript. New commands and file formats can be added using the Snack C-API.