Sally is a tool for mapping a set of strings to a set of vectors. This mapping is referred to as embedding and allows techniques of machine learning and data mining to be applied for the analysis of string data. It can be used with data such as text documents, DNA sequences, or log files. The vector space model or bag-of-words model is used. Strings are characterized by a set of features, where each feature is associated with one dimension of the vector space. Occurrences of the features in each string are counted. Alternatively, binary or TF-IDF values can be computed. Vectors can be output in plain text, LibSVM, or Matlab format.
wgms3d is a full-vectorial electromagnetic waveguide mode solver. It computes the modes of dielectric waveguides at a specified wavelength using a second-order finite-difference method. The waveguide cross section may consist of several adjacent regions of constant refractive index (i.e., step-index profiles). Dielectric interfaces do not have to be aligned with the discretization grid; they may be arbitrarily slanted or curved. The entire waveguide may be curved along the propagation direction. Leakage and curvature losses can be computed using Perfectly Matched Layers as absorbing boundaries.
openEMS is an electromagnetic field solver using the FDTD method. It employs a fully 3D Cartesian and cylindrical coordinate graded mesh. Matlab (or Octave) is used as an easy and flexible scripting interface. Advanced Features include: multi-threading, SIMD (SSE), and MPI support for high speed FDTD.
The underling library provides simple, scalable means to manipulate MPI-parallel, three dimensional pencil decompositions using FFTW. Pencil decompositions are a natural way to distribute O(n^3) data across O(n^2) processors and are well-suited for memory-intensive, structured spectral turbulence simulations and postprocessing codes. It may be useful in other domains as well. The library is written in C99 and may be used by C89 or C++ applications.
Salad (short for Letter Salad) is an efficient and flexible implementation of the well-known anomaly detection method Anagram by Wang et al. (RAID 2006). Salad is based on n-gram models, that is, data is represented as all of its substrings of length n. During training these n-grams are stored in a Bloom filter. This enables the detector to represent a large number of n-grams in little memory and still being able to efficiently access the data. Salad extends Anagram by allowing various n-gram types, a 2-class version of the detector for classification, and various model analysis modes.
KaHIP - Karlsruhe High Quality Partitioning - is a family of graph partitioning programs that tackle the balanced graph partitioning problem. It focuses on solution quality and implements flow-based methods, more-localized local searches, and several parallel and sequential meta-heuristics.
Harry is a small tool for comparing strings and measuring their similarity. It implements several common distance and kernel functions for strings, as well as some exotic similarity measures. For example, Harry supports the Levenshtein (edit) distance, the Jaro-Winkler distance, and the compression distance. Harry is implemented using OpenMP, so its runtime scales linearly with the number of available CPU cores. Efficient implementations and effective caching speed comparison of strings.