Harry is a small tool for comparing strings and measuring their similarity. It implements several common distance and kernel functions for strings, as well as some exotic similarity measures. For example, Harry supports the Levenshtein (edit) distance, the Jaro-Winkler distance, and the compression distance. Harry is implemented using OpenMP, so its runtime scales linearly with the number of available CPU cores. Efficient implementations and effective caching speed comparison of strings.
Salad (short for Letter Salad) is an efficient and flexible implementation of the well-known anomaly detection method Anagram by Wang et al. (RAID 2006). Salad is based on n-gram models, that is, data is represented as all of its substrings of length n. During training these n-grams are stored in a Bloom filter. This enables the detector to represent a large number of n-grams in little memory and still being able to efficiently access the data. Salad extends Anagram by allowing various n-gram types, a 2-class version of the detector for classification, and various model analysis modes.
FEniCS is a collection of free software for automated, efficient solution of differential equations. It has an extensive list of features, including automated solution of variational problems, automated error control and adaptivity, a comprehensive library of finite elements, high performance linear algebra, and many more. It is organized as a collection of interoperable components, including the problem-solving environment DOLFIN, the form compiler FFC, the finite element tabulator FIAT, the just-in-time compiler Instant, the code generation interface UFC, the form language UFL, and a range of additional components.
Gerbil consists of an interactive visualization tool targeted at multispectral and hyperspectral image data, and a toolbox of common algorithms, e.g. for segmentation. Multispectral imaging has been gaining popularity and has been gradually applied to many fields besides remote sensing. However, due to the high dimensionality of the data, both human observers and computers have difficulty interpreting this wealth of information. Gerbil facilitates the visualization of the relationship between spectral and topological information in a novel fashion. It puts emphasis on the spectral gradient, which is shown to provide enhanced information for many reflectance analysis tasks. It also includes a rich toolbox for evaluation of image segmentation and other algorithms in the multispectral domain. The parallel coordinates visualization technique is combined with hashing for a highly interactive visual connection between spectral distribution, spectral gradient, and topology.
Sally is a tool for mapping a set of strings to a set of vectors. This mapping is referred to as embedding and allows techniques of machine learning and data mining to be applied for the analysis of string data. It can be used with data such as text documents, DNA sequences, or log files. The vector space model or bag-of-words model is used. Strings are characterized by a set of features, where each feature is associated with one dimension of the vector space. Occurrences of the features in each string are counted. Alternatively, binary or TF-IDF values can be computed. Vectors can be output in plain text, LibSVM, or Matlab format.
Malheur is a tool for the automatic analysis of malware behavior (program behavior recorded from malicious software in a sandbox environment). It is designed to support the regular analysis of malicious software and the development of detection and defense measures. It allows for identifying novel classes of malware with similar behavior and assigning unknown malware to discovered classes. It can be applied to recorded program behavior of various formats as long as monitored events are separated by delimiter symbols, e.g. as in reports generated by the popular malware sandboxes CWSandbox, Anubis, Norman Sandbox, and Joebox.
Bayon is a simple and fast data clustering tool for large-scale data sets. If you want to survey large-scale data, bayon is useful to partition the data into some groups and understand it. Bayon supports two hard-clustering methods, repeated bisection clustering, and K-means clustering. In the outputs of these methods, each input document is assigned to only one cluster. But you can get similar clusters for each input document like soft-clustering method by using some options.