Talend Open Studio for Data Quality helps you to profile your data. The ergonomic interface allows you to define metrics (indicators) and collect statistics on your data in a few clicks. It comes with a set of regular expressions that helps you to identify bad data. You can create your own regular expressions and use them in data profiling analyses. A lot of options exist for each indicator, which change the behavior of the indicator so that it gives you more pertinent information. Data quality options on indicators alert you when your data quality is not what you expected.
With MetaModel, you use a type-safe SQL-like API for querying any datastore. It is a data access framework providing a common interface for exploration and querying of different types of datastores. It isn't a data mapping framework. Instead, it emphasizes abstraction of metadata and the ability to add data sources at runtime, making MetaModel great for generic data processing applications, but less so for applications modeled around a particular domain.
DataCleaner is a data quality analysis tool that allows you to perform data profiling, validating, and minor ETL-like tasks. These activities help you administer and monitor your data quality in order to ensure that your data is useful and applicable to your business situation. It can be used for master data management (MDM) methodologies, data warehousing projects, statistical research, preparation for extract-transform-load activities, and more.
Evergreen is an integrated library system originally developed by the Georgia PINES consortium for use as their automation system, and now includes contributions from around the world. It was designed from scratch for large-scale deployment in very large public library and state-wide consortium environments with tens of millions of records and hundreds of libraries, but can also scale down to the smallest of single-branch libraries.
Wandora is a general purpose data extraction, management, and publishing application based on Topic Maps and Java. Wandora has a graphical user interface, layered presentation of knowledge, several data storage options, rich data extraction, import and export capabilities, and an embedded HTTP server that enables dynamic publication of Topic Maps. Wandora is well suited for rapid ontology construction and knowledge mashups.
dbacl is a digramic Bayesian text classifier. Given some text, it calculates the posterior probabilities that the input resembles one of any number of previously learned document collections. It can be used to sort incoming email into arbitrary categories such as spam, work, and play, or simply to distinguish an English text from a French text. It fully supports international character sets, and uses sophisticated statistical models based on the Maximum Entropy Principle.
Haystack is a powerful tool designed to enable each and every individual manage all of her information in the way that makes the most sense. By removing the arbitrary barriers created by applications that only handle certain information "types", and recording only a fixed set of relationships defined by the developer, users can define whichever arrangements of, connections between, and views of information they find most effective. Such personalization of information management will dramatically improve your ability to find what you need when you need it.
Doodle is a desktop search engine for Linux. It searches your hard drive for files using pattern matching on meta-data. It extracts file-format specific meta-data using libextractor and builds a suffix tree to index the files. The index can then be searched rapidly. It is similar to locate, but can take advantage of information such as ID3 tags. It is possible to do full-text indexing using the appropriate libextractor plugins. It also supports using FAM to keep the database up-to-date.
The Gaudí Database Visual Editor is a Java application that allows you to visually design the tables of a database using a JDBC 2.0 (or higher) driver. It saves generated diagrams in XML format. It also generates Java code that binds an object to a table from a database and XML code for generating GUIs.