All releases of DataCleaner


Release Notes: Data transformations can be used to preprocess, extract, refine, combine, and calculate data items in jobs. Filtering, sampling, and subflow management allow you to define criteria that exclude and include particular items of data. Reporting was enriched with charts, graphs, navigation trees, etc. New DQ functions were added for date gap analysis, phonetic similarity finding, synonym lookups, etc. More options and DQ measures were added for existing data quality functions like the pattern finder, string analyzer, and more. Profiling jobs can be reused, so you can define your processing flow once and run it on any data. Support for MS Excel 2007+ spreadsheets was added.


Release Notes: The MetaModel version was updated to 1.2, which adds support for two new datastores: dBase databases (.dbf files) and MS Access databases (.mdb files). A bug pertaining to text-file dictionary "file not found" errors was fixed. A lot of the other underlying libraries have been updated, providing improvements to performance and stability.


Release Notes: Improved Excel spreadsheet support. Improved SQL Server support. Improved performance for CSV files. A fix for a bug that caused certain database connection errors to be ignored in terms of user feedback. A fix for a bug that caused re-opening of database dictionaries to throw an NPE. A fix for a bug related to dictionary lookups of null values. Support for Teradata databases. Connection templates for SQL Server connections. Selection of file encoding when reading CSV files. A fix for a minor bug relating to reading files on the classpath when running in Java WebStart mode.


Release Notes: Memory use of the Value Distribution Profile was improved. It now does on-disk caching with the Berkeley db when necessary. The app is now a single JAR file that can be served through Java WebStart. The app automatically downloads regexes from the RegexSwap. A bug in matching number columns in dictionaries was fixed. A bug with invalid characters in XML formats was fixed. Suffix case is now ignored so that both .CSV and .csv files can be opened. The number of columns shown in the preview window is automatically restricted if there are too many to show on the screen.


Release Notes: An additional HTML export format has been added to the built-in export formats (usable when exporting Profiler results in the desktop app and when executing the runjob command-line tool). The export format can be chosen directly from the desktop app. Four new measures were added to the String Analysis profile: average characters and maximum/minimum/average white spaces.


Release Notes: The license was changed to LGPL. The profiler and validator can be executed using multiple threads. DataCleaner tasks can be executed from the command line for batch operation. More elaborate status information is given during profiler and validator execution. Date mask matcher and regex matcher profiles were added. A regex is loaded from the online RegexSwap repository. Popular database drivers are automatically downloaded and installed. More file types are supported, such as .dat and .txt. XML file support was improved. Memory improvements were made in the Time analysis profile. Logging when running profiling and validation was improved. An information schema is provided for file-based datastores. Columns in the datastore-tree are lazy-loaded.


Release Notes: This release adds multi-threaded execution, a commandline interface (runjob.sh/runjob.cmd), some UI updates, and a few bugfixes.


Release Notes: The new online RegexSwap system has been integrated to support browsing and downloading of regexes. Automatic download and installation of popular database drivers. Templates for JDBC connection strings. Profiling and validation results now include detail execution status and monitoring capabilities. Better database and XML file compatibility due to updated MetaModel libraries.


Release Notes: A major update was made to functionality, with lots of new features that were built upon the stabilization release of 1.4. The license was changed to LGPL. New profiles were added for a date mask matcher and a regex matcher. More file types are supported (.dat and .txt). XML file support was improved.


Release Notes: The "Repeated values" profile was replaced with the better and more advanced "Value distribution" profile. Drill-to-details options were added for Dictionary Matcher profile. A new application logo was made. Lots of small bugfixes and UI beautifications were done. Lots of sample dictionaries and regexes were added.