All releases of DataCleaner


Release Notes: This release adds minor bugfixes, performance improvements, and a few new features. Among the important ones are greatly-improved batch loading performance, a convenient "write data" menu in the main window, double-click renaming of job components, syntax coloring in the Javascript transformer and filter, and fixes for a potential deadlock when starting the application.


Release Notes: Support for MongoDB databases, both for read and write operations. Integration with EasyDQ.com, which provides Customer DQ functions in the cloud. Duplicate detection (aka. Deduplication / Fuzzy matching) analyzers. A "Table lookup" component for doing lookups of multiple values from a table. An "Insert into table" component for inserting records into any kind of table (e.g. database tables, CSV files, Excel sheets, or MongoDB collections). Job-level variables which allow for parameterizable jobs that can be instrumented from the command line.


Release Notes: International data support: Transliterate transformer and Character set distribution. Pattern Finder and Value Distribution can now perform analysis based on group-by columns. Chart coloring and layout has been improved. Excel spreadsheet writing has been added to output options. Documentation improvements and command line interface support.


Release Notes: A new extension architecture allows third party extensions to register in a central marketplace and easily install onto DataCleaner. Support was added for analyzing SAS data sets and fixed width value data sets. Support was added for Japanese characters. Integrity checks were added for failing when CSV file record formats are inconsistent. The documentation was completely rewritten.


Release Notes: Quick filtering of datastores was added. Reference data for countries is now provided. Minor UI improvements were made. Support was added for adding extension packages. A command line interface for executing jobs was added. Number formatting options were added in the "Convert to Number" transformer.


Release Notes: Window management was simplified by making most operations available through the single job builder window. Jobs are now stoppable before they have finished. Bar and line charts have been added to a lot of analyzer results. Preview data now contains paging controls to browse further into the data. Most common database drivers are included by default. Various minor improvements and bugfixes were made.


Release Notes: First-time ease of use was improved by disabling all buttons before source data is selected. When possible in a job, filters now have the ability to optimize the query of a job. This was implemented for the "Max rows", "Equals", and "Not null" filters. The visualization of execution flow now allows removing column items and filter outcome items, making the graph more comprehensible, especially for very large jobs. A bug was fixed when passing null values to the the email standardizer. "Mixed" tokens are properly presented in the the Pattern finder.


Release Notes: Minor bugfixes and improvements were made. Filter outcomes were added to the flow visualization. A bug was fixed in the widget for selecting the tokenizer's tokens. The "Equals" filter can now have multiple values with which to compare. Some minor cosmetic improvements were made.


Release Notes: Data transformations can be used to preprocess, extract, refine, combine, and calculate data items in jobs. Filtering, sampling, and subflow management allow you to define criteria that exclude and include particular items of data. Reporting was enriched with charts, graphs, navigation trees, etc. New DQ functions were added for date gap analysis, phonetic similarity finding, synonym lookups, etc. More options and DQ measures were added for existing data quality functions like the pattern finder, string analyzer, and more. Profiling jobs can be reused, so you can define your processing flow once and run it on any data. Support for MS Excel 2007+ spreadsheets was added.


Release Notes: The MetaModel version was updated to 1.2, which adds support for two new datastores: dBase databases (.dbf files) and MS Access databases (.mdb files). A bug pertaining to text-file dictionary "file not found" errors was fixed. A lot of the other underlying libraries have been updated, providing improvements to performance and stability.