DataCleaner is a data quality analysis tool that allows you to perform data profiling, validating, and minor ETL-like tasks. These activities help you administer and monitor your data quality in order to ensure that your data is useful and applicable to your business situation. It can be used for master data management (MDM) methodologies, data warehousing projects, statistical research, preparation for extract-transform-load activities, and more.
Last announcement
Today it was announced that Human Inference, the European data quality authority has finished their acquisition of the eobjects.org site, to active...
Recent releases


Release Notes: Support for Apache CouchDB was added (both read and write). A new writer for UPDATE TABLE operations, in addition to the existing INSERT INTO TABLE component. Drill-to-detail information is saved in result files. Improved error handling when connecting to EasyDQ Web services. Manual configuration of the table model when working with NoSQL datastores.


Release Notes: A bug has been fixed in the Table lookup transformation which caused it to be unable to have multiple output columns. CSV file escape characters have been made configurable. A minor bug pertaining to empty strings in the Concatenator has been fixed. Support for the Cubrid database has been added. Converter transformations have been adapted to be able to work on multiple fields, not just single fields.


Release Notes: This release adds saving, archiving, and sharing of data profiling results, automatic merging of duplicates (golden record creation), checking of contacts in sanction lists (due diligence checks), transformers for NoSQL data structures, specification of datastore connection properties on the commandline, drilling to details in value distribution, more user-friendly database connection configuration, and execution and scheduling of jobs via Pentaho Data Integration/Kettle.


Release Notes: This release adds minor bugfixes, performance improvements, and a few new features. Among the important ones are greatly-improved batch loading performance, a convenient "write data" menu in the main window, double-click renaming of job components, syntax coloring in the Javascript transformer and filter, and fixes for a potential deadlock when starting the application.


Release Notes: Support for MongoDB databases, both for read and write operations. Integration with EasyDQ.com, which provides Customer DQ functions in the cloud. Duplicate detection (aka. Deduplication / Fuzzy matching) analyzers. A "Table lookup" component for doing lookups of multiple values from a table. An "Insert into table" component for inserting records into any kind of table (e.g. database tables, CSV files, Excel sheets, or MongoDB collections). Job-level variables which allow for parameterizable jobs that can be instrumented from the command line.