remove-duplicate-rows adaptor removes duplicate rows from a datatable.
If a duplicate is found, those duplicated rows are placed in a separate datatable.
data
Type: datatable
Required: Yes
The datatable containing duplicate rows.
column names
Type: list
Required: No
A list of columns to compare for duplicate values. If unspecified, entire rows will be compared.
case sensitive
Type: boolean
Required: No
When set to True, lowercase and uppercase letters are treated as different. When set to False, lowercase and uppercase letters are treated as equivalent. If unspecified, defaults to False
data
Type: datatable
A datatable containing only unique rows, including first instance of duplicate rows.
duplicates
Type: datatable
A datatable containing duplicate rows.
Example 1: Default behaviour.
data:
column name: null (empty)
case sensitive: null (empty)
data:
duplicates: Empty Table
-> By default all the rows are compared, since all the values in the id column are unique no duplicate rows were removed.
Example 2: Specify column names to compare.
data:
column name:
case sensitive: null (empty)
data:
duplicates:
-> Removed the duplicate rows 5 and 6.
Example 3: Specify case sensitive column names to compare.
data:
column name:
case sensitive: True
data:
duplicates:
-> Removed only the duplicate row 5 as row 6 code column is in lowercase and does not match row 3 .
Removing duplicated rows after using append-datatables adaptor on two partially-overlapping datasets.
Determining which rows are duplicated, and how many times they are duplicated.