remove-duplicate-rows
Description
remove-duplicate-rows adaptor removes duplicate rows from a datatable.
If a duplicate is found, those duplicated rows are placed in a separate datatable.
Inputs
data
Type: datatable
Required: Yes
The datatable containing duplicate rows.
column names
Type: list
Required: No
A list of columns to compare for duplicate values. If unspecified, entire rows will be compared.
case sensitive
Type: boolean
Required: No
When set to True, lowercase and uppercase letters are treated as different. When set to False, lowercase and uppercase letters are treated as equivalent. If unspecified, defaults to False
Outputs
data
Type: datatable
A datatable containing only unique rows, including first instance of duplicate rows.
duplicates
Type: datatable
A datatable containing duplicate rows.
Examples
Example 1: Default behaviour.
Inputs:
data:
1
GB
United Kingdom
2
TR
Turkey
3
US
United States
4
IND
India
5
IND
India
6
us
United States
column name: null (empty)
case sensitive: null (empty)
Outputs:
data:
1
GB
United Kingdom
2
TR
Turkey
3
US
United States
4
IND
India
5
IND
India
6
us
United States
duplicates: Empty Table
-> By default all the rows are compared, since all the values in the id column are unique no duplicate rows were removed.
Example 2: Specify column names to compare.
Inputs:
data:
1
GB
United Kingdom
2
TR
Turkey
3
US
United States
4
IND
India
5
IND
India
6
us
United States
column name:
code
name
case sensitive: null (empty)
Outputs:
data:
1
GB
United Kingdom
2
TR
Turkey
3
US
United States
4
IND
India
duplicates:
1
IND
India
2
us
United States
-> Removed the duplicate rows 5 and 6.
Example 3: Specify case sensitive column names to compare.
Inputs:
data:
1
GB
United Kingdom
2
TR
Turkey
3
US
United States
4
IND
India
5
IND
India
6
us
United States
column name:
code
name
case sensitive: True
Outputs:
data:
1
GB
United Kingdom
2
TR
Turkey
3
US
United States
4
IND
India
5
us
United States
duplicates:
1
IND
India
-> Removed only the duplicate row 5 as row 6 code column is in lowercase and does not match row 3 .
Use case
Removing duplicated rows after using
append-datatablesadaptor on two partially-overlapping datasets.Determining which rows are duplicated, and how many times they are duplicated.
Last updated