remove-duplicate-rows
Description
remove-duplicate-rows
adaptor removes duplicate rows from a datatable.
If a duplicate is found, those duplicated rows are placed in a separate datatable.
Inputs
data
Type: datatable
Required: Yes
The datatable containing duplicate rows.
column names
Type: list
Required: No
A list of columns to compare for duplicate values. If unspecified, entire rows will be compared.
case sensitive
Type: boolean
Required: No
When set to True
, lowercase and uppercase letters are treated as different. When set to False
, lowercase and uppercase letters are treated as equivalent. If unspecified, defaults to False
Outputs
data
Type: datatable
A datatable containing only unique rows, including first instance of duplicate rows.
duplicates
Type: datatable
A datatable containing duplicate rows.
Examples
Example 1: Default behaviour.
Inputs:
data:
1
GB
United Kingdom
2
TR
Turkey
3
US
United States
4
IND
India
5
IND
India
6
us
United States
column name:
null (empty)
case sensitive:
null (empty)
Outputs:
data
:
1
GB
United Kingdom
2
TR
Turkey
3
US
United States
4
IND
India
5
IND
India
6
us
United States
duplicates:
Empty Table
-> By default all the rows are compared, since all the values in the id
column are unique no duplicate rows were removed.
Example 2: Specify column names to compare.
Inputs:
data:
1
GB
United Kingdom
2
TR
Turkey
3
US
United States
4
IND
India
5
IND
India
6
us
United States
column name:
code
name
case sensitive:
null (empty)
Outputs:
data
:
1
GB
United Kingdom
2
TR
Turkey
3
US
United States
4
IND
India
duplicates:
1
IND
India
2
us
United States
-> Removed the duplicate rows 5 and 6.
Example 3: Specify case sensitive column names to compare.
Inputs:
data:
1
GB
United Kingdom
2
TR
Turkey
3
US
United States
4
IND
India
5
IND
India
6
us
United States
column name:
code
name
case sensitive:
True
Outputs:
data
:
1
GB
United Kingdom
2
TR
Turkey
3
US
United States
4
IND
India
5
us
United States
duplicates:
1
IND
India
-> Removed only the duplicate row 5 as row 6 code
column is in lowercase and does not match row 3 .
Use case
Removing duplicated rows after using
append-datatables
adaptor on two partially-overlapping datasets.Determining which rows are duplicated, and how many times they are duplicated.
Last updated