remove-duplicate-rows

Description

remove-duplicate-rows adaptor removes duplicate rows from a datatable.

If a duplicate is found, those duplicated rows are placed in a separate datatable.

Inputs

data Type: datatable Required: Yes The datatable containing duplicate rows.

column names Type: list Required: No A list of columns to compare for duplicate values. If unspecified, entire rows will be compared.

case sensitive Type: boolean Required: No When set to True, lowercase and uppercase letters are treated as different. When set to False, lowercase and uppercase letters are treated as equivalent. If unspecified, defaults to False

Outputs

data Type: datatable A datatable containing only unique rows, including first instance of duplicate rows.

duplicates Type: datatable A datatable containing duplicate rows.

Examples

Example 1: Default behaviour.

Inputs:

data:

code

name

United Kingdom

Turkey

United States

IND

India

IND

India

United States

column name: null (empty)

case sensitive: null (empty)

Outputs:

data:

code

name

United Kingdom

Turkey

United States

IND

India

IND

India

United States

duplicates: Empty Table

-> By default all the rows are compared, since all the values in the id column are unique no duplicate rows were removed.

Example 2: Specify column names to compare.

Inputs:

data:

code

name

United Kingdom

Turkey

United States

IND

India

IND

India

United States

column name:

code
name

case sensitive: null (empty)

Outputs:

data:

code

name

United Kingdom

Turkey

United States

IND

India

duplicates:

code

name

IND

India

United States

-> Removed the duplicate rows 5 and 6.

Example 3: Specify case sensitive column names to compare.

Inputs:

data:

code

name

United Kingdom

Turkey

United States

IND

India

IND

India

United States

column name:

code
name

case sensitive: True

Outputs:

data:

code

name

United Kingdom

Turkey

United States

IND

India

United States

duplicates:

code

name

IND

India

-> Removed only the duplicate row 5 as row 6 code column is in lowercase and does not match row 3 .

Use case

Removing duplicated rows after using append-datatables adaptor on two partially-overlapping datasets.
Determining which rows are duplicated, and how many times they are duplicated.

Previousremove-duplicate-list-values Nextrename-columns

Last updated 1 year ago