# remove-duplicate-rows

## Description

<mark style="color:purple;">`remove-duplicate-rows`</mark> *adaptor removes duplicate rows from a datatable*.

If a duplicate is found, those duplicated rows are placed in a separate datatable.

## Inputs

**`data`**\
Type: `datatable`\
Required: Yes\
The datatable containing duplicate rows.

**`column names`**\
Type: `list`\
Required: No\
A list of columns to compare for duplicate values. If unspecified, entire rows will be compared.

**`case sensitive`**\
Type: `boolean`\
Required: No\
When set to `True`, lowercase and uppercase letters are treated as different. When set to `False`, lowercase and uppercase letters are treated as equivalent. If unspecified, defaults to `False`

## Outputs

**`data`**\
Type: `datatable`\
A datatable containing only unique rows, including first instance of duplicate rows.

**`duplicates`**\
Type: `datatable`\
A datatable containing duplicate rows.

## Examples

### Example 1: Default behaviour.

#### Inputs:

`data:`

| id | code | name           |
| -- | ---- | -------------- |
| 1  | GB   | United Kingdom |
| 2  | TR   | Turkey         |
| 3  | US   | United States  |
| 4  | IND  | India          |
| 5  | IND  | India          |
| 6  | us   | United States  |

`column name:` *null (empty)*

`case sensitive:` *null (empty)*

#### Outputs:

`data`:

| id | code | name           |
| -- | ---- | -------------- |
| 1  | GB   | United Kingdom |
| 2  | TR   | Turkey         |
| 3  | US   | United States  |
| 4  | IND  | India          |
| 5  | IND  | India          |
| 6  | us   | United States  |

`duplicates:` Empty Table

-> By default all the rows are compared, since all the values in the `id` column are unique no duplicate rows were removed.

### Example 2: Specify column names to compare.

#### Inputs:

`data:`

| id | code | name           |
| -- | ---- | -------------- |
| 1  | GB   | United Kingdom |
| 2  | TR   | Turkey         |
| 3  | US   | United States  |
| 4  | IND  | India          |
| 5  | IND  | India          |
| 6  | us   | United States  |

`column name:`

1. code
2. name

`case sensitive:` *null (empty)*

#### Outputs:

`data`:

| id | code | name           |
| -- | ---- | -------------- |
| 1  | GB   | United Kingdom |
| 2  | TR   | Turkey         |
| 3  | US   | United States  |
| 4  | IND  | India          |

`duplicates:`

| id | code | name          |
| -- | ---- | ------------- |
| 1  | IND  | India         |
| 2  | us   | United States |

-> Removed the duplicate rows *5* and *6*.

### Example 3: Specify case sensitive column names to compare.

#### Inputs:

`data:`

| id | code | name           |
| -- | ---- | -------------- |
| 1  | GB   | United Kingdom |
| 2  | TR   | Turkey         |
| 3  | US   | United States  |
| 4  | IND  | India          |
| 5  | IND  | India          |
| 6  | us   | United States  |

`column name:`

1. code
2. name

`case sensitive:` True

#### Outputs:

`data`:

| id | code | name           |
| -- | ---- | -------------- |
| 1  | GB   | United Kingdom |
| 2  | TR   | Turkey         |
| 3  | US   | United States  |
| 4  | IND  | India          |
| 5  | us   | United States  |

`duplicates:`

| id | code | name  |
| -- | ---- | ----- |
| 1  | IND  | India |

-> Removed only the duplicate row *5* as row *6* `code` column is in lowercase and does not match row *3* .

## Use case

* Removing duplicated rows after using [<mark style="color:purple;">`append-datatables`</mark>](https://cgps.gitbook.io/data-flo/reference-guide/append-datatables) adaptor on two partially-overlapping datasets.
* Determining which rows are duplicated, and how many times they are duplicated.
