add-jittering

Description

add-jittering adaptor can be used to add random noise to numeric columns in a datatable, which can be helpful when visualising geographical coordinates on a map.

Jittering introduces variability to the data points' positions, spreading them out slightly and providing a clearer representation of the density and distribution of points. Further, jittering adds stochasticity to specified numeric columns in a datatable.

Inputs

data Type: datatable Required: Yes The datatable containing the columns to be jittered.

columns Type: list Required: Yes The names of the columns to which jittering will be added.

range Type: number Required: No The maximum distance from the original value. If unspecified, defaults to 1.

unit Type: text Required: No The unit of the range value. For jittering geographical coordinates use either kilometers, meters, miles, or none. If unspecified, defaults to none.

digits Type: number Required: No The number of digits to appear after the decimal point; should be a value between 0 and 100, inclusive. If this argument is omitted, it is treated as 6.

Outputs

data Type: datatable A datatable with jittered values in the specified columns.

Examples

Example 1: Default behaviour.

Inputs:

data:

id
latitude
longitude

Sample A

4.0

8.0

Sample B

5.0

9.0

Sample C

5.0

9.0

Sample D

4.0

8.0

columns:

  1. latitude

  2. longitude

unit: kilometers

Outputs:

data:

id
latitude
longitude

Sample A

3.997208

8.000714

Sample B

5.002011

9.001153

Sample C

4.996007

9.00033

Sample D

4.001309

7.994139

-> In this example, we jitter geographical coordinates, by moving them up to one kilometer, so they do not stack on top of each other when plotting them on a map.

Example 2: Jitter patient ages.

Inputs:

data:

Name
Age

Patient A

20

Patient B

60

Patient C

32

columns:

  1. Age

range: 10

digits: 0

Outputs:

data:

Name
Age

Patient A

28

Patient B

55

Patient C

33

-> In this example, we jitter patient ages randomly to help anonymise information, by adding or subtracting random values between 0 and 10 (the value specified in range), we set digits to 0 to return whole numbers (integers).

Use Cases

  • Scatter plots: Applying jittering to the data points, makes it easier to see the density and distribution of points in areas with high overlap.

  • Categorical data: Jittering adds variability to the position of data points, making it easier to distinguish categories with high densities of data. Without jittering, dense clusters of points might create the impression of a single point or a smaller number of points.

  • Data with small sample size: In cases where the sample size is small and data points are concentrated around a few values, jittering can be useful to prevent points from directly overlapping and to give a better sense of the underlying distribution.

Note: It's important to note that jittering introduces a degree of artificiality to the visualisation and should be noted for users interpreting these visualisations.

Last updated