add-jittering
Description
add-jittering
adaptor can be used to add random noise to numeric columns in a datatable, which can be helpful when visualising geographical coordinates on a map.
Jittering introduces variability to the data points' positions, spreading them out slightly and providing a clearer representation of the density and distribution of points. Further, jittering adds stochasticity to specified numeric columns in a datatable.
Inputs
data
Type: datatable
Required: Yes
The datatable containing the columns to be jittered.
columns
Type: list
Required: Yes
The names of the columns to which jittering will be added.
range
Type: number
Required: No
The maximum distance from the original value. If unspecified, defaults to 1
.
unit
Type: text
Required: No
The unit of the range value. For jittering geographical coordinates use either kilometers
, meters
, miles
, or none
. If unspecified, defaults to none
.
digits
Type: number
Required: No
The number of digits to appear after the decimal point; should be a value between 0 and 100, inclusive. If this argument is omitted, it is treated as 6.
Outputs
data
Type: datatable
A datatable with jittered values in the specified columns.
Examples
Example 1: Default behaviour.
Inputs:
data
:
Sample A
4.0
8.0
Sample B
5.0
9.0
Sample C
5.0
9.0
Sample D
4.0
8.0
columns
:
latitude
longitude
unit
: kilometers
Outputs:
data
:
Sample A
3.997208
8.000714
Sample B
5.002011
9.001153
Sample C
4.996007
9.00033
Sample D
4.001309
7.994139
-> In this example, we jitter geographical coordinates, by moving them up to one kilometer, so they do not stack on top of each other when plotting them on a map.
Example 2: Jitter patient ages.
Inputs:
data
:
Patient A
20
Patient B
60
Patient C
32
columns
:
Age
range
: 10
digits
: 0
Outputs:
data
:
Patient A
28
Patient B
55
Patient C
33
-> In this example, we jitter patient ages randomly to help anonymise information, by adding or subtracting random values between 0 and 10 (the value specified in range
), we set digits
to 0
to return whole numbers (integers).
Use Cases
Scatter plots: Applying jittering to the data points, makes it easier to see the density and distribution of points in areas with high overlap.
Categorical data: Jittering adds variability to the position of data points, making it easier to distinguish categories with high densities of data. Without jittering, dense clusters of points might create the impression of a single point or a smaller number of points.
Data with small sample size: In cases where the sample size is small and data points are concentrated around a few values, jittering can be useful to prevent points from directly overlapping and to give a better sense of the underlying distribution.
Note: It's important to note that jittering introduces a degree of artificiality to the visualisation and should be noted for users interpreting these visualisations.
Last updated