-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.qmd
196 lines (130 loc) · 5.07 KB
/
README.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
---
title: "Nd"
output: gfm
bibliography: "man/references/bibliography.bib"
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r Nd-setup}
#| include: false
library(Nd)
knitr::opts_chunk$set(
echo = TRUE, collapse = TRUE, comment = "#>",
out.width = "100%", fig.path = "man/figures/README-", fig.width = 12
)
library(ggplot2)
ggplot2::theme_set(ggplot2::theme_minimal(base_size = 18))
```
<!-- badges: start -->
<!-- badges: end -->
The purpose of the `Nd` R package is to present an extension to the S3 class
[`survival::Surv`](https://rdrr.io/cran/survival/man/Surv.html) with a
refocus on left-censored data, referred to as *non-detect* data, that is
commonly found in environmental statistics.
## Installation
This package is currently **not** on CRAN and can be installed from
[GitHub](https://github.com/) with:
```{.r}
# install.packages("devtools")
devtools::install_github("nclJoshCowley/Nd")
```
## Description
We can construct a `Nd` object using the supplied generic.
```{r Nd-x-numeric}
x <- Nd::Nd(c(1, 3, 7), is_nd = c(TRUE, FALSE, FALSE))
print(x)
```
```{r Nd-x-character}
x <- Nd::Nd(c("<1", "<1", "<1", "3.2", "7.5", "9.4", "<10", "<10", "12.1"))
print(x)
```
Printing the example shows several observations with conditional formatting
for left-censored data.
The underlying structure of this object is made up of a numerical component,
`value` and a logical component `is_nd`, both accessible via the `$`
operator.
```{r Nd-x-parts}
x$value
x$is_nd
```
Note the meaning of the numerical component is dependent on the logical part:
- if `is_nd` is `TRUE`, the value represents the detection limit
and the observation is said to be left-censored.
- otherwise, the value represents the measured concentration of the uncensored
observation.
## Artificial Censor
An alternate method of creating left-censored data is to artificially censor
existing data.
For example, we can censor the existing dataset at some chosen quantile and
see how the Nd objects appear in data frames.
```{r Nd-artificial-quantile}
data("mtcars", package = "datasets")
mtcars$mpg <- Nd::artificial_censor(mtcars$mpg, quantiles = 0.5)
utils::head(mtcars)
```
Alternatively, we could have censored at some known detection limit.
```{r Nd-artificial-detlim}
data("mtcars", package = "datasets")
mtcars$mpg <- Nd::artificial_censor(mtcars$mpg, detlims = 21)
utils::head(mtcars)
```
Multiple quantiles or detection limits are supported where values are censored
at the minimum detection limit they fall below.
```{r Nd-artificial-detlims}
data("mtcars", package = "datasets")
mtcars$mpg <- Nd::artificial_censor(mtcars$mpg, detlims = c(20, 21, 22))
utils::head(mtcars)
```
There exists a helper function `cens_ratio`, that will allow us to determine
the ratio of observations that are left-censored (at any detection limit).
```{r}
Nd::cens_ratio(mtcars$mpg)
```
## Imputation
Often, we want to use a method that only accepts numerical dependent variables
such as a linear model within `stats::lm`.
Such approaches have drawn criticism within the statistical community due to
ad-hoc nature of this method and its bias
[@george2021censoring; @helsel2011statistics; @singh2002robust]
<!-- (Singh & Nocerino, 2002; Helsel, 2011; George et al., 2021). -->
We can impute (replace `Nd` with `numeric`) using the `impute` generic that
accepts several imputation functions:
1. `"DL2"`: replace `ND<1.0` with half the detection limit, `0.5`;
1. `"ROS"`: regression on order statistics, see `?NADA::ros`;
1. user supplied function of the form `f(value, is_nd, ...)` that returns
a numeric of equal length.
Using our example,
```{r Nd-impute}
impute(x, "DL2")
impute(x, "ROS")
impute(x, function(value, is_nd) ifelse(is_nd, -1, value))
```
## Visualisation
Visualising a censored data is non-trivial since censored observations
represent a range of possible values instead of a single value.
More practical advice is given in @usgs2020.
We supply the following methods for visualisation of data but these are by no
means complete and pull requests are welcome.
### Quantile Plot
Since it is possible to order censored data, we can plot the rank of each
observation against the value (concentration or detection limit).
Such a plot highlights the extreme points of the data and the detection limit
structure.
```{r Nd-autoplot}
ggplot2::autoplot(Nd_example, type = "quantile") +
ggplot2::labs("Quantile Plot")
```
### Time-Series Plot
Ideally this would be a `ggproto` object but this is in development.
Instead, the following object will add multiple layers comprised of
`ggplot2::geom_point` and `ggplot2::geom_line` which is most useful
when the `x` variable is time.
Note, using the detection limits as plotting points may be misleading
[@usgs2020].
```{r Nd-plot-layer-line}
plot_data <- tibble::enframe(rev(Nd_example), name = "xval", value = "yval")
ggplot2::ggplot(plot_data, ggplot2::aes(x = .data$xval, y = .data$yval)) +
layer_Nd_line(.data$yval)
```
## References
::: {#refs}
:::