Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev #13

Merged
merged 5 commits into from
Jul 17, 2024
Merged

Dev #13

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/Documenter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- uses: julia-actions/julia-buildpkg@latest
- uses: julia-actions/julia-docdeploy@latest
env:
Expand Down
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "MetidaStats"
uuid = "75cdad26-409a-4e43-8ad7-d54b4fa665a0"
authors = ["PharmCat <[email protected]>"]
version = "0.2.1"
version = "0.2.2"

[deps]

Expand Down
66 changes: 66 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,69 @@ Metida descriptive statistics.
```
import Pkg; Pkg.add(url = "https://github.com/PharmCat/MetidaStats.jl.git")
```

## Import DataFrame

```
data = CSV.File("somedata.csv") |> DataFrame

# variables to analyze
vars = [:Cmax, :AUClast]

# sorting variables
sort = [:form, :period]

ds = dataimport(data; vars = vars, sort = sort)
```

## Get descriptive statistics

```
descriptives(ds, stats = [:n, :mean, :var])
```

## Or without dataimport step

```
descriptives(data; vars = vars, sort = sort, stats = [:n, :mean, :var])
```

Keywords:

- `skipmissing` - drop NaN and Missing values, default = true;
- `skipnonpositive` - drop non-positive values (and NaN, Missing) for "log-statistics" - :geom, :geomean, :logmean, :logvar, :geocv;
- `stats` - default set `stats = [:n, :mean, :sd, :se, :median, :min, :max]`;
- `corrected` - use corrected var (true);
- `level` - level for confidence intervals (0.95);

Possible values for `stats` is:

* :n - number of observbations;
* :posn - positive (non-negative) number of observations;
* :mean - arithmetic mean;
* :var - variance;
* :bvar - variance with no correction;
* :geom - geometric mean;
* :logmean - arithmetic mean for log-transformed data;
* :logvar - variance for log-transformed data;
* :sd - standard deviation (or σ);
* :se - standard error;
* :cv - coefficient of variation;
* :geocv - coefficient of variation for log-transformed data;
* :lci - lower confidence interval;
* :uci - upper confidence interval;
* :lmeanci - lower confidence interval for mean;
* :umeanci - lower confidence interval for mean;
* :median - median;
* :min - minimum;
* :max - maximum;
* :range - range;
* :q1 - lower quartile;
* :q3 - upper quartile;
* :iqr - inter quartile range;
* :kurt - kurtosis;
* :skew - skewness;
* :harmmean - harmonic mean;
* :ses standard error of skewness;
* :sek - standard error of kurtosis;
* :sum - sum.
8 changes: 4 additions & 4 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
CurrentModule = MetidaStats
```

Metida descriptive statistics.
Metida descriptive statistics - provide tables with categirized descriptive statistics from tabular data.

*This program comes with absolutely no warranty. No liability is accepted for any loss and risk to public health resulting from use of this software.

Expand Down Expand Up @@ -37,19 +37,19 @@ ds[1:5, :]

### Import:

```
```@example dsexample
di = MetidaStats.dataimport(ds, vars = [:var1, :var2], sort = [:col, :row])
```

### Statistics:

```
```@example dsexample
des = MetidaStats.descriptives(di; skipmissing = true, skipnonpositive = true, stats = MetidaStats.STATLIST)
```

### Make DataFrame

```
```@example dsexample
df = DataFrame(des)
```

Expand Down
78 changes: 41 additions & 37 deletions src/descriptive.jl
Original file line number Diff line number Diff line change
Expand Up @@ -82,38 +82,41 @@ end
* kwargs:
- `skipmissing` - drop NaN and Missing values, default = true;
- `skipnonpositive` - drop non-positive values (and NaN, Missing) for "log-statistics" - :geom, :geomean, :logmean, :logvar, :geocv;
- `stats` - default set `stats = [:n, :mean, :sd, :se, :median, :min, :max]`
- `stats` - default set `stats = [:n, :mean, :sd, :se, :median, :min, :max]`;
- `corrected` - use corrected var (true);
- `level` - level for confidence intervals (0.95);

Possible values for `stats` is:

* :n - number of observbations;
:posn - positive (non-negative) number of observations;
:mean - arithmetic mean;
:var - variance;
:bvar - variance with no correction;
:geom - geometric mean;
:logmean - arithmetic mean for log-transformed data;
:logvar - variance for log-transformed data ``σ^2_{log}``;
:sd - standard deviation (or σ);
:se - standard error;
:cv - coefficient of variation;
:geocv - coefficient of variation for log-transformed data (``CV = sqrt{exp(σ^2_{log})-1}``);
:lci - lower confidence interval;
:uci - upper confidence interval;
:lmeanci - lower confidence interval for mean;
:umeanci - lower confidence interval for mean;
:median - median,;
:min - minimum;
:max - maximum;
:range - range;
:q1 - lower quartile;
:q3,
:iqr,
:kurt,
:skew,
:harmmean,
:ses,
:sek,
:sum
* :posn - positive (non-negative) number of observations;
* :mean - arithmetic mean;
* :var - variance;
* :bvar - variance with no correction;
* :geom - geometric mean;
* :logmean - arithmetic mean for log-transformed data;
* :logvar - variance for log-transformed data ``σ^2_{log}``;
* :sd - standard deviation (or σ);
* :se - standard error;
* :cv - coefficient of variation;
* :geocv - coefficient of variation for log-transformed data (``CV = sqrt{exp(σ^2_{log})-1}``);
* :lci - lower confidence interval;
* :uci - upper confidence interval;
* :lmeanci - lower confidence interval for mean;
* :umeanci - lower confidence interval for mean;
* :median - median,;
* :min - minimum;
* :max - maximum;
* :range - range;
* :q1 - lower quartile;
* :q3 - upper quartile;
* :iqr - inter quartile range;
* :kurt - kurtosis;
* :skew - skewness;
* :harmmean - harmonic mean;
* :ses standard error of skewness;
* :sek - standard error of kurtosis;
* :sum - sum.

"""
function descriptives(data, vars, sort = nothing; kwargs...)
Expand All @@ -124,6 +127,7 @@ function descriptives(data, vars, sort = nothing; kwargs...)
if eltype(vars) <: Integer vars = Tables.columnnames(data)[vars] end
if !isnothing(sort)
vars = setdiff(vars, sort)
if length(sort) == 0 sort = nothing end
end
descriptives(dataimport_(data, vars, sort); kwargs...)
end
Expand Down Expand Up @@ -211,10 +215,10 @@ function descriptives_(obsvec, kwargs, logstats, cicalk)
end
n_ = length(vec)
if cicalk
if n_ > 1 q = quantile(TDist(n_ - 1), 1 - (1-kwargs[:level])/2) end
if n_ > 1 q = quantile(TDist(n_ - 1), 1 - (1 - kwargs[:level]) / 2) end # add tdist / normal option # add multiple CI ?
end
# skipnonpositive
#logstats = makelogvec #calk logstats
# logstats = makelogvec #calk logstats
if logstats
if kwargs[:skipnonpositive]
logvec = log.(skipnonpositive(obsvec))
Expand Down Expand Up @@ -272,21 +276,21 @@ function descriptives_(obsvec, kwargs, logstats, cicalk)
elseif s == :uci
haskey(result, :mean) || begin result[:mean] = sum(vec) / n_ end
haskey(result, :sd) || begin result[:sd] = std(vec; corrected = kwargs[:corrected], mean = result[:mean]) end
result[s] = result[:mean] + q*result[:sd]
result[s] = result[:mean] + q * result[:sd]
elseif s == :lci
haskey(result, :mean) || begin result[:mean] = sum(vec) / n_ end
haskey(result, :sd) || begin result[:sd] = std(vec; corrected = kwargs[:corrected], mean = result[:mean]) end
result[s] = result[:mean] - q*result[:sd]
result[s] = result[:mean] - q * result[:sd]
elseif s == :umeanci
haskey(result, :mean) || begin result[:mean] = sum(vec) / n_ end
haskey(result, :sd) || begin result[:sd] = std(vec; corrected = kwargs[:corrected], mean = result[:mean]) end
haskey(result, :se) || begin result[:se] = result[:sd] / sqrt(n_) end
result[s] = result[:mean] + q*result[:se]
result[s] = result[:mean] + q * result[:se]
elseif s == :lmeanci
haskey(result, :mean) || begin result[:mean] = sum(vec) / n_ end
haskey(result, :sd) || begin result[:sd] = std(vec; corrected = kwargs[:corrected], mean = result[:mean]) end
haskey(result, :se) || begin result[:se] = result[:sd] / sqrt(n_) end
result[s] = result[:mean] - q*result[:se]
result[s] = result[:mean] - q * result[:se]
elseif s == :median
result[s] = median(vec)
elseif s == :min
Expand Down Expand Up @@ -403,13 +407,13 @@ function MetidaBase.metida_table_(obj::DataSet{DS}; sort = nothing, stats = noth
stats ⊆ STATLIST || error("Some statistics not known!")
if isa(stats, Symbol) stats = [stats] end
if isnothing(sort)
ressetl = collect(intersect(resset, stats))
ressetl = sortbyvec!(collect(intersect(resset, stats)), collect(keys(first(obj).result)))
else
ressetl = sortbyvec!(collect(intersect(resset, stats)), sort)
end
else
if isnothing(sort)
ressetl = collect(resset)
ressetl = sortbyvec!(collect(resset), collect(keys(first(obj).result)))
else
ressetl = sortbyvec!(collect(resset), sort)
end
Expand Down
Loading