diff --git a/README.txt b/README.md
similarity index 77%
rename from README.txt
rename to README.md
index 0c515b4..d389977 100644
--- a/README.txt
+++ b/README.md
@@ -1,13 +1,16 @@
-= Statsample
+Statsample
+==========
-http://ruby-statsample.rubyforge.org/
+[http://ruby-statsample.rubyforge.org/](http://ruby-statsample.rubyforge.org/)
-== DESCRIPTION:
+DESCRIPTION:
+------------
A suite for basic and advanced statistics on Ruby. Tested on Ruby 1.8.7, 1.9.1, 1.9.2 (April, 2010), ruby-head(June, 2011) and JRuby 1.4 (Ruby 1.8.7 compatible).
Include:
+
* Descriptive statistics: frequencies, median, mean, standard error, skew, kurtosis (and many others).
* Imports and exports datasets from and to Excel, CSV and plain text files.
* Correlations: Pearson's r, Spearman's rank correlation (rho), point biserial, tau a, tau b and gamma. Tetrachoric and Polychoric correlation provides by +statsample-bivariate-extension+ gem.
@@ -24,9 +27,10 @@ Include:
* Creates reports on text, html and rtf, using ReportBuilder gem
* Graphics: Histogram, Boxplot and Scatterplot
-== PRINCIPLES
+PRINCIPLES
+----------
-* Software Design:
+* Software Design:
* One module/class for each type of analysis
* Options can be set as hash on initialize() or as setters methods
* Clean API for interactive sessions
@@ -36,13 +40,14 @@ Include:
* Statistical Design
* Results are tested against text results, SPSS and R outputs.
* Go beyond Null Hiphotesis Testing, using confidence intervals and effect sizes when possible
- * (When possible) All references for methods are documented, providing sensible information on documentation
+ * (When possible) All references for methods are documented, providing sensible information on documentation
-== FEATURES:
+FEATURES:
+--------
* Classes for manipulation and storage of data:
* Statsample::Vector: An extension of an array, with statistical methods like sum, mean and standard deviation
- * Statsample::Dataset: a group of Statsample::Vector, analog to a excel spreadsheet or a dataframe on R. The base of almost all operations on statsample.
+ * Statsample::Dataset: a group of Statsample::Vector, analog to a excel spreadsheet or a dataframe on R. The base of almost all operations on statsample.
* Statsample::Multiset: multiple datasets with same fields and type of vectors
* Anova module provides generic Statsample::Anova::OneWay and vector based Statsample::Anova::OneWayWithVectors. Also you can create contrast using Statsample::Anova::Contrast
* Module Statsample::Bivariate provides covariance and pearson, spearman, point biserial, tau a, tau b, gamma, tetrachoric (see Bivariate::Tetrachoric) and polychoric (see Bivariate::Polychoric) correlations. Include methods to create correlation and covariance matrices
@@ -52,10 +57,10 @@ Include:
* Logit Regression: Statsample::Regression::Binomial::Logit
* Probit Regression: Statsample::Regression::Binomial::Probit
* Factorial Analysis algorithms on Statsample::Factor module.
- * Classes for Extraction of factors:
+ * Classes for Extraction of factors:
* Statsample::Factor::PCA
* Statsample::Factor::PrincipalAxis
- * Classes for Rotation of factors:
+ * Classes for Rotation of factors:
* Statsample::Factor::Varimax
* Statsample::Factor::Equimax
* Statsample::Factor::Quartimax
@@ -64,7 +69,7 @@ Include:
* Statsample::Factor::MAP performs Velicer's Minimum Average Partial (MAP) test, which retain components as long as the variance in the correlation matrix represents systematic variance.
* Dominance Analysis. Based on Budescu and Azen papers, dominance analysis is a method to analyze the relative importance of one predictor relative to another on multiple regression
* Statsample::DominanceAnalysis class can report dominance analysis for a sample, using uni or multivariate dependent variables
- * Statsample::DominanceAnalysis::Bootstrap can execute bootstrap analysis to determine dominance stability, as recomended by Azen & Budescu (2003) link[http://psycnet.apa.org/journals/met/8/2/129/].
+ * Statsample::DominanceAnalysis::Bootstrap can execute bootstrap analysis to determine dominance stability, as recomended by Azen & Budescu (2003) link[http://psycnet.apa.org/journals/met/8/2/129/].
* Module Statsample::Codification, to help to codify open questions
* Converters to import and export data:
* Statsample::Database : Can create sql to create tables, read and insert data
@@ -73,7 +78,7 @@ Include:
* Statsample::Mx : Write Mx Files
* Statsample::GGobi : Write Ggobi files
* Module Statsample::Crosstab provides function to create crosstab for categorical data
-* Module Statsample::Reliability provides functions to analyze scales with psychometric methods.
+* Module Statsample::Reliability provides functions to analyze scales with psychometric methods.
* Class Statsample::Reliability::ScaleAnalysis provides statistics like mean, standard deviation for a scale, Cronbach's alpha and standarized Cronbach's alpha, and for each item: mean, correlation with total scale, mean if deleted, Cronbach's alpha is deleted.
* Class Statsample::Reliability::MultiScaleAnalysis provides a DSL to easily analyze reliability of multiple scales and retrieve correlation matrix and factor analysis of them.
* Class Statsample::Reliability::ICC provides intra-class correlation, using Shrout & Fleiss(1979) and McGraw & Wong (1996) formulations.
@@ -81,7 +86,7 @@ Include:
* Module Statsample::Test provides several methods and classes to perform inferencial statistics
* Statsample::Test::BartlettSphericity
* Statsample::Test::ChiSquare
- * Statsample::Test::F
+ * Statsample::Test::F
* Statsample::Test::KolmogorovSmirnov (only D value)
* Statsample::Test::Levene
* Statsample::Test::UMannWhitney
@@ -90,85 +95,91 @@ Include:
* Statsample::Graph::Boxplot
* Statsample::Graph::Histogram
* Statsample::Graph::Scatterplot
-* Module Statsample::TimeSeries provides basic support for time series.
+* Module Statsample::TimeSeries provides basic support for time series.
* Gem +statsample-sem+ provides a DSL to R libraries +sem+ and +OpenMx+
-* Close integration with gem reportbuilder, to easily create reports on text, html and rtf formats.
-
-== Examples of use:
+* Close integration with gem `reportbuilder`, to easily create reports on text, html and rtf formats.
-See multiples examples of use on [http://github.com/clbustos/statsample/tree/master/examples/]
+Examples of use:
+--------------
-=== Boxplot
+See multiples examples of use on [http://github.com/clbustos/statsample/tree/master/examples/](http://github.com/clbustos/statsample/tree/master/examples/)
+Boxplot
+-------
+```ruby
require 'statsample'
- ss_analysis(Statsample::Graph::Boxplot) do
+ ss_analysis(Statsample::Graph::Boxplot) do
n=30
a=rnorm(n-1,50,10)
b=rnorm(n, 30,5)
c=rnorm(n,5,1)
a.push(2)
boxplot(:vectors=>[a,b,c], :width=>300, :height=>300, :groups=>%w{first first second}, :minimum=>0)
- end
+ end
Statsample::Analysis.run # Open svg file on *nix application defined
-
-=== Correlation matrix
-
+```
+Correlation matrix
+------------------
+```ruby
require 'statsample'
# Note R like generation of random gaussian variable
# and correlation matrix
-
+
ss_analysis("Statsample::Bivariate.correlation_matrix") do
samples=1000
ds=data_frame(
- 'a'=>rnorm(samples),
+ 'a'=>rnorm(samples),
'b'=>rnorm(samples),
'c'=>rnorm(samples),
'd'=>rnorm(samples))
- cm=cor(ds)
+ cm=cor(ds)
summary(cm)
end
-
- Statsample::Analysis.run_batch # Echo output to console
+ Statsample::Analysis.run_batch # Echo output to console
+```
-== REQUIREMENTS:
+REQUIREMENTS:
+-------------
-Optional:
+Optional:
* Plotting: gnuplot and rbgnuplot, SVG::Graph
-* Factorial analysis and polychorical correlation(joint estimate and polychoric series): gsl library and rb-gsl (http://rb-gsl.rubyforge.org/). You should install it using gem install gsl.
-
-Note: Use gsl 1.12.109 or later.
+* Factorial analysis and polychorical correlation(joint estimate and polychoric series): gsl library and rb-gsl [http://rb-gsl.rubyforge.org/](http://rb-gsl.rubyforge.org/). You should install it using `gem install gsl`.
-== RESOURCES
+**Note**: Use gsl 1.12.109 or later.
-* Source code on github: http://github.com/clbustos/statsample
-* API: http://ruby-statsample.rubyforge.org/statsample/
-* Bug report and feature request: http://github.com/clbustos/statsample/issues
-* E-mailing list: http://groups.google.com/group/statsample
+RESOURCES:
+----------
-== INSTALL:
+* Source code on github: [http://github.com/clbustos/statsample](http://github.com/clbustos/statsample)
+* API: [http://ruby-statsample.rubyforge.org/statsample/](http://ruby-statsample.rubyforge.org/statsample/)
+* Bug report and feature request: [http://github.com/clbustos/statsample/issues](http://github.com/clbustos/statsample/issues)
+* E-mailing list: [http://groups.google.com/group/statsample](http://groups.google.com/group/statsample)
- $ sudo gem install statsample
+INSTALL:
+---------
+ `$ sudo gem install statsample`
-On *nix, you should install statsample-optimization to retrieve gems gsl, statistics2 and a C extension to speed some methods.
+On \*nix, you should install statsample-optimization to retrieve gems gsl, statistics2 and a C extension to speed some methods.
-There are available precompiled version for Ruby 1.9 on x86, x86_64 and mingw32 archs.
+There are available precompiled version for Ruby 1.9 on x86, x86\_64 and mingw32 archs.
- $ sudo gem install statsample-optimization
+ `$ sudo gem install statsample-optimization`
-If you use Ruby 1.8, you should compile statsample-optimization, usign parameter --platform ruby
+If you use Ruby 1.8, you should compile statsample-optimization, usign parameter `--platform ruby`
- $ sudo gem install statsample-optimization --platform ruby
+ `$ sudo gem install statsample-optimization --platform ruby`
-If you need to work on Structural Equation Modeling, you could see +statsample-sem+. You need R with +sem+ or +OpenMx+ [http://openmx.psyc.virginia.edu/] libraries installed
+If you need to work on Structural Equation Modeling, you could see _statsample-sem_. You need R with _sem_ or _OpenMx_ [http://openmx.psyc.virginia.edu/](http://openmx.psyc.virginia.edu/) libraries installed
- $ sudo gem install statsample-sem
+ `$ sudo gem install statsample-sem`
Available setup.rb file
- sudo gem ruby setup.rb
+ `sudo gem ruby setup.rb`
-== LICENSE:
+LICENSE:
+-------
GPL-2 (See LICENSE.txt)
diff --git a/features/acf.feature b/features/acf.feature
new file mode 100644
index 0000000..da31ed1
--- /dev/null
+++ b/features/acf.feature
@@ -0,0 +1,31 @@
+Feature: ACF
+
+ As a statistician
+ So that I can evaluate autocorrelation of a series
+ I want to evaluate acf
+
+Background: a timeseries
+
+ Given the following values in a timeseries:
+ | timeseries |
+ | 10 20 30 40 50 60 70 80 90 100 |
+ | 110 120 130 140 150 160 170 180 190 200 |
+
+Scenario: cross-check acf for 10 lags
+ When I provide 10 lags for acf
+ And I calculate acf
+ Then I should get 11 values in resultant acf
+ And I should see "1.0, 0.85, 0.7015037593984963, 0.556015037593985, 0.4150375939849624, 0.2800751879699248, 0.15263157894736842, 0.034210526315789476, -0.07368421052631578, -0.16954887218045114, -0.2518796992481203" as complete series
+
+Scenario: cross-check acf for 5 lags
+ When I provide 5 lags for acf
+ And I calculate acf
+ Then I should get 6 values in resultant acf
+ And I should see "1.0, 0.85, 0.7015037593984963, 0.556015037593985, 0.4150375939849624, 0.2800751879699248" as complete series
+
+Scenario: first value should be 1.0
+ When I provide 2 lags for acf
+ And I calculate acf
+ Then I should get 3 values in resultant acf
+ And I should see 1.0 as first value
+
diff --git a/features/pacf.feature b/features/pacf.feature
new file mode 100644
index 0000000..ff13afd
--- /dev/null
+++ b/features/pacf.feature
@@ -0,0 +1,42 @@
+Feature: PACF
+
+ As a statistician
+ So that I can quickly evaluate partial autocorrelation of a series
+ I want to evaluate pacf
+
+Background: a timeseries
+
+ Given the following values in a timeseries:
+ | timeseries |
+ | 10 20 30 40 50 60 70 80 90 100 |
+ | 110 120 130 140 150 160 170 180 190 200 |
+
+Scenario: check pacf for 10 lags with unbiased
+ When I provide 10 lags for pacf
+ When I provide yw yule walker as method
+ Then I should get Array as resultant output
+ Then I should get 11 values in resultant pacf
+
+Scenario: check pacf for 5 lags with mle
+ When I provide 5 lags for pacf
+ When I provide mle yule walker as method
+ Then I should get Array as resultant output
+ Then I should get 6 values in resultant pacf
+
+Scenario: check first value of pacf
+ When I provide 5 lags for pacf
+ When I provide yw yule walker as method
+ Then I should get Array as resultant output
+ And I should see 1.0 as first value
+
+Scenario: check all values in pacf for 5 lags with mle
+ When I provide 5 lags for pacf
+ When I provide mle yule walker as method
+ Then I should get Array as resultant output
+ And I should see "1.0, 0.85, -0.07566212829370711, -0.07635069706072706, -0.07698628638512295, -0.07747034005560738" as complete series
+
+Scenario: check all values in pacf for 5 lags with unbiased
+ When I provide 5 lags for pacf
+ When I provide yw yule walker as method
+ Then I should get Array as resultant output
+ And I should see "1.0, 0.8947368421052632, -0.10582010582010604, -0.11350188273265083, -0.12357534824820737, -0.13686534216335522" as complete series
diff --git a/features/step_definitions.rb b/features/step_definitions.rb
new file mode 100644
index 0000000..aefc354
--- /dev/null
+++ b/features/step_definitions.rb
@@ -0,0 +1,37 @@
+require 'statsample'
+include Statsample::TimeSeries
+
+Given /^the following values in a timeseries:$/ do |series|
+ arr = []
+ series.hashes.each do |sequence|
+ arr += sequence['timeseries'].split(' ').map(&:to_i).to_ts
+ end
+ @timeseries = arr.to_ts
+end
+
+When /^I provide (\d+) lags for p?acf$/ do |lags|
+ @lags = lags.to_i
+end
+
+When /^I provide (\w+) yule walker as method$/ do |method|
+ @method = method
+end
+
+Then /^I should get (\w+) as resultant output$/ do |klass|
+ @result = @timeseries.pacf(@lags, @method)
+ assert_equal @result.class.to_s, klass
+end
+
+Then /^I should get (\w+) values in resultant p?acf$/ do |values_count|
+ assert_equal @result.size, values_count.to_i
+end
+
+And /^I should see (\d+\.\d) as first value$/ do |first_value|
+ assert_equal @result.first, first_value.to_f
+end
+
+And /^I should see \"(.+)\" as complete series$/ do |series|
+ series = series.split(',').map(&:to_f)
+ assert_equal @result, series
+end
+
diff --git a/features/step_definitions_acf.rb b/features/step_definitions_acf.rb
new file mode 100644
index 0000000..ca5479c
--- /dev/null
+++ b/features/step_definitions_acf.rb
@@ -0,0 +1,9 @@
+require 'statsample'
+require 'debugger'
+include Statsample::TimeSeries
+
+#all instance variable and cucumber DSL s DRYed up in step_definitions.rb
+And /^I calculate acf$/ do
+ @result = @timeseries.acf(@lags)
+end
+
diff --git a/lib/statsample/analysis.rb b/lib/statsample/analysis.rb
index bf0e7b1..b0036a4 100644
--- a/lib/statsample/analysis.rb
+++ b/lib/statsample/analysis.rb
@@ -3,7 +3,7 @@
module Statsample
# DSL to create analysis without hazzle.
- # * Shortcuts methods to avoid use complete namescapes, many based on R
+ # * Shortcuts methods to avoid use complete namespaces, many based on R
# * Attach/detach vectors to workspace, like R
# == Example
# an1=Statsample::Analysis.store(:first) do
diff --git a/lib/statsample/arima.rb b/lib/statsample/arima.rb
new file mode 100644
index 0000000..70a5dc8
--- /dev/null
+++ b/lib/statsample/arima.rb
@@ -0,0 +1,28 @@
+require 'debugger'
+module Statsample
+ module ARIMA
+ class ARIMA < Statsample::TimeSeries
+
+ def arima(ds, p, i, q)
+ if q.zero?
+ self.ar(p)
+ elsif p.zero?
+ self.ma(p)
+ end
+ end
+
+ def ar(p)
+ #AutoRegressive part of model
+ #http://en.wikipedia.org/wiki/Autoregressive_model#Definition
+ #For finding parameters(to fit), we will use either Yule-walker
+ #or Burg's algorithm(more efficient)
+
+ degugger
+
+ end
+
+ def yule_walker()
+ end
+ end
+ end
+end
diff --git a/lib/statsample/multiset.rb b/lib/statsample/multiset.rb
index 88fe595..230fe3c 100644
--- a/lib/statsample/multiset.rb
+++ b/lib/statsample/multiset.rb
@@ -1,212 +1,225 @@
module Statsample
# Multiset joins multiple dataset with the same fields and vectors
- # but with different number of cases.
+ # but with different number of cases.
# This is the base class for stratified and cluster sampling estimation
class Multiset
# Name of fields
attr_reader :fields
# Array with Statsample::Dataset
attr_reader :datasets
+
# To create a multiset
# * Multiset.new(%w{f1 f2 f3}) # define only fields
def initialize(fields)
- @fields=fields
- @datasets={}
+ @fields = fields
+ @datasets = {}
end
- def self.new_empty_vectors(fields,ds_names)
- ms=Multiset.new(fields)
- ds_names.each{|d|
- ms.add_dataset(d,Dataset.new(fields))
- }
- ms
+
+ def self.new_empty_vectors(fields, ds_names)
+ ms = Multiset.new(fields)
+ ds_names.each do |d|
+ ms.add_dataset(d, Dataset.new(fields))
+ end
+ ms
end
+
# Generate a new dataset as a union of partial dataset
# If block given, this is applied to each dataset before union
def union(&block)
- union_field={}
- types={}
- names={}
- labels={}
- each do |k,ds|
+ union_field = {}
+ types = {}
+ names = {}
+ labels = {}
+ each do |k, ds|
if block
- ds=ds.dup
- yield k,ds
+ ds = ds.dup
+ yield k, ds
end
@fields.each do |f|
- union_field[f]||=Array.new
+ union_field[f] ||= Array.new
union_field[f].concat(ds[f].data)
- types[f]||=ds[f].type
- names[f]||=ds[f].name
- labels[f]||=ds[f].labels
+ types[f] ||= ds[f].type
+ names[f] ||= ds[f].name
+ labels[f] ||= ds[f].labels
end
end
-
+
@fields.each do |f|
- union_field[f]=union_field[f].to_vector(types[f])
- union_field[f].name=names[f]
- union_field[f].labels=labels[f]
+ union_field[f] = union_field[f].to_vector(types[f])
+ union_field[f].name = names[f]
+ union_field[f].labels = labels[f]
end
- ds_union=union_field.to_dataset
- ds_union.fields=@fields
+ ds_union = union_field.to_dataset
+ ds_union.fields = @fields
ds_union
end
+
def datasets_names
- @datasets.keys.sort
+ @datasets.keys.sort
end
+
def n_datasets
- @datasets.size
+ @datasets.size
end
- def add_dataset(key,ds)
- if(ds.fields!=@fields)
- raise ArgumentError, "Dataset(#{ds.fields.to_s})must have the same fields of the Multiset(#{@fields})"
+
+ def add_dataset(key, ds)
+ if(ds.fields != @fields)
+ raise ArgumentError, "Dataset(#{ds.fields.to_s})must have the same fields of the Multiset(#{@fields})"
else
- @datasets[key]=ds
+ @datasets[key]=ds
end
end
+
def sum_field(field)
- @datasets.inject(0) {|a,da|
- stratum_name=da[0]
- vector=da[1][field]
- val=yield stratum_name,vector
+ @datasets.inject(0) do |a,da|
+ stratum_name = da[0]
+ vector = da[1][field]
+ val = yield stratum_name,vector
a+val
- }
+ end
end
+
def collect_vector(field)
- @datasets.collect {|k,v|
+ @datasets.collect {|k, v|
yield k, v[field]
}
end
-
+
def each_vector(field)
- @datasets.each {|k,v|
+ @datasets.each {|k, v|
yield k, v[field]
}
end
+
def[](i)
@datasets[i]
end
+
def each(&block)
- @datasets.each {|k,ds|
- next if ds.cases==0
- block.call(k,ds)
- }
+ @datasets.each do |k, ds|
+ next if ds.cases == 0
+ block.call(k, ds)
+ end
end
end
+
class StratifiedSample
class << self
# mean for an array of vectors
def mean(*vectors)
- n_total=0
- means=vectors.inject(0){|a,v|
- n_total+=v.size
- a+v.sum
- }
- means.to_f/n_total
+ n_total = 0
+ means=vectors.inject(0) do |a,v|
+ n_total += v.size
+ a + v.sum
+ end
+ means.to_f / n_total
end
-
+
def standard_error_ksd_wr(es)
- n_total=0
- sum=es.inject(0){|a,h|
- n_total+=h['N']
- a+((h['N']**2 * h['s']**2) / h['n'].to_f)
- }
+ n_total = 0
+ sum=es.inject(0) do |a,h|
+ n_total += h['N']
+ a + ((h['N'] ** 2 * h['s'] ** 2) / h['n'].to_f)
+ end
(1.to_f / n_total)*Math::sqrt(sum)
end
-
-
+
+
def variance_ksd_wr(es)
standard_error_ksd_wr(es)**2
end
def calculate_n_total(es)
- es.inject(0) {|a,h| a+h['N'] }
+ es.inject(0) {|a, h| a+h['N'] }
end
# Source : Cochran (1972)
-
+
def variance_ksd_wor(es)
- n_total=calculate_n_total(es)
- es.inject(0){|a,h|
- val=((h['N'].to_f / n_total)**2) * (h['s']**2 / h['n'].to_f) * (1 - (h['n'].to_f / h['N']))
- a+val
- }
+ n_total = calculate_n_total(es)
+ es.inject(0) do |a, h|
+ val = ((h['N'].to_f / n_total)**2) * (h['s']**2 / h['n'].to_f) * (1 - (h['n'].to_f / h['N']))
+ a+val
+ end
end
+
def standard_error_ksd_wor(es)
Math::sqrt(variance_ksd_wor(es))
end
-
-
-
+
def variance_esd_wor(es)
- n_total=calculate_n_total(es)
- sum=es.inject(0){|a,h|
- val=h['N']*(h['N']-h['n'])*(h['s']**2 / h['n'].to_f)
- a+val
- }
- (1.0/(n_total**2))*sum
+ n_total = calculate_n_total(es)
+ sum=es.inject(0) do |a,h|
+ val = h['N'] * (h['N'] - h['n']) * (h['s']**2 / h['n'].to_f)
+ a + val
+ end
+ (1.0 / (n_total**2)) * sum
end
-
-
+
def standard_error_esd_wor(es)
Math::sqrt(variance_ksd_wor(es))
end
# Based on http://stattrek.com/Lesson6/STRAnalysis.aspx
def variance_esd_wr(es)
- n_total=calculate_n_total(es)
- sum=es.inject(0){|a,h|
- val= ((h['s']**2 * h['N']**2) / h['n'].to_f)
- a+val
- }
- (1.0/(n_total**2))*sum
+ n_total = calculate_n_total(es)
+ sum = es.inject(0) do |a,h|
+ val = ((h['s']**2 * h['N']**2) / h['n'].to_f)
+ a + val
+ end
+ (1.0 / (n_total**2)) * sum
end
+
def standard_error_esd_wr(es)
Math::sqrt(variance_esd_wr(es))
end
-
+
def proportion_variance_ksd_wor(es)
n_total=calculate_n_total(es)
- es.inject(0){|a,h|
- val= (((h['N'].to_f / n_total)**2 * h['p']*(1-h['p'])) / (h['n'])) * (1- (h['n'].to_f / h['N']))
- a+val
- }
+ es.inject(0) do |a,h|
+ val = (((h['N'].to_f / n_total)**2 * h['p']*(1-h['p'])) / (h['n'])) * (1- (h['n'].to_f / h['N']))
+ a + val
+ end
end
+
def proportion_sd_ksd_wor(es)
- Math::sqrt(proportion_variance_ksd_wor(es))
+ Math::sqrt(proportion_variance_ksd_wor(es))
end
-
-
+
def proportion_sd_ksd_wr(es)
- n_total=calculate_n_total(es)
- sum=es.inject(0){|a,h|
- val= (h['N']**2 * h['p']*(1-h['p'])) / h['n'].to_f
- a+val
- }
- Math::sqrt(sum) * (1.0/n_total)
+ n_total = calculate_n_total(es)
+ sum = es.inject(0) do |a,h|
+ val = (h['N']**2 * h['p'] * (1 - h['p'])) / h['n'].to_f
+ a + val
+ end
+ Math::sqrt(sum) * (1.0 / n_total)
end
+
def proportion_variance_ksd_wr(es)
- proportion_variance_ksd_wor(es)**2
+ proportion_variance_ksd_wor(es)**2
end
-
+
def proportion_variance_esd_wor(es)
n_total=n_total=calculate_n_total(es)
-
- sum=es.inject(0){|a,h|
- a=(h['N']**2 * (h['N']-h['n']) * h['p']*(1.0-h['p'])) / ((h['n']-1)*(h['N']-1))
- a+val
- }
+
+ sum = es.inject(0) do |a,h|
+ a = (h['N']**2 * (h['N'] - h['n']) * h['p'] * (1.0-h['p'])) / ((h['n'] - 1) * (h['N'] - 1))
+ a + val
+ end
Math::sqrt(sum) * (1.0/n_total**2)
end
+
def proportion_sd_esd_wor(es)
- Math::sqrt(proportion_variance_ksd_wor(es))
+ Math::sqrt(proportion_variance_ksd_wor(es))
end
end
-
- def initialize(ms,strata_sizes)
- raise TypeError,"ms should be a Multiset" unless ms.is_a? Statsample::Multiset
- @ms=ms
- raise ArgumentError,"You should put a strata size for each dataset" if strata_sizes.keys.sort!=ms.datasets_names
- @strata_sizes=strata_sizes
- @population_size=@strata_sizes.inject(0) {|a,x| a+x[1]}
- @strata_number=@ms.n_datasets
- @sample_size=@ms.datasets.inject(0) {|a,x| a+x[1].cases}
+ #class singleton methods end above.
+
+ def initialize(ms, strata_sizes)
+ raise TypeError, "ms should be a Multiset" unless ms.is_a? Statsample::Multiset
+ @ms = ms
+ raise ArgumentError, "You should put a strata size for each dataset" if strata_sizes.keys.sort!=ms.datasets_names
+ @strata_sizes = strata_sizes
+ @population_size = @strata_sizes.inject(0) { |a, x| a + x[1] }
+ @strata_number = @ms.n_datasets
+ @sample_size = @ms.datasets.inject(0) { |a,x| a + x[1].cases }
end
# Number of strata
def strata_number
@@ -226,14 +239,14 @@ def stratum_size(h)
@strata_sizes[h]
end
def vectors_by_field(field)
- @ms.datasets.collect{|k,ds|
+ @ms.datasets.collect{|k, ds|
ds[field]
}
end
# Population proportion based on strata
- def proportion(field, v=1)
- @ms.sum_field(field) {|s_name,vector|
- stratum_ponderation(s_name)*vector.proportion(v)
+ def proportion(field, v = 1)
+ @ms.sum_field(field) {|s_name, vector|
+ stratum_ponderation(s_name) * vector.proportion(v)
}
end
# Stratum ponderation.
@@ -242,69 +255,70 @@ def stratum_ponderation(h)
@strata_sizes[h].to_f / @population_size
end
alias_method :wh, :stratum_ponderation
-
+
# Population mean based on strata
def mean(field)
- @ms.sum_field(field) {|s_name,vector|
- stratum_ponderation(s_name)*vector.mean
+ @ms.sum_field(field) {|s_name, vector|
+ stratum_ponderation(s_name) * vector.mean
}
end
# Standard error with estimated population variance and without replacement.
# Source: Cochran (1972)
def standard_error_wor(field)
- es=@ms.collect_vector(field) {|s_n, vector|
- {'N'=>@strata_sizes[s_n],'n'=>vector.size, 's'=>vector.sds}
+ es = @ms.collect_vector(field) {|s_n, vector|
+ {'N' => @strata_sizes[s_n], 'n' => vector.size, 's' => vector.sds}
}
-
+
StratifiedSample.standard_error_esd_wor(es)
end
-
+
# Standard error with estimated population variance and without replacement.
# Source: http://stattrek.com/Lesson6/STRAnalysis.aspx
-
+
def standard_error_wor_2(field)
- sum=@ms.sum_field(field) {|s_name,vector|
- s_size=@strata_sizes[s_name]
- (s_size**2 * (1-(vector.size.to_f / s_size)) * vector.variance_sample / vector.size.to_f)
- }
- (1/@population_size.to_f)*Math::sqrt(sum)
+ sum = @ms.sum_field(field) do |s_name, vector|
+ s_size = @strata_sizes[s_name]
+ (s_size**2 * (1 - (vector.size.to_f / s_size)) * vector.variance_sample / vector.size.to_f)
+ end
+ (1 / @population_size.to_f) * Math::sqrt(sum)
end
-
+
def standard_error_wr(field)
- es=@ms.collect_vector(field) {|s_n, vector|
- {'N'=>@strata_sizes[s_n],'n'=>vector.size, 's'=>vector.sds}
+ es = @ms.collect_vector(field) {|s_n, vector|
+ {'N' => @strata_sizes[s_n], 'n' => vector.size, 's' => vector.sds}
}
-
+
StratifiedSample.standard_error_esd_wr(es)
end
- def proportion_sd_esd_wor(field,v=1)
- es=@ms.collect_vector(field) {|s_n, vector|
- {'N'=>@strata_sizes[s_n],'n'=>vector.size, 'p'=>vector.proportion(v)}
+
+ def proportion_sd_esd_wor(field, v = 1)
+ es = @ms.collect_vector(field) {|s_n, vector|
+ {'N' => @strata_sizes[s_n], 'n' => vector.size, 'p' => vector.proportion(v)}
}
-
+
StratifiedSample.proportion_sd_esd_wor(es)
end
-
- def proportion_standard_error(field,v=1)
- prop=proportion(field,v)
- sum=@ms.sum_field(field) {|s_name,vector|
- nh=vector.size
- s_size=@strata_sizes[s_name]
- (s_size**2 * (1-(nh/s_size)) * prop * (1-prop) / (nh -1 ))
+
+ def proportion_standard_error(field, v = 1)
+ prop=proportion(field, v)
+ sum = @ms.sum_field(field) {|s_name, vector|
+ nh = vector.size
+ s_size = @strata_sizes[s_name]
+ (s_size**2 * (1 - (nh / s_size)) * prop * (1 - prop) / (nh -1))
}
(1.quo(@population_size)) * Math::sqrt(sum)
end
# Cochran(1971), p. 150
- def variance_pst(field,v=1)
- sum=@ms.datasets.inject(0) {|a,da|
- stratum_name=da[0]
- ds=da[1]
- nh=ds.cases.to_f
- s_size=@strata_sizes[stratum_name]
- prop=ds[field].proportion(v)
- a + (((s_size**2 * (s_size-nh)) / (s_size-1))*(prop*(1-prop) / (nh-1)))
- }
- (1/@population_size.to_f ** 2)*sum
+ def variance_pst(field, v = 1)
+ sum=@ms.datasets.inject(0) do |a, da|
+ stratum_name = da[0]
+ ds = da[1]
+ nh = ds.cases.to_f
+ s_size = @strata_sizes[stratum_name]
+ prop = ds[field].proportion(v)
+ a + (((s_size**2 * (s_size - nh)) / (s_size - 1)) * (prop * (1 - prop) / (nh - 1)))
+ end
+ (1 / @population_size.to_f ** 2) * sum
end
end
end
diff --git a/lib/statsample/srs.rb b/lib/statsample/srs.rb
index 6dcc7d2..752cba5 100644
--- a/lib/statsample/srs.rb
+++ b/lib/statsample/srs.rb
@@ -61,7 +61,7 @@ def proportion_confidence_interval_z(p, n_sample, n_population, margin=0.95)
# Proportion confidence interval with x value
# Uses estimated proportion, sample without replacement
- def proportion_confidence_interval(p, sam,pop , x)
+ def proportion_confidence_interval(p, sam, pop, x)
#f=sam.quo(pop)
one_range=x * Math::sqrt((qf(sam, pop) * p * (1-p)).quo(sam-1)) + (1.quo(sam * 2.0))
[p-one_range, p+one_range]
@@ -112,7 +112,7 @@ def proportion_total_sd_ep_wor(prop, sam, pop)
########################
#
- # :SECTION: Mean stimation
+ # :SECTION: Mean estimation
#
########################
diff --git a/lib/statsample/timeseries.rb b/lib/statsample/timeseries.rb
index d9e7aad..3a236b3 100644
--- a/lib/statsample/timeseries.rb
+++ b/lib/statsample/timeseries.rb
@@ -1,9 +1,10 @@
+require 'statsample/timeseries/pacf'
module Statsample::TimeSeriesShorthands
# Creates a new Statsample::TimeSeries object
# Argument should be equal to TimeSeries.new
def to_time_series(*args)
- Statsample::TimeSeries::TimeSeries.new(self, :scale, *args)
- end
+ Statsample::TimeSeries::TimeSeries.new(self, :scale, *args)
+ end
alias :to_ts :to_time_series
end
@@ -17,6 +18,7 @@ module TimeSeries
# Collection of data indexed by time.
# The order goes from earliest to latest.
class TimeSeries < Statsample::Vector
+ include Statsample::TimeSeries::Pacf
# Calculates the autocorrelation coefficients of the series.
#
# The first element is always 1, since that is the correlation
@@ -29,10 +31,10 @@ class TimeSeries < Statsample::Vector
# ts.acf # => array with first 21 autocorrelations
# ts.acf 3 # => array with first 3 autocorrelations
#
- def acf maxlags = nil
- maxlags ||= (10 * Math.log10(size)).to_i
+ def acf max_lags = nil
+ max_lags ||= (10 * Math.log10(size)).to_i
- (0..maxlags).map do |i|
+ (0..max_lags).map do |i|
if i == 0
1.0
else
@@ -45,6 +47,16 @@ def acf maxlags = nil
end
end
+ def pacf(max_lags = nil, method = 'yw')
+ #parameters:
+ #max_lags => maximum number of lags for pcf
+ #method => for autocovariance in yule_walker:
+ #'yw' for 'yule-walker unbaised', 'mle' for biased maximum likelihood
+
+ max_lags ||= (10 * Math.log10(size)).to_i
+ Pacf::Pacf.pacf_yw(self, max_lags, method)
+ end
+
# Lags the series by k periods.
#
# The convention is to set the oldest observations (the first ones
@@ -86,7 +98,7 @@ def lag k = 1
# # => [0.69, 0.23, 0.44, 0.71, ...]
#
# ts.diff # => [nil, -0.46, 0.21, 0.27, ...]
- #
+ #
def diff
self - self.lag
end
diff --git a/lib/statsample/timeseries/pacf.rb b/lib/statsample/timeseries/pacf.rb
new file mode 100644
index 0000000..09836b9
--- /dev/null
+++ b/lib/statsample/timeseries/pacf.rb
@@ -0,0 +1,100 @@
+module Statsample
+ module TimeSeries
+ module Pacf
+ class Pacf
+
+ def self.pacf_yw(timeseries, max_lags, method = 'yw')
+ #partial autocorrelation by yule walker equations.
+ #Inspiration: StatsModels
+ pacf = [1.0]
+ (1..max_lags).map do |i|
+ pacf << yule_walker(timeseries, i, method)[-1]
+ end
+ pacf
+ end
+
+ def self.yule_walker(ts, k = 1, method='yw')
+ #From the series, estimates AR(p)(autoregressive) parameter
+ #using Yule-Waler equation. See -
+ #http://en.wikipedia.org/wiki/Autoregressive_moving_average_model
+
+ #parameters:
+ #ts = series
+ #k = order, default = 1
+ #method = can be 'yw' or 'mle'. If 'yw' then it is unbiased, denominator
+ #is (n - k)
+
+ #returns:
+ #rho => autoregressive coefficients
+ ts = ts - ts.mean
+ n = ts.size
+ if method.downcase.eql? 'yw'
+ #unbiased => denominator = (n - k)
+ denom =->(k) { n - k }
+ else
+ #mle
+ #denominator => (n)
+ denom =->(k) { n }
+ end
+ r = Array.new(k + 1) { 0.0 }
+ r[0] = ts.map { |x| x ** 2 }.inject(:+).to_f / denom.call(0).to_f
+
+ 1.upto(k) do |l|
+ r[l] = (ts[0...-l].zip(ts[l...ts.size])).map do |x|
+ x.inject(:*)
+ end.inject(:+).to_f / denom.call(l).to_f
+ end
+
+ r_R = toeplitz(r[0...-1])
+
+ mat = Matrix.columns(r_R).inverse()
+ solve_matrix(mat, r[1..r.size])
+ end
+
+ def self.toeplitz(arr)
+ #Generates Toeplitz matrix -
+ #http://en.wikipedia.org/wiki/Toeplitz_matrix
+ #Toeplitz matrix are equal when they are stored in row &
+ #column major
+ #=> arr = [0, 1, 2, 3]
+ #=> result:
+ #[[0, 1, 2, 3],
+ # [1, 0, 1, 2],
+ # [2, 1, 0, 1],
+ # [3, 2, 1, 0]]
+ eplitz_matrix = Array.new(arr.size) { Array.new(arr.size) }
+
+ 0.upto(arr.size - 1) do |i|
+ j = 0
+ index = i
+ while i >= 0 do
+ eplitz_matrix[index][j] = arr[i]
+ j += 1
+ i -= 1
+ end
+ i = index + 1; k = 1
+ while i < arr.size do
+ eplitz_matrix[index][j] = arr[k]
+ i += 1; j += 1; k += 1
+ end
+ end
+ eplitz_matrix
+ end
+
+ def self.solve_matrix(matrix, out_vector)
+ solution_vector = Array.new(out_vector.size, 0)
+ matrix = matrix.to_a
+ k = 0
+ matrix.each do |row|
+ row.each_with_index do |element, i|
+ solution_vector[k] += element * 1.0 * out_vector[i]
+ end
+ k += 1
+ end
+ solution_vector
+ end
+
+ end
+ end
+ end
+end
diff --git a/statsample.gemspec b/statsample.gemspec
new file mode 100644
index 0000000..34798d9
--- /dev/null
+++ b/statsample.gemspec
@@ -0,0 +1,30 @@
+Gem::Specification.new do |s|
+ s.name = 'statsample'
+ s.version = '1.1.0.2013'
+ s.date = '2013-04-28'
+ s.summary = 'StatSample - Ruby Statistical library'
+ s.description = "A suite for basic and advanced statistics on Ruby. Tested on Ruby 1.8.7 1.9.1, 1.9.2 and 1.9.3, JRuby 1.4(Ruby 1.8.7 compatible)."
+ s.authors = ["Claudio Bustos"]
+ s.email = ["clbustos@gmail.com"]
+ s.executables = `git ls-files -- bin/*`.split("\n").map { |f| File.basename(f) }
+ s.files = `git ls-files -- lib/*`.split("\n")
+ s.extra_rdoc_files += ['History.txt', 'Manifest.txt', 'README.md', 'references.txt']
+
+ s.homepage = "http://github.com/clbustos/statsample"
+ s.require_path = "lib"
+ s.rubyforge_project = "ruby-statsample"
+ s.test_files = `git ls-files -- test features`.split("\n")
+
+ DEPENDENCIES = [{:gem => "spreadsheet", :version => "~> 0.8.5"}, {:gem => "reportbuilder", :version => "~> 1.4.2"},
+ {:gem => "minimization", :version => "~> 0.2.1"}, {:gem => "fastercsv", :version => "~> 1.5.5"},
+ {:gem => "dirty-memoize", :version => "~> 0.0.4"}, {:gem => "extendmatrix", :version => "~> 0.3.1"},
+ {:gem => "statsample-bivariate-extension", :version => "~> 1.1.0"}, {:gem => "rserve-client", :version => "~> 0.3.0"},
+ {:gem => "rubyvis", :version => "~> 0.5.2"}, {:gem => "gettext", :version => "~> 2.3.9"},
+ {:gem => "mocha", :version => "~> 0.14.0"}, {:gem => "hoe-git", :version => "~> 1.5.0"},
+ {:gem => "minitest", :version => "~> 4.2"}, {:gem => "shoulda", :version => "~> 3.5.0"},
+ {:gem => "cucumber", :version => "~>1.3.3"}
+ ]
+ DEPENDENCIES.each do |dependency|
+ s.add_dependency(dependency[:gem], dependency[:version])
+ end
+end
diff --git a/test/helpers_tests.rb b/test/helpers_tests.rb
index eca4b75..8eba53b 100644
--- a/test/helpers_tests.rb
+++ b/test/helpers_tests.rb
@@ -25,10 +25,8 @@ def self.should_with_gsl(name,&block)
else
skip("Requires GSL")
end
-
end
end
-
end
end
diff --git a/test/test_pacf.rb b/test/test_pacf.rb
new file mode 100644
index 0000000..278214b
--- /dev/null
+++ b/test/test_pacf.rb
@@ -0,0 +1,52 @@
+require(File.expand_path(File.dirname(__FILE__)+'/helpers_tests.rb'))
+class StatsampleTimeSeriesPacfTestCase < MiniTest::Unit::TestCase
+ context(Statsample::TimeSeries) do
+ include Statsample::TimeSeries
+ setup do
+ @ts = (1..20).map { |x| x * 10 }.to_ts
+ #setting up a proc to get a closure for pacf calling with variable lags and methods
+ @pacf_proc =->(k, method) { @ts.pacf(k, method) }
+ end
+
+ should "return correct correct pacf size for lags = 5" do
+ assert_equal @pacf_proc.call(5, 'yw').size, 6
+ assert_equal @pacf_proc.call(5, 'mle').size, 6
+ #first element is 1.0
+ end
+
+ should "return correct correct pacf size for lags = 10" do
+ assert_equal @pacf_proc.call(10, 'yw').size, 11
+ assert_equal @pacf_proc.call(10, 'mle').size, 11
+ #first element is 1.0
+ end
+
+ should "have first element as 1.0" do
+ assert_equal @pacf_proc.call(10, 'yw')[0], 1.0
+ assert_equal @pacf_proc.call(10, 'mle')[0], 1.0
+ end
+
+ should "give correct pacf results for unbiased yule-walker" do
+ result_10 = [1.0, 0.8947368421052632, -0.10582010582010604, -0.11350188273265083, -0.12357534824820737, -0.13686534216335522, -0.15470588235294147, -0.17938011883732036, -0.2151192288178601, -0.2707082833133261, -0.3678160919540221]
+ result_5 = [1.0, 0.8947368421052632, -0.10582010582010604, -0.11350188273265083, -0.12357534824820737, -0.13686534216335522]
+ assert_equal @pacf_proc.call(10, 'yw'), result_10
+ assert_equal @pacf_proc.call(5, 'yw'), result_5
+
+ #Checking for lag = (1..10)
+ 1.upto(10) do |i|
+ assert_equal @pacf_proc.call(i, 'yw'), result_10[0..i]
+ end
+ end
+
+ should "give correct pacf results for mle yule-walker" do
+ result_10 = [1.0, 0.85, -0.07566212829370711, -0.07635069706072706, -0.07698628638512295, -0.07747034005560738, -0.0776780981161499, -0.07744984679625189, -0.0765803323191094, -0.07480650005932366, -0.07179435184923755]
+ result_5 = [1.0, 0.85, -0.07566212829370711, -0.07635069706072706, -0.07698628638512295, -0.07747034005560738]
+ assert_equal @pacf_proc.call(10, 'mle'), result_10
+ assert_equal @pacf_proc.call(5, 'mle'), result_5
+
+ #Checking for lag = (1..10)
+ 1.upto(10) do |i|
+ assert_equal @pacf_proc.call(i, 'mle'), result_10[0..i]
+ end
+ end
+ end
+end
diff --git a/test/test_regression2.rb b/test/test_regression2.rb
new file mode 100644
index 0000000..099c340
--- /dev/null
+++ b/test/test_regression2.rb
@@ -0,0 +1,235 @@
+require(File.expand_path(File.dirname(__FILE__)+'/helpers_tests.rb'))
+
+class StatsampleRegressionTestCase < MiniTest::Unit::TestCase
+ include Statsample::Shorthand
+ # context "Example with missing data" do
+
+ def setup
+ @x=[0.285714285714286, 0.114285714285714, 0.314285714285714, 0.2, 0.2, 0.228571428571429, 0.2, 0.4, 0.714285714285714, 0.285714285714286, 0.285714285714286, 0.228571428571429, 0.485714285714286, 0.457142857142857, 0.257142857142857, 0.228571428571429, 0.285714285714286, 0.285714285714286, 0.285714285714286, 0.142857142857143, 0.285714285714286, 0.514285714285714, 0.485714285714286, 0.228571428571429, 0.285714285714286, 0.342857142857143, 0.285714285714286, 0.0857142857142857].to_scale
+
+ @y=[nil, 0.233333333333333, nil, 0.266666666666667, 0.366666666666667, nil, 0.333333333333333, 0.3, 0.666666666666667, 0.0333333333333333, 0.333333333333333, nil, nil, 0.533333333333333, 0.433333333333333, 0.4, 0.4, 0.5, 0.4, 0.266666666666667, 0.166666666666667, 0.666666666666667, 0.433333333333333, 0.166666666666667, nil, 0.4, 0.366666666666667, nil].to_scale
+ @ds={'x'=>@x,'y'=>@y}.to_dataset
+ @lr=Statsample::Regression::Multiple::RubyEngine.new(@ds,'y')
+ end
+
+ def test_correct_values
+ assert_in_delta(0.455,@lr.r2,0.001)
+ assert_in_delta(0.427,@lr.r2_adjusted, 0.001)
+ assert_in_delta(0.1165,@lr.se_estimate,0.001)
+ assert_in_delta(15.925,@lr.f,0.0001)
+ assert_in_delta(0.675, @lr.standarized_coeffs['x'],0.001)
+ assert_in_delta(0.778, @lr.coeffs['x'],0.001, "coeff x")
+ assert_in_delta(0.132, @lr.constant,0.001,"constant")
+ assert_in_delta(0.195, @lr.coeffs_se['x'],0.001,"coeff x se")
+ assert_in_delta(0.064, @lr.constant_se,0.001,"constant se")
+ end
+ #end
+ def test_error
+ samples=100
+
+ a,b=rand,rand
+
+ x1=samples.times.map { rand}.to_scale
+ x2=samples.times.map {rand}.to_scale
+ x3=samples.times.map {|i| x1[i]*(1+a)+x2[i]*(1+b)}.to_scale
+ y=samples.times.map {|i| x1[i]+x2[i]+x3[i]+rand}.to_scale
+
+ ds={'x1'=>x1,'x2'=>x2,'x3'=>x3,'y'=>y}.to_dataset
+
+ assert_raise(Statsample::Regression::LinearDependency) {
+ Statsample::Regression::Multiple::RubyEngine.new(ds,'y')
+ }
+ end
+ def test_parameters
+ @x=[13,20,10,33,15].to_vector(:scale)
+ @y=[23,18,35,10,27].to_vector(:scale)
+ reg=Statsample::Regression::Simple.new_from_vectors(@x,@y)
+ _test_simple_regression(reg)
+ ds={'x'=>@x,'y'=>@y}.to_dataset
+ reg=Statsample::Regression::Simple.new_from_dataset(ds,'x','y')
+ _test_simple_regression(reg)
+ reg=Statsample::Regression.simple(@x,@y)
+ _test_simple_regression(reg)
+
+ end
+ def _test_simple_regression(reg)
+
+ assert_in_delta(40.009, reg.a,0.001)
+ assert_in_delta(-0.957, reg.b,0.001)
+ assert_in_delta(4.248, reg.standard_error,0.002)
+ assert(reg.summary)
+ end
+
+ def test_summaries
+ a=10.times.map{rand(100)}.to_scale
+ b=10.times.map{rand(100)}.to_scale
+ y=10.times.map{rand(100)}.to_scale
+ ds={'a'=>a,'b'=>b,'y'=>y}.to_dataset
+ lr=Statsample::Regression::Multiple::RubyEngine.new(ds,'y')
+ assert(lr.summary.size>0)
+ end
+ def test_multiple_dependent
+ complete=Matrix[
+ [1,0.53,0.62,0.19,-0.09,0.08,0.02,-0.12,0.08],
+ [0.53,1,0.61,0.23,0.1,0.18,0.02,-0.1,0.15],
+ [0.62,0.61,1,0.03,0.1,0.12,0.03,-0.06,0.12],
+ [0.19,0.23,0.03,1,-0.02,0.02,0,-0.02,-0.02],
+ [-0.09,0.1,0.1,-0.02,1,0.05,0.06,0.18,0.02],
+ [0.08,0.18,0.12,0.02,0.05,1,0.22,-0.07,0.36],
+ [0.02,0.02,0.03,0,0.06,0.22,1,-0.01,-0.05],
+ [-0.12,-0.1,-0.06,-0.02,0.18,-0.07,-0.01,1,-0.03],
+ [0.08,0.15,0.12,-0.02,0.02,0.36,-0.05,-0.03,1]]
+ complete.extend Statsample::CovariateMatrix
+ complete.fields=%w{adhd cd odd sex age monly mwork mage poverty}
+
+ lr=Statsample::Regression::Multiple::MultipleDependent.new(complete, %w{adhd cd odd})
+
+
+ assert_in_delta(0.197, lr.r2yx,0.001)
+ assert_in_delta(0.197, lr.r2yx_covariance,0.001)
+ assert_in_delta(0.07, lr.p2yx,0.001)
+
+ end
+
+ def test_multiple_regression_pairwise_2
+ @a=[1,3,2,4,3,5,4,6,5,7,3,nil,3,nil,3].to_vector(:scale)
+ @b=[3,3,4,4,5,5,6,6,4,4,2,2,nil,6,2].to_vector(:scale)
+ @c=[11,22,30,40,50,65,78,79,99,100,nil,3,7,nil,7].to_vector(:scale)
+ @y=[3,4,5,6,7,8,9,10,20,30,30,40,nil,50,nil].to_vector(:scale)
+ ds={'a'=>@a,'b'=>@b,'c'=>@c,'y'=>@y}.to_dataset
+ lr=Statsample::Regression::Multiple::RubyEngine.new(ds,'y')
+ assert_in_delta(2407.436,lr.sst,0.001)
+ assert_in_delta(0.752,lr.r,0.001, "pairwise r")
+ assert_in_delta(0.565,lr.r2,0.001)
+ assert_in_delta(1361.130,lr.ssr,0.001)
+ assert_in_delta(1046.306,lr.sse,0.001)
+ assert_in_delta(3.035,lr.f,0.001)
+ end
+
+
+ def test_multiple_regression_gsl
+ if Statsample.has_gsl?
+ @a=[1,3,2,4,3,5,4,6,5,7].to_vector(:scale)
+ @b=[3,3,4,4,5,5,6,6,4,4].to_vector(:scale)
+ @c=[11,22,30,40,50,65,78,79,99,100].to_vector(:scale)
+ @y=[3,4,5,6,7,8,9,10,20,30].to_vector(:scale)
+ ds={'a'=>@a,'b'=>@b,'c'=>@c,'y'=>@y}.to_dataset
+ lr=Statsample::Regression::Multiple::GslEngine.new(ds,'y')
+ assert(lr.summary.size>0)
+ model_test(lr,'gsl')
+ predicted=[1.7857, 6.0989, 3.2433, 7.2908, 4.9667, 10.3428, 8.8158, 10.4717, 23.6639, 25.3198]
+ c_predicted=lr.predicted
+ predicted.each_index{|i|
+ assert_in_delta(predicted[i],c_predicted[i],0.001)
+ }
+ residuals=[1.2142, -2.0989, 1.7566, -1.29085, 2.033, -2.3428, 0.18414, -0.47177, -3.66395, 4.6801]
+ c_residuals=lr.residuals
+ residuals.each_index{|i|
+ assert_in_delta(residuals[i],c_residuals[i],0.001)
+ }
+ else
+ skip "Regression::Multiple::GslEngine not tested (no Gsl)"
+ end
+ end
+
+
+
+ def model_test_matrix(lr,name='undefined')
+
+ stan_coeffs={'a'=>0.151,'b'=>-0.547,'c'=>0.997}
+ unstan_coeffs={'a'=>0.695, 'b'=>-4.286, 'c'=>0.266}
+
+ unstan_coeffs.each_key{|k|
+ assert_in_delta(unstan_coeffs[k], lr.coeffs[k],0.001,"b coeffs - #{name}")
+ }
+
+ stan_coeffs.each_key{|k|
+ assert_in_delta(stan_coeffs[k], lr.standarized_coeffs[k],0.001, "beta coeffs - #{name}")
+ }
+
+ assert_in_delta(11.027,lr.constant,0.001)
+
+ assert_in_delta(0.955,lr.r,0.001)
+ assert_in_delta(0.913,lr.r2,0.001)
+
+ assert_in_delta(20.908, lr.f,0.001)
+ assert_in_delta(0.001, lr.probability, 0.001)
+ assert_in_delta(0.226,lr.tolerance("a"),0.001)
+
+ coeffs_se={"a"=>1.171,"b"=>1.129,"c"=>0.072}
+
+
+
+ ccoeffs_se=lr.coeffs_se
+ coeffs_se.each_key{|k|
+ assert_in_delta(coeffs_se[k],ccoeffs_se[k],0.001)
+ }
+ coeffs_t={"a"=>0.594,"b"=>-3.796,"c"=>3.703}
+ ccoeffs_t=lr.coeffs_t
+ coeffs_t.each_key{|k|
+ assert_in_delta(coeffs_t[k], ccoeffs_t[k],0.001)
+ }
+
+ assert_in_delta(639.6,lr.sst,0.001)
+ assert_in_delta(583.76,lr.ssr,0.001)
+ assert_in_delta(55.840,lr.sse,0.001)
+ assert(lr.summary.size>0, "#{name} without summary")
+ end
+ def model_test(lr,name='undefined')
+ model_test_matrix(lr,name)
+ assert_in_delta(4.559, lr.constant_se,0.001)
+ assert_in_delta(2.419, lr.constant_t,0.001)
+
+ assert_in_delta(1.785,lr.process([1,3,11]),0.001)
+ end
+ def test_regression_matrix
+ @a=[1,3,2,4,3,5,4,6,5,7].to_vector(:scale)
+ @b=[3,3,4,4,5,5,6,6,4,4].to_vector(:scale)
+ @c=[11,22,30,40,50,65,78,79,99,100].to_vector(:scale)
+ @y=[3,4,5,6,7,8,9,10,20,30].to_vector(:scale)
+ ds={'a'=>@a,'b'=>@b,'c'=>@c,'y'=>@y}.to_dataset
+ cor=Statsample::Bivariate.correlation_matrix(ds)
+
+ lr=Statsample::Regression::Multiple::MatrixEngine.new(cor,'y', :y_mean=>@y.mean, :x_mean=>{'a'=>ds['a'].mean, 'b'=>ds['b'].mean, 'c'=>ds['c'].mean}, :cases=>@a.size, :y_sd=>@y.sd , :x_sd=>{'a' => @a.sd, 'b' => @b.sd, 'c' => @c.sd})
+ assert_nil(lr.constant_se)
+ assert_nil(lr.constant_t)
+ model_test_matrix(lr, "correlation matrix")
+
+ covariance=Statsample::Bivariate.covariance_matrix(ds)
+ lr=Statsample::Regression::Multiple::MatrixEngine.new(covariance,'y', :y_mean=>@y.mean, :x_mean=>{'a'=>ds['a'].mean, 'b'=>ds['b'].mean, 'c'=>ds['c'].mean}, :cases=>@a.size)
+ assert(lr.summary.size>0)
+
+ model_test(lr , "covariance matrix")
+ end
+ def test_regression_rubyengine
+ @a=[nil,1,3,2,4,3,5,4,6,5,7].to_vector(:scale)
+ @b=[nil,3,3,4,4,5,5,6,6,4,4].to_vector(:scale)
+ @c=[nil,11,22,30,40,50,65,78,79,99,100].to_vector(:scale)
+ @y=[nil,3,4,5,6,7,8,9,10,20,30].to_vector(:scale)
+ ds={'a'=>@a,'b'=>@b,'c'=>@c,'y'=>@y}.to_dataset
+ lr=Statsample::Regression::Multiple::RubyEngine.new(ds,'y')
+ assert_equal(11, lr.total_cases)
+ assert_equal(10, lr.valid_cases)
+ model_test(lr, 'rubyengine with missing data')
+
+ predicted=[nil,1.7857, 6.0989, 3.2433, 7.2908, 4.9667, 10.3428, 8.8158, 10.4717, 23.6639, 25.3198]
+ c_predicted = lr.predicted
+ predicted.each_index do |i|
+ if c_predicted[i].nil?
+ assert(predicted[i].nil?, "Actual #{i} is nil, but expected #{predicted[i]}")
+ else
+ assert_in_delta(predicted[i], c_predicted[i], 0.001)
+ end
+ end
+ residuals=[nil,1.2142, -2.0989, 1.7566, -1.29085, 2.033, -2.3428, 0.18414, -0.47177, -3.66395, 4.6801]
+ c_residuals=lr.residuals
+ residuals.each_index do |i|
+ if c_residuals[i].nil?
+ assert(residuals[i].nil?)
+ else
+ assert_in_delta(residuals[i],c_residuals[i],0.001)
+ end
+ end
+
+ end
+end
diff --git a/test/test_srs.rb b/test/test_srs.rb
index 1d18cf9..a486423 100644
--- a/test/test_srs.rb
+++ b/test/test_srs.rb
@@ -6,4 +6,50 @@ def test_std_error
assert_equal(108,Statsample::SRS.estimation_n(0.05,0.5,150,0.95).to_i)
assert_in_delta(0.0289,Statsample::SRS.proportion_sd_kp_wor(0.5,100,150),0.001)
end
+
+ def test_fpc
+ sample_size = 80
+ population_size = 10
+
+ assert_equal -7, Statsample::SRS.fpc_var(sample_size, population_size).to_i
+ end
+
+ def test_qf
+ sample_size = 80
+ population_size = 10
+
+ assert_equal -7, Statsample::SRS.qf(sample_size, population_size).to_i
+ #non sample fraction = 1 - sample fraction
+ assert_equal -7, 1 - sample_size.quo(population_size)
+ end
+
+ def test_proportion_confidence_interval_t
+ proportion = 59
+ n_sample, n_population = 15, 10
+ confidence_interval = Statsample::SRS.proportion_confidence_interval_t(proportion, n_sample, n_population).map(&:to_f)
+
+ assert_in_delta 35.2559, confidence_interval[0], 0.001
+ assert_in_delta 82.7440, confidence_interval[1], 0.001
+ end
+
+ def test_proportion_confidence_interval_z
+ #estimated proportion with normal distribution.
+ proportion = 59
+ n_sample, n_population = 15, 10
+ confidence_interval = Statsample::SRS.proportion_confidence_interval_z(proportion, n_sample, n_population).map(&:to_f)
+
+ assert_in_delta 37.2991, confidence_interval[0], 0.001
+ assert_in_delta 80.7008, confidence_interval[1], 0.001
+ end
+
+ def test_proportion_sd_kp_wor
+ #standard deviation without replacement strategy for known proportion
+ assert_in_delta 11.6979, Statsample::SRS.proportion_sd_kp_wor(40, -30, 20), 0.001
+ end
+
+ def test_proportion_sd_ep_wor
+ #standard deviation with replacement for estimated proportion
+ assert_in_delta 35.5527, Statsample::SRS.proportion_sd_ep_wr(80, -4), 0.001
+ assert_in_delta 38.9700, Statsample::SRS.proportion_sd_ep_wr(68, -2), 0.001
+ end
end
diff --git a/test/test_stratified.rb b/test/test_stratified.rb
index eb8ef45..7f0ef8d 100644
--- a/test/test_stratified.rb
+++ b/test/test_stratified.rb
@@ -3,6 +3,11 @@
class StatsampleStratifiedTestCase < MiniTest::Unit::TestCase
def initialize(*args)
+ @es = [
+ {"N" => 5, "s" => 3, "n" => 2},
+ {"N" => 10, "s" => 5, "n" => 2},
+ {"N" => 30, "s" => 32, "n" => 12}
+ ]
super
end
def test_mean
@@ -14,4 +19,69 @@ def test_mean
popv=pop.to_vector(:scale)
assert_equal(popv.mean,Statsample::StratifiedSample.mean(av,bv))
end
+
+ def test_standard_error_ksd_wr
+ # Standard error for known sample design with replacement
+ es = [
+ {"N" => 5, "s" => 3, "n" => 2},
+ {"N" => 10, "s" => 5, "n" => 2},
+ {"N" => 30, "s" => 32, "n" => 12}
+ ]
+ assert_in_delta 6.21279, Statsample::StratifiedSample.standard_error_ksd_wr(es), 0.001
+ end
+
+ def test_variance_ksd_wr
+ # variance of known sample desgin with replacement
+ result = Statsample::StratifiedSample.variance_ksd_wr(@es)
+
+ assert_in_delta 38.5987, result, 0.001
+ assert_in_delta Statsample::StratifiedSample.standard_error_ksd_wr(@es), Math::sqrt(result), 0.001
+ end
+
+ def test_calculate_n_total
+ # calculates total 'n' from the hash
+
+ assert_equal 45, Statsample::StratifiedSample.calculate_n_total(@es)
+ assert_equal 15, Statsample::StratifiedSample.calculate_n_total(@es.first(2))
+ assert_equal 40, Statsample::StratifiedSample.calculate_n_total(@es.last(2))
+ end
+
+ def test_variance_ksd_wor
+ # variance of known sample desgin without replacement
+
+ assert_in_delta 23.2827, Statsample::StratifiedSample.variance_ksd_wor(@es), 0.001
+ assert_in_delta 4.7444, Statsample::StratifiedSample.variance_ksd_wor(@es.first(2)), 0.001
+ end
+
+ def test_standard_error_ksd_wor
+ # standard error for known sample design without replacement
+ #
+ assert_in_delta 4.8252, Statsample::StratifiedSample.standard_error_ksd_wor(@es), 0.001
+
+ #for fragment sample design
+ assert_in_delta 5.4244, Statsample::StratifiedSample.standard_error_ksd_wor(@es.last 2), 0.001
+ end
+
+ def test_variance_esd_wor
+ # variance of estimated sample design without replacement strategry
+ assert_in_delta 23.2827, Statsample::StratifiedSample.variance_esd_wor(@es), 0.001
+ end
+
+ def test_standard_error_esd_wor
+ # Standard error for estimated sample design without replacement
+ assert_in_delta 4.82521, Statsample::StratifiedSample.standard_error_esd_wor(@es), 0.001
+ end
+
+ def test_standard_error_esd_wr
+ res = Statsample::StratifiedSample.standard_error_esd_wr(@es)
+ assert_in_delta 6.21279, res, 0.001
+ assert_in_delta Statsample::StratifiedSample.variance_esd_wr(@es), res ** 2, 0.001
+ end
+
+ def test_variance_esd_wr
+ # variance for estimated sample dsign with replacement
+
+ assert_in_delta 38.5987, Statsample::StratifiedSample.variance_esd_wr(@es), 0.001
+ end
+
end
diff --git a/test/test_test_f.rb b/test/test_test_f.rb
index b7cc4a8..9145d7a 100644
--- a/test/test_test_f.rb
+++ b/test/test_test_f.rb
@@ -1,33 +1,35 @@
require(File.expand_path(File.dirname(__FILE__)+'/helpers_tests.rb'))
class StatsampleTestFTestCase < MiniTest::Unit::TestCase
- context(Statsample::Test::F) do
- setup do
- @ssb=84
- @ssw=68
- @df_num=2
- @df_den=15
- @f=Statsample::Test::F.new(@ssb.quo(@df_num),@ssw.quo(@df_den), @df_num, @df_den)
- end
- should "have #f equal to msb/msw" do
- assert_equal((@ssb.quo(@df_num)).quo(@ssw.quo(@df_den)), @f.f)
- end
- should "have df total equal to df_num+df_den" do
- assert_equal(@df_num + @df_den, @f.df_total)
- end
- should "have probability near 0.002" do
- assert_in_delta(0.002, @f.probability, 0.0005)
- end
- should "be coerced into float" do
- assert_equal(@f.to_f, @f.f)
- end
-
- context("method summary") do
- setup do
- @summary=@f.summary
- end
- should "have size > 0" do
- assert(@summary.size>0)
- end
- end
+ def setup
+ @ssb=84
+ @ssw=68
+ @df_num=2
+ @df_den=15
+ @f=Statsample::Test::F.new(@ssb.quo(@df_num),@ssw.quo(@df_den), @df_num, @df_den)
+ @summary=@f.summary
+ end
+
+ def test_f_value
+ #should have #f equal to msb/msw
+ assert_equal((@ssb.quo(@df_num)).quo(@ssw.quo(@df_den)), @f.f)
+ end
+
+ def test_df_with_num_and_den
+ # df total should be equal to df_num+df_den
+ assert_equal(@df_num + @df_den, @f.df_total)
+ end
+
+ def test_probability
+ #should have probability near 0.002
+ assert_in_delta(0.002, @f.probability, 0.0005)
+ end
+
+ def test_float_coercion
+ #should be coerced into float
+ assert_equal(@f.to_f, @f.f)
+ end
+
+ def test_summary_size
+ assert(@summary.size>0)
end
end
diff --git a/test/test_tseries.rb b/test/test_tseries.rb
index 35f2cc9..9306ae0 100644
--- a/test/test_tseries.rb
+++ b/test/test_tseries.rb
@@ -24,8 +24,17 @@ def test_acf
end
def test_lag
- assert_in_delta 16.66, @xiu.lag[@xiu.size - 1], 0.001
- assert_in_delta 16.36, @xiu.lag(2)[@xiu.size - 1], 0.001
+ #test of default lag
+ lag1 = @xiu.lag
+
+ assert_in_delta 16.66, lag1[lag1.size - 1], 0.001
+ assert_in_delta 16.36, lag1[lag1.size - 2], 0.001
+
+ #test with different lagging unit
+ lag2 = @xiu.lag(2)
+
+ assert_in_delta 16.36, lag2[lag2.size - 1], 0.001
+ assert_in_delta 16.56, lag2[lag2.size - 2], 0.001
end
def test_delta
diff --git a/test/test_wald.rb b/test/test_wald.rb
new file mode 100644
index 0000000..30fb87b
--- /dev/null
+++ b/test/test_wald.rb
@@ -0,0 +1,84 @@
+require(File.expand_path(File.dirname(__FILE__)+'/helpers_tests.rb'))
+
+class StatsampleTestWaldTest < MiniTest::Unit::TestCase
+
+ # Wald
+ include Statsample::Shorthand
+ include Statsample::TimeSeries
+ def setup
+ @wald = 100.times.map do
+ rand(100)
+ end.to_ts
+ end
+
+ def generate_acf_series(lags)
+ @wald.acf(lags)
+ end
+
+ def compute_mean(acf)
+ acf.inject(:+) / acf.size
+ end
+
+ def compute_variance(acf)
+ result = acf.map { |x| x ** 2 }.inject(:+) / acf.size
+ return result.to_f
+ end
+
+ def generate_acf_summation(lags)
+ acf_series = @wald.acf(lags)
+ acf_series.map do |x|
+ x ** 2
+ end.inject(:+)
+ end
+
+ def compute_chi_df(lags)
+ observed = Matrix[@wald.acf(lags)]
+ Statsample::Test.chi_square(observed).df
+ end
+
+ def test_with_5_lags
+ acf = generate_acf_series(5)
+ assert_equal generate_acf_summation(5), @wald.acf(5).map { |x| x ** 2 }.inject(:+)
+ assert_in_delta compute_mean(acf).to_i, 0
+ assert_in_delta compute_variance(acf), 1.0/acf.size, 0.1
+ assert_equal (generate_acf_summation(5)/5).to_i, compute_chi_df(5)
+
+ end
+
+ def test_with_10_lags
+ acf = generate_acf_series(10)
+ assert_equal generate_acf_summation(10), @wald.acf(10).map { |x| x ** 2 }.inject(:+)
+ assert_in_delta compute_mean(acf).to_i, 0
+ assert_in_delta compute_variance(acf), 1.0/acf.size, 0.1
+ assert_equal (generate_acf_summation(10)/10).to_i, compute_chi_df(10)
+ end
+
+ def test_with_15_lags
+ acf = generate_acf_series(15)
+ assert_equal generate_acf_summation(15), @wald.acf(15).map { |x| x ** 2 }.inject(:+)
+ assert_in_delta compute_mean(acf).to_i, 0
+ assert_in_delta compute_variance(acf), 1.0/acf.size, 0.1
+ assert_equal (generate_acf_summation(15)/15).to_i, compute_chi_df(15)
+ end
+
+ def test_with_20_lags
+ acf = generate_acf_series(20)
+ assert_equal generate_acf_summation(20), @wald.acf(20).map { |x| x ** 2 }.inject(:+)
+ assert_in_delta compute_mean(acf).to_i, 0
+ assert_in_delta compute_variance(acf), 1.0/acf.size, 0.1
+ assert_equal (generate_acf_summation(20)/20).to_i, compute_chi_df(20)
+ end
+
+ def test_with_different_series
+ @wald = 200.times.map { rand }.to_ts
+ acf_series = @wald.acf(10)
+ lhs = acf_series.each { |x| x ** 2 }.inject(:+)
+ rhs = Statsample::Test.chi_square(Matrix[acf_series]).df
+ assert_equal (lhs/10).to_i, rhs
+ end
+
+ def test_pacf
+ #TODO
+ end
+end
+
diff --git a/test/test_wald2.rb b/test/test_wald2.rb
new file mode 100644
index 0000000..1a3eecd
--- /dev/null
+++ b/test/test_wald2.rb
@@ -0,0 +1,71 @@
+require(File.expand_path(File.dirname(__FILE__)+'/helpers_tests.rb'))
+
+class StatsampleWaldTest < MiniTest::Unit::TestCase
+ # Wald test is useful to test a series of n acf with Chi-square
+ # degree of freedom. It is extremely useful to test fit the fit of
+ # an ARIMA model to test the residuals.
+
+ include Statsample::TimeSeries
+ include Statsample::Shorthand
+ include Distribution
+
+ def setup
+ #create time series to evaluate later
+ @wald = 100.times.map { rand(100) }.to_ts
+ end
+
+ def sum_of_squares_of_acf_series(lags)
+ #perform sum of squares for a series of acf with specified lags
+ acf = @wald.acf(lags)
+ return acf.map { |x| x ** 2 }.inject(:+)
+ end
+
+ def chisquare_cdf(sum_of_squares, lags)
+ 1 - ChiSquare.cdf(sum_of_squares, lags)
+ end
+
+
+ def test_wald_with_5_lags
+ #number of lags for acf = 5
+ lags = 5
+ sum_of_squares = sum_of_squares_of_acf_series(lags)
+ assert_in_delta chisquare_cdf(sum_of_squares, lags), 1, 0.05
+ assert_equal @wald.acf(lags).size, lags + 1
+ end
+
+
+ def test_wald_with_10_lags
+ #number of lags for acf = 10
+ lags = 10
+ sum_of_squares = sum_of_squares_of_acf_series(lags)
+ assert_in_delta chisquare_cdf(sum_of_squares, lags), 1, 0.05
+ assert_equal @wald.acf(lags).size, lags + 1
+ end
+
+
+ def test_wald_with_15_lags
+ #number of lags for acf = 15
+ lags = 15
+ sum_of_squares = sum_of_squares_of_acf_series(lags)
+ assert_in_delta chisquare_cdf(sum_of_squares, lags), 1, 0.05
+ assert_equal @wald.acf(lags).size, lags + 1
+ end
+
+
+ def test_wald_with_20_lags
+ #number of lags for acf = 20
+ lags = 20
+ sum_of_squares = sum_of_squares_of_acf_series(lags)
+ assert_in_delta chisquare_cdf(sum_of_squares, lags), 1, 0.05
+ assert_equal @wald.acf(lags).size, lags + 1
+ end
+
+
+ def test_wald_with_25_lags
+ #number of lags for acf = 25
+ lags = 25
+ sum_of_squares = sum_of_squares_of_acf_series(lags)
+ assert_in_delta chisquare_cdf(sum_of_squares, lags), 1, 0.05
+ assert_equal @wald.acf(lags).size, lags + 1
+ end
+end