From af7b6245390c63af379686b8822e6e2d9330d63e Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Wed, 24 Apr 2013 22:23:20 +0530
Subject: [PATCH 01/22] Added Gemspec with new gems listing

---
 README.txt         | 174 ---------------------------------------------
 statsample.gemspec |  98 +++++++++++++++++++++++++
 2 files changed, 98 insertions(+), 174 deletions(-)
 delete mode 100644 README.txt
 create mode 100644 statsample.gemspec

diff --git a/README.txt b/README.txt
deleted file mode 100644
index 0c515b4..0000000
--- a/README.txt
+++ /dev/null
@@ -1,174 +0,0 @@
-= Statsample
-
-http://ruby-statsample.rubyforge.org/
-
-
-== DESCRIPTION:
-
-A suite for basic and advanced statistics on Ruby. Tested on Ruby 1.8.7, 1.9.1, 1.9.2 (April, 2010), ruby-head(June, 2011) and JRuby 1.4 (Ruby 1.8.7 compatible).
-
-Include:
-* Descriptive statistics: frequencies, median, mean, standard error, skew, kurtosis (and many others).
-* Imports and exports datasets from and to Excel, CSV and plain text files.
-* Correlations: Pearson's r, Spearman's rank correlation (rho), point biserial, tau a, tau b and  gamma.  Tetrachoric and Polychoric correlation provides by +statsample-bivariate-extension+ gem.
-* Intra-class correlation
-* Anova: generic and vector-based One-way ANOVA and Two-way ANOVA, with contrasts for One-way ANOVA.
-* Tests: F, T, Levene, U-Mannwhitney.
-* Regression: Simple, Multiple (OLS), Probit  and Logit
-* Factorial Analysis: Extraction (PCA and Principal Axis), Rotation (Varimax, Equimax, Quartimax) and Parallel Analysis and Velicer's MAP test, for estimation of number of factors.
-* Reliability analysis for simple scale and a DSL to easily analyze multiple scales using factor analysis and correlations, if you want it.
-* Basic time series support
-* Dominance Analysis, with multivariate dependent and bootstrap (Azen & Budescu)
-* Sample calculation related formulas
-* Structural Equation Modeling (SEM), using R libraries +sem+ and +OpenMx+
-* Creates reports on text, html and rtf, using ReportBuilder gem
-* Graphics: Histogram, Boxplot and Scatterplot
-
-== PRINCIPLES
-
-* Software Design: 
-  * One module/class for each type of analysis
-  * Options can be set as hash on initialize() or as setters methods
-  * Clean API for interactive sessions
-  * summary() returns all necessary informacion for interactive sessions
-  * All statistical data available though methods on objects
-  * All (important) methods should be tested. Better with random data.
-* Statistical Design
-  * Results are tested against text results, SPSS and R outputs.
-  * Go beyond Null Hiphotesis Testing, using confidence intervals and effect sizes when possible
-  * (When possible) All references for methods are documented, providing sensible information on documentation 
-
-== FEATURES:
-
-* Classes for manipulation and storage of data:
-  * Statsample::Vector: An extension of an array, with statistical methods like sum, mean and standard deviation
-  * Statsample::Dataset: a group of Statsample::Vector, analog to a excel spreadsheet or a dataframe on R. The base of almost all operations on statsample. 
-  * Statsample::Multiset: multiple datasets with same fields and type of vectors
-* Anova module provides generic Statsample::Anova::OneWay and vector based Statsample::Anova::OneWayWithVectors. Also you can create contrast using Statsample::Anova::Contrast
-* Module Statsample::Bivariate provides covariance and pearson, spearman, point biserial, tau a, tau b, gamma, tetrachoric (see Bivariate::Tetrachoric) and polychoric (see Bivariate::Polychoric) correlations. Include methods to create correlation and covariance matrices
-* Multiple types of regression.
-  * Simple Regression :  Statsample::Regression::Simple
-  * Multiple Regression: Statsample::Regression::Multiple
-  * Logit Regression:    Statsample::Regression::Binomial::Logit
-  * Probit Regression:    Statsample::Regression::Binomial::Probit
-* Factorial Analysis algorithms on Statsample::Factor module.
-  * Classes for Extraction of factors: 
-    * Statsample::Factor::PCA
-    * Statsample::Factor::PrincipalAxis
-  * Classes for Rotation of factors: 
-    * Statsample::Factor::Varimax
-    * Statsample::Factor::Equimax
-    * Statsample::Factor::Quartimax
-  * Classes for calculation of factors to retain
-    * Statsample::Factor::ParallelAnalysis performs Horn's 'parallel analysis' to a principal components analysis to adjust for sample bias in the retention of components.
-    * Statsample::Factor::MAP performs Velicer's Minimum Average Partial (MAP) test, which retain components as long as the variance in the correlation matrix represents systematic variance.
-* Dominance Analysis. Based on Budescu and Azen papers, dominance analysis is a method to analyze the relative importance of one predictor relative to another on multiple regression
-  * Statsample::DominanceAnalysis class can report dominance analysis for a sample, using uni or multivariate dependent variables
-  * Statsample::DominanceAnalysis::Bootstrap can execute bootstrap analysis to determine dominance stability, as recomended by  Azen & Budescu (2003) link[http://psycnet.apa.org/journals/met/8/2/129/]. 
-* Module Statsample::Codification, to help to codify open questions
-* Converters to import and export data:
-  * Statsample::Database : Can create sql to create tables, read and insert data
-  * Statsample::CSV : Read and write CSV files
-  * Statsample::Excel : Read and write Excel files
-  * Statsample::Mx    : Write Mx Files
-  * Statsample::GGobi : Write Ggobi files
-* Module Statsample::Crosstab provides function to create crosstab for categorical data
-* Module Statsample::Reliability provides functions to analyze scales with psychometric methods. 
-  * Class Statsample::Reliability::ScaleAnalysis provides statistics like mean, standard deviation for a scale, Cronbach's alpha and standarized Cronbach's alpha, and for each item: mean, correlation with total scale, mean if deleted, Cronbach's alpha is deleted.
-  * Class Statsample::Reliability::MultiScaleAnalysis provides a DSL to easily analyze reliability of multiple scales and retrieve correlation matrix and factor analysis of them.
-  * Class Statsample::Reliability::ICC provides intra-class correlation, using Shrout & Fleiss(1979) and McGraw & Wong (1996) formulations.
-* Module Statsample::SRS (Simple Random Sampling) provides a lot of functions to estimate standard error for several type of samples
-* Module Statsample::Test provides several methods and classes to perform inferencial statistics
-  * Statsample::Test::BartlettSphericity
-  * Statsample::Test::ChiSquare
-  * Statsample::Test::F  
-  * Statsample::Test::KolmogorovSmirnov (only D value)
-  * Statsample::Test::Levene
-  * Statsample::Test::UMannWhitney
-  * Statsample::Test::T
-* Module Graph provides several classes to create beautiful graphs using rubyvis
-  * Statsample::Graph::Boxplot
-  * Statsample::Graph::Histogram
-  * Statsample::Graph::Scatterplot
-* Module Statsample::TimeSeries provides basic support for time series. 
-* Gem +statsample-sem+ provides a DSL to R libraries +sem+ and +OpenMx+
-* Close integration with gem <tt>reportbuilder</tt>, to easily create reports on text, html and rtf formats.
-
-== Examples of use:
-
-See multiples examples of use on [http://github.com/clbustos/statsample/tree/master/examples/]
-
-=== Boxplot
-
-    require 'statsample'
-    ss_analysis(Statsample::Graph::Boxplot) do 
-      n=30
-      a=rnorm(n-1,50,10)
-      b=rnorm(n, 30,5)
-      c=rnorm(n,5,1)
-      a.push(2)
-      boxplot(:vectors=>[a,b,c], :width=>300, :height=>300, :groups=>%w{first first second}, :minimum=>0)
-    end    
-    Statsample::Analysis.run # Open svg file on *nix application defined
-
-=== Correlation matrix
-
-    require 'statsample'
-    # Note R like generation of random gaussian variable
-    # and correlation matrix
-    
-    ss_analysis("Statsample::Bivariate.correlation_matrix") do
-      samples=1000
-      ds=data_frame(
-        'a'=>rnorm(samples), 
-        'b'=>rnorm(samples),
-        'c'=>rnorm(samples),
-        'd'=>rnorm(samples))
-      cm=cor(ds) 
-      summary(cm)
-    end
-    
-    Statsample::Analysis.run_batch # Echo output to console
-
-
-== REQUIREMENTS:
-
-Optional: 
-
-* Plotting: gnuplot and rbgnuplot, SVG::Graph
-* Factorial analysis and polychorical correlation(joint estimate and polychoric series): gsl library and rb-gsl (http://rb-gsl.rubyforge.org/). You should install it using <tt>gem install gsl</tt>. 
-
-<b>Note</b>: Use gsl 1.12.109 or later.
-
-== RESOURCES
-
-* Source code on github: http://github.com/clbustos/statsample
-* API: http://ruby-statsample.rubyforge.org/statsample/
-* Bug report and feature request: http://github.com/clbustos/statsample/issues
-* E-mailing list: http://groups.google.com/group/statsample
-
-== INSTALL:
-
-  $ sudo gem install statsample
-
-On *nix, you should install statsample-optimization to retrieve gems gsl, statistics2 and a C extension to speed some methods. 
-
-There are available precompiled version for Ruby 1.9 on x86, x86_64 and mingw32 archs.
-
-  $ sudo gem install statsample-optimization
-
-If you use Ruby 1.8, you should compile statsample-optimization, usign parameter <tt>--platform ruby</tt>
-
-  $ sudo gem install statsample-optimization --platform ruby
-
-If you need to work on Structural Equation Modeling, you could see +statsample-sem+. You need R with +sem+ or +OpenMx+ [http://openmx.psyc.virginia.edu/] libraries installed
-
-  $ sudo gem install statsample-sem
-
-Available setup.rb file
-
-  sudo gem ruby setup.rb
-
-== LICENSE:
-
-GPL-2 (See LICENSE.txt)
diff --git a/statsample.gemspec b/statsample.gemspec
new file mode 100644
index 0000000..205f0e2
--- /dev/null
+++ b/statsample.gemspec
@@ -0,0 +1,98 @@
+# -*- encoding: utf-8 -*-
+
+Gem::Specification.new do |s|
+  s.name = "statsample"
+  s.version = "1.1.0.20110912141756"
+
+  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
+  s.authors = ["Claudio Bustos"]
+  s.date = "2013-19-04"
+  s.description = "A suite for basic and advanced statistics on Ruby. Tested on Ruby 1.8.7, 1.9.1, 1.9.2, 1.9.3 and JRuby 1.4 (Ruby 1.8.7 compatible).\n\nInclude:\n* Descriptive statistics: frequencies, median, mean, standard error, skew, kurtosis (and many others).\n* Imports and exports datasets from and to Excel, CSV and plain text files.\n* Correlations: Pearson's r, Spearman's rank correlation (rho), point biserial, tau a, tau b and  gamma.  Tetrachoric and Polychoric correlation provides by +statsample-bivariate-extension+ gem.\n* Intra-class correlation\n* Anova: generic and vector-based One-way ANOVA and Two-way ANOVA, with contrasts for One-way ANOVA.\n* Tests: F, T, Levene, U-Mannwhitney.\n* Regression: Simple, Multiple (OLS), Probit  and Logit\n* Factorial Analysis: Extraction (PCA and Principal Axis), Rotation (Varimax, Equimax, Quartimax) and Parallel Analysis and Velicer's MAP test, for estimation of number of factors.\n* Reliability analysis for simple scale and a DSL to easily analyze multiple scales using factor analysis and correlations, if you want it.\n* Dominance Analysis, with multivariate dependent and bootstrap (Azen & Budescu)\n* Sample calculation related formulas\n* Structural Equation Modeling (SEM), using R libraries +sem+ and +OpenMx+\n* Creates reports on text, html and rtf, using ReportBuilder gem\n* Graphics: Histogram, Boxplot and Scatterplot"
+  s.email = ["clbustos@gmail.com"]
+  s.executables = ["statsample"]
+  s.extra_rdoc_files = ["History.txt", "LICENSE.txt", "Manifest.txt", "README.txt", "references.txt"]
+  s.files = ["History.txt", "LICENSE.txt", "Manifest.txt", "README.txt", "Rakefile", "benchmarks/correlation_matrix_15_variables.rb", "benchmarks/correlation_matrix_5_variables.rb", "benchmarks/correlation_matrix_methods/correlation_matrix.ds", "benchmarks/correlation_matrix_methods/correlation_matrix.html", "benchmarks/correlation_matrix_methods/correlation_matrix.rb", "benchmarks/correlation_matrix_methods/correlation_matrix.xls", "benchmarks/correlation_matrix_methods/correlation_matrix_gsl_ruby.ods", "benchmarks/correlation_matrix_methods/correlation_matrix_with_graphics.ods", "benchmarks/correlation_matrix_methods/results.ds", "benchmarks/factor_map.rb", "benchmarks/helpers_benchmark.rb", "bin/statsample", "data/locale/es/LC_MESSAGES/statsample.mo", "doc_latex/manual/equations.tex", "examples/boxplot.rb", "examples/correlation_matrix.rb", "examples/dataset.rb", "examples/dominance_analysis.rb", "examples/dominance_analysis_bootstrap.rb", "examples/histogram.rb", "examples/icc.rb", "examples/levene.rb", "examples/multiple_regression.rb", "examples/multivariate_correlation.rb", "examples/parallel_analysis.rb", "examples/polychoric.rb", "examples/principal_axis.rb", "examples/reliability.rb", "examples/scatterplot.rb", "examples/t_test.rb", "examples/tetrachoric.rb", "examples/u_test.rb", "examples/vector.rb", "examples/velicer_map_test.rb", "grab_references.rb", "lib/spss.rb", "lib/statsample.rb", "lib/statsample/analysis.rb", "lib/statsample/anova.rb", "lib/statsample/anova/contrast.rb", "lib/statsample/anova/oneway.rb", "lib/statsample/anova/twoway.rb", "lib/statsample/bivariate.rb", "lib/statsample/bivariate/pearson.rb", "lib/statsample/codification.rb", "lib/statsample/converter/csv.rb", "lib/statsample/converter/spss.rb", "lib/statsample/converters.rb", "lib/statsample/crosstab.rb", "lib/statsample/dataset.rb", "lib/statsample/dominanceanalysis.rb", "lib/statsample/dominanceanalysis/bootstrap.rb", "lib/statsample/factor.rb", "lib/statsample/factor/map.rb", "lib/statsample/factor/parallelanalysis.rb", "lib/statsample/factor/pca.rb", "lib/statsample/factor/principalaxis.rb", "lib/statsample/factor/rotation.rb", "lib/statsample/graph.rb", "lib/statsample/graph/boxplot.rb", "lib/statsample/graph/histogram.rb", "lib/statsample/graph/scatterplot.rb", "lib/statsample/histogram.rb", "lib/statsample/matrix.rb", "lib/statsample/mle.rb", "lib/statsample/mle/logit.rb", "lib/statsample/mle/normal.rb", "lib/statsample/mle/probit.rb", "lib/statsample/multiset.rb", "lib/statsample/regression.rb", "lib/statsample/regression/binomial.rb", "lib/statsample/regression/binomial/logit.rb", "lib/statsample/regression/binomial/probit.rb", "lib/statsample/regression/multiple.rb", "lib/statsample/regression/multiple/alglibengine.rb", "lib/statsample/regression/multiple/baseengine.rb", "lib/statsample/regression/multiple/gslengine.rb", "lib/statsample/regression/multiple/matrixengine.rb", "lib/statsample/regression/multiple/rubyengine.rb", "lib/statsample/regression/simple.rb", "lib/statsample/reliability.rb", "lib/statsample/reliability/icc.rb", "lib/statsample/reliability/multiscaleanalysis.rb", "lib/statsample/reliability/scaleanalysis.rb", "lib/statsample/reliability/skillscaleanalysis.rb", "lib/statsample/resample.rb", "lib/statsample/rserve_extension.rb", "lib/statsample/shorthand.rb", "lib/statsample/srs.rb", "lib/statsample/test.rb", "lib/statsample/test/bartlettsphericity.rb", "lib/statsample/test/chisquare.rb", "lib/statsample/test/f.rb", "lib/statsample/test/kolmogorovsmirnov.rb", "lib/statsample/test/levene.rb", "lib/statsample/test/t.rb", "lib/statsample/test/umannwhitney.rb", "lib/statsample/vector.rb", "lib/statsample/vector/gsl.rb", "po/es/statsample.mo", "po/es/statsample.po", "po/statsample.pot", "references.txt", "setup.rb", "test/fixtures/bank2.dat", "test/fixtures/correlation_matrix.rb", "test/fixtures/crime.txt", "test/fixtures/hartman_23.matrix", "test/fixtures/repeated_fields.csv", "test/fixtures/test_binomial.csv", "test/fixtures/test_csv.csv", "test/fixtures/test_xls.xls", "test/fixtures/tetmat_matrix.txt", "test/fixtures/tetmat_test.txt", "test/helpers_tests.rb", "test/test_analysis.rb", "test/test_anova_contrast.rb", "test/test_anovaoneway.rb", "test/test_anovatwoway.rb", "test/test_anovatwowaywithdataset.rb", "test/test_anovawithvectors.rb", "test/test_bartlettsphericity.rb", "test/test_bivariate.rb", "test/test_codification.rb", "test/test_crosstab.rb", "test/test_csv.rb", "test/test_dataset.rb", "test/test_dominance_analysis.rb", "test/test_factor.rb", "test/test_factor_map.rb", "test/test_factor_pa.rb", "test/test_ggobi.rb", "test/test_gsl.rb", "test/test_histogram.rb", "test/test_logit.rb", "test/test_matrix.rb", "test/test_mle.rb", "test/test_multiset.rb", "test/test_regression.rb", "test/test_reliability.rb", "test/test_reliability_icc.rb", "test/test_reliability_skillscale.rb", "test/test_resample.rb", "test/test_rserve_extension.rb", "test/test_srs.rb", "test/test_statistics.rb", "test/test_stest.rb", "test/test_stratified.rb", "test/test_test_f.rb", "test/test_test_kolmogorovsmirnov.rb", "test/test_test_t.rb", "test/test_umannwhitney.rb", "test/test_vector.rb", "test/test_xls.rb", "web/Rakefile"]
+  s.homepage = "http://ruby-statsample.rubyforge.org/"
+  s.post_install_message = "***************************************************\nThanks for installing statsample.\n\nOn *nix, you could install statsample-optimization\nto retrieve gems gsl, statistics2 and a C extension\nto speed some methods.\n\n  $ sudo gem install statsample-optimization\n\nOn Ubuntu, install  build-essential and libgsl0-dev \nusing apt-get. Compile ruby 1.9x from \nsource code first.\n\n  $ sudo apt-get install build-essential libgsl0-dev\n\n\n*****************************************************\n"
+  s.rdoc_options = ["--main", "README.txt"]
+  s.require_paths = ["lib"]
+  s.rubyforge_project = "ruby-statsample"
+  s.rubygems_version = "1.8.10"
+  s.summary = "A suite for basic and advanced statistics on Ruby"
+  s.test_files = ["test/test_ggobi.rb", "test/test_vector.rb", "test/test_crosstab.rb", "test/test_bivariate.rb", "test/test_dominance_analysis.rb", "test/test_analysis.rb", "test/test_dataset.rb", "test/test_test_t.rb", "test/test_regression.rb", "test/test_reliability_icc.rb", "test/test_histogram.rb", "test/test_multiset.rb", "test/test_logit.rb", "test/test_xls.rb", "test/test_test_kolmogorovsmirnov.rb", "test/test_factor_map.rb", "test/test_stratified.rb", "test/test_rserve_extension.rb", "test/test_reliability_skillscale.rb", "test/test_anova_contrast.rb", "test/test_anovaoneway.rb", "test/test_matrix.rb", "test/test_reliability.rb", "test/test_anovatwowaywithdataset.rb", "test/test_factor_pa.rb", "test/test_factor.rb", "test/test_resample.rb", "test/test_anovawithvectors.rb", "test/test_bartlettsphericity.rb", "test/test_statistics.rb", "test/test_stest.rb", "test/test_srs.rb", "test/test_test_f.rb", "test/test_csv.rb", "test/test_anovatwoway.rb", "test/test_gsl.rb", "test/test_mle.rb", "test/test_umannwhitney.rb", "test/test_codification.rb"]
+
+  if s.respond_to? :specification_version then
+    s.specification_version = 3
+
+    if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
+      s.add_runtime_dependency(%q<spreadsheet>, ["~> 0.6.5"])
+      s.add_runtime_dependency(%q<reportbuilder>, ["~> 1.4"])
+      s.add_runtime_dependency(%q<minimization>, ["~> 0.2.0"])
+      s.add_runtime_dependency(%q<fastercsv>, ["> 0"])
+      s.add_runtime_dependency(%q<dirty-memoize>, ["~> 0.0"])
+      s.add_runtime_dependency(%q<extendmatrix>, ["~> 0.3.1"])
+      s.add_runtime_dependency(%q<statsample-bivariate-extension>, ["> 0"])
+      s.add_runtime_dependency(%q<rserve-client>, ["~> 0.2.5"])
+      s.add_runtime_dependency(%q<rubyvis>, ["~> 0.5.0"])
+      s.add_runtime_dependency(%q<distribution>, ["~> 0.3"])
+      s.add_development_dependency(%q<hoe>, ["~> 0"])
+      s.add_development_dependency(%q<shoulda>, ["~> 0"])
+      s.add_development_dependency(%q<minitest>, ["~> 2.0"])
+      s.add_development_dependency(%q<rserve-client>, ["~> 0"])
+      s.add_development_dependency(%q<gettext>, ["~> 2.3.8"])
+      s.add_development_dependency(%q<mocha>, ["~> 0"])
+      s.add_development_dependency(%q<hoe-git>, ["~> 1.5.0"])
+      s.add_development_dependency(%q<hoe>, ["~> 2.12"])
+      s.add_development_dependency(%q<rspec>, ["~> 2.13.0"])
+      s.add_development_dependency(%q<rspec-core>, ["~> 2.13.1"])
+      s.add_development_dependency(%q<rspec-mocks>, ["~> 2.13.0"])
+      s.add_development_dependency(%q<rspec-expectations>, ["~> 2.13.0"])
+    else
+      s.add_dependency(%q<spreadsheet>, ["~> 0.6.5"])
+      s.add_dependency(%q<reportbuilder>, ["~> 1.4"])
+      s.add_dependency(%q<minimization>, ["~> 0.2.0"])
+      s.add_dependency(%q<fastercsv>, ["> 0"])
+      s.add_dependency(%q<dirty-memoize>, ["~> 0.0"])
+      s.add_dependency(%q<extendmatrix>, ["~> 0.3.1"])
+      s.add_dependency(%q<statsample-bivariate-extension>, ["> 0"])
+      s.add_dependency(%q<rserve-client>, ["~> 0.2.5"])
+      s.add_dependency(%q<rubyvis>, ["~> 0.5.0"])
+      s.add_dependency(%q<distribution>, ["~> 0.3"])
+      s.add_dependency(%q<hoe>, ["~> 0"])
+      s.add_dependency(%q<shoulda>, ["~> 0"])
+      s.add_dependency(%q<minitest>, ["~> 2.0"])
+      s.add_dependency(%q<rserve-client>, ["~> 0"])
+      s.add_dependency(%q<gettext>, ["~> 2.3.8"])
+      s.add_dependency(%q<mocha>, ["~> 0"])
+      s.add_dependency(%q<hoe-git>, ["~> 1.5.0"])
+      s.add_dependency(%q<hoe>, ["~> 2.12"])
+      s.add_dependency(%q<rspec>, ["~> 2.13.0"])
+      s.add_dependency(%q<rspec-core>, ["~> 2.13.1"])
+      s.add_dependency(%q<rspec-mocks>, ["~> 2.13.0"])
+      s.add_dependency(%q<rspec-expectations>, ["~> 2.13.0"])
+    end
+  else
+    s.add_dependency(%q<spreadsheet>, ["~> 0.6.5"])
+    s.add_dependency(%q<reportbuilder>, ["~> 1.4"])
+    s.add_dependency(%q<minimization>, ["~> 0.2.0"])
+    s.add_dependency(%q<fastercsv>, ["> 0"])
+    s.add_dependency(%q<dirty-memoize>, ["~> 0.0"])
+    s.add_dependency(%q<extendmatrix>, ["~> 0.3.1"])
+    s.add_dependency(%q<statsample-bivariate-extension>, ["> 0"])
+    s.add_dependency(%q<rserve-client>, ["~> 0.2.5"])
+    s.add_dependency(%q<rubyvis>, ["~> 0.5.0"])
+    s.add_dependency(%q<distribution>, ["~> 0.3"])
+    s.add_dependency(%q<hoe>, ["~> 0"])
+    s.add_dependency(%q<shoulda>, ["~> 0"])
+    s.add_dependency(%q<minitest>, ["~> 2.0"])
+    s.add_dependency(%q<rserve-client>, ["~> 0"])
+    s.add_dependency(%q<gettext>, ["~> 2.3.8"])
+    s.add_dependency(%q<mocha>, ["~> 0"])
+    s.add_dependency(%q<hoe-git>, ["~> 1.5.0"])
+    s.add_dependency(%q<hoe>, ["~> 2.12"])
+    s.add_dependency(%q<rspec>, ["~> 2.13.0"])
+    s.add_dependency(%q<rspec-core>, ["~> 2.13.1"])
+    s.add_dependency(%q<rspec-mocks>, ["~> 2.13.0"])
+    s.add_dependency(%q<rspec-expectations>, ["~> 2.13.0"])
+  end
+end

From 2efd9fb5511070bd07517fc5b8dc71cdeafd85f6 Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Wed, 24 Apr 2013 22:25:11 +0530
Subject: [PATCH 02/22] Converted README to proper markdown. Easily readable
 now.

---
 README.md | 185 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 185 insertions(+)
 create mode 100644 README.md

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..d389977
--- /dev/null
+++ b/README.md
@@ -0,0 +1,185 @@
+Statsample
+==========
+
+[http://ruby-statsample.rubyforge.org/](http://ruby-statsample.rubyforge.org/)
+
+
+DESCRIPTION:
+------------
+
+A suite for basic and advanced statistics on Ruby. Tested on Ruby 1.8.7, 1.9.1, 1.9.2 (April, 2010), ruby-head(June, 2011) and JRuby 1.4 (Ruby 1.8.7 compatible).
+
+Include:
+
+* Descriptive statistics: frequencies, median, mean, standard error, skew, kurtosis (and many others).
+* Imports and exports datasets from and to Excel, CSV and plain text files.
+* Correlations: Pearson's r, Spearman's rank correlation (rho), point biserial, tau a, tau b and  gamma.  Tetrachoric and Polychoric correlation provides by +statsample-bivariate-extension+ gem.
+* Intra-class correlation
+* Anova: generic and vector-based One-way ANOVA and Two-way ANOVA, with contrasts for One-way ANOVA.
+* Tests: F, T, Levene, U-Mannwhitney.
+* Regression: Simple, Multiple (OLS), Probit  and Logit
+* Factorial Analysis: Extraction (PCA and Principal Axis), Rotation (Varimax, Equimax, Quartimax) and Parallel Analysis and Velicer's MAP test, for estimation of number of factors.
+* Reliability analysis for simple scale and a DSL to easily analyze multiple scales using factor analysis and correlations, if you want it.
+* Basic time series support
+* Dominance Analysis, with multivariate dependent and bootstrap (Azen & Budescu)
+* Sample calculation related formulas
+* Structural Equation Modeling (SEM), using R libraries +sem+ and +OpenMx+
+* Creates reports on text, html and rtf, using ReportBuilder gem
+* Graphics: Histogram, Boxplot and Scatterplot
+
+PRINCIPLES
+----------
+
+* Software Design:
+  * One module/class for each type of analysis
+  * Options can be set as hash on initialize() or as setters methods
+  * Clean API for interactive sessions
+  * summary() returns all necessary informacion for interactive sessions
+  * All statistical data available though methods on objects
+  * All (important) methods should be tested. Better with random data.
+* Statistical Design
+  * Results are tested against text results, SPSS and R outputs.
+  * Go beyond Null Hiphotesis Testing, using confidence intervals and effect sizes when possible
+  * (When possible) All references for methods are documented, providing sensible information on documentation
+
+FEATURES:
+--------
+
+* Classes for manipulation and storage of data:
+  * Statsample::Vector: An extension of an array, with statistical methods like sum, mean and standard deviation
+  * Statsample::Dataset: a group of Statsample::Vector, analog to a excel spreadsheet or a dataframe on R. The base of almost all operations on statsample.
+  * Statsample::Multiset: multiple datasets with same fields and type of vectors
+* Anova module provides generic Statsample::Anova::OneWay and vector based Statsample::Anova::OneWayWithVectors. Also you can create contrast using Statsample::Anova::Contrast
+* Module Statsample::Bivariate provides covariance and pearson, spearman, point biserial, tau a, tau b, gamma, tetrachoric (see Bivariate::Tetrachoric) and polychoric (see Bivariate::Polychoric) correlations. Include methods to create correlation and covariance matrices
+* Multiple types of regression.
+  * Simple Regression :  Statsample::Regression::Simple
+  * Multiple Regression: Statsample::Regression::Multiple
+  * Logit Regression:    Statsample::Regression::Binomial::Logit
+  * Probit Regression:    Statsample::Regression::Binomial::Probit
+* Factorial Analysis algorithms on Statsample::Factor module.
+  * Classes for Extraction of factors:
+    * Statsample::Factor::PCA
+    * Statsample::Factor::PrincipalAxis
+  * Classes for Rotation of factors:
+    * Statsample::Factor::Varimax
+    * Statsample::Factor::Equimax
+    * Statsample::Factor::Quartimax
+  * Classes for calculation of factors to retain
+    * Statsample::Factor::ParallelAnalysis performs Horn's 'parallel analysis' to a principal components analysis to adjust for sample bias in the retention of components.
+    * Statsample::Factor::MAP performs Velicer's Minimum Average Partial (MAP) test, which retain components as long as the variance in the correlation matrix represents systematic variance.
+* Dominance Analysis. Based on Budescu and Azen papers, dominance analysis is a method to analyze the relative importance of one predictor relative to another on multiple regression
+  * Statsample::DominanceAnalysis class can report dominance analysis for a sample, using uni or multivariate dependent variables
+  * Statsample::DominanceAnalysis::Bootstrap can execute bootstrap analysis to determine dominance stability, as recomended by  Azen & Budescu (2003) link[http://psycnet.apa.org/journals/met/8/2/129/].
+* Module Statsample::Codification, to help to codify open questions
+* Converters to import and export data:
+  * Statsample::Database : Can create sql to create tables, read and insert data
+  * Statsample::CSV : Read and write CSV files
+  * Statsample::Excel : Read and write Excel files
+  * Statsample::Mx    : Write Mx Files
+  * Statsample::GGobi : Write Ggobi files
+* Module Statsample::Crosstab provides function to create crosstab for categorical data
+* Module Statsample::Reliability provides functions to analyze scales with psychometric methods.
+  * Class Statsample::Reliability::ScaleAnalysis provides statistics like mean, standard deviation for a scale, Cronbach's alpha and standarized Cronbach's alpha, and for each item: mean, correlation with total scale, mean if deleted, Cronbach's alpha is deleted.
+  * Class Statsample::Reliability::MultiScaleAnalysis provides a DSL to easily analyze reliability of multiple scales and retrieve correlation matrix and factor analysis of them.
+  * Class Statsample::Reliability::ICC provides intra-class correlation, using Shrout & Fleiss(1979) and McGraw & Wong (1996) formulations.
+* Module Statsample::SRS (Simple Random Sampling) provides a lot of functions to estimate standard error for several type of samples
+* Module Statsample::Test provides several methods and classes to perform inferencial statistics
+  * Statsample::Test::BartlettSphericity
+  * Statsample::Test::ChiSquare
+  * Statsample::Test::F
+  * Statsample::Test::KolmogorovSmirnov (only D value)
+  * Statsample::Test::Levene
+  * Statsample::Test::UMannWhitney
+  * Statsample::Test::T
+* Module Graph provides several classes to create beautiful graphs using rubyvis
+  * Statsample::Graph::Boxplot
+  * Statsample::Graph::Histogram
+  * Statsample::Graph::Scatterplot
+* Module Statsample::TimeSeries provides basic support for time series.
+* Gem +statsample-sem+ provides a DSL to R libraries +sem+ and +OpenMx+
+* Close integration with gem `reportbuilder`, to easily create reports on text, html and rtf formats.
+
+Examples of use:
+--------------
+
+See multiples examples of use on [http://github.com/clbustos/statsample/tree/master/examples/](http://github.com/clbustos/statsample/tree/master/examples/)
+
+Boxplot
+-------
+```ruby
+    require 'statsample'
+    ss_analysis(Statsample::Graph::Boxplot) do
+      n=30
+      a=rnorm(n-1,50,10)
+      b=rnorm(n, 30,5)
+      c=rnorm(n,5,1)
+      a.push(2)
+      boxplot(:vectors=>[a,b,c], :width=>300, :height=>300, :groups=>%w{first first second}, :minimum=>0)
+    end
+    Statsample::Analysis.run # Open svg file on *nix application defined
+```
+Correlation matrix
+------------------
+```ruby
+    require 'statsample'
+    # Note R like generation of random gaussian variable
+    # and correlation matrix
+
+    ss_analysis("Statsample::Bivariate.correlation_matrix") do
+      samples=1000
+      ds=data_frame(
+        'a'=>rnorm(samples),
+        'b'=>rnorm(samples),
+        'c'=>rnorm(samples),
+        'd'=>rnorm(samples))
+      cm=cor(ds)
+      summary(cm)
+    end
+
+    Statsample::Analysis.run_batch # Echo output to console
+```
+
+REQUIREMENTS:
+-------------
+
+Optional:
+
+* Plotting: gnuplot and rbgnuplot, SVG::Graph
+* Factorial analysis and polychorical correlation(joint estimate and polychoric series): gsl library and rb-gsl [http://rb-gsl.rubyforge.org/](http://rb-gsl.rubyforge.org/). You should install it using `gem install gsl`.
+
+**Note**: Use gsl 1.12.109 or later.
+
+RESOURCES:
+----------
+
+* Source code on github: [http://github.com/clbustos/statsample](http://github.com/clbustos/statsample)
+* API: [http://ruby-statsample.rubyforge.org/statsample/](http://ruby-statsample.rubyforge.org/statsample/)
+* Bug report and feature request: [http://github.com/clbustos/statsample/issues](http://github.com/clbustos/statsample/issues)
+* E-mailing list: [http://groups.google.com/group/statsample](http://groups.google.com/group/statsample)
+
+INSTALL:
+---------
+  `$ sudo gem install statsample`
+
+On \*nix, you should install statsample-optimization to retrieve gems gsl, statistics2 and a C extension to speed some methods.
+
+There are available precompiled version for Ruby 1.9 on x86, x86\_64 and mingw32 archs.
+
+  `$ sudo gem install statsample-optimization`
+
+If you use Ruby 1.8, you should compile statsample-optimization, usign parameter `--platform ruby`
+
+  `$ sudo gem install statsample-optimization --platform ruby`
+
+If you need to work on Structural Equation Modeling, you could see _statsample-sem_. You need R with _sem_ or _OpenMx_ [http://openmx.psyc.virginia.edu/](http://openmx.psyc.virginia.edu/) libraries installed
+
+  `$ sudo gem install statsample-sem`
+
+Available setup.rb file
+
+  `sudo gem ruby setup.rb`
+
+LICENSE:
+-------
+
+GPL-2 (See LICENSE.txt)

From 80f7a73be015d2ac7f531fb2125363a6617f0fab Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Wed, 24 Apr 2013 22:26:14 +0530
Subject: [PATCH 03/22] Added specs for Simple Random Sampling(SRS) and Time
 Series Analysis. Removed depracated Shoulda includes from helper_tests
 Modified param list in srs.rb for whitespaces.

---
 lib/statsample/srs.rb |  2 +-
 test/helpers_tests.rb |  6 ------
 test/test_srs.rb      | 46 +++++++++++++++++++++++++++++++++++++++++++
 test/test_tseries.rb  | 13 ++++++++++--
 4 files changed, 58 insertions(+), 9 deletions(-)

diff --git a/lib/statsample/srs.rb b/lib/statsample/srs.rb
index 6dcc7d2..a372431 100644
--- a/lib/statsample/srs.rb
+++ b/lib/statsample/srs.rb
@@ -61,7 +61,7 @@ def proportion_confidence_interval_z(p, n_sample, n_population, margin=0.95)
       # Proportion confidence interval with x value
       # Uses estimated proportion, sample without replacement
       
-      def proportion_confidence_interval(p, sam,pop , x)
+      def proportion_confidence_interval(p, sam, pop, x)
           #f=sam.quo(pop)
           one_range=x * Math::sqrt((qf(sam, pop) * p * (1-p)).quo(sam-1)) + (1.quo(sam * 2.0))
           [p-one_range, p+one_range]
diff --git a/test/helpers_tests.rb b/test/helpers_tests.rb
index 82c6b2e..419b24f 100644
--- a/test/helpers_tests.rb
+++ b/test/helpers_tests.rb
@@ -14,10 +14,6 @@
 module MiniTest
   class Unit
     class TestCase
-      include Shoulda::InstanceMethods
-      extend Shoulda::ClassMethods
-      include Shoulda::Assertions
-      
       def self.should_with_gsl(name,&block)
         should(name) do
           if Statsample.has_gsl?
@@ -25,10 +21,8 @@ def self.should_with_gsl(name,&block)
           else
             skip("Requires GSL")
           end
-         
         end
       end
-      
     end
   end
 
diff --git a/test/test_srs.rb b/test/test_srs.rb
index 1d18cf9..a486423 100644
--- a/test/test_srs.rb
+++ b/test/test_srs.rb
@@ -6,4 +6,50 @@ def test_std_error
     assert_equal(108,Statsample::SRS.estimation_n(0.05,0.5,150,0.95).to_i)
     assert_in_delta(0.0289,Statsample::SRS.proportion_sd_kp_wor(0.5,100,150),0.001)
   end
+
+  def test_fpc
+    sample_size = 80
+    population_size = 10
+
+    assert_equal -7, Statsample::SRS.fpc_var(sample_size, population_size).to_i
+  end
+
+  def test_qf
+    sample_size = 80
+    population_size = 10
+
+    assert_equal -7, Statsample::SRS.qf(sample_size, population_size).to_i
+    #non sample fraction = 1 - sample fraction
+    assert_equal -7, 1 - sample_size.quo(population_size)
+  end
+
+  def test_proportion_confidence_interval_t
+    proportion = 59
+    n_sample, n_population = 15, 10
+    confidence_interval = Statsample::SRS.proportion_confidence_interval_t(proportion, n_sample, n_population).map(&:to_f)
+
+    assert_in_delta 35.2559, confidence_interval[0], 0.001
+    assert_in_delta 82.7440, confidence_interval[1], 0.001
+  end
+
+  def test_proportion_confidence_interval_z
+    #estimated proportion with normal distribution.
+    proportion = 59
+    n_sample, n_population = 15, 10
+    confidence_interval = Statsample::SRS.proportion_confidence_interval_z(proportion, n_sample, n_population).map(&:to_f)
+
+    assert_in_delta 37.2991, confidence_interval[0], 0.001
+    assert_in_delta 80.7008, confidence_interval[1], 0.001
+  end
+
+  def test_proportion_sd_kp_wor
+    #standard deviation without replacement strategy for known proportion
+    assert_in_delta 11.6979, Statsample::SRS.proportion_sd_kp_wor(40, -30, 20), 0.001
+  end
+
+  def test_proportion_sd_ep_wor
+    #standard deviation with replacement for estimated proportion
+    assert_in_delta 35.5527, Statsample::SRS.proportion_sd_ep_wr(80, -4), 0.001
+    assert_in_delta 38.9700, Statsample::SRS.proportion_sd_ep_wr(68, -2), 0.001
+  end
 end
diff --git a/test/test_tseries.rb b/test/test_tseries.rb
index 35f2cc9..9306ae0 100644
--- a/test/test_tseries.rb
+++ b/test/test_tseries.rb
@@ -24,8 +24,17 @@ def test_acf
   end
 
   def test_lag
-    assert_in_delta 16.66,    @xiu.lag[@xiu.size - 1], 0.001
-    assert_in_delta 16.36, @xiu.lag(2)[@xiu.size - 1], 0.001
+    #test of default lag
+    lag1 = @xiu.lag
+
+    assert_in_delta 16.66, lag1[lag1.size - 1], 0.001
+    assert_in_delta 16.36, lag1[lag1.size - 2], 0.001
+
+    #test with different lagging unit
+    lag2 = @xiu.lag(2)
+
+    assert_in_delta 16.36, lag2[lag2.size - 1], 0.001
+    assert_in_delta 16.56, lag2[lag2.size - 2], 0.001
   end
 
   def test_delta

From b95ca730c2f9715ed9def88c08d2d872b84f76ab Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Wed, 1 May 2013 21:17:27 +0530
Subject: [PATCH 04/22] Added tests for StratifiedSample with proper comments
 on their significance Edited lib/statsample/multiset.rb for proper
 indentation, spacing and ruby patterns.

---
 lib/statsample/multiset.rb | 324 +++++++++++++++++++------------------
 lib/statsample/srs.rb      |   2 +-
 test/test_stratified.rb    |  70 ++++++++
 3 files changed, 240 insertions(+), 156 deletions(-)

diff --git a/lib/statsample/multiset.rb b/lib/statsample/multiset.rb
index 88fe595..230fe3c 100644
--- a/lib/statsample/multiset.rb
+++ b/lib/statsample/multiset.rb
@@ -1,212 +1,225 @@
 module Statsample
   # Multiset joins multiple dataset with the same fields and vectors
-  # but with different number of cases. 
+  # but with different number of cases.
   # This is the base class for stratified and cluster sampling estimation
   class Multiset
     # Name of fields
     attr_reader :fields
     # Array with Statsample::Dataset
     attr_reader :datasets
+
     # To create a multiset
     # * Multiset.new(%w{f1 f2 f3}) # define only fields
     def initialize(fields)
-        @fields=fields
-        @datasets={}
+      @fields = fields
+      @datasets = {}
     end
-    def self.new_empty_vectors(fields,ds_names) 
-        ms=Multiset.new(fields)
-        ds_names.each{|d|
-            ms.add_dataset(d,Dataset.new(fields))
-        }
-        ms
+
+    def self.new_empty_vectors(fields, ds_names)
+      ms = Multiset.new(fields)
+      ds_names.each do |d|
+        ms.add_dataset(d, Dataset.new(fields))
+      end
+      ms
     end
+
     # Generate a new dataset as a union of partial dataset
     # If block given, this is applied to each dataset before union
     def union(&block)
-      union_field={}
-      types={}
-      names={}
-      labels={}
-      each do |k,ds|
+      union_field = {}
+      types = {}
+      names = {}
+      labels = {}
+      each do |k, ds|
         if block
-          ds=ds.dup
-          yield k,ds
+          ds = ds.dup
+          yield k, ds
         end
         @fields.each do |f|
-          union_field[f]||=Array.new
+          union_field[f] ||= Array.new
           union_field[f].concat(ds[f].data)
-          types[f]||=ds[f].type
-          names[f]||=ds[f].name
-          labels[f]||=ds[f].labels
+          types[f] ||= ds[f].type
+          names[f] ||= ds[f].name
+          labels[f] ||= ds[f].labels
         end
       end
-      
+
       @fields.each do |f|
-        union_field[f]=union_field[f].to_vector(types[f])
-        union_field[f].name=names[f]
-        union_field[f].labels=labels[f]
+        union_field[f] = union_field[f].to_vector(types[f])
+        union_field[f].name = names[f]
+        union_field[f].labels = labels[f]
       end
-      ds_union=union_field.to_dataset
-      ds_union.fields=@fields
+      ds_union = union_field.to_dataset
+      ds_union.fields = @fields
       ds_union
     end
+
     def datasets_names
-        @datasets.keys.sort
+      @datasets.keys.sort
     end
+
     def n_datasets
-        @datasets.size
+      @datasets.size
     end
-    def add_dataset(key,ds)
-      if(ds.fields!=@fields)
-       raise ArgumentError, "Dataset(#{ds.fields.to_s})must have the same fields of the Multiset(#{@fields})"
+
+    def add_dataset(key, ds)
+      if(ds.fields != @fields)
+        raise ArgumentError, "Dataset(#{ds.fields.to_s})must have the same fields of the Multiset(#{@fields})"
       else
-          @datasets[key]=ds
+        @datasets[key]=ds
       end
     end
+
     def sum_field(field)
-      @datasets.inject(0) {|a,da|
-        stratum_name=da[0]
-        vector=da[1][field]
-        val=yield stratum_name,vector
+      @datasets.inject(0) do |a,da|
+        stratum_name = da[0]
+        vector = da[1][field]
+        val = yield stratum_name,vector
         a+val
-      }
+      end
     end
+
     def collect_vector(field)
-      @datasets.collect {|k,v|
+      @datasets.collect {|k, v|
         yield k, v[field]
       }
     end
-    
+
     def each_vector(field)
-      @datasets.each {|k,v|
+      @datasets.each {|k, v|
         yield k, v[field]
       }
     end
+
     def[](i)
       @datasets[i]
     end
+
     def each(&block)
-      @datasets.each {|k,ds|
-        next if ds.cases==0
-        block.call(k,ds)
-      }
+      @datasets.each do |k, ds|
+        next if ds.cases == 0
+        block.call(k, ds)
+      end
     end
   end
+
   class StratifiedSample
     class << self
       # mean for an array of vectors
       def mean(*vectors)
-        n_total=0
-        means=vectors.inject(0){|a,v|
-          n_total+=v.size
-          a+v.sum
-        }
-        means.to_f/n_total
+        n_total = 0
+        means=vectors.inject(0) do |a,v|
+          n_total += v.size
+          a + v.sum
+        end
+        means.to_f / n_total
       end
-      
+
       def standard_error_ksd_wr(es)
-        n_total=0
-        sum=es.inject(0){|a,h|
-            n_total+=h['N']
-            a+((h['N']**2 * h['s']**2) / h['n'].to_f)
-        }
+        n_total = 0
+        sum=es.inject(0) do |a,h|
+          n_total += h['N']
+          a + ((h['N'] ** 2 * h['s'] ** 2) / h['n'].to_f)
+        end
         (1.to_f / n_total)*Math::sqrt(sum)
       end
-      
-      
+
+
       def variance_ksd_wr(es)
         standard_error_ksd_wr(es)**2
       end
       def calculate_n_total(es)
-        es.inject(0) {|a,h| a+h['N'] }
+        es.inject(0) {|a, h| a+h['N'] }
       end
       # Source : Cochran (1972)
-      
+
       def variance_ksd_wor(es)
-      n_total=calculate_n_total(es)
-      es.inject(0){|a,h|
-        val=((h['N'].to_f / n_total)**2) * (h['s']**2 / h['n'].to_f) * (1 - (h['n'].to_f / h['N']))
-        a+val
-      }
+        n_total = calculate_n_total(es)
+        es.inject(0) do |a, h|
+          val = ((h['N'].to_f / n_total)**2) * (h['s']**2 / h['n'].to_f) * (1 - (h['n'].to_f / h['N']))
+          a+val
+        end
       end
+
       def standard_error_ksd_wor(es)
         Math::sqrt(variance_ksd_wor(es))
       end
-      
-      
-      
+
       def variance_esd_wor(es)
-        n_total=calculate_n_total(es)
-        sum=es.inject(0){|a,h|
-          val=h['N']*(h['N']-h['n'])*(h['s']**2 / h['n'].to_f)
-          a+val
-        }
-        (1.0/(n_total**2))*sum
+        n_total = calculate_n_total(es)
+        sum=es.inject(0) do |a,h|
+          val = h['N'] * (h['N'] - h['n']) * (h['s']**2 / h['n'].to_f)
+          a + val
+        end
+        (1.0 / (n_total**2)) * sum
       end
-      
-      
+
       def standard_error_esd_wor(es)
         Math::sqrt(variance_ksd_wor(es))
       end
       # Based on http://stattrek.com/Lesson6/STRAnalysis.aspx
       def variance_esd_wr(es)
-        n_total=calculate_n_total(es)
-          sum=es.inject(0){|a,h|
-            val= ((h['s']**2 * h['N']**2) / h['n'].to_f)
-            a+val
-          }
-          (1.0/(n_total**2))*sum
+        n_total = calculate_n_total(es)
+        sum = es.inject(0) do |a,h|
+          val = ((h['s']**2 * h['N']**2) / h['n'].to_f)
+          a + val
+        end
+        (1.0 / (n_total**2)) * sum
       end
+
       def standard_error_esd_wr(es)
         Math::sqrt(variance_esd_wr(es))
       end
-      
+
       def proportion_variance_ksd_wor(es)
         n_total=calculate_n_total(es)
-          es.inject(0){|a,h|
-            val= (((h['N'].to_f / n_total)**2 * h['p']*(1-h['p'])) / (h['n'])) * (1- (h['n'].to_f / h['N']))
-            a+val
-          }
+        es.inject(0) do |a,h|
+          val = (((h['N'].to_f / n_total)**2 * h['p']*(1-h['p'])) / (h['n'])) * (1- (h['n'].to_f / h['N']))
+          a + val
+        end
       end
+
       def proportion_sd_ksd_wor(es)
-          Math::sqrt(proportion_variance_ksd_wor(es))
+        Math::sqrt(proportion_variance_ksd_wor(es))
       end
-      
-      
+
       def proportion_sd_ksd_wr(es)
-        n_total=calculate_n_total(es)
-        sum=es.inject(0){|a,h|
-          val= (h['N']**2 * h['p']*(1-h['p'])) / h['n'].to_f
-          a+val
-        }
-        Math::sqrt(sum) * (1.0/n_total)
+        n_total = calculate_n_total(es)
+        sum = es.inject(0) do |a,h|
+          val = (h['N']**2 * h['p'] * (1 - h['p'])) / h['n'].to_f
+          a + val
+        end
+        Math::sqrt(sum) * (1.0 / n_total)
       end
+
       def proportion_variance_ksd_wr(es)
-          proportion_variance_ksd_wor(es)**2
+        proportion_variance_ksd_wor(es)**2
       end
-      
+
       def proportion_variance_esd_wor(es)
         n_total=n_total=calculate_n_total(es)
-        
-        sum=es.inject(0){|a,h|
-          a=(h['N']**2 * (h['N']-h['n']) * h['p']*(1.0-h['p'])) / ((h['n']-1)*(h['N']-1))
-          a+val
-        }
+
+        sum = es.inject(0) do |a,h|
+          a = (h['N']**2 * (h['N'] - h['n']) * h['p'] * (1.0-h['p'])) / ((h['n'] - 1) * (h['N'] - 1))
+          a + val
+        end
         Math::sqrt(sum) * (1.0/n_total**2)
       end
+
       def proportion_sd_esd_wor(es)
-          Math::sqrt(proportion_variance_ksd_wor(es))
+        Math::sqrt(proportion_variance_ksd_wor(es))
       end
     end
-    
-    def initialize(ms,strata_sizes)
-      raise TypeError,"ms should be a Multiset" unless ms.is_a? Statsample::Multiset
-      @ms=ms
-      raise ArgumentError,"You should put a strata size for each dataset" if strata_sizes.keys.sort!=ms.datasets_names
-      @strata_sizes=strata_sizes
-      @population_size=@strata_sizes.inject(0) {|a,x| a+x[1]}
-      @strata_number=@ms.n_datasets
-      @sample_size=@ms.datasets.inject(0) {|a,x| a+x[1].cases}
+    #class singleton methods end above.
+
+    def initialize(ms, strata_sizes)
+      raise TypeError, "ms should be a Multiset" unless ms.is_a? Statsample::Multiset
+      @ms = ms
+      raise ArgumentError, "You should put a strata size for each dataset" if strata_sizes.keys.sort!=ms.datasets_names
+      @strata_sizes = strata_sizes
+      @population_size = @strata_sizes.inject(0) { |a, x| a + x[1] }
+      @strata_number = @ms.n_datasets
+      @sample_size = @ms.datasets.inject(0) { |a,x| a + x[1].cases }
     end
     # Number of strata
     def strata_number
@@ -226,14 +239,14 @@ def stratum_size(h)
       @strata_sizes[h]
     end
     def vectors_by_field(field)
-      @ms.datasets.collect{|k,ds|
+      @ms.datasets.collect{|k, ds|
         ds[field]
       }
     end
     # Population proportion based on strata
-    def proportion(field, v=1)
-      @ms.sum_field(field) {|s_name,vector|
-      stratum_ponderation(s_name)*vector.proportion(v)
+    def proportion(field, v = 1)
+      @ms.sum_field(field) {|s_name, vector|
+        stratum_ponderation(s_name) * vector.proportion(v)
       }
     end
     # Stratum ponderation.
@@ -242,69 +255,70 @@ def stratum_ponderation(h)
       @strata_sizes[h].to_f / @population_size
     end
     alias_method :wh, :stratum_ponderation
-    
+
     # Population mean based on strata
     def mean(field)
-      @ms.sum_field(field) {|s_name,vector|
-      stratum_ponderation(s_name)*vector.mean
+      @ms.sum_field(field) {|s_name, vector|
+        stratum_ponderation(s_name) * vector.mean
       }
     end
     # Standard error with estimated population variance and without replacement.
     # Source: Cochran (1972)
     def standard_error_wor(field)
-      es=@ms.collect_vector(field) {|s_n, vector|
-        {'N'=>@strata_sizes[s_n],'n'=>vector.size, 's'=>vector.sds}
+      es = @ms.collect_vector(field) {|s_n, vector|
+        {'N' => @strata_sizes[s_n], 'n' => vector.size, 's' => vector.sds}
       }
-      
+
       StratifiedSample.standard_error_esd_wor(es)
     end
-    
+
     # Standard error with estimated population variance and without replacement.
     # Source: http://stattrek.com/Lesson6/STRAnalysis.aspx
-    
+
     def standard_error_wor_2(field)
-      sum=@ms.sum_field(field) {|s_name,vector|
-        s_size=@strata_sizes[s_name]
-      (s_size**2 * (1-(vector.size.to_f / s_size)) * vector.variance_sample / vector.size.to_f)
-      }
-      (1/@population_size.to_f)*Math::sqrt(sum)
+      sum = @ms.sum_field(field)  do |s_name, vector|
+        s_size = @strata_sizes[s_name]
+        (s_size**2 * (1 - (vector.size.to_f / s_size)) * vector.variance_sample / vector.size.to_f)
+      end
+      (1 / @population_size.to_f) * Math::sqrt(sum)
     end
-    
+
     def standard_error_wr(field)
-      es=@ms.collect_vector(field) {|s_n, vector|
-        {'N'=>@strata_sizes[s_n],'n'=>vector.size, 's'=>vector.sds}
+      es = @ms.collect_vector(field) {|s_n, vector|
+        {'N' => @strata_sizes[s_n], 'n' => vector.size, 's' => vector.sds}
       }
-      
+
       StratifiedSample.standard_error_esd_wr(es)
     end
-    def proportion_sd_esd_wor(field,v=1)
-      es=@ms.collect_vector(field) {|s_n, vector|
-        {'N'=>@strata_sizes[s_n],'n'=>vector.size, 'p'=>vector.proportion(v)}
+
+    def proportion_sd_esd_wor(field, v = 1)
+      es = @ms.collect_vector(field) {|s_n, vector|
+        {'N' => @strata_sizes[s_n], 'n' => vector.size, 'p' => vector.proportion(v)}
       }
-      
+
       StratifiedSample.proportion_sd_esd_wor(es)
     end
-    
-    def proportion_standard_error(field,v=1)
-      prop=proportion(field,v)
-      sum=@ms.sum_field(field) {|s_name,vector|
-        nh=vector.size
-        s_size=@strata_sizes[s_name]
-        (s_size**2 * (1-(nh/s_size)) * prop * (1-prop) / (nh -1 ))
+
+    def proportion_standard_error(field, v = 1)
+      prop=proportion(field, v)
+      sum = @ms.sum_field(field) {|s_name, vector|
+        nh = vector.size
+        s_size = @strata_sizes[s_name]
+        (s_size**2 * (1 - (nh / s_size)) * prop * (1 - prop) / (nh -1))
       }
       (1.quo(@population_size)) * Math::sqrt(sum)
     end
     # Cochran(1971), p. 150 
-    def variance_pst(field,v=1)
-      sum=@ms.datasets.inject(0) {|a,da|
-        stratum_name=da[0]
-        ds=da[1]
-        nh=ds.cases.to_f
-        s_size=@strata_sizes[stratum_name]
-        prop=ds[field].proportion(v)
-        a + (((s_size**2 * (s_size-nh)) / (s_size-1))*(prop*(1-prop) / (nh-1)))
-      }
-      (1/@population_size.to_f ** 2)*sum
+    def variance_pst(field, v = 1)
+      sum=@ms.datasets.inject(0) do |a, da|
+        stratum_name = da[0]
+        ds = da[1]
+        nh = ds.cases.to_f
+        s_size = @strata_sizes[stratum_name]
+        prop = ds[field].proportion(v)
+        a + (((s_size**2 * (s_size - nh)) / (s_size - 1)) * (prop * (1 - prop) / (nh - 1)))
+      end
+      (1 / @population_size.to_f ** 2) * sum
     end
   end
 end
diff --git a/lib/statsample/srs.rb b/lib/statsample/srs.rb
index a372431..752cba5 100644
--- a/lib/statsample/srs.rb
+++ b/lib/statsample/srs.rb
@@ -112,7 +112,7 @@ def proportion_total_sd_ep_wor(prop, sam, pop)
       
       ########################
       #
-      # :SECTION:  Mean stimation
+      # :SECTION:  Mean estimation
       #
       ########################
 
diff --git a/test/test_stratified.rb b/test/test_stratified.rb
index eb8ef45..7f0ef8d 100644
--- a/test/test_stratified.rb
+++ b/test/test_stratified.rb
@@ -3,6 +3,11 @@
 class StatsampleStratifiedTestCase < MiniTest::Unit::TestCase
 
   def initialize(*args)
+    @es = [
+      {"N" => 5, "s" => 3, "n" => 2},
+      {"N" => 10, "s" => 5, "n" => 2},
+      {"N" => 30, "s" => 32, "n" => 12}
+    ]
     super
   end
   def test_mean
@@ -14,4 +19,69 @@ def test_mean
     popv=pop.to_vector(:scale)
     assert_equal(popv.mean,Statsample::StratifiedSample.mean(av,bv))
   end
+
+  def test_standard_error_ksd_wr
+    # Standard error for known sample design with replacement
+    es = [
+      {"N" => 5, "s" => 3, "n" => 2},
+      {"N" => 10, "s" => 5, "n" => 2},
+      {"N" => 30, "s" => 32, "n" => 12}
+    ]
+    assert_in_delta 6.21279, Statsample::StratifiedSample.standard_error_ksd_wr(es), 0.001
+  end
+
+  def test_variance_ksd_wr
+    # variance of known sample desgin with replacement
+    result = Statsample::StratifiedSample.variance_ksd_wr(@es)
+
+    assert_in_delta 38.5987, result, 0.001
+    assert_in_delta Statsample::StratifiedSample.standard_error_ksd_wr(@es), Math::sqrt(result), 0.001
+  end
+
+  def test_calculate_n_total
+    # calculates total 'n' from the hash
+
+    assert_equal 45, Statsample::StratifiedSample.calculate_n_total(@es)
+    assert_equal 15, Statsample::StratifiedSample.calculate_n_total(@es.first(2))
+    assert_equal 40, Statsample::StratifiedSample.calculate_n_total(@es.last(2))
+  end
+
+  def test_variance_ksd_wor
+    # variance of known sample desgin without replacement
+
+    assert_in_delta 23.2827, Statsample::StratifiedSample.variance_ksd_wor(@es), 0.001
+    assert_in_delta 4.7444, Statsample::StratifiedSample.variance_ksd_wor(@es.first(2)), 0.001
+  end
+
+  def test_standard_error_ksd_wor
+    # standard error for known sample design without replacement
+    #
+    assert_in_delta 4.8252, Statsample::StratifiedSample.standard_error_ksd_wor(@es), 0.001
+
+    #for fragment sample design
+    assert_in_delta 5.4244, Statsample::StratifiedSample.standard_error_ksd_wor(@es.last 2), 0.001
+  end
+
+  def test_variance_esd_wor
+    # variance of estimated sample design without replacement strategry
+    assert_in_delta 23.2827, Statsample::StratifiedSample.variance_esd_wor(@es), 0.001
+  end
+
+  def test_standard_error_esd_wor
+    # Standard error for estimated sample design without replacement
+    assert_in_delta 4.82521, Statsample::StratifiedSample.standard_error_esd_wor(@es), 0.001
+  end
+
+  def test_standard_error_esd_wr
+    res = Statsample::StratifiedSample.standard_error_esd_wr(@es)
+    assert_in_delta 6.21279, res, 0.001
+    assert_in_delta Statsample::StratifiedSample.variance_esd_wr(@es), res ** 2, 0.001
+  end
+
+  def test_variance_esd_wr
+    # variance for estimated sample dsign with replacement
+
+    assert_in_delta 38.5987, Statsample::StratifiedSample.variance_esd_wr(@es), 0.001
+  end
+
 end

From fcfa01ec367ebd66e358d865f5af2ed40b2d87b5 Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Sat, 22 Jun 2013 09:59:54 +0530
Subject: [PATCH 05/22] corrected doc typo

---
 lib/statsample/analysis.rb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/statsample/analysis.rb b/lib/statsample/analysis.rb
index bf0e7b1..b0036a4 100644
--- a/lib/statsample/analysis.rb
+++ b/lib/statsample/analysis.rb
@@ -3,7 +3,7 @@
 
 module Statsample
   # DSL to create analysis without hazzle. 
-  # * Shortcuts methods to avoid use complete namescapes, many based on R  
+  # * Shortcuts methods to avoid use complete namespaces, many based on R  
   # * Attach/detach vectors to workspace, like R
   # == Example
   #  an1=Statsample::Analysis.store(:first) do

From 47ff467f9866798e0917cd75ccb21ad046c3b0d3 Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Sat, 22 Jun 2013 10:27:31 +0530
Subject: [PATCH 06/22] Regression tests

---
 test/test_regression2.rb | 235 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 235 insertions(+)
 create mode 100644 test/test_regression2.rb

diff --git a/test/test_regression2.rb b/test/test_regression2.rb
new file mode 100644
index 0000000..2c6b9ad
--- /dev/null
+++ b/test/test_regression2.rb
@@ -0,0 +1,235 @@
+require(File.expand_path(File.dirname(__FILE__)+'/helpers_tests.rb'))
+
+class StatsampleRegressionTestCase < MiniTest::Unit::TestCase
+  include Statsample::Shorthand
+  #  context "Example with missing data" do
+
+  def setup
+    @x=[0.285714285714286, 0.114285714285714, 0.314285714285714, 0.2, 0.2, 0.228571428571429, 0.2, 0.4, 0.714285714285714, 0.285714285714286, 0.285714285714286, 0.228571428571429, 0.485714285714286, 0.457142857142857, 0.257142857142857, 0.228571428571429, 0.285714285714286, 0.285714285714286, 0.285714285714286, 0.142857142857143, 0.285714285714286, 0.514285714285714, 0.485714285714286, 0.228571428571429, 0.285714285714286, 0.342857142857143, 0.285714285714286, 0.0857142857142857].to_scale
+
+    @y=[nil, 0.233333333333333, nil, 0.266666666666667, 0.366666666666667, nil, 0.333333333333333, 0.3, 0.666666666666667, 0.0333333333333333, 0.333333333333333, nil, nil, 0.533333333333333, 0.433333333333333, 0.4, 0.4, 0.5, 0.4, 0.266666666666667, 0.166666666666667, 0.666666666666667, 0.433333333333333, 0.166666666666667, nil, 0.4, 0.366666666666667, nil].to_scale      
+    @ds={'x'=>@x,'y'=>@y}.to_dataset
+    @lr=Statsample::Regression::Multiple::RubyEngine.new(@ds,'y')
+  end
+
+  def test_correct_values #should "have correct values" do
+    assert_in_delta(0.455,@lr.r2,0.001)
+    assert_in_delta(0.427,@lr.r2_adjusted, 0.001)
+    assert_in_delta(0.1165,@lr.se_estimate,0.001)
+    assert_in_delta(15.925,@lr.f,0.0001)
+    assert_in_delta(0.675, @lr.standarized_coeffs['x'],0.001)
+    assert_in_delta(0.778, @lr.coeffs['x'],0.001, "coeff x")
+    assert_in_delta(0.132, @lr.constant,0.001,"constant")
+    assert_in_delta(0.195, @lr.coeffs_se['x'],0.001,"coeff x se")
+    assert_in_delta(0.064, @lr.constant_se,0.001,"constant se")
+  end
+  #end
+  def test_error# "return an error if data is linearly dependent" do
+    samples=100
+
+    a,b=rand,rand
+
+    x1=samples.times.map { rand}.to_scale
+    x2=samples.times.map {rand}.to_scale
+    x3=samples.times.map {|i| x1[i]*(1+a)+x2[i]*(1+b)}.to_scale
+    y=samples.times.map {|i| x1[i]+x2[i]+x3[i]+rand}.to_scale
+
+    ds={'x1'=>x1,'x2'=>x2,'x3'=>x3,'y'=>y}.to_dataset
+
+    assert_raise(Statsample::Regression::LinearDependency) {
+      Statsample::Regression::Multiple::RubyEngine.new(ds,'y')
+    }
+  end
+  def test_parameters
+    @x=[13,20,10,33,15].to_vector(:scale)
+    @y=[23,18,35,10,27].to_vector(:scale)
+    reg=Statsample::Regression::Simple.new_from_vectors(@x,@y)
+    _test_simple_regression(reg)
+    ds={'x'=>@x,'y'=>@y}.to_dataset
+    reg=Statsample::Regression::Simple.new_from_dataset(ds,'x','y')
+    _test_simple_regression(reg)
+    reg=Statsample::Regression.simple(@x,@y)
+    _test_simple_regression(reg)
+
+  end
+  def _test_simple_regression(reg)
+
+    assert_in_delta(40.009, reg.a,0.001)
+    assert_in_delta(-0.957, reg.b,0.001)
+    assert_in_delta(4.248, reg.standard_error,0.002)
+    assert(reg.summary)
+  end
+
+  def test_summaries
+    a=10.times.map{rand(100)}.to_scale
+    b=10.times.map{rand(100)}.to_scale
+    y=10.times.map{rand(100)}.to_scale
+    ds={'a'=>a,'b'=>b,'y'=>y}.to_dataset
+    lr=Statsample::Regression::Multiple::RubyEngine.new(ds,'y')
+    assert(lr.summary.size>0)
+  end
+  def test_multiple_dependent
+    complete=Matrix[
+                    [1,0.53,0.62,0.19,-0.09,0.08,0.02,-0.12,0.08],
+                    [0.53,1,0.61,0.23,0.1,0.18,0.02,-0.1,0.15],
+                    [0.62,0.61,1,0.03,0.1,0.12,0.03,-0.06,0.12],
+                    [0.19,0.23,0.03,1,-0.02,0.02,0,-0.02,-0.02],
+                    [-0.09,0.1,0.1,-0.02,1,0.05,0.06,0.18,0.02],
+                    [0.08,0.18,0.12,0.02,0.05,1,0.22,-0.07,0.36],
+                    [0.02,0.02,0.03,0,0.06,0.22,1,-0.01,-0.05],
+                    [-0.12,-0.1,-0.06,-0.02,0.18,-0.07,-0.01,1,-0.03],
+                    [0.08,0.15,0.12,-0.02,0.02,0.36,-0.05,-0.03,1]]
+    complete.extend Statsample::CovariateMatrix
+    complete.fields=%w{adhd cd odd sex age monly mwork mage poverty}
+
+    lr=Statsample::Regression::Multiple::MultipleDependent.new(complete, %w{adhd cd odd})
+
+
+    assert_in_delta(0.197, lr.r2yx,0.001)
+    assert_in_delta(0.197, lr.r2yx_covariance,0.001)
+    assert_in_delta(0.07, lr.p2yx,0.001)
+
+  end
+
+  def test_multiple_regression_pairwise_2
+    @a=[1,3,2,4,3,5,4,6,5,7,3,nil,3,nil,3].to_vector(:scale)
+    @b=[3,3,4,4,5,5,6,6,4,4,2,2,nil,6,2].to_vector(:scale)
+    @c=[11,22,30,40,50,65,78,79,99,100,nil,3,7,nil,7].to_vector(:scale)
+    @y=[3,4,5,6,7,8,9,10,20,30,30,40,nil,50,nil].to_vector(:scale)
+    ds={'a'=>@a,'b'=>@b,'c'=>@c,'y'=>@y}.to_dataset
+    lr=Statsample::Regression::Multiple::RubyEngine.new(ds,'y')
+    assert_in_delta(2407.436,lr.sst,0.001)
+    assert_in_delta(0.752,lr.r,0.001, "pairwise r")
+    assert_in_delta(0.565,lr.r2,0.001)
+    assert_in_delta(1361.130,lr.ssr,0.001)
+    assert_in_delta(1046.306,lr.sse,0.001)
+    assert_in_delta(3.035,lr.f,0.001)
+  end
+
+
+  def test_multiple_regression_gsl
+    if Statsample.has_gsl?
+      @a=[1,3,2,4,3,5,4,6,5,7].to_vector(:scale)
+      @b=[3,3,4,4,5,5,6,6,4,4].to_vector(:scale)
+      @c=[11,22,30,40,50,65,78,79,99,100].to_vector(:scale)
+      @y=[3,4,5,6,7,8,9,10,20,30].to_vector(:scale)
+      ds={'a'=>@a,'b'=>@b,'c'=>@c,'y'=>@y}.to_dataset
+      lr=Statsample::Regression::Multiple::GslEngine.new(ds,'y')
+      assert(lr.summary.size>0)
+      model_test(lr,'gsl')
+      predicted=[1.7857, 6.0989, 3.2433, 7.2908, 4.9667, 10.3428, 8.8158, 10.4717, 23.6639, 25.3198]
+      c_predicted=lr.predicted
+      predicted.each_index{|i|
+        assert_in_delta(predicted[i],c_predicted[i],0.001)
+      }
+      residuals=[1.2142, -2.0989, 1.7566, -1.29085, 2.033, -2.3428, 0.18414, -0.47177, -3.66395, 4.6801]
+      c_residuals=lr.residuals
+      residuals.each_index{|i|
+        assert_in_delta(residuals[i],c_residuals[i],0.001)
+      }
+    else
+      skip "Regression::Multiple::GslEngine not tested (no Gsl)"
+    end
+  end
+
+
+
+  def model_test_matrix(lr,name='undefined')
+
+    stan_coeffs={'a'=>0.151,'b'=>-0.547,'c'=>0.997}
+    unstan_coeffs={'a'=>0.695, 'b'=>-4.286, 'c'=>0.266}
+
+    unstan_coeffs.each_key{|k|
+      assert_in_delta(unstan_coeffs[k], lr.coeffs[k],0.001,"b coeffs - #{name}")
+    }
+
+    stan_coeffs.each_key{|k|
+      assert_in_delta(stan_coeffs[k], lr.standarized_coeffs[k],0.001, "beta coeffs - #{name}")
+    }
+
+    assert_in_delta(11.027,lr.constant,0.001)
+
+    assert_in_delta(0.955,lr.r,0.001)
+    assert_in_delta(0.913,lr.r2,0.001)
+
+    assert_in_delta(20.908, lr.f,0.001)
+    assert_in_delta(0.001, lr.probability, 0.001)
+    assert_in_delta(0.226,lr.tolerance("a"),0.001)
+
+    coeffs_se={"a"=>1.171,"b"=>1.129,"c"=>0.072}
+
+
+
+    ccoeffs_se=lr.coeffs_se
+    coeffs_se.each_key{|k|
+      assert_in_delta(coeffs_se[k],ccoeffs_se[k],0.001)
+    }
+    coeffs_t={"a"=>0.594,"b"=>-3.796,"c"=>3.703}
+    ccoeffs_t=lr.coeffs_t
+    coeffs_t.each_key{|k|
+      assert_in_delta(coeffs_t[k], ccoeffs_t[k],0.001)
+    }
+
+    assert_in_delta(639.6,lr.sst,0.001)
+    assert_in_delta(583.76,lr.ssr,0.001)
+    assert_in_delta(55.840,lr.sse,0.001)
+    assert(lr.summary.size>0, "#{name} without summary")
+  end
+  def model_test(lr,name='undefined')
+    model_test_matrix(lr,name)
+    assert_in_delta(4.559, lr.constant_se,0.001)
+    assert_in_delta(2.419, lr.constant_t,0.001)
+
+    assert_in_delta(1.785,lr.process([1,3,11]),0.001)
+  end
+  def test_regression_matrix
+    @a=[1,3,2,4,3,5,4,6,5,7].to_vector(:scale)
+    @b=[3,3,4,4,5,5,6,6,4,4].to_vector(:scale)
+    @c=[11,22,30,40,50,65,78,79,99,100].to_vector(:scale)
+    @y=[3,4,5,6,7,8,9,10,20,30].to_vector(:scale)
+    ds={'a'=>@a,'b'=>@b,'c'=>@c,'y'=>@y}.to_dataset
+    cor=Statsample::Bivariate.correlation_matrix(ds)
+
+    lr=Statsample::Regression::Multiple::MatrixEngine.new(cor,'y', :y_mean=>@y.mean, :x_mean=>{'a'=>ds['a'].mean, 'b'=>ds['b'].mean, 'c'=>ds['c'].mean}, :cases=>@a.size, :y_sd=>@y.sd , :x_sd=>{'a' => @a.sd, 'b' => @b.sd, 'c' => @c.sd})
+    assert_nil(lr.constant_se)
+    assert_nil(lr.constant_t)
+    model_test_matrix(lr, "correlation matrix")
+
+    covariance=Statsample::Bivariate.covariance_matrix(ds)
+    lr=Statsample::Regression::Multiple::MatrixEngine.new(covariance,'y', :y_mean=>@y.mean, :x_mean=>{'a'=>ds['a'].mean, 'b'=>ds['b'].mean, 'c'=>ds['c'].mean}, :cases=>@a.size)
+    assert(lr.summary.size>0)
+
+    model_test(lr , "covariance matrix")
+  end
+  def test_regression_rubyengine
+    @a=[nil,1,3,2,4,3,5,4,6,5,7].to_vector(:scale)
+    @b=[nil,3,3,4,4,5,5,6,6,4,4].to_vector(:scale)
+    @c=[nil,11,22,30,40,50,65,78,79,99,100].to_vector(:scale)
+    @y=[nil,3,4,5,6,7,8,9,10,20,30].to_vector(:scale)
+    ds={'a'=>@a,'b'=>@b,'c'=>@c,'y'=>@y}.to_dataset
+    lr=Statsample::Regression::Multiple::RubyEngine.new(ds,'y')
+    assert_equal(11, lr.total_cases)
+    assert_equal(10, lr.valid_cases)
+    model_test(lr, 'rubyengine with missing data')
+
+    predicted=[nil,1.7857, 6.0989, 3.2433, 7.2908, 4.9667, 10.3428, 8.8158, 10.4717, 23.6639, 25.3198]
+    c_predicted = lr.predicted
+    predicted.each_index do |i|
+      if c_predicted[i].nil?
+        assert(predicted[i].nil?, "Actual #{i} is nil, but expected #{predicted[i]}")
+      else
+        assert_in_delta(predicted[i], c_predicted[i], 0.001)
+      end
+    end
+    residuals=[nil,1.2142, -2.0989, 1.7566, -1.29085, 2.033, -2.3428, 0.18414, -0.47177, -3.66395, 4.6801]
+    c_residuals=lr.residuals
+    residuals.each_index do |i|
+      if c_residuals[i].nil?
+        assert(residuals[i].nil?)
+      else
+        assert_in_delta(residuals[i],c_residuals[i],0.001)
+      end
+    end
+
+  end
+end

From 734f3be565090b7168b286fec90a48c07089fdba Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Sun, 23 Jun 2013 23:24:46 +0530
Subject: [PATCH 07/22] Repaired F tests, removed shoulda references

---
 test/test_regression2.rb |  4 +--
 test/test_test_f.rb      | 60 +++++++++++++++++++++-------------------
 2 files changed, 33 insertions(+), 31 deletions(-)

diff --git a/test/test_regression2.rb b/test/test_regression2.rb
index 2c6b9ad..099c340 100644
--- a/test/test_regression2.rb
+++ b/test/test_regression2.rb
@@ -12,7 +12,7 @@ def setup
     @lr=Statsample::Regression::Multiple::RubyEngine.new(@ds,'y')
   end
 
-  def test_correct_values #should "have correct values" do
+  def test_correct_values
     assert_in_delta(0.455,@lr.r2,0.001)
     assert_in_delta(0.427,@lr.r2_adjusted, 0.001)
     assert_in_delta(0.1165,@lr.se_estimate,0.001)
@@ -24,7 +24,7 @@ def test_correct_values #should "have correct values" do
     assert_in_delta(0.064, @lr.constant_se,0.001,"constant se")
   end
   #end
-  def test_error# "return an error if data is linearly dependent" do
+  def test_error
     samples=100
 
     a,b=rand,rand
diff --git a/test/test_test_f.rb b/test/test_test_f.rb
index b7cc4a8..9145d7a 100644
--- a/test/test_test_f.rb
+++ b/test/test_test_f.rb
@@ -1,33 +1,35 @@
 require(File.expand_path(File.dirname(__FILE__)+'/helpers_tests.rb'))
 class StatsampleTestFTestCase < MiniTest::Unit::TestCase
-  context(Statsample::Test::F) do 
-    setup do
-      @ssb=84
-      @ssw=68
-      @df_num=2
-      @df_den=15
-      @f=Statsample::Test::F.new(@ssb.quo(@df_num),@ssw.quo(@df_den), @df_num, @df_den)
-    end
-    should "have #f equal to msb/msw" do
-      assert_equal((@ssb.quo(@df_num)).quo(@ssw.quo(@df_den)), @f.f)
-    end
-    should "have df total equal to df_num+df_den" do
-      assert_equal(@df_num + @df_den, @f.df_total)
-    end
-    should "have probability near 0.002" do 
-      assert_in_delta(0.002, @f.probability, 0.0005)
-    end
-    should "be coerced into float" do
-      assert_equal(@f.to_f, @f.f)
-    end
-    
-    context("method summary") do
-      setup do
-        @summary=@f.summary
-      end
-      should "have size > 0" do
-        assert(@summary.size>0)
-      end
-    end
+  def setup
+    @ssb=84
+    @ssw=68
+    @df_num=2
+    @df_den=15
+    @f=Statsample::Test::F.new(@ssb.quo(@df_num),@ssw.quo(@df_den), @df_num, @df_den)
+    @summary=@f.summary
+  end
+
+  def test_f_value
+    #should have #f equal to msb/msw
+    assert_equal((@ssb.quo(@df_num)).quo(@ssw.quo(@df_den)), @f.f)
+  end
+
+  def test_df_with_num_and_den
+    # df total should be equal to df_num+df_den
+    assert_equal(@df_num + @df_den, @f.df_total)
+  end
+
+  def test_probability
+    #should have probability near 0.002
+    assert_in_delta(0.002, @f.probability, 0.0005)
+  end
+
+  def test_float_coercion
+    #should be coerced into float
+    assert_equal(@f.to_f, @f.f)
+  end
+
+  def test_summary_size
+    assert(@summary.size>0)
   end
 end

From acc618ef5fd26577e65484024d4086ede4f226d6 Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Sun, 23 Jun 2013 23:25:38 +0530
Subject: [PATCH 08/22] Prepared gemspec with updated version of dependencies
 for proper build and devs. To build: gem build statsample.gemspec To install:
 gem install ./statsample*.gem

---
 statsample.gemspec | 117 ++++++++++-----------------------------------
 1 file changed, 24 insertions(+), 93 deletions(-)

diff --git a/statsample.gemspec b/statsample.gemspec
index 205f0e2..92bed6b 100644
--- a/statsample.gemspec
+++ b/statsample.gemspec
@@ -1,98 +1,29 @@
-# -*- encoding: utf-8 -*-
-
 Gem::Specification.new do |s|
-  s.name = "statsample"
-  s.version = "1.1.0.20110912141756"
-
-  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
-  s.authors = ["Claudio Bustos"]
-  s.date = "2013-19-04"
-  s.description = "A suite for basic and advanced statistics on Ruby. Tested on Ruby 1.8.7, 1.9.1, 1.9.2, 1.9.3 and JRuby 1.4 (Ruby 1.8.7 compatible).\n\nInclude:\n* Descriptive statistics: frequencies, median, mean, standard error, skew, kurtosis (and many others).\n* Imports and exports datasets from and to Excel, CSV and plain text files.\n* Correlations: Pearson's r, Spearman's rank correlation (rho), point biserial, tau a, tau b and  gamma.  Tetrachoric and Polychoric correlation provides by +statsample-bivariate-extension+ gem.\n* Intra-class correlation\n* Anova: generic and vector-based One-way ANOVA and Two-way ANOVA, with contrasts for One-way ANOVA.\n* Tests: F, T, Levene, U-Mannwhitney.\n* Regression: Simple, Multiple (OLS), Probit  and Logit\n* Factorial Analysis: Extraction (PCA and Principal Axis), Rotation (Varimax, Equimax, Quartimax) and Parallel Analysis and Velicer's MAP test, for estimation of number of factors.\n* Reliability analysis for simple scale and a DSL to easily analyze multiple scales using factor analysis and correlations, if you want it.\n* Dominance Analysis, with multivariate dependent and bootstrap (Azen & Budescu)\n* Sample calculation related formulas\n* Structural Equation Modeling (SEM), using R libraries +sem+ and +OpenMx+\n* Creates reports on text, html and rtf, using ReportBuilder gem\n* Graphics: Histogram, Boxplot and Scatterplot"
-  s.email = ["clbustos@gmail.com"]
-  s.executables = ["statsample"]
-  s.extra_rdoc_files = ["History.txt", "LICENSE.txt", "Manifest.txt", "README.txt", "references.txt"]
-  s.files = ["History.txt", "LICENSE.txt", "Manifest.txt", "README.txt", "Rakefile", "benchmarks/correlation_matrix_15_variables.rb", "benchmarks/correlation_matrix_5_variables.rb", "benchmarks/correlation_matrix_methods/correlation_matrix.ds", "benchmarks/correlation_matrix_methods/correlation_matrix.html", "benchmarks/correlation_matrix_methods/correlation_matrix.rb", "benchmarks/correlation_matrix_methods/correlation_matrix.xls", "benchmarks/correlation_matrix_methods/correlation_matrix_gsl_ruby.ods", "benchmarks/correlation_matrix_methods/correlation_matrix_with_graphics.ods", "benchmarks/correlation_matrix_methods/results.ds", "benchmarks/factor_map.rb", "benchmarks/helpers_benchmark.rb", "bin/statsample", "data/locale/es/LC_MESSAGES/statsample.mo", "doc_latex/manual/equations.tex", "examples/boxplot.rb", "examples/correlation_matrix.rb", "examples/dataset.rb", "examples/dominance_analysis.rb", "examples/dominance_analysis_bootstrap.rb", "examples/histogram.rb", "examples/icc.rb", "examples/levene.rb", "examples/multiple_regression.rb", "examples/multivariate_correlation.rb", "examples/parallel_analysis.rb", "examples/polychoric.rb", "examples/principal_axis.rb", "examples/reliability.rb", "examples/scatterplot.rb", "examples/t_test.rb", "examples/tetrachoric.rb", "examples/u_test.rb", "examples/vector.rb", "examples/velicer_map_test.rb", "grab_references.rb", "lib/spss.rb", "lib/statsample.rb", "lib/statsample/analysis.rb", "lib/statsample/anova.rb", "lib/statsample/anova/contrast.rb", "lib/statsample/anova/oneway.rb", "lib/statsample/anova/twoway.rb", "lib/statsample/bivariate.rb", "lib/statsample/bivariate/pearson.rb", "lib/statsample/codification.rb", "lib/statsample/converter/csv.rb", "lib/statsample/converter/spss.rb", "lib/statsample/converters.rb", "lib/statsample/crosstab.rb", "lib/statsample/dataset.rb", "lib/statsample/dominanceanalysis.rb", "lib/statsample/dominanceanalysis/bootstrap.rb", "lib/statsample/factor.rb", "lib/statsample/factor/map.rb", "lib/statsample/factor/parallelanalysis.rb", "lib/statsample/factor/pca.rb", "lib/statsample/factor/principalaxis.rb", "lib/statsample/factor/rotation.rb", "lib/statsample/graph.rb", "lib/statsample/graph/boxplot.rb", "lib/statsample/graph/histogram.rb", "lib/statsample/graph/scatterplot.rb", "lib/statsample/histogram.rb", "lib/statsample/matrix.rb", "lib/statsample/mle.rb", "lib/statsample/mle/logit.rb", "lib/statsample/mle/normal.rb", "lib/statsample/mle/probit.rb", "lib/statsample/multiset.rb", "lib/statsample/regression.rb", "lib/statsample/regression/binomial.rb", "lib/statsample/regression/binomial/logit.rb", "lib/statsample/regression/binomial/probit.rb", "lib/statsample/regression/multiple.rb", "lib/statsample/regression/multiple/alglibengine.rb", "lib/statsample/regression/multiple/baseengine.rb", "lib/statsample/regression/multiple/gslengine.rb", "lib/statsample/regression/multiple/matrixengine.rb", "lib/statsample/regression/multiple/rubyengine.rb", "lib/statsample/regression/simple.rb", "lib/statsample/reliability.rb", "lib/statsample/reliability/icc.rb", "lib/statsample/reliability/multiscaleanalysis.rb", "lib/statsample/reliability/scaleanalysis.rb", "lib/statsample/reliability/skillscaleanalysis.rb", "lib/statsample/resample.rb", "lib/statsample/rserve_extension.rb", "lib/statsample/shorthand.rb", "lib/statsample/srs.rb", "lib/statsample/test.rb", "lib/statsample/test/bartlettsphericity.rb", "lib/statsample/test/chisquare.rb", "lib/statsample/test/f.rb", "lib/statsample/test/kolmogorovsmirnov.rb", "lib/statsample/test/levene.rb", "lib/statsample/test/t.rb", "lib/statsample/test/umannwhitney.rb", "lib/statsample/vector.rb", "lib/statsample/vector/gsl.rb", "po/es/statsample.mo", "po/es/statsample.po", "po/statsample.pot", "references.txt", "setup.rb", "test/fixtures/bank2.dat", "test/fixtures/correlation_matrix.rb", "test/fixtures/crime.txt", "test/fixtures/hartman_23.matrix", "test/fixtures/repeated_fields.csv", "test/fixtures/test_binomial.csv", "test/fixtures/test_csv.csv", "test/fixtures/test_xls.xls", "test/fixtures/tetmat_matrix.txt", "test/fixtures/tetmat_test.txt", "test/helpers_tests.rb", "test/test_analysis.rb", "test/test_anova_contrast.rb", "test/test_anovaoneway.rb", "test/test_anovatwoway.rb", "test/test_anovatwowaywithdataset.rb", "test/test_anovawithvectors.rb", "test/test_bartlettsphericity.rb", "test/test_bivariate.rb", "test/test_codification.rb", "test/test_crosstab.rb", "test/test_csv.rb", "test/test_dataset.rb", "test/test_dominance_analysis.rb", "test/test_factor.rb", "test/test_factor_map.rb", "test/test_factor_pa.rb", "test/test_ggobi.rb", "test/test_gsl.rb", "test/test_histogram.rb", "test/test_logit.rb", "test/test_matrix.rb", "test/test_mle.rb", "test/test_multiset.rb", "test/test_regression.rb", "test/test_reliability.rb", "test/test_reliability_icc.rb", "test/test_reliability_skillscale.rb", "test/test_resample.rb", "test/test_rserve_extension.rb", "test/test_srs.rb", "test/test_statistics.rb", "test/test_stest.rb", "test/test_stratified.rb", "test/test_test_f.rb", "test/test_test_kolmogorovsmirnov.rb", "test/test_test_t.rb", "test/test_umannwhitney.rb", "test/test_vector.rb", "test/test_xls.rb", "web/Rakefile"]
-  s.homepage = "http://ruby-statsample.rubyforge.org/"
-  s.post_install_message = "***************************************************\nThanks for installing statsample.\n\nOn *nix, you could install statsample-optimization\nto retrieve gems gsl, statistics2 and a C extension\nto speed some methods.\n\n  $ sudo gem install statsample-optimization\n\nOn Ubuntu, install  build-essential and libgsl0-dev \nusing apt-get. Compile ruby 1.9x from \nsource code first.\n\n  $ sudo apt-get install build-essential libgsl0-dev\n\n\n*****************************************************\n"
-  s.rdoc_options = ["--main", "README.txt"]
-  s.require_paths = ["lib"]
-  s.rubyforge_project = "ruby-statsample"
-  s.rubygems_version = "1.8.10"
-  s.summary = "A suite for basic and advanced statistics on Ruby"
-  s.test_files = ["test/test_ggobi.rb", "test/test_vector.rb", "test/test_crosstab.rb", "test/test_bivariate.rb", "test/test_dominance_analysis.rb", "test/test_analysis.rb", "test/test_dataset.rb", "test/test_test_t.rb", "test/test_regression.rb", "test/test_reliability_icc.rb", "test/test_histogram.rb", "test/test_multiset.rb", "test/test_logit.rb", "test/test_xls.rb", "test/test_test_kolmogorovsmirnov.rb", "test/test_factor_map.rb", "test/test_stratified.rb", "test/test_rserve_extension.rb", "test/test_reliability_skillscale.rb", "test/test_anova_contrast.rb", "test/test_anovaoneway.rb", "test/test_matrix.rb", "test/test_reliability.rb", "test/test_anovatwowaywithdataset.rb", "test/test_factor_pa.rb", "test/test_factor.rb", "test/test_resample.rb", "test/test_anovawithvectors.rb", "test/test_bartlettsphericity.rb", "test/test_statistics.rb", "test/test_stest.rb", "test/test_srs.rb", "test/test_test_f.rb", "test/test_csv.rb", "test/test_anovatwoway.rb", "test/test_gsl.rb", "test/test_mle.rb", "test/test_umannwhitney.rb", "test/test_codification.rb"]
+  s.name               = 'statsample'
+  s.version            = '1.1.0.2013'
+  s.date               = '2013-04-28'
+  s.summary            = 'StatSample - Ruby Statistical library'
+  s.description        = "A suite for basic and advanced statistics on Ruby. Tested on Ruby 1.8.7 1.9.1, 1.9.2 and 1.9.3, JRuby 1.4(Ruby 1.8.7 compatible)."
+  s.authors            = ["Claudio Bustos"]
+  s.email              = ["clbustos@gmail.com"]
+  s.executables        = `git ls-files -- bin/*`.split("\n").map { |f| File.basename(f) }
+  s.files              = `git ls-files -- lib/*`.split("\n")
+  s.extra_rdoc_files  += ['History.txt', 'Manifest.txt', 'README.md', 'references.txt']
 
-  if s.respond_to? :specification_version then
-    s.specification_version = 3
+  s.homepage           = "http://github.com/clbustos/statsample"
+  s.require_path       = "lib"
+  s.rubyforge_project  = "ruby-statsample"
+  s.test_files         = `git ls-files -- {test}/*`.split("\n")
 
-    if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
-      s.add_runtime_dependency(%q<spreadsheet>, ["~> 0.6.5"])
-      s.add_runtime_dependency(%q<reportbuilder>, ["~> 1.4"])
-      s.add_runtime_dependency(%q<minimization>, ["~> 0.2.0"])
-      s.add_runtime_dependency(%q<fastercsv>, ["> 0"])
-      s.add_runtime_dependency(%q<dirty-memoize>, ["~> 0.0"])
-      s.add_runtime_dependency(%q<extendmatrix>, ["~> 0.3.1"])
-      s.add_runtime_dependency(%q<statsample-bivariate-extension>, ["> 0"])
-      s.add_runtime_dependency(%q<rserve-client>, ["~> 0.2.5"])
-      s.add_runtime_dependency(%q<rubyvis>, ["~> 0.5.0"])
-      s.add_runtime_dependency(%q<distribution>, ["~> 0.3"])
-      s.add_development_dependency(%q<hoe>, ["~> 0"])
-      s.add_development_dependency(%q<shoulda>, ["~> 0"])
-      s.add_development_dependency(%q<minitest>, ["~> 2.0"])
-      s.add_development_dependency(%q<rserve-client>, ["~> 0"])
-      s.add_development_dependency(%q<gettext>, ["~> 2.3.8"])
-      s.add_development_dependency(%q<mocha>, ["~> 0"])
-      s.add_development_dependency(%q<hoe-git>, ["~> 1.5.0"])
-      s.add_development_dependency(%q<hoe>, ["~> 2.12"])
-      s.add_development_dependency(%q<rspec>, ["~> 2.13.0"])
-      s.add_development_dependency(%q<rspec-core>, ["~> 2.13.1"])
-      s.add_development_dependency(%q<rspec-mocks>, ["~> 2.13.0"])
-      s.add_development_dependency(%q<rspec-expectations>, ["~> 2.13.0"])
-    else
-      s.add_dependency(%q<spreadsheet>, ["~> 0.6.5"])
-      s.add_dependency(%q<reportbuilder>, ["~> 1.4"])
-      s.add_dependency(%q<minimization>, ["~> 0.2.0"])
-      s.add_dependency(%q<fastercsv>, ["> 0"])
-      s.add_dependency(%q<dirty-memoize>, ["~> 0.0"])
-      s.add_dependency(%q<extendmatrix>, ["~> 0.3.1"])
-      s.add_dependency(%q<statsample-bivariate-extension>, ["> 0"])
-      s.add_dependency(%q<rserve-client>, ["~> 0.2.5"])
-      s.add_dependency(%q<rubyvis>, ["~> 0.5.0"])
-      s.add_dependency(%q<distribution>, ["~> 0.3"])
-      s.add_dependency(%q<hoe>, ["~> 0"])
-      s.add_dependency(%q<shoulda>, ["~> 0"])
-      s.add_dependency(%q<minitest>, ["~> 2.0"])
-      s.add_dependency(%q<rserve-client>, ["~> 0"])
-      s.add_dependency(%q<gettext>, ["~> 2.3.8"])
-      s.add_dependency(%q<mocha>, ["~> 0"])
-      s.add_dependency(%q<hoe-git>, ["~> 1.5.0"])
-      s.add_dependency(%q<hoe>, ["~> 2.12"])
-      s.add_dependency(%q<rspec>, ["~> 2.13.0"])
-      s.add_dependency(%q<rspec-core>, ["~> 2.13.1"])
-      s.add_dependency(%q<rspec-mocks>, ["~> 2.13.0"])
-      s.add_dependency(%q<rspec-expectations>, ["~> 2.13.0"])
-    end
-  else
-    s.add_dependency(%q<spreadsheet>, ["~> 0.6.5"])
-    s.add_dependency(%q<reportbuilder>, ["~> 1.4"])
-    s.add_dependency(%q<minimization>, ["~> 0.2.0"])
-    s.add_dependency(%q<fastercsv>, ["> 0"])
-    s.add_dependency(%q<dirty-memoize>, ["~> 0.0"])
-    s.add_dependency(%q<extendmatrix>, ["~> 0.3.1"])
-    s.add_dependency(%q<statsample-bivariate-extension>, ["> 0"])
-    s.add_dependency(%q<rserve-client>, ["~> 0.2.5"])
-    s.add_dependency(%q<rubyvis>, ["~> 0.5.0"])
-    s.add_dependency(%q<distribution>, ["~> 0.3"])
-    s.add_dependency(%q<hoe>, ["~> 0"])
-    s.add_dependency(%q<shoulda>, ["~> 0"])
-    s.add_dependency(%q<minitest>, ["~> 2.0"])
-    s.add_dependency(%q<rserve-client>, ["~> 0"])
-    s.add_dependency(%q<gettext>, ["~> 2.3.8"])
-    s.add_dependency(%q<mocha>, ["~> 0"])
-    s.add_dependency(%q<hoe-git>, ["~> 1.5.0"])
-    s.add_dependency(%q<hoe>, ["~> 2.12"])
-    s.add_dependency(%q<rspec>, ["~> 2.13.0"])
-    s.add_dependency(%q<rspec-core>, ["~> 2.13.1"])
-    s.add_dependency(%q<rspec-mocks>, ["~> 2.13.0"])
-    s.add_dependency(%q<rspec-expectations>, ["~> 2.13.0"])
+  DEPENDENCIES         = [{:gem => "spreadsheet", :version => "~> 0.8.5"}, {:gem => "reportbuilder", :version => "~> 1.4.2"},
+   {:gem => "minimization", :version => "~> 0.2.1"}, {:gem => "fastercsv", :version => "~> 1.5.5"}, 
+   {:gem => "dirty-memoize", :version => "~> 0.0.4"}, {:gem => "extendmatrix", :version => "~> 0.3.1"},
+   {:gem => "statsample-bivariate-extension", :version => "~> 1.1.0"}, {:gem => "rserve-client", :version => "~> 0.3.0"},
+   {:gem => "rubyvis", :version => "~> 0.5.2"}, {:gem => "gettext", :version => "~> 2.3.9"},
+   {:gem => "mocha", :version => "~> 0.14.0"}, {:gem => "hoe-git", :version => "~> 1.5.0"},
+   {:gem => "minitest", :version => "~> 5.0.5"}, {:gem => "shoulda", :version => "~> 3.5.0"},
+  ]
+  DEPENDENCIES.each do |dependency|
+    s.add_dependency(dependency[:gem], dependency[:version])
   end
 end

From fd06759440604eb64f3a47a628d7d3693f6f25fb Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Sun, 30 Jun 2013 05:52:23 +0530
Subject: [PATCH 09/22] Wald test implemented

---
 lib/statsample/timeseries.rb |  4 +--
 test/test_wald.rb            | 53 ++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+), 2 deletions(-)
 create mode 100644 test/test_wald.rb

diff --git a/lib/statsample/timeseries.rb b/lib/statsample/timeseries.rb
index d9e7aad..eada4a5 100644
--- a/lib/statsample/timeseries.rb
+++ b/lib/statsample/timeseries.rb
@@ -2,8 +2,8 @@ module Statsample::TimeSeriesShorthands
   # Creates a new Statsample::TimeSeries object
   # Argument should be equal to TimeSeries.new
   def to_time_series(*args)
-		Statsample::TimeSeries::TimeSeries.new(self, :scale, *args)
-	end
+    Statsample::TimeSeries::TimeSeries.new(self, :scale, *args)
+  end
 
   alias :to_ts :to_time_series
 end
diff --git a/test/test_wald.rb b/test/test_wald.rb
new file mode 100644
index 0000000..73a3459
--- /dev/null
+++ b/test/test_wald.rb
@@ -0,0 +1,53 @@
+require(File.expand_path(File.dirname(__FILE__)+'/helpers_tests.rb'))
+
+class StatsampleTestWaldTest < MiniTest::Unit::TestCase
+  include Statsample::Shorthand
+  include Statsample::TimeSeries
+  def setup
+    @wald = 100.times.map do
+      rand(100)
+    end.to_ts
+  end
+
+  def generate_acf_series(lags)
+    acf_series = @wald.acf(lags)
+    acf_series.map do |x|
+      x ** 2
+    end.inject(:+)
+  end
+
+  def compute_chi_df(lags)
+    observed = Matrix[@wald.acf(lags)]
+    Statsample::Test.chi_square(observed).df
+  end
+
+  def test_with_5_lags
+    assert_equal (generate_acf_series(5)/5).to_i, compute_chi_df(5)
+    assert_equal generate_acf_series(5), @wald.acf(5).map { |x| x ** 2 }.inject(:+)
+  end
+
+  def test_with_10_lags
+    assert_equal (generate_acf_series(5)/5).to_i, compute_chi_df(5)
+  end
+
+  def test_with_15_lags
+    assert_equal (generate_acf_series(5)/5).to_i, compute_chi_df(5)
+  end
+
+  def test_with_20_lags
+    assert_equal (generate_acf_series(5)/5).to_i, compute_chi_df(5)
+  end
+
+  def test_with_different_series
+    @wald = 200.times.map { rand }.to_ts
+    acf_series = @wald.acf(10)
+    lhs = acf_series.each { |x| x ** 2 }.inject(:+)
+    rhs = Statsample::Test.chi_square(Matrix[acf_series]).df
+    assert_equal (lhs/10).to_i, rhs
+  end
+
+  def test_pacf
+    #TODO
+  end
+end
+

From 67aa8c98d8c6c8a6fd5ae9eec88332e4c9a1673b Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Sun, 30 Jun 2013 06:17:27 +0530
Subject: [PATCH 10/22] Implemented wald tests for acf mean, variance and
 distributed chi-square

---
 test/test_wald.rb | 41 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/test/test_wald.rb b/test/test_wald.rb
index 73a3459..30fb87b 100644
--- a/test/test_wald.rb
+++ b/test/test_wald.rb
@@ -1,6 +1,8 @@
 require(File.expand_path(File.dirname(__FILE__)+'/helpers_tests.rb'))
 
 class StatsampleTestWaldTest < MiniTest::Unit::TestCase
+
+  # Wald 
   include Statsample::Shorthand
   include Statsample::TimeSeries
   def setup
@@ -10,6 +12,19 @@ def setup
   end
 
   def generate_acf_series(lags)
+    @wald.acf(lags)
+  end
+
+  def compute_mean(acf)
+    acf.inject(:+) / acf.size
+  end
+
+  def compute_variance(acf)
+    result = acf.map { |x| x ** 2 }.inject(:+) / acf.size
+    return result.to_f
+  end
+
+  def generate_acf_summation(lags)
     acf_series = @wald.acf(lags)
     acf_series.map do |x|
       x ** 2
@@ -22,20 +37,36 @@ def compute_chi_df(lags)
   end
 
   def test_with_5_lags
-    assert_equal (generate_acf_series(5)/5).to_i, compute_chi_df(5)
-    assert_equal generate_acf_series(5), @wald.acf(5).map { |x| x ** 2 }.inject(:+)
+    acf = generate_acf_series(5)
+    assert_equal generate_acf_summation(5), @wald.acf(5).map { |x| x ** 2 }.inject(:+)
+    assert_in_delta compute_mean(acf).to_i, 0
+    assert_in_delta compute_variance(acf), 1.0/acf.size, 0.1
+    assert_equal (generate_acf_summation(5)/5).to_i, compute_chi_df(5)
+
   end
 
   def test_with_10_lags
-    assert_equal (generate_acf_series(5)/5).to_i, compute_chi_df(5)
+    acf = generate_acf_series(10)
+    assert_equal generate_acf_summation(10), @wald.acf(10).map { |x| x ** 2 }.inject(:+)
+    assert_in_delta compute_mean(acf).to_i, 0
+    assert_in_delta compute_variance(acf), 1.0/acf.size, 0.1
+    assert_equal (generate_acf_summation(10)/10).to_i, compute_chi_df(10)
   end
 
   def test_with_15_lags
-    assert_equal (generate_acf_series(5)/5).to_i, compute_chi_df(5)
+    acf = generate_acf_series(15)
+    assert_equal generate_acf_summation(15), @wald.acf(15).map { |x| x ** 2 }.inject(:+)
+    assert_in_delta compute_mean(acf).to_i, 0
+    assert_in_delta compute_variance(acf), 1.0/acf.size, 0.1
+    assert_equal (generate_acf_summation(15)/15).to_i, compute_chi_df(15)
   end
 
   def test_with_20_lags
-    assert_equal (generate_acf_series(5)/5).to_i, compute_chi_df(5)
+    acf = generate_acf_series(20)
+    assert_equal generate_acf_summation(20), @wald.acf(20).map { |x| x ** 2 }.inject(:+)
+    assert_in_delta compute_mean(acf).to_i, 0
+    assert_in_delta compute_variance(acf), 1.0/acf.size, 0.1
+    assert_equal (generate_acf_summation(20)/20).to_i, compute_chi_df(20)
   end
 
   def test_with_different_series

From ac99e293c0d89e21a638780db75abca818fc55d0 Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Mon, 1 Jul 2013 20:38:36 +0530
Subject: [PATCH 11/22] Wald test. To test acf and fit on an ARIMA model.

---
 test/test_wald2.rb | 66 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)
 create mode 100644 test/test_wald2.rb

diff --git a/test/test_wald2.rb b/test/test_wald2.rb
new file mode 100644
index 0000000..7f59e2c
--- /dev/null
+++ b/test/test_wald2.rb
@@ -0,0 +1,66 @@
+require(File.expand_path(File.dirname(__FILE__)+'/helpers_tests.rb'))
+
+class StatsampleWaldTest < MiniTest::Unit::TestCase
+  # Wald test is useful to test a series of n acf with Chi-square
+  # degree of freedom. It is extremely useful to test fit the fit of
+  # an ARIMA model to test the residuals.
+
+  include Statsample::TimeSeries
+  include Statsample::Shorthand
+  include Distribution
+
+  def setup
+    #create time series to evaluate later
+    @wald = 100.times.map { rand(100) }.to_ts
+  end
+
+  def sum_of_squares_of_acf_series(lags)
+    #perform sum of squares for a series of acf with specified lags
+    acf = @wald.acf(lags)
+    return acf.map { |x| x ** 2 }.inject(:+)
+  end
+
+  def chisquare_cdf(sum_of_squares, lags)
+    1 - ChiSquare.cdf(sum_of_squares, lags)
+  end
+
+
+  def test_wald_with_5_lags
+    #number of lags for acf = 5
+    lags = 5
+    sum_of_squares = sum_of_squares_of_acf_series(lags)
+    assert_in_delta chisquare_cdf(sum_of_squares, lags), 1, 0.05
+  end
+
+
+  def test_wald_with_10_lags
+    #number of lags for acf = 10
+    lags = 10
+    sum_of_squares = sum_of_squares_of_acf_series(lags)
+    assert_in_delta chisquare_cdf(sum_of_squares, lags), 1, 0.05
+  end
+
+
+  def test_wald_with_15_lags
+    #number of lags for acf = 15
+    lags = 15
+    sum_of_squares = sum_of_squares_of_acf_series(lags)
+    assert_in_delta chisquare_cdf(sum_of_squares, lags), 1, 0.05
+  end
+
+
+  def test_wald_with_20_lags
+    #number of lags for acf = 20
+    lags = 20
+    sum_of_squares = sum_of_squares_of_acf_series(lags)
+    assert_in_delta chisquare_cdf(sum_of_squares, lags), 1, 0.05
+  end
+
+
+  def test_wald_with_25_lags
+    #number of lags for acf = 25
+    lags = 25
+    sum_of_squares = sum_of_squares_of_acf_series(lags)
+    assert_in_delta chisquare_cdf(sum_of_squares, lags), 1, 0.05
+  end
+end

From 99b31c5ae2cf2835922c9b013fbefca00b450324 Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Wed, 3 Jul 2013 12:39:52 +0530
Subject: [PATCH 12/22] Implementing pacf with yule walker for unbiased and mle
 method.

---
 lib/statsample/timeseries.rb | 90 ++++++++++++++++++++++++++++++++++++
 test/test_wald2.rb           |  5 ++
 2 files changed, 95 insertions(+)

diff --git a/lib/statsample/timeseries.rb b/lib/statsample/timeseries.rb
index eada4a5..0ed56eb 100644
--- a/lib/statsample/timeseries.rb
+++ b/lib/statsample/timeseries.rb
@@ -45,6 +45,96 @@ def acf maxlags = nil
         end
       end
 
+
+      def yule_walker(k = 1, method='yw')
+        #From the series, estimates AR(p)(autoregressive) parameter
+        #using Yule-Waler equation. See -
+        #http://en.wikipedia.org/wiki/Autoregressive_moving_average_model
+
+        #parameters:
+        #ts = series
+        #k = order, default = 1
+        #method = can be 'yw' or 'mle'. If 'yw' then it is unbiased, denominator
+        #is (n - k)
+
+        #returns:
+        #rho => autoregressive coefficients
+        ts = self #timeseries
+        ts -= ts.mean
+        n = ts.size
+        if method.downcase.eql? 'yw'
+          #unbiased => denominator = (n - k)
+          denom =->(k) { n - k }
+        else
+          #mle
+          #denominator => (n)
+          denom =->(k) { n }
+        end
+        r = Array.new(k + 1) { 0.0 }
+        r[0] = ts.map { |x| x ** 2 }.inject(:+) / denom.call(0)
+
+        for k in (1..(k+1))
+          r[k] = (ts[0...-k] * ts[k...ts.size]).inject(:+) / denom.call(k)
+        end
+
+        R = toeplitz(r[0...-1])
+
+        #TODO: return the solved matrix equation of (R, r[1..r.size])
+        #pending
+      end
+
+
+      def toeplitz(arr)
+        #Generates Toeplitz matrix - http://en.wikipedia.org/wiki/Toeplitz_matrix
+        #=> arr = [0, 1, 2, 3]
+        #=> result:
+        #[[0, 1, 2, 3],
+        # [1, 0, 1, 2],
+        # [2, 1, 0, 1],
+        # [3, 2, 1, 0]]
+        eplitz_matrix = Array.new(arr.size) { Array.new(arr.size) }
+
+        0.upto(arr.size - 1) do |i|
+          j = 0
+          index = i
+          while i >= 0 do
+            eplitz_matrix[index][j] = arr[i]
+            j += 1
+            i -= 1
+          end
+          i = index + 1; k = 1
+          while i < arr.size do
+            eplitz_matrix[index][j] = arr[k]
+            i += 1; j += 1; k += 1
+          end
+        end
+
+        return eplitz_matrix
+      end
+
+
+      def pacf_yw(maxlags, method = 'yw')
+        #partial autocorrelation by yule walker equations.
+        #Inspiration: StatsModels
+        xm = self - self.mean
+        pacf = [1]
+        (1..maxlags).map do |i|
+          pacf << yule_walker(i, method)[0][-1]
+        end
+        return pacf
+      end
+
+
+      def pacf(maxlags = nil, method = 'yw')
+        #parameters:
+        #maxlags => maximum number of lags for pcf
+        #method => for autocovariance in yule_walker:
+          #'yw' for 'yule-walker unbaised', 'mle' for biased maximum likelihood
+
+        maxlags ||= (10 * Math.log10(size)).to_i
+        return pacf_yw(maxlags, method)
+      end
+
       # Lags the series by k periods.
       #
       # The convention is to set the oldest observations (the first ones
diff --git a/test/test_wald2.rb b/test/test_wald2.rb
index 7f59e2c..1a3eecd 100644
--- a/test/test_wald2.rb
+++ b/test/test_wald2.rb
@@ -30,6 +30,7 @@ def test_wald_with_5_lags
     lags = 5
     sum_of_squares = sum_of_squares_of_acf_series(lags)
     assert_in_delta chisquare_cdf(sum_of_squares, lags), 1, 0.05
+    assert_equal @wald.acf(lags).size, lags + 1
   end
 
 
@@ -38,6 +39,7 @@ def test_wald_with_10_lags
     lags = 10
     sum_of_squares = sum_of_squares_of_acf_series(lags)
     assert_in_delta chisquare_cdf(sum_of_squares, lags), 1, 0.05
+    assert_equal @wald.acf(lags).size, lags + 1
   end
 
 
@@ -46,6 +48,7 @@ def test_wald_with_15_lags
     lags = 15
     sum_of_squares = sum_of_squares_of_acf_series(lags)
     assert_in_delta chisquare_cdf(sum_of_squares, lags), 1, 0.05
+    assert_equal @wald.acf(lags).size, lags + 1
   end
 
 
@@ -54,6 +57,7 @@ def test_wald_with_20_lags
     lags = 20
     sum_of_squares = sum_of_squares_of_acf_series(lags)
     assert_in_delta chisquare_cdf(sum_of_squares, lags), 1, 0.05
+    assert_equal @wald.acf(lags).size, lags + 1
   end
 
 
@@ -62,5 +66,6 @@ def test_wald_with_25_lags
     lags = 25
     sum_of_squares = sum_of_squares_of_acf_series(lags)
     assert_in_delta chisquare_cdf(sum_of_squares, lags), 1, 0.05
+    assert_equal @wald.acf(lags).size, lags + 1
   end
 end

From 2baf8a094a02cfc338246d019d673831967d939e Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Fri, 5 Jul 2013 22:21:54 +0530
Subject: [PATCH 13/22] Implemented pacf with yule-walker.

---
 lib/statsample/timeseries.rb | 43 +++++++++++++++++++++++++-----------
 statsample.gemspec           |  2 +-
 2 files changed, 31 insertions(+), 14 deletions(-)

diff --git a/lib/statsample/timeseries.rb b/lib/statsample/timeseries.rb
index 0ed56eb..11fa6d2 100644
--- a/lib/statsample/timeseries.rb
+++ b/lib/statsample/timeseries.rb
@@ -46,7 +46,7 @@ def acf maxlags = nil
       end
 
 
-      def yule_walker(k = 1, method='yw')
+      def yule_walker(series, k = 1, method='yw')
         #From the series, estimates AR(p)(autoregressive) parameter
         #using Yule-Waler equation. See -
         #http://en.wikipedia.org/wiki/Autoregressive_moving_average_model
@@ -59,8 +59,8 @@ def yule_walker(k = 1, method='yw')
 
         #returns:
         #rho => autoregressive coefficients
-        ts = self #timeseries
-        ts -= ts.mean
+        ts = series #timeseries
+        ts = ts - ts.mean
         n = ts.size
         if method.downcase.eql? 'yw'
           #unbiased => denominator = (n - k)
@@ -71,21 +71,39 @@ def yule_walker(k = 1, method='yw')
           denom =->(k) { n }
         end
         r = Array.new(k + 1) { 0.0 }
-        r[0] = ts.map { |x| x ** 2 }.inject(:+) / denom.call(0)
+        r[0] = ts.map { |x| x ** 2 }.inject(:+).to_f / denom.call(0).to_f
 
-        for k in (1..(k+1))
-          r[k] = (ts[0...-k] * ts[k...ts.size]).inject(:+) / denom.call(k)
+        for l in (1..k)
+          r[k] = (ts[0...-l].zip(ts[l...ts.size])).map do |x|
+            x.inject(:*)
+          end.inject(:+).to_f / denom.call(l).to_f
         end
 
-        R = toeplitz(r[0...-1])
+        r_R = toeplitz(r[0...-1])
 
-        #TODO: return the solved matrix equation of (R, r[1..r.size])
-        #pending
+        mat = Matrix.columns(r_R).inverse()
+        solve_matrix(mat, r[1..r.size])
+      end
+
+      def solve_matrix(matrix, out_vector)
+        solution_vector = Array.new(out_vector.size,0 )
+        matrix = matrix.to_a
+        k = 0
+        matrix.each do |row|
+          row.each_with_index do |element, i|
+            solution_vector[k] += element * 1.0 * out_vector[i]
+          end
+          k += 1
+        end
+        return solution_vector
       end
 
 
       def toeplitz(arr)
-        #Generates Toeplitz matrix - http://en.wikipedia.org/wiki/Toeplitz_matrix
+        #Generates Toeplitz matrix -
+        #http://en.wikipedia.org/wiki/Toeplitz_matrix
+        #Toeplitz matrix are equal when they are stored in row &
+        #column major
         #=> arr = [0, 1, 2, 3]
         #=> result:
         #[[0, 1, 2, 3],
@@ -116,10 +134,9 @@ def toeplitz(arr)
       def pacf_yw(maxlags, method = 'yw')
         #partial autocorrelation by yule walker equations.
         #Inspiration: StatsModels
-        xm = self - self.mean
-        pacf = [1]
+        pacf = [1.0]
         (1..maxlags).map do |i|
-          pacf << yule_walker(i, method)[0][-1]
+          pacf << yule_walker(self, i, method)[-1]
         end
         return pacf
       end
diff --git a/statsample.gemspec b/statsample.gemspec
index 92bed6b..de16f4d 100644
--- a/statsample.gemspec
+++ b/statsample.gemspec
@@ -21,7 +21,7 @@ Gem::Specification.new do |s|
    {:gem => "statsample-bivariate-extension", :version => "~> 1.1.0"}, {:gem => "rserve-client", :version => "~> 0.3.0"},
    {:gem => "rubyvis", :version => "~> 0.5.2"}, {:gem => "gettext", :version => "~> 2.3.9"},
    {:gem => "mocha", :version => "~> 0.14.0"}, {:gem => "hoe-git", :version => "~> 1.5.0"},
-   {:gem => "minitest", :version => "~> 5.0.5"}, {:gem => "shoulda", :version => "~> 3.5.0"},
+   {:gem => "minitest", :version => "~> 4.2"}, {:gem => "shoulda", :version => "~> 3.5.0"},
   ]
   DEPENDENCIES.each do |dependency|
     s.add_dependency(dependency[:gem], dependency[:version])

From 935e107763d039e1989a14b5e37bd485dd0b045a Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Sat, 6 Jul 2013 02:31:56 +0530
Subject: [PATCH 14/22] Corrected a loop mistake in pacf implementation.
 Completed and verified working on pacf - yule-walker unbiased and mle. :)
 Next : tests

---
 lib/statsample/timeseries.rb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/statsample/timeseries.rb b/lib/statsample/timeseries.rb
index 11fa6d2..770c308 100644
--- a/lib/statsample/timeseries.rb
+++ b/lib/statsample/timeseries.rb
@@ -74,7 +74,7 @@ def yule_walker(series, k = 1, method='yw')
         r[0] = ts.map { |x| x ** 2 }.inject(:+).to_f / denom.call(0).to_f
 
         for l in (1..k)
-          r[k] = (ts[0...-l].zip(ts[l...ts.size])).map do |x|
+          r[l] = (ts[0...-l].zip(ts[l...ts.size])).map do |x|
             x.inject(:*)
           end.inject(:+).to_f / denom.call(l).to_f
         end

From d83c19bc83107cf57b759f165e264b84820caad4 Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Sat, 6 Jul 2013 13:44:39 +0530
Subject: [PATCH 15/22] Additional comments on pacf usage documented

---
 lib/statsample/timeseries.rb | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/lib/statsample/timeseries.rb b/lib/statsample/timeseries.rb
index 770c308..5804efe 100644
--- a/lib/statsample/timeseries.rb
+++ b/lib/statsample/timeseries.rb
@@ -141,7 +141,13 @@ def pacf_yw(maxlags, method = 'yw')
         return pacf
       end
 
-
+      ## USAGE ##
+      ## series = (1..100).map { rand(100) }.to_ts
+      #  series.pacf(10, 'yw') 
+      #  => yule-walker unbiased partial-autocorrelation
+      #  series.pacf(10, 'mle')
+      #  => yule-walker mle  pacf
+      #
       def pacf(maxlags = nil, method = 'yw')
         #parameters:
         #maxlags => maximum number of lags for pcf

From 1efd7fcbd188dd230299b670af81d66fcb100ba2 Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Sat, 6 Jul 2013 21:01:44 +0530
Subject: [PATCH 16/22] Initial tests for pacf

---
 test/test_pacf.rb | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)
 create mode 100644 test/test_pacf.rb

diff --git a/test/test_pacf.rb b/test/test_pacf.rb
new file mode 100644
index 0000000..981a545
--- /dev/null
+++ b/test/test_pacf.rb
@@ -0,0 +1,42 @@
+require(File.expand_path(File.dirname(__FILE__)+'/helpers_tests.rb'))
+class StatsampleTimeSeriesPacfTestCase < MiniTest::Unit::TestCase
+  context(Statsample::TimeSeries) do
+    include Statsample::TimeSeries
+    setup do
+      @ts = (1..20).map { |x| x * 10 }.to_ts
+      #setting up a proc to get a closure for pacf calling with variable lags and methods
+      @pacf_proc =->(k, method) { @ts.pacf(k, method) }
+    end
+
+    should "return correct correct pacf size for lags = 5" do
+      assert_equal @pacf_proc.call(5, 'yw').size, 6 
+      assert_equal @pacf_proc.call(5, 'mle').size, 6
+      #first element is 1.0
+    end
+
+    should "return correct correct pacf size for lags = 10" do
+      assert_equal @pacf_proc.call(10, 'yw').size, 11 
+      assert_equal @pacf_proc.call(10, 'mle').size, 11
+      #first element is 1.0
+    end
+
+    should "have first element as 1.0" do
+      assert_equal @pacf_proc.call(10, 'yw')[0], 1.0
+      assert_equal @pacf_proc.call(10, 'mle')[0], 1.0
+    end
+
+    should "give correct pacf results for unbiased yule-walker" do
+      result_10 = [1.0, 0.8947368421052632, -0.10582010582010604, -0.11350188273265083, -0.12357534824820737, -0.13686534216335522, -0.15470588235294147, -0.17938011883732036, -0.2151192288178601, -0.2707082833133261, -0.3678160919540221]
+      result_5 = [1.0, 0.8947368421052632, -0.10582010582010604, -0.11350188273265083, -0.12357534824820737, -0.13686534216335522]
+      assert_equal @pacf_proc.call(10, 'yw'), result_10
+      assert_equal @pacf_proc.call(5, 'yw'), result_5
+    end
+
+    should "give correct pacf results for mle yule-walker" do
+      result_10 = [1.0, 0.85, -0.07566212829370711, -0.07635069706072706, -0.07698628638512295, -0.07747034005560738, -0.0776780981161499, -0.07744984679625189, -0.0765803323191094, -0.07480650005932366, -0.07179435184923755]
+      result_5 = [1.0, 0.85, -0.07566212829370711, -0.07635069706072706, -0.07698628638512295, -0.07747034005560738]
+      assert_equal @pacf_proc.call(10, 'mle'), result_10
+      assert_equal @pacf_proc.call(5, 'mle'), result_5
+    end
+  end
+end

From 05dad29d1bf043016176059c17aba3b2674239cf Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Sun, 7 Jul 2013 21:05:15 +0530
Subject: [PATCH 17/22] Tested assertions for pacf in range (1..10)

---
 test/test_pacf.rb | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/test/test_pacf.rb b/test/test_pacf.rb
index 981a545..278214b 100644
--- a/test/test_pacf.rb
+++ b/test/test_pacf.rb
@@ -30,6 +30,11 @@ class StatsampleTimeSeriesPacfTestCase < MiniTest::Unit::TestCase
       result_5 = [1.0, 0.8947368421052632, -0.10582010582010604, -0.11350188273265083, -0.12357534824820737, -0.13686534216335522]
       assert_equal @pacf_proc.call(10, 'yw'), result_10
       assert_equal @pacf_proc.call(5, 'yw'), result_5
+
+      #Checking for lag = (1..10)
+      1.upto(10) do |i|
+        assert_equal @pacf_proc.call(i, 'yw'), result_10[0..i]
+      end
     end
 
     should "give correct pacf results for mle yule-walker" do
@@ -37,6 +42,11 @@ class StatsampleTimeSeriesPacfTestCase < MiniTest::Unit::TestCase
       result_5 = [1.0, 0.85, -0.07566212829370711, -0.07635069706072706, -0.07698628638512295, -0.07747034005560738]
       assert_equal @pacf_proc.call(10, 'mle'), result_10
       assert_equal @pacf_proc.call(5, 'mle'), result_5
+
+      #Checking for lag = (1..10)
+      1.upto(10) do |i|
+        assert_equal @pacf_proc.call(i, 'mle'), result_10[0..i]
+      end
     end
   end
 end

From 623acbf0a39fa3ddedbffd5db728e9d543c5398b Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Mon, 8 Jul 2013 22:34:22 +0530
Subject: [PATCH 18/22] Abstracted code for Pacf in a new
 Statsample::TimeSeries::Pacf module. Addressed few concerns by Monica and
 Claudio. :)

---
 lib/statsample/timeseries.rb      | 121 +++---------------------------
 lib/statsample/timeseries/pacf.rb | 100 ++++++++++++++++++++++++
 2 files changed, 110 insertions(+), 111 deletions(-)
 create mode 100644 lib/statsample/timeseries/pacf.rb

diff --git a/lib/statsample/timeseries.rb b/lib/statsample/timeseries.rb
index 5804efe..3a236b3 100644
--- a/lib/statsample/timeseries.rb
+++ b/lib/statsample/timeseries.rb
@@ -1,3 +1,4 @@
+require 'statsample/timeseries/pacf'
 module Statsample::TimeSeriesShorthands
   # Creates a new Statsample::TimeSeries object
   # Argument should be equal to TimeSeries.new
@@ -17,6 +18,7 @@ module TimeSeries
     # Collection of data indexed by time.
     # The order goes from earliest to latest.
     class TimeSeries < Statsample::Vector
+      include Statsample::TimeSeries::Pacf
       # Calculates the autocorrelation coefficients of the series.
       #
       # The first element is always 1, since that is the correlation
@@ -29,10 +31,10 @@ class TimeSeries < Statsample::Vector
       #  ts.acf   # => array with first 21 autocorrelations
       #  ts.acf 3 # => array with first 3 autocorrelations
       #
-      def acf maxlags = nil
-        maxlags ||= (10 * Math.log10(size)).to_i
+      def acf max_lags = nil
+        max_lags ||= (10 * Math.log10(size)).to_i
 
-        (0..maxlags).map do |i|
+        (0..max_lags).map do |i|
           if i == 0
             1.0
           else
@@ -45,117 +47,14 @@ def acf maxlags = nil
         end
       end
 
-
-      def yule_walker(series, k = 1, method='yw')
-        #From the series, estimates AR(p)(autoregressive) parameter
-        #using Yule-Waler equation. See -
-        #http://en.wikipedia.org/wiki/Autoregressive_moving_average_model
-
+      def pacf(max_lags = nil, method = 'yw')
         #parameters:
-        #ts = series
-        #k = order, default = 1
-        #method = can be 'yw' or 'mle'. If 'yw' then it is unbiased, denominator
-        #is (n - k)
-
-        #returns:
-        #rho => autoregressive coefficients
-        ts = series #timeseries
-        ts = ts - ts.mean
-        n = ts.size
-        if method.downcase.eql? 'yw'
-          #unbiased => denominator = (n - k)
-          denom =->(k) { n - k }
-        else
-          #mle
-          #denominator => (n)
-          denom =->(k) { n }
-        end
-        r = Array.new(k + 1) { 0.0 }
-        r[0] = ts.map { |x| x ** 2 }.inject(:+).to_f / denom.call(0).to_f
-
-        for l in (1..k)
-          r[l] = (ts[0...-l].zip(ts[l...ts.size])).map do |x|
-            x.inject(:*)
-          end.inject(:+).to_f / denom.call(l).to_f
-        end
-
-        r_R = toeplitz(r[0...-1])
-
-        mat = Matrix.columns(r_R).inverse()
-        solve_matrix(mat, r[1..r.size])
-      end
-
-      def solve_matrix(matrix, out_vector)
-        solution_vector = Array.new(out_vector.size,0 )
-        matrix = matrix.to_a
-        k = 0
-        matrix.each do |row|
-          row.each_with_index do |element, i|
-            solution_vector[k] += element * 1.0 * out_vector[i]
-          end
-          k += 1
-        end
-        return solution_vector
-      end
-
-
-      def toeplitz(arr)
-        #Generates Toeplitz matrix -
-        #http://en.wikipedia.org/wiki/Toeplitz_matrix
-        #Toeplitz matrix are equal when they are stored in row &
-        #column major
-        #=> arr = [0, 1, 2, 3]
-        #=> result:
-        #[[0, 1, 2, 3],
-        # [1, 0, 1, 2],
-        # [2, 1, 0, 1],
-        # [3, 2, 1, 0]]
-        eplitz_matrix = Array.new(arr.size) { Array.new(arr.size) }
-
-        0.upto(arr.size - 1) do |i|
-          j = 0
-          index = i
-          while i >= 0 do
-            eplitz_matrix[index][j] = arr[i]
-            j += 1
-            i -= 1
-          end
-          i = index + 1; k = 1
-          while i < arr.size do
-            eplitz_matrix[index][j] = arr[k]
-            i += 1; j += 1; k += 1
-          end
-        end
-
-        return eplitz_matrix
-      end
-
-
-      def pacf_yw(maxlags, method = 'yw')
-        #partial autocorrelation by yule walker equations.
-        #Inspiration: StatsModels
-        pacf = [1.0]
-        (1..maxlags).map do |i|
-          pacf << yule_walker(self, i, method)[-1]
-        end
-        return pacf
-      end
-
-      ## USAGE ##
-      ## series = (1..100).map { rand(100) }.to_ts
-      #  series.pacf(10, 'yw') 
-      #  => yule-walker unbiased partial-autocorrelation
-      #  series.pacf(10, 'mle')
-      #  => yule-walker mle  pacf
-      #
-      def pacf(maxlags = nil, method = 'yw')
-        #parameters:
-        #maxlags => maximum number of lags for pcf
+        #max_lags => maximum number of lags for pcf
         #method => for autocovariance in yule_walker:
           #'yw' for 'yule-walker unbaised', 'mle' for biased maximum likelihood
 
-        maxlags ||= (10 * Math.log10(size)).to_i
-        return pacf_yw(maxlags, method)
+        max_lags ||= (10 * Math.log10(size)).to_i
+       Pacf::Pacf.pacf_yw(self, max_lags, method)
       end
 
       # Lags the series by k periods.
@@ -199,7 +98,7 @@ def lag k = 1
       #            # => [0.69, 0.23, 0.44, 0.71, ...]
       #
       #  ts.diff   # => [nil, -0.46, 0.21, 0.27, ...]
-      #  
+      #
       def diff
         self - self.lag
       end
diff --git a/lib/statsample/timeseries/pacf.rb b/lib/statsample/timeseries/pacf.rb
new file mode 100644
index 0000000..09836b9
--- /dev/null
+++ b/lib/statsample/timeseries/pacf.rb
@@ -0,0 +1,100 @@
+module Statsample
+  module TimeSeries
+    module Pacf
+      class Pacf
+
+        def self.pacf_yw(timeseries, max_lags, method = 'yw')
+          #partial autocorrelation by yule walker equations.
+          #Inspiration: StatsModels
+          pacf = [1.0]
+          (1..max_lags).map do |i|
+            pacf << yule_walker(timeseries, i, method)[-1]
+          end
+          pacf
+        end
+
+        def self.yule_walker(ts, k = 1, method='yw')
+          #From the series, estimates AR(p)(autoregressive) parameter
+          #using Yule-Waler equation. See -
+          #http://en.wikipedia.org/wiki/Autoregressive_moving_average_model
+
+          #parameters:
+          #ts = series
+          #k = order, default = 1
+          #method = can be 'yw' or 'mle'. If 'yw' then it is unbiased, denominator
+          #is (n - k)
+
+          #returns:
+          #rho => autoregressive coefficients
+          ts = ts - ts.mean
+          n = ts.size
+          if method.downcase.eql? 'yw'
+            #unbiased => denominator = (n - k)
+            denom =->(k) { n - k }
+          else
+            #mle
+            #denominator => (n)
+            denom =->(k) { n }
+          end
+          r = Array.new(k + 1) { 0.0 }
+          r[0] = ts.map { |x| x ** 2 }.inject(:+).to_f / denom.call(0).to_f
+
+          1.upto(k) do |l|
+            r[l] = (ts[0...-l].zip(ts[l...ts.size])).map do |x|
+              x.inject(:*)
+            end.inject(:+).to_f / denom.call(l).to_f
+          end
+
+          r_R = toeplitz(r[0...-1])
+
+          mat = Matrix.columns(r_R).inverse()
+          solve_matrix(mat, r[1..r.size])
+        end
+
+        def self.toeplitz(arr)
+          #Generates Toeplitz matrix -
+          #http://en.wikipedia.org/wiki/Toeplitz_matrix
+          #Toeplitz matrix are equal when they are stored in row &
+          #column major
+          #=> arr = [0, 1, 2, 3]
+          #=> result:
+          #[[0, 1, 2, 3],
+          # [1, 0, 1, 2],
+          # [2, 1, 0, 1],
+          # [3, 2, 1, 0]]
+          eplitz_matrix = Array.new(arr.size) { Array.new(arr.size) }
+
+          0.upto(arr.size - 1) do |i|
+            j = 0
+            index = i
+            while i >= 0 do
+              eplitz_matrix[index][j] = arr[i]
+              j += 1
+              i -= 1
+            end
+            i = index + 1; k = 1
+            while i < arr.size do
+              eplitz_matrix[index][j] = arr[k]
+              i += 1; j += 1; k += 1
+            end
+          end
+          eplitz_matrix
+        end
+
+        def self.solve_matrix(matrix, out_vector)
+          solution_vector = Array.new(out_vector.size, 0)
+          matrix = matrix.to_a
+          k = 0
+          matrix.each do |row|
+            row.each_with_index do |element, i|
+              solution_vector[k] += element * 1.0 * out_vector[i]
+            end
+            k += 1
+          end
+          solution_vector
+        end
+
+      end
+    end
+  end
+end

From 8849e9e1e50503eb0ea5e28821c4aa6b1597c480 Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Thu, 11 Jul 2013 15:43:19 +0530
Subject: [PATCH 19/22] Listed cucumber gem in gemspec, added 'features'
 directory in test_files

---
 features/pacf.feature | 3 +++
 statsample.gemspec    | 3 ++-
 2 files changed, 5 insertions(+), 1 deletion(-)
 create mode 100644 features/pacf.feature

diff --git a/features/pacf.feature b/features/pacf.feature
new file mode 100644
index 0000000..75d7025
--- /dev/null
+++ b/features/pacf.feature
@@ -0,0 +1,3 @@
+Feature: PACF
+
+
diff --git a/statsample.gemspec b/statsample.gemspec
index de16f4d..34798d9 100644
--- a/statsample.gemspec
+++ b/statsample.gemspec
@@ -13,7 +13,7 @@ Gem::Specification.new do |s|
   s.homepage           = "http://github.com/clbustos/statsample"
   s.require_path       = "lib"
   s.rubyforge_project  = "ruby-statsample"
-  s.test_files         = `git ls-files -- {test}/*`.split("\n")
+  s.test_files         = `git ls-files -- test features`.split("\n")
 
   DEPENDENCIES         = [{:gem => "spreadsheet", :version => "~> 0.8.5"}, {:gem => "reportbuilder", :version => "~> 1.4.2"},
    {:gem => "minimization", :version => "~> 0.2.1"}, {:gem => "fastercsv", :version => "~> 1.5.5"}, 
@@ -22,6 +22,7 @@ Gem::Specification.new do |s|
    {:gem => "rubyvis", :version => "~> 0.5.2"}, {:gem => "gettext", :version => "~> 2.3.9"},
    {:gem => "mocha", :version => "~> 0.14.0"}, {:gem => "hoe-git", :version => "~> 1.5.0"},
    {:gem => "minitest", :version => "~> 4.2"}, {:gem => "shoulda", :version => "~> 3.5.0"},
+   {:gem => "cucumber", :version => "~>1.3.3"}
   ]
   DEPENDENCIES.each do |dependency|
     s.add_dependency(dependency[:gem], dependency[:version])

From 3616c9ecb30e58dacf3d8340867e8e8afb33955f Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Thu, 11 Jul 2013 15:44:11 +0530
Subject: [PATCH 20/22] Full featured cucumber tests for pacf module.

---
 features/pacf.feature        | 39 ++++++++++++++++++++++++++++++++++++
 features/step_definitions.rb | 39 ++++++++++++++++++++++++++++++++++++
 2 files changed, 78 insertions(+)
 create mode 100644 features/step_definitions.rb

diff --git a/features/pacf.feature b/features/pacf.feature
index 75d7025..ff13afd 100644
--- a/features/pacf.feature
+++ b/features/pacf.feature
@@ -1,3 +1,42 @@
 Feature: PACF
 
+  As a statistician
+  So that I can quickly evaluate partial autocorrelation of a series
+  I want to evaluate pacf
 
+Background: a timeseries
+
+  Given the following values in a timeseries:
+    | timeseries |
+    | 10  20  30  40  50  60  70  80  90  100 |
+    | 110 120 130 140 150 160 170 180 190 200 |
+
+Scenario: check pacf for 10 lags with unbiased
+  When I provide 10 lags for pacf
+  When I provide yw yule walker as method
+  Then I should get Array as resultant output
+  Then I should get 11 values in resultant pacf
+
+Scenario: check pacf for 5 lags with mle
+  When I provide 5 lags for pacf
+  When I provide mle yule walker as method
+  Then I should get Array as resultant output
+  Then I should get 6 values in resultant pacf
+
+Scenario: check first value of pacf
+  When I provide 5 lags for pacf
+  When I provide yw yule walker as method
+  Then I should get Array as resultant output
+  And I should see 1.0 as first value
+
+Scenario: check all values in pacf for 5 lags with mle
+  When I provide 5 lags for pacf
+  When I provide mle yule walker as method
+  Then I should get Array as resultant output
+  And I should see "1.0, 0.85, -0.07566212829370711, -0.07635069706072706, -0.07698628638512295, -0.07747034005560738" as complete series 
+
+Scenario: check all values in pacf for 5 lags with unbiased
+  When I provide 5 lags for pacf
+  When I provide yw yule walker as method
+  Then I should get Array as resultant output
+  And I should see "1.0, 0.8947368421052632, -0.10582010582010604, -0.11350188273265083, -0.12357534824820737, -0.13686534216335522" as complete series 
diff --git a/features/step_definitions.rb b/features/step_definitions.rb
new file mode 100644
index 0000000..2c0539b
--- /dev/null
+++ b/features/step_definitions.rb
@@ -0,0 +1,39 @@
+require 'statsample'
+require 'debugger'
+include Statsample::TimeSeries
+
+Given /^the following values in a timeseries:$/ do |series|
+  arr = []
+  series.hashes.each do |sequence|
+    arr += sequence['timeseries'].split(' ').map(&:to_i).to_ts
+  end
+  @timeseries = arr.to_ts
+end
+
+When /^I provide (\d+) lags for pacf$/ do |lags|
+  @lags = lags.to_i
+end
+
+When /^I provide (\w+) yule walker as method$/ do |method|
+  @method = method
+end
+
+Then /^I should get (\w+) as resultant output$/ do |klass|
+  @result = @timeseries.pacf(@lags, @method)
+  assert_equal @result.class.to_s, klass
+end
+
+Then /^I should get (\w+) values in resultant pacf$/ do |values_count|
+  assert_equal @result.size, values_count.to_i
+  @timeseries
+end
+
+And /^I should see (\d+\.\d) as first value$/ do |first_value|
+  assert_equal @result.first, first_value.to_f
+end
+
+And /^I should see \"(.+)\" as complete series$/ do |series|
+  series = series.split(',').map(&:to_f)
+  assert_equal @result, series
+end
+

From 46287c0501b6ac2452ee9d4dbeb17f485292a8b6 Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Thu, 11 Jul 2013 16:17:30 +0530
Subject: [PATCH 21/22] Adding cucumber tests for autocorrelation.

---
 features/acf.feature             | 31 +++++++++++++++++++++++++++++++
 features/step_definitions.rb     |  6 ++----
 features/step_definitions_acf.rb |  9 +++++++++
 3 files changed, 42 insertions(+), 4 deletions(-)
 create mode 100644 features/acf.feature
 create mode 100644 features/step_definitions_acf.rb

diff --git a/features/acf.feature b/features/acf.feature
new file mode 100644
index 0000000..da31ed1
--- /dev/null
+++ b/features/acf.feature
@@ -0,0 +1,31 @@
+Feature: ACF
+
+  As a statistician
+  So that I can evaluate autocorrelation of a series
+  I want to evaluate acf
+
+Background: a timeseries
+
+  Given the following values in a timeseries:
+    | timeseries |
+    | 10  20  30  40  50  60  70  80  90  100 |
+    | 110 120 130 140 150 160 170 180 190 200 |
+
+Scenario: cross-check acf for 10 lags
+  When I provide 10 lags for acf
+  And I calculate acf
+  Then I should get 11 values in resultant acf
+  And I should see "1.0, 0.85, 0.7015037593984963, 0.556015037593985, 0.4150375939849624, 0.2800751879699248, 0.15263157894736842, 0.034210526315789476, -0.07368421052631578, -0.16954887218045114, -0.2518796992481203" as complete series 
+
+Scenario: cross-check acf for 5 lags
+  When I provide 5 lags for acf
+  And I calculate acf
+  Then I should get 6 values in resultant acf
+  And I should see "1.0, 0.85, 0.7015037593984963, 0.556015037593985, 0.4150375939849624, 0.2800751879699248" as complete series
+
+Scenario: first value should be 1.0
+  When I provide 2 lags for acf
+  And I calculate acf
+  Then I should get 3 values in resultant acf
+  And I should see 1.0 as first value
+
diff --git a/features/step_definitions.rb b/features/step_definitions.rb
index 2c0539b..aefc354 100644
--- a/features/step_definitions.rb
+++ b/features/step_definitions.rb
@@ -1,5 +1,4 @@
 require 'statsample'
-require 'debugger'
 include Statsample::TimeSeries
 
 Given /^the following values in a timeseries:$/ do |series|
@@ -10,7 +9,7 @@
   @timeseries = arr.to_ts
 end
 
-When /^I provide (\d+) lags for pacf$/ do |lags|
+When /^I provide (\d+) lags for p?acf$/ do |lags|
   @lags = lags.to_i
 end
 
@@ -23,9 +22,8 @@
   assert_equal @result.class.to_s, klass
 end
 
-Then /^I should get (\w+) values in resultant pacf$/ do |values_count|
+Then /^I should get (\w+) values in resultant p?acf$/ do |values_count|
   assert_equal @result.size, values_count.to_i
-  @timeseries
 end
 
 And /^I should see (\d+\.\d) as first value$/ do |first_value|
diff --git a/features/step_definitions_acf.rb b/features/step_definitions_acf.rb
new file mode 100644
index 0000000..ca5479c
--- /dev/null
+++ b/features/step_definitions_acf.rb
@@ -0,0 +1,9 @@
+require 'statsample'
+require 'debugger'
+include Statsample::TimeSeries
+
+#all instance variable and cucumber DSL s DRYed up in step_definitions.rb
+And /^I calculate acf$/ do
+  @result = @timeseries.acf(@lags)
+end
+

From e74ec7e35baf4e0c3752837b58b454e9085ff5de Mon Sep 17 00:00:00 2001
From: Ankur Goel <ankurgel@gmail.com>
Date: Sun, 14 Jul 2013 17:49:36 +0530
Subject: [PATCH 22/22] basic prototype of arima module. currently ongoing:
 general simulation for AR(p)

---
 lib/statsample/arima.rb | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)
 create mode 100644 lib/statsample/arima.rb

diff --git a/lib/statsample/arima.rb b/lib/statsample/arima.rb
new file mode 100644
index 0000000..70a5dc8
--- /dev/null
+++ b/lib/statsample/arima.rb
@@ -0,0 +1,28 @@
+require 'debugger'
+module Statsample
+  module ARIMA
+    class ARIMA < Statsample::TimeSeries
+
+      def arima(ds, p, i, q)
+        if q.zero? 
+          self.ar(p)
+        elsif p.zero?
+          self.ma(p)
+        end
+      end
+
+      def ar(p)
+        #AutoRegressive part of model
+        #http://en.wikipedia.org/wiki/Autoregressive_model#Definition
+        #For finding parameters(to fit), we will use either Yule-walker
+        #or Burg's algorithm(more efficient)
+
+        degugger
+
+      end
+
+      def yule_walker()
+      end
+    end
+  end
+end