Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CodeReview Round 1 #14

Open
wants to merge 23 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
af7b624
Added Gemspec with new gems listing
AnkurGel Apr 24, 2013
2efd9fb
Converted README to proper markdown. Easily readable now.
AnkurGel Apr 24, 2013
80f7a73
Added specs for Simple Random Sampling(SRS) and Time Series Analysis.
AnkurGel Apr 24, 2013
b95ca73
Added tests for StratifiedSample with proper comments on their signif…
AnkurGel May 1, 2013
fcfa01e
corrected doc typo
AnkurGel Jun 22, 2013
47ff467
Regression tests
AnkurGel Jun 22, 2013
734f3be
Repaired F tests, removed shoulda references
AnkurGel Jun 23, 2013
acc618e
Prepared gemspec with updated version of dependencies for proper buil…
AnkurGel Jun 23, 2013
05aa48e
Merged from clbustos. Merge remote-tracking branch 'upstream/master'
AnkurGel Jun 26, 2013
fd06759
Wald test implemented
AnkurGel Jun 30, 2013
67aa8c9
Implemented wald tests for acf mean, variance and distributed chi-square
AnkurGel Jun 30, 2013
ac99e29
Wald test. To test acf and fit on an ARIMA model.
AnkurGel Jul 1, 2013
99b31c5
Implementing pacf with yule walker for unbiased and mle method.
AnkurGel Jul 3, 2013
2baf8a0
Implemented pacf with yule-walker.
AnkurGel Jul 5, 2013
935e107
Corrected a loop mistake in pacf implementation.
AnkurGel Jul 5, 2013
d83c19b
Additional comments on pacf usage documented
AnkurGel Jul 6, 2013
1efd7fc
Initial tests for pacf
AnkurGel Jul 6, 2013
05dad29
Tested assertions for pacf in range (1..10)
AnkurGel Jul 7, 2013
623acbf
Abstracted code for Pacf in a new Statsample::TimeSeries::Pacf module…
AnkurGel Jul 8, 2013
8849e9e
Listed cucumber gem in gemspec, added 'features' directory in test_files
AnkurGel Jul 11, 2013
3616c9e
Full featured cucumber tests for pacf module.
AnkurGel Jul 11, 2013
46287c0
Adding cucumber tests for autocorrelation.
AnkurGel Jul 11, 2013
e74ec7e
basic prototype of arima module.
AnkurGel Jul 14, 2013
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 61 additions & 50 deletions README.txt → README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
= Statsample
Statsample
==========

http://ruby-statsample.rubyforge.org/
[http://ruby-statsample.rubyforge.org/](http://ruby-statsample.rubyforge.org/)


== DESCRIPTION:
DESCRIPTION:
------------

A suite for basic and advanced statistics on Ruby. Tested on Ruby 1.8.7, 1.9.1, 1.9.2 (April, 2010), ruby-head(June, 2011) and JRuby 1.4 (Ruby 1.8.7 compatible).

Include:

* Descriptive statistics: frequencies, median, mean, standard error, skew, kurtosis (and many others).
* Imports and exports datasets from and to Excel, CSV and plain text files.
* Correlations: Pearson's r, Spearman's rank correlation (rho), point biserial, tau a, tau b and gamma. Tetrachoric and Polychoric correlation provides by +statsample-bivariate-extension+ gem.
Expand All @@ -24,9 +27,10 @@ Include:
* Creates reports on text, html and rtf, using ReportBuilder gem
* Graphics: Histogram, Boxplot and Scatterplot

== PRINCIPLES
PRINCIPLES
----------

* Software Design:
* Software Design:
* One module/class for each type of analysis
* Options can be set as hash on initialize() or as setters methods
* Clean API for interactive sessions
Expand All @@ -36,13 +40,14 @@ Include:
* Statistical Design
* Results are tested against text results, SPSS and R outputs.
* Go beyond Null Hiphotesis Testing, using confidence intervals and effect sizes when possible
* (When possible) All references for methods are documented, providing sensible information on documentation
* (When possible) All references for methods are documented, providing sensible information on documentation

== FEATURES:
FEATURES:
--------

* Classes for manipulation and storage of data:
* Statsample::Vector: An extension of an array, with statistical methods like sum, mean and standard deviation
* Statsample::Dataset: a group of Statsample::Vector, analog to a excel spreadsheet or a dataframe on R. The base of almost all operations on statsample.
* Statsample::Dataset: a group of Statsample::Vector, analog to a excel spreadsheet or a dataframe on R. The base of almost all operations on statsample.
* Statsample::Multiset: multiple datasets with same fields and type of vectors
* Anova module provides generic Statsample::Anova::OneWay and vector based Statsample::Anova::OneWayWithVectors. Also you can create contrast using Statsample::Anova::Contrast
* Module Statsample::Bivariate provides covariance and pearson, spearman, point biserial, tau a, tau b, gamma, tetrachoric (see Bivariate::Tetrachoric) and polychoric (see Bivariate::Polychoric) correlations. Include methods to create correlation and covariance matrices
Expand All @@ -52,10 +57,10 @@ Include:
* Logit Regression: Statsample::Regression::Binomial::Logit
* Probit Regression: Statsample::Regression::Binomial::Probit
* Factorial Analysis algorithms on Statsample::Factor module.
* Classes for Extraction of factors:
* Classes for Extraction of factors:
* Statsample::Factor::PCA
* Statsample::Factor::PrincipalAxis
* Classes for Rotation of factors:
* Classes for Rotation of factors:
* Statsample::Factor::Varimax
* Statsample::Factor::Equimax
* Statsample::Factor::Quartimax
Expand All @@ -64,7 +69,7 @@ Include:
* Statsample::Factor::MAP performs Velicer's Minimum Average Partial (MAP) test, which retain components as long as the variance in the correlation matrix represents systematic variance.
* Dominance Analysis. Based on Budescu and Azen papers, dominance analysis is a method to analyze the relative importance of one predictor relative to another on multiple regression
* Statsample::DominanceAnalysis class can report dominance analysis for a sample, using uni or multivariate dependent variables
* Statsample::DominanceAnalysis::Bootstrap can execute bootstrap analysis to determine dominance stability, as recomended by Azen & Budescu (2003) link[http://psycnet.apa.org/journals/met/8/2/129/].
* Statsample::DominanceAnalysis::Bootstrap can execute bootstrap analysis to determine dominance stability, as recomended by Azen & Budescu (2003) link[http://psycnet.apa.org/journals/met/8/2/129/].
* Module Statsample::Codification, to help to codify open questions
* Converters to import and export data:
* Statsample::Database : Can create sql to create tables, read and insert data
Expand All @@ -73,15 +78,15 @@ Include:
* Statsample::Mx : Write Mx Files
* Statsample::GGobi : Write Ggobi files
* Module Statsample::Crosstab provides function to create crosstab for categorical data
* Module Statsample::Reliability provides functions to analyze scales with psychometric methods.
* Module Statsample::Reliability provides functions to analyze scales with psychometric methods.
* Class Statsample::Reliability::ScaleAnalysis provides statistics like mean, standard deviation for a scale, Cronbach's alpha and standarized Cronbach's alpha, and for each item: mean, correlation with total scale, mean if deleted, Cronbach's alpha is deleted.
* Class Statsample::Reliability::MultiScaleAnalysis provides a DSL to easily analyze reliability of multiple scales and retrieve correlation matrix and factor analysis of them.
* Class Statsample::Reliability::ICC provides intra-class correlation, using Shrout & Fleiss(1979) and McGraw & Wong (1996) formulations.
* Module Statsample::SRS (Simple Random Sampling) provides a lot of functions to estimate standard error for several type of samples
* Module Statsample::Test provides several methods and classes to perform inferencial statistics
* Statsample::Test::BartlettSphericity
* Statsample::Test::ChiSquare
* Statsample::Test::F
* Statsample::Test::F
* Statsample::Test::KolmogorovSmirnov (only D value)
* Statsample::Test::Levene
* Statsample::Test::UMannWhitney
Expand All @@ -90,85 +95,91 @@ Include:
* Statsample::Graph::Boxplot
* Statsample::Graph::Histogram
* Statsample::Graph::Scatterplot
* Module Statsample::TimeSeries provides basic support for time series.
* Module Statsample::TimeSeries provides basic support for time series.
* Gem +statsample-sem+ provides a DSL to R libraries +sem+ and +OpenMx+
* Close integration with gem <tt>reportbuilder</tt>, to easily create reports on text, html and rtf formats.

== Examples of use:
* Close integration with gem `reportbuilder`, to easily create reports on text, html and rtf formats.

See multiples examples of use on [http://github.com/clbustos/statsample/tree/master/examples/]
Examples of use:
--------------

=== Boxplot
See multiples examples of use on [http://github.com/clbustos/statsample/tree/master/examples/](http://github.com/clbustos/statsample/tree/master/examples/)

Boxplot
-------
```ruby
require 'statsample'
ss_analysis(Statsample::Graph::Boxplot) do
ss_analysis(Statsample::Graph::Boxplot) do
n=30
a=rnorm(n-1,50,10)
b=rnorm(n, 30,5)
c=rnorm(n,5,1)
a.push(2)
boxplot(:vectors=>[a,b,c], :width=>300, :height=>300, :groups=>%w{first first second}, :minimum=>0)
end
end
Statsample::Analysis.run # Open svg file on *nix application defined

=== Correlation matrix

```
Correlation matrix
------------------
```ruby
require 'statsample'
# Note R like generation of random gaussian variable
# and correlation matrix

ss_analysis("Statsample::Bivariate.correlation_matrix") do
samples=1000
ds=data_frame(
'a'=>rnorm(samples),
'a'=>rnorm(samples),
'b'=>rnorm(samples),
'c'=>rnorm(samples),
'd'=>rnorm(samples))
cm=cor(ds)
cm=cor(ds)
summary(cm)
end

Statsample::Analysis.run_batch # Echo output to console

Statsample::Analysis.run_batch # Echo output to console
```

== REQUIREMENTS:
REQUIREMENTS:
-------------

Optional:
Optional:

* Plotting: gnuplot and rbgnuplot, SVG::Graph
* Factorial analysis and polychorical correlation(joint estimate and polychoric series): gsl library and rb-gsl (http://rb-gsl.rubyforge.org/). You should install it using <tt>gem install gsl</tt>.

<b>Note</b>: Use gsl 1.12.109 or later.
* Factorial analysis and polychorical correlation(joint estimate and polychoric series): gsl library and rb-gsl [http://rb-gsl.rubyforge.org/](http://rb-gsl.rubyforge.org/). You should install it using `gem install gsl`.

== RESOURCES
**Note**: Use gsl 1.12.109 or later.

* Source code on github: http://github.com/clbustos/statsample
* API: http://ruby-statsample.rubyforge.org/statsample/
* Bug report and feature request: http://github.com/clbustos/statsample/issues
* E-mailing list: http://groups.google.com/group/statsample
RESOURCES:
----------

== INSTALL:
* Source code on github: [http://github.com/clbustos/statsample](http://github.com/clbustos/statsample)
* API: [http://ruby-statsample.rubyforge.org/statsample/](http://ruby-statsample.rubyforge.org/statsample/)
* Bug report and feature request: [http://github.com/clbustos/statsample/issues](http://github.com/clbustos/statsample/issues)
* E-mailing list: [http://groups.google.com/group/statsample](http://groups.google.com/group/statsample)

$ sudo gem install statsample
INSTALL:
---------
`$ sudo gem install statsample`

On *nix, you should install statsample-optimization to retrieve gems gsl, statistics2 and a C extension to speed some methods.
On \*nix, you should install statsample-optimization to retrieve gems gsl, statistics2 and a C extension to speed some methods.

There are available precompiled version for Ruby 1.9 on x86, x86_64 and mingw32 archs.
There are available precompiled version for Ruby 1.9 on x86, x86\_64 and mingw32 archs.

$ sudo gem install statsample-optimization
`$ sudo gem install statsample-optimization`

If you use Ruby 1.8, you should compile statsample-optimization, usign parameter <tt>--platform ruby</tt>
If you use Ruby 1.8, you should compile statsample-optimization, usign parameter `--platform ruby`

$ sudo gem install statsample-optimization --platform ruby
`$ sudo gem install statsample-optimization --platform ruby`

If you need to work on Structural Equation Modeling, you could see +statsample-sem+. You need R with +sem+ or +OpenMx+ [http://openmx.psyc.virginia.edu/] libraries installed
If you need to work on Structural Equation Modeling, you could see _statsample-sem_. You need R with _sem_ or _OpenMx_ [http://openmx.psyc.virginia.edu/](http://openmx.psyc.virginia.edu/) libraries installed

$ sudo gem install statsample-sem
`$ sudo gem install statsample-sem`

Available setup.rb file

sudo gem ruby setup.rb
`sudo gem ruby setup.rb`

== LICENSE:
LICENSE:
-------

GPL-2 (See LICENSE.txt)
31 changes: 31 additions & 0 deletions features/acf.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
Feature: ACF

As a statistician
So that I can evaluate autocorrelation of a series
I want to evaluate acf

Background: a timeseries

Given the following values in a timeseries:
| timeseries |
| 10 20 30 40 50 60 70 80 90 100 |
| 110 120 130 140 150 160 170 180 190 200 |

Scenario: cross-check acf for 10 lags
When I provide 10 lags for acf
And I calculate acf
Then I should get 11 values in resultant acf
And I should see "1.0, 0.85, 0.7015037593984963, 0.556015037593985, 0.4150375939849624, 0.2800751879699248, 0.15263157894736842, 0.034210526315789476, -0.07368421052631578, -0.16954887218045114, -0.2518796992481203" as complete series

Scenario: cross-check acf for 5 lags
When I provide 5 lags for acf
And I calculate acf
Then I should get 6 values in resultant acf
And I should see "1.0, 0.85, 0.7015037593984963, 0.556015037593985, 0.4150375939849624, 0.2800751879699248" as complete series

Scenario: first value should be 1.0
When I provide 2 lags for acf
And I calculate acf
Then I should get 3 values in resultant acf
And I should see 1.0 as first value

42 changes: 42 additions & 0 deletions features/pacf.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
Feature: PACF

As a statistician
So that I can quickly evaluate partial autocorrelation of a series
I want to evaluate pacf

Background: a timeseries

Given the following values in a timeseries:
| timeseries |
| 10 20 30 40 50 60 70 80 90 100 |
| 110 120 130 140 150 160 170 180 190 200 |

Scenario: check pacf for 10 lags with unbiased
When I provide 10 lags for pacf
When I provide yw yule walker as method
Then I should get Array as resultant output
Then I should get 11 values in resultant pacf

Scenario: check pacf for 5 lags with mle
When I provide 5 lags for pacf
When I provide mle yule walker as method
Then I should get Array as resultant output
Then I should get 6 values in resultant pacf

Scenario: check first value of pacf
When I provide 5 lags for pacf
When I provide yw yule walker as method
Then I should get Array as resultant output
And I should see 1.0 as first value

Scenario: check all values in pacf for 5 lags with mle
When I provide 5 lags for pacf
When I provide mle yule walker as method
Then I should get Array as resultant output
And I should see "1.0, 0.85, -0.07566212829370711, -0.07635069706072706, -0.07698628638512295, -0.07747034005560738" as complete series

Scenario: check all values in pacf for 5 lags with unbiased
When I provide 5 lags for pacf
When I provide yw yule walker as method
Then I should get Array as resultant output
And I should see "1.0, 0.8947368421052632, -0.10582010582010604, -0.11350188273265083, -0.12357534824820737, -0.13686534216335522" as complete series
37 changes: 37 additions & 0 deletions features/step_definitions.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
require 'statsample'
include Statsample::TimeSeries

Given /^the following values in a timeseries:$/ do |series|
arr = []
series.hashes.each do |sequence|
arr += sequence['timeseries'].split(' ').map(&:to_i).to_ts
end
@timeseries = arr.to_ts
end

When /^I provide (\d+) lags for p?acf$/ do |lags|
@lags = lags.to_i
end

When /^I provide (\w+) yule walker as method$/ do |method|
@method = method
end

Then /^I should get (\w+) as resultant output$/ do |klass|
@result = @timeseries.pacf(@lags, @method)
assert_equal @result.class.to_s, klass
end

Then /^I should get (\w+) values in resultant p?acf$/ do |values_count|
assert_equal @result.size, values_count.to_i
end

And /^I should see (\d+\.\d) as first value$/ do |first_value|
assert_equal @result.first, first_value.to_f
end

And /^I should see \"(.+)\" as complete series$/ do |series|
series = series.split(',').map(&:to_f)
assert_equal @result, series
end

9 changes: 9 additions & 0 deletions features/step_definitions_acf.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
require 'statsample'
require 'debugger'
include Statsample::TimeSeries

#all instance variable and cucumber DSL s DRYed up in step_definitions.rb
And /^I calculate acf$/ do
@result = @timeseries.acf(@lags)
end

2 changes: 1 addition & 1 deletion lib/statsample/analysis.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

module Statsample
# DSL to create analysis without hazzle.
# * Shortcuts methods to avoid use complete namescapes, many based on R
# * Shortcuts methods to avoid use complete namespaces, many based on R
# * Attach/detach vectors to workspace, like R
# == Example
# an1=Statsample::Analysis.store(:first) do
Expand Down
28 changes: 28 additions & 0 deletions lib/statsample/arima.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
require 'debugger'
module Statsample
module ARIMA
class ARIMA < Statsample::TimeSeries

def arima(ds, p, i, q)
if q.zero?
self.ar(p)
elsif p.zero?
self.ma(p)
end
end

def ar(p)
#AutoRegressive part of model
#http://en.wikipedia.org/wiki/Autoregressive_model#Definition
#For finding parameters(to fit), we will use either Yule-walker
#or Burg's algorithm(more efficient)

degugger

end

def yule_walker()
end
end
end
end
Loading