Skip to content

small minimal quantile and CDF estimator for multiple large data streams

Notifications You must be signed in to change notification settings

hville/sample-distribution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sample distribution

takes a large and continuous data stream and continuously only retain a reduced empirical CDF approximation

small, simple, no dependencies

ExampleFeaturesLimitationsWhyAPILicense

Example

import Recorder from 'sample-distribution'

// recorder with 5 retained samples
var recorder = new Recorder(5)
[5,3,6,2,7,1,8].forEach(recorder.push, recorder)

console.log('minimum:', recorder.Q(0)) // minimum:0
console.log('median:', recorder.Q(0.5)) // median:4
console.log('maximum:', recorder.Q(1)) // maximum:8

Features

  • constant memory use, no compression steps and/or triggered garbage collection
  • significantly faster than other implementations (about 3-5x faster)
  • mean-preserving compression (empirical cdf with same average as samples)
  • No value interpolation. Values are kept as-is, but ranks are interpolated.

Limitations

  • other than the mean, the other moments (variance, skew, kurtosis) are not preserved
  • The moments are those of the generated curve that approximate those of the source samples

Background and related projects

This module attempts to match the underlying CDF as closely as possible by adjusting local ranks to keep the local average when discarding values. By some measure, the root mean square error for each original sample value is 10 times better than the 2 other implementations above. Type npm run compare or see ./util/compare.js for a benchmark and error comparison for multiple distribution types.

API

Creation

  • var recorder = new Recorder(L): creates a recorder that will keep L values and associated L ranks

Properties - read-only getters

  • .N number: total samples received
  • .E number: average of samples received and of the resulting approximated cdf
  • .S number: standard variation of the resulting aproximated cdf
  • .V number: variance of the resulting aproximated cdf

Properties - internal

  • .vs array: internal store of retained sample values
  • .rs array: internal store of retained value approximated ranks

Methods

  • .push(number) void: sample value(s) to be added
  • .F(value:number) number: cummulative probability (cdf) of the specified value
  • .f(value:number) number: probability density (pdf) of the specified value
  • .Q(probability:number) number: estimated value for specified probability
  • .M(order:number) number: estimated origin moment (ie. E = M(1), V=M(2)-E^2)

Transferable through this.data

  • const main = new Recorder( new Float64Array(buffer, offset, length) )
  • const copy = new Recorder( main.data ) transferable TypedArray

License

MIT © Hugo Villeneuve

About

small minimal quantile and CDF estimator for multiple large data streams

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published