The Haskell Data Science Kit (HDSK) project is an attempt to create a well-documented, well-tested, and performant data science library implemented in the Haskell language.
Sources suggest that in spite of huge potential for performance gains over current de facto methods [1], adoption of Haskell in the data science community lags for a variety of reasons, the greatest of which seems to be the dearth [2] of easy-to-use data science libraries (indeed, searching for "data science" on GitHub yields 14 Haskell-language repositories and 5,807 Python-language repositories [3]). This project seeks to mediate that issue by presenting a unified (though modular) library of data science utilities which support the entire life-cycle of a data science project.
Disclaimer: At the time of writing, I am still a beginner in Haskell, and this project is as much about the above stated goal as it is about me learning and practicing Haskell itself and the software development ecosystem around it. So, I make no guarantees that I will give the most optimal or idiomatic solution to any given function (and in cases when I don't, pull requests are gladly welcomed!).
To use HDSK within your stack project, you must add this repository to the
extra-deps
list in stack.yaml
. NOTE: this step will change once HDSK is
released on Hackage.
extra-deps:
- git: [email protected]:wbadart/hdsk.git
commit: a52bed4216f607628e71594256dafd550ffe2d3e
The commit hash listed above is the most recent commit at the time of this writing. Be sure that the value you use is a recent enough to contain the features you need.
The cabal file generated by stack has been checked in, so if you aren't using stack, and are only using cabal, the library can be installed from a fresh clone of the repository.
Please see willbadart.com/hdsk for library documentation. Further project info, such as planned features, is made available on the wiki.
You'll notice a key theme in this document has been promoting adoption. As such, I'm developing and eventually releasing this project under the BSD-3-Clause [11] license, due to its general permissiveness. This is also one of the more popular licenses among the Haskell community [12].
Please see LICENSE for the full text.