Ramp - Rapid Machine Learning Prototyping

Ramp is a python library for rapid prototyping of machine learning solutions. It's a light-weight pandas-based machine learning framework pluggable with existing python machine learning and statistics tools (scikit-learn, rpy2, etc.). Ramp provides a simple, declarative syntax for exploring features, algorithms and transformations quickly and efficiently.

Documentation: http://ramp.readthedocs.org

Why Ramp?

Clean, declarative syntax
Complex feature transformations

Chain and combine features:

Normalize(Log('x'))
Interactions([Log('x1'), (F('x2') + F('x3')) / 2])

Reduce feature dimension:

DimensionReduction([F('x%d'%i) for i in range(100)], decomposer=PCA(n_components=3))

Incorporate residuals or predictions to blend with other models:

Residuals(simple_model_def) + Predictions(complex_model_def)

Data context awareness

Any feature that uses the target ("y") variable will automatically respect the current training and test sets. Similarly, preparation data (a feature's mean and stdev, for example) is stored and tracked between data contexts.
Composability

All features, estimators, and their fits are composable, pluggable and storable.
Easy extensibility

Ramp has a simple API, allowing you to plug in estimators from scikit-learn, rpy2 and elsewhere, or easily build your own feature transformations, metrics, feature selectors, reporters, or estimators.

Quick start

Getting started with Ramp: Classifying insults

Or, the quintessential Iris example:

import pandas
from ramp import *
import urllib2
import sklearn
from sklearn import decomposition


# fetch and clean iris data from UCI
data = pandas.read_csv(urllib2.urlopen(
    "http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"))
data = data.drop([149]) # bad line
columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']
data.columns = columns


# all features
features = [FillMissing(f, 0) for f in columns[:-1]]

# features, log transformed features, and interaction terms
expanded_features = (
    features +
    [Log(F(f) + 1) for f in features] +
    [
        F('sepal_width') ** 2,
        combo.Interactions(features),
    ]
)


# Define several models and feature sets to explore,
# run 5 fold cross-validation on each and print the results.
# We define 2 models and 4 feature sets, so this will be
# 4 * 2 = 8 models tested.
shortcuts.cv_factory(
    data=data,

    target=[AsFactor('class')],
    metrics=[
        [metrics.GeneralizedMCC()],
        ],
    # report feature importance scores from Random Forest
    reporters=[
        [reporters.RFImportance()],
        ],

    # Try out two algorithms
    model=[
        sklearn.ensemble.RandomForestClassifier(
            n_estimators=20),
        sklearn.linear_model.LogisticRegression(),
        ],

    # and 4 feature sets
    features=[
        expanded_features,

        # Feature selection
        [trained.FeatureSelector(
            expanded_features,
            # use random forest's importance to trim
            selectors.RandomForestSelector(classifier=True),
            target=AsFactor('class'), # target to use
            n_keep=5, # keep top 5 features
            )],

        # Reduce feature dimension (pointless on this dataset)
        [combo.DimensionReduction(expanded_features,
                            decomposer=decomposition.PCA(n_components=4))],

        # Normalized features
        [Normalize(f) for f in expanded_features],
    ]
)

Status

Ramp is alpha currently, so expect bugs, bug fixes and API changes.

Requirements

Numpy
Scipy
Pandas
PyTables
Sci-kit Learn
gensim

Author

Ken Van Haren. Email with feedback/questions: [email protected] @squaredloss

Name		Name	Last commit message	Last commit date
Latest commit History 181 Commits
docs		docs
examples		examples
ramp		ramp
.gitignore		.gitignore
AUTHORS		AUTHORS
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ramp - Rapid Machine Learning Prototyping

Quick start

Status

Requirements

Author

Contributors

About

Releases

Packages

Languages

License

dchaid/ramp

Folders and files

Latest commit

History

Repository files navigation

Ramp - Rapid Machine Learning Prototyping

Quick start

Status

Requirements

Author

Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages