This project and corresponding website are no longer maintained by Two Sigma. We continue to encourage independent development.
This repo is still a work in progress and not ready for public collaboration.
Smooshr 2.0 (new name TBD) is a no-code data pipeline builder and runner that allows users to configure a repeatable set of steps to process, clean up, and validate data.
- Clone this repo
- Install JavaScript dependencies with
yarn install
- Create your python virtual environment:
python -m venv venv
- Start your virtual environment:
source venv/bin/activate
- Install pip-tools:
pip install pip-tools
- Install python dependencies:
yarn py-install
- Set up your env files locally
cp .env.frontend.sample .env.frontend && cp .env.server.sample .env.server
NOTE: Python dependencies are managed in the requirements.in
file. Think of requirements.in
as your package.json
equivalent. When you want to add a new Python library, add it to requirements.in
and then run yarn py-install
. This will update requirements.txt
with the correct versions and install them. You should not use pip install [lib]
directly because this does not update the requirements.txt
file automatically.
Now you are ready to run the app.
The frontend is served using Vite.
- Make sure you're up to date on your frontend dependencies with
yarn install
. - Start the app with
yarn dev
The API server is build with FastAPI.
- Activate your Python venv (if you haven't already):
source venv/bin/activate
- Run
yarn db-upgrade
to make sure your database is created and up-to-date. - Start the API server:
yarn api
The auto-generated API docs (using Swagger) can be found at localhost:8000/docs
Unit tests for the server can be run by running yarn api-test
.
If you made any changes to server/models
ask yourself the following questions:
Did you create a new model?
If yes, then add the new model to models/__init__.py
so it can be picked up by SQLAlchemy when it imports the models
directory.
Next, run yarn db-new-migration "[migration title]"
to create a new db migration script that adds a database table for this model. Read the managing database migrations section for more information.
Do your updates require a database migration?
If you are updating a model that gets written to the database then it's highly likely this will require a database migration.
- Run
yarn db-new-migration "[migration title]"
to autogenerate a migration script. - Go to
migrations/versions
and open your new migration script. Alembic tries to autogenerate the migration code. It's generally successful with simple migrations, like adding new columns, but it doesn't know what to do for more complicated migrations that involve editing an existing column. - Verify that your auto-generated migration script is correct. Otherwise, manually edit it.
- IMPORTANT: also implement the downgrade function. As a rule, running an upgrade followed by a downgrade should result in the original database without any loss of data.
- When ready, run
yarn db-upgrade
to test your migration. Verify it works. - Run
yarn db-downgrade
to test the downgrade. Verify you didn't lose any data.
If everything is good then you're ready to commit this change and submit a PR!
There are 3 commands you will need:
yarn db-new-migration "[migration title]"
This will autogenerate a migration script to update the database based on the specifications in server/models/
. Remember to always manually check and edit the script because the autogenerated code is usually only correct for simple migrations. Also remember that your downgrade function should be correct too.
yarn db-upgrade
Upgrades or initializes a database all the way to the latest version.
yarn db-downgrade
Downgrades the database by a single version.