Skip to content
This repository has been archived by the owner on Jan 9, 2025. It is now read-only.
/ smooshr-next Public archive

Smooshr 2.0 (name TBD) is a no-code data pipeline builder and runner that allows users to configure a repeatable set of steps to process, clean up, and validate data.

License

Notifications You must be signed in to change notification settings

tsdataclinic/smooshr-next

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This project and corresponding website are no longer maintained by Two Sigma. We continue to encourage independent development.

Smooshr 2.0

This repo is still a work in progress and not ready for public collaboration.

Smooshr 2.0 (new name TBD) is a no-code data pipeline builder and runner that allows users to configure a repeatable set of steps to process, clean up, and validate data.

Local development

Initial set up

  1. Clone this repo
  2. Install JavaScript dependencies with yarn install
  3. Create your python virtual environment: python -m venv venv
  4. Start your virtual environment: source venv/bin/activate
  5. Install pip-tools: pip install pip-tools
  6. Install python dependencies: yarn py-install
  7. Set up your env files locally cp .env.frontend.sample .env.frontend && cp .env.server.sample .env.server

NOTE: Python dependencies are managed in the requirements.in file. Think of requirements.in as your package.json equivalent. When you want to add a new Python library, add it to requirements.in and then run yarn py-install. This will update requirements.txt with the correct versions and install them. You should not use pip install [lib] directly because this does not update the requirements.txt file automatically.

Now you are ready to run the app.

Starting the Front-End Server

The frontend is served using Vite.

  1. Make sure you're up to date on your frontend dependencies with yarn install.
  2. Start the app with yarn dev

Starting the API Server

The API server is build with FastAPI.

  1. Activate your Python venv (if you haven't already): source venv/bin/activate
  2. Run yarn db-upgrade to make sure your database is created and up-to-date.
  3. Start the API server: yarn api

The auto-generated API docs (using Swagger) can be found at localhost:8000/docs

Unit tests for the server can be run by running yarn api-test.

Updating models

If you made any changes to server/models ask yourself the following questions:

Did you create a new model?

If yes, then add the new model to models/__init__.py so it can be picked up by SQLAlchemy when it imports the models directory.

Next, run yarn db-new-migration "[migration title]" to create a new db migration script that adds a database table for this model. Read the managing database migrations section for more information.

Do your updates require a database migration?

If you are updating a model that gets written to the database then it's highly likely this will require a database migration.

  1. Run yarn db-new-migration "[migration title]" to autogenerate a migration script.
  2. Go to migrations/versions and open your new migration script. Alembic tries to autogenerate the migration code. It's generally successful with simple migrations, like adding new columns, but it doesn't know what to do for more complicated migrations that involve editing an existing column.
  3. Verify that your auto-generated migration script is correct. Otherwise, manually edit it.
  4. IMPORTANT: also implement the downgrade function. As a rule, running an upgrade followed by a downgrade should result in the original database without any loss of data.
  5. When ready, run yarn db-upgrade to test your migration. Verify it works.
  6. Run yarn db-downgrade to test the downgrade. Verify you didn't lose any data.

If everything is good then you're ready to commit this change and submit a PR!

Managing database migrations

There are 3 commands you will need:

  1. yarn db-new-migration "[migration title]"

This will autogenerate a migration script to update the database based on the specifications in server/models/. Remember to always manually check and edit the script because the autogenerated code is usually only correct for simple migrations. Also remember that your downgrade function should be correct too.

  1. yarn db-upgrade

Upgrades or initializes a database all the way to the latest version.

  1. yarn db-downgrade

Downgrades the database by a single version.

About

Smooshr 2.0 (name TBD) is a no-code data pipeline builder and runner that allows users to configure a repeatable set of steps to process, clean up, and validate data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •