An extensible framework and GUI for processing data files. TACT is a DMAC tool to help organize Python scripts by placing their functionality behind a control layer and an API, which users can access via the command line, or a GUI.
The goal of this project is to collect a wide array of processing scripts into a tool which is useful to both technical and non-technical users.
- Clone this repo locally
- Use Conda/Mamba to create a new environment from the
tact/environment.yml
file - Activate the new
tact
environment - Using VScode, add the following configs to your
launch.json
file:
{
"name": "TACT API",
"type": "python",
"request": "launch",
"env": {
"PYTHONPATH": "${workspaceRoot}"
},
"program": "API/tact_api.py",
"console": "integratedTerminal"
},
{
"name": "Steamlit GUI",
"type": "python",
"request": "launch",
"module": "streamlit",
"args": [
"run",
"UI/streamlit/TACT.py"
],
"env": {
"API_URL": "http://127.0.0.1:5000"
}
}
Note: You may need to provide absolute paths to the program files
- Launch
TACT API
from the Run and Debug menu - Launch
Steamlit GUI
from the Run and Debug menu - TACT should open in a new browser window
TACT is config-driven, and the basic workflow is as follows:
- Application starts up, reads config files
- User modifies config files via the API (e.g. by providing a file path to the target file)
- Config changes are saved
- User initiates specific operations via API
- Repeat
Diagram of TACT structure:
- Dataset cleaning (for tabular data)
- Standardize datetime formats
- Delete columns
- Normalize column headers
- Batch replace row values
- Drop duplicate rows
- Drop empty columns
- Dataset transformation
- Row enumeration (flipping from wide to long)
- Append constant values to transformed rows
- Drop records with value of
0
in target columns - Split input column into multiple columns
- Set occurrence status for each row
- Generate UUID for each record
- Row combination (find records that share 1 - n column values and combine them into a single row)
- Conditional Append (append values to all records that match a given criteria)
- Taxonomic name matching
- ERDDAP
datasets.xml
generation - IOOS QARTOD testing
- Ad-hoc dataset QA/QC
- API: Flask
- GUI: Streamlit
- Dataset operations: Pandas
- Package management: Micromamba
- Improve documentation (add local development steps)
- CI/CD steps
- Testing
- Move to a different GUI framework
- Complete in-process functionality
- Add new functionality!
TACT started as a purely internal tool, and is still a work in progress! Please feel free to open issues and PRs with improvements/fixes.