Kaisen_Yao_IDS706_Week3_Individual/
├── .devcontainer/
│ ├── devcontainer.json
│ └── Dockerfile
├── .github/
│ └── workflows/
│ ├── format.yml
│ ├── install.yml
│ ├── lint.yml
│ └── test.yml
├── .gitignore
├── Dockerfile
├── LICENSE
├── main.ipynb
├── main.py
├── Makefile
├── mylib/
│ ├── __init__.py
│ └── lib.py
├── README.md
├── repeat.sh
├── requirements.txt
├── setup.sh
├── test_lib.py
└── test_main.py
The purpose of this project is to build upon the last three mini-projects to simulate best practices of continuous integration in Data Science projects. The project uses a dataset that provides an urbanization index for U.S. congressional districts. It contains details like urbanization index, rural and urban population distributions, and partisan lean.
- Open codespaces
- Wait for container to be built and pinned requirements from
requirements.txt
to be installed - If running locally,
git clone
the repository and usemake install
- Format code
make format
- Lint code
make lint
- Test code
make test
Whenever code is pushed to the repository, the following will be automatically generated and committed via GitHub Actions:
- Descriptive statistics of the dataset.
- Visualizations, including:
- Urbanization Index Distribution (Histogram)
- Urbanization Grouping Over Time (Line Chart)
- Population Distribution by District Type (Bar Chart)
The descriptive statistics and vizualizations are generated whenever an individaul pushes to my repository via actions-user
using make generate_and_push
. You can find them here descriptive statistics and vizualizations