Mestrado em Bioinformática, Universidade do Minho, 2022-2023.
A package of machine learning algorithms to grasp the concepts of the course. Students should implement essential algorithms from scratch using numpy and pandas. Implementations must follow a common and simple API.
To get started, fork the repository from GitHub and clone it to your local machine.
Fork the following GitHub repository: https://github.com/cruz-f/si
Then, clone the repository to your local machine:
git clone https://github.com/YOUR_USERNAME/si.git
Open the repository in your favorite IDE and install the dependencies (if missing):
pip install -r requirements.txt
or
pip install numpy pandas scipy matplotlib
Note: You can also create a similar Python package and push it to your GitHub.
Make a change to the repository: Add your co-authorship to the __init__.py file (within the si folder):
__author__ = "YOUR_NAME"
__credits__ = ["YOUR_NAME"]
__license__ = "Apache License 2.0"
__version__ = "0.0.1"
__maintainer__ = "YOUR_NAME"
__email__ = "YOUR_EMAIL"
Then, commit it to your local repository and publish it to your GitHub:
git add README.md
git commit -m "Adding my co-authorship to README.md file"
git push origin main
Note: you can also use the IDE Git tools.
The package is organized as follows:
si
├── src
│ ├── si
│ │ ├── __init__.py
│ │ ├── data
│ │ │ ├── __init__.py
├── datasets
│ ├── README.md
│ ├── ...
├── scripts
│ ├── README.md
│ ├── ...
├── ... (python package configuration files)
A tour to Python packages:
- The src folder contains the source code of the package. It should contain an intermediate file called si (the name of the package) and the modules of the package. All python packages and subpackages must also contain a file called __init__.py.
- The datasets folder contains the datasets used in the scripts.
- The scripts folder contains the scripts used to test the package and include examples.
Note: It is also common to have a tests folder to include the unit tests of the package. However, we will not cover this topic in this course.
Note: A python package also contains many configuration files (e.g., setup.py, requirements.txt, etc.).
All datasets are available at: https://www.dropbox.com/sh/oas4yru2r9n61hk/AADpRunbqES44W49gx9deRN5a?dl=0
This package is heavily inspired and adapted from https://github.com/vmspereira/si.