Heart Disease Indicators Data Processing and Visualization

Project Overview

The main objective of this project is to develop a Python program for processing and visualizing data related to key health indicators of patients with heart disease.

Dataset

The dataset comprises 319,795 patients and includes 18 variables such as BMI, smoking habits, alcohol consumption, and various health conditions. Here's a brief description of each variable:

Column Name	Description
`HeartDisease`	Indicates whether the patient has coronary artery disease or has suffered a myocardial infarction (binary: Yes/No).
`BMI`	Body Mass Index, calculated from the height and weight of the patient (numeric).
`Smoking`	Indicates if the patient has smoked at least 100 cigarettes in their lifetime (binary: Yes/No).
`AlcoholDrinking`	Indicates if the patient is a heavy drinker (men: >14 drinks/week, women: >7 drinks/week) (binary: Yes/No).
`Stroke`	Indicates if the patient has ever had a stroke (binary: Yes/No).
`PhysicalHealth`	Number of days in the past 30 days that the patient’s physical health was not good (numeric: 0-30).
`MentalHealth`	Number of days in the past 30 days that the patient’s mental health was not good (numeric: 0-30).
`DiffWalking`	Indicates if the patient has serious difficulty walking or climbing stairs (binary: Yes/No).
`Sex`	The sex of the patient (categorical: Male/Female).
`AgeCategory`	Age category of the patient, divided into 14 groups (categorical).
`Race`	Imputed race of the patient (categorical).
`Diabetic`	Indicates if the patient has diabetes (binary: Yes/No).
`PhysicalActivity`	Indicates if the patient engaged in physical activity or exercise in the past 30 days outside of their regular job (binary: Yes/No).
`GenHealth`	The patient's general health status (categorical).
`SleepTime`	The average number of hours of sleep the patient gets in a 24-hour period (numeric).
`Asthma`	Indicates if the patient currently has or has had asthma (binary: Yes/No).
`KidneyDisease`	Indicates if the patient has kidney disease, excluding kidney stones, bladder infection, or incontinence (binary: Yes/No).
`SkinCancer`	Indicates if the patient has or has had skin cancer (binary: Yes/No).

Program

The program implements a command-line interface (CLI) with the following functionalities:

1. Data Loading

LOAD: Prompts the user for a dataset filename and loads it into memory as a DataFrame. Displays summary information including column data types and memory usage.
LOADF: Automatically loads the provided dataset (heart_cleaned.csv) and displays summary information.

2. Data Cleaning

CLEAR: Clears the DataFrame from memory and shows the number of discarded records.

3. Data Processing

AGEAVERAGE: Creates a new continuous variable AgeAverage derived from the AgeCategory column.
SHOWFEATURES: Displays two lists containing categorical and numerical variables in the DataFrame.
NUMSTAT: Provides statistical summaries for numerical columns.
CATSTAT: Provides summaries for categorical columns.

4. Data Visualization

CATGRAPH: Generates pie charts to visually represent the distribution of categorical variables using Plotly.
BMI-HD: Visualizes the relationship between BMI and heart disease using Seaborn, including box plots and histograms.
AGE-HDKDSC: Plots the relationship between age and the incidence of heart disease, kidney disease, and skin cancer using Seaborn.
COMATRIX: Computes and visualizes the correlation matrix for categorical variables.

5. Program Termination

QUIT: Clears the DataFrame and exits the program.

Project Structure

/scripts: Contains the main script main.py that implements the above functionalities.
/data: Directory to store datasets.
/results: Directory to save results for each run, such as visualizations and model outputs.
/logs: Directory to save logs for each run, including prints, error messages and code details.
Dockerfile: Instructions to containerize the application.
requirements.txt: List of Python dependencies.
README.md: This file.

Requirements

The code is written in Python and requires the following libraries:

Pandas
Requests
NumPy
Matplotlib
Seaborn
Plotly

Installation

Clone the repository:

git clone https://github.com/HugoTex98/EDA-Heart-Disease-Data.git
cd EDA-Heart-Disease-Data

Create a virtual environment and activate it:

python3 -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the required dependencies:
```
pip install -r requirements.txt
```

Building and Running the Docker Container

To build and run the Docker container for this project, follow the steps below:

1. Build the Docker Image

First, navigate to the root directory of the project (where Dockerfile is located) and run the following command to build the Docker image:

docker build -t eda-heart-disease-data .

This command builds the Docker image using the instructions in the Dockerfile and tags it as breast-cancer-prediction.

2. Run the Docker Container

Once the image is built, you can run the Docker container using the following command:

docker run -it --rm eda-heart-disease-data

-it: Runs the container in interactive mode, allowing you to interact with the terminal inside the container.
--rm: Automatically removes the container once it stops running, keeping your environment clean.
eda-heart-disease-data: The name of the Docker image you built.

Since the project is not a web application and does not expose any ports, the output from the script will be directly visible in the terminal where the container runs. If the script generates output files, they will be saved in the container's file system.

To stop the container while it’s running, you can do so by pressing Ctrl + C in the terminal where the container is running.

Usage

Run the program using the command-line interface:

python main.py

Follow the prompts to load data, process it, and visualize various aspects related to heart disease indicators.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heart Disease Indicators Data Processing and Visualization

Project Overview

Dataset

Program

1. Data Loading

2. Data Cleaning

3. Data Processing

4. Data Visualization

5. Program Termination

Project Structure

Requirements

Installation

Building and Running the Docker Container

1. Build the Docker Image

2. Run the Docker Container

Usage

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
__pycache__		__pycache__
dataset		dataset
logs		logs
results/20240919_233408		results/20240919_233408
scripts		scripts
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt

HugoTex98/EDA-Heart-Disease-Data

Folders and files

Latest commit

History

Repository files navigation

Heart Disease Indicators Data Processing and Visualization

Project Overview

Dataset

Program

1. Data Loading

2. Data Cleaning

3. Data Processing

4. Data Visualization

5. Program Termination

Project Structure

Requirements

Installation

Building and Running the Docker Container

1. Build the Docker Image

2. Run the Docker Container

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages