Table 2 - Statistical quantitative description of the category features error #2

viniciusparede · 2023-05-21T12:16:20Z

Description

Table 2 in the article reproduces the quantitative statistics of the categorical attributes of the problem in question in an extremely clear manner. When reproducing the same table, I noticed a discrepancy in the results compared to the article, specifically regarding the value found for patients who survived and have anemia.

It is worth noting that this is likely just a typing error and does not compromise the published work. I opened this issue to document the case. Once again, congratulations on the achievement and the work conducted.

Steps to Reproduce

Load the provided database.
Utilize a data analysis/data manipulation tool.
Reproduce the results from Table 2 of the article.

Below is a Python code that reproduces Table 2 of the article.

import pandas as pd

# Load data
csv_link = "https://archive.ics.uci.edu/ml/machine-learning-databases/00519/heart_failure_clinical_records_dataset.csv"
df = pd.read_csv(csv_link)

variables = ["anaemia", "high_blood_pressure", "diabetes", "sex", "smoking"]
summary_list = []

# Iterate over selected variables
for var in variables:
    # Calculate counts and percentages for the full sample
    full_sample_count = df[var].value_counts()
    full_sample_percent = df[var].value_counts(normalize=True) * 100

    # Calculate counts and percentages for patients who died
    dead_sample_count = df[df["DEATH_EVENT"] == 1][var].value_counts()
    dead_sample_percent = (
        df[df["DEATH_EVENT"] == 1][var].value_counts(normalize=True) * 100
    )

    # Calculate counts and percentages for patients who survived
    survived_sample_count = df[df["DEATH_EVENT"] == 0][var].value_counts()
    survived_sample_percent = (
        df[df["DEATH_EVENT"] == 0][var].value_counts(normalize=True) * 100
    )

    # Create temporary DataFrame for each variable and value
    for val in [0, 1]:
        temp_df = pd.DataFrame(
            {
                "Variable": var,
                "Bool": val,
                "Full Sample #": full_sample_count.get(val, 0),
                "Full Sample %": full_sample_percent.get(val, 0),
                "Dead Patients #": dead_sample_count.get(val, 0),
                "Dead Patients %": dead_sample_percent.get(val, 0),
                "Survived Patients #": survived_sample_count.get(val, 0),
                "Survived Patients %": survived_sample_percent.get(val, 0),
            },
            index=[0],
        )
        summary_list.append(temp_df)

# Concatenate all temporary DataFrames into a single summary DataFrame
summary_df = pd.concat(summary_list, ignore_index=True)
summary_df = summary_df.round(2)
summary_df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table 2 - Statistical quantitative description of the category features error #2

Table 2 - Statistical quantitative description of the category features error #2

viniciusparede commented May 21, 2023

Table 2 - Statistical quantitative description of the category features error #2

Table 2 - Statistical quantitative description of the category features error #2

Comments

viniciusparede commented May 21, 2023

Description

Steps to Reproduce

Screenshots