Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table 2 - Statistical quantitative description of the category features error #2

Open
viniciusparede opened this issue May 21, 2023 · 0 comments

Comments

@viniciusparede
Copy link

Description

Table 2 in the article reproduces the quantitative statistics of the categorical attributes of the problem in question in an extremely clear manner. When reproducing the same table, I noticed a discrepancy in the results compared to the article, specifically regarding the value found for patients who survived and have anemia.

It is worth noting that this is likely just a typing error and does not compromise the published work. I opened this issue to document the case. Once again, congratulations on the achievement and the work conducted.

Steps to Reproduce

  1. Load the provided database.
  2. Utilize a data analysis/data manipulation tool.
  3. Reproduce the results from Table 2 of the article.

Below is a Python code that reproduces Table 2 of the article.

import pandas as pd

# Load data
csv_link = "https://archive.ics.uci.edu/ml/machine-learning-databases/00519/heart_failure_clinical_records_dataset.csv"
df = pd.read_csv(csv_link)

variables = ["anaemia", "high_blood_pressure", "diabetes", "sex", "smoking"]
summary_list = []

# Iterate over selected variables
for var in variables:
    # Calculate counts and percentages for the full sample
    full_sample_count = df[var].value_counts()
    full_sample_percent = df[var].value_counts(normalize=True) * 100

    # Calculate counts and percentages for patients who died
    dead_sample_count = df[df["DEATH_EVENT"] == 1][var].value_counts()
    dead_sample_percent = (
        df[df["DEATH_EVENT"] == 1][var].value_counts(normalize=True) * 100
    )

    # Calculate counts and percentages for patients who survived
    survived_sample_count = df[df["DEATH_EVENT"] == 0][var].value_counts()
    survived_sample_percent = (
        df[df["DEATH_EVENT"] == 0][var].value_counts(normalize=True) * 100
    )

    # Create temporary DataFrame for each variable and value
    for val in [0, 1]:
        temp_df = pd.DataFrame(
            {
                "Variable": var,
                "Bool": val,
                "Full Sample #": full_sample_count.get(val, 0),
                "Full Sample %": full_sample_percent.get(val, 0),
                "Dead Patients #": dead_sample_count.get(val, 0),
                "Dead Patients %": dead_sample_percent.get(val, 0),
                "Survived Patients #": survived_sample_count.get(val, 0),
                "Survived Patients %": survived_sample_percent.get(val, 0),
            },
            index=[0],
        )
        summary_list.append(temp_df)

# Concatenate all temporary DataFrames into a single summary DataFrame
summary_df = pd.concat(summary_list, ignore_index=True)
summary_df = summary_df.round(2)
summary_df

Screenshots

image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant