You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Table 2 in the article reproduces the quantitative statistics of the categorical attributes of the problem in question in an extremely clear manner. When reproducing the same table, I noticed a discrepancy in the results compared to the article, specifically regarding the value found for patients who survived and have anemia.
It is worth noting that this is likely just a typing error and does not compromise the published work. I opened this issue to document the case. Once again, congratulations on the achievement and the work conducted.
Steps to Reproduce
Load the provided database.
Utilize a data analysis/data manipulation tool.
Reproduce the results from Table 2 of the article.
Below is a Python code that reproduces Table 2 of the article.
importpandasaspd# Load datacsv_link="https://archive.ics.uci.edu/ml/machine-learning-databases/00519/heart_failure_clinical_records_dataset.csv"df=pd.read_csv(csv_link)
variables= ["anaemia", "high_blood_pressure", "diabetes", "sex", "smoking"]
summary_list= []
# Iterate over selected variablesforvarinvariables:
# Calculate counts and percentages for the full samplefull_sample_count=df[var].value_counts()
full_sample_percent=df[var].value_counts(normalize=True) *100# Calculate counts and percentages for patients who dieddead_sample_count=df[df["DEATH_EVENT"] ==1][var].value_counts()
dead_sample_percent= (
df[df["DEATH_EVENT"] ==1][var].value_counts(normalize=True) *100
)
# Calculate counts and percentages for patients who survivedsurvived_sample_count=df[df["DEATH_EVENT"] ==0][var].value_counts()
survived_sample_percent= (
df[df["DEATH_EVENT"] ==0][var].value_counts(normalize=True) *100
)
# Create temporary DataFrame for each variable and valueforvalin [0, 1]:
temp_df=pd.DataFrame(
{
"Variable": var,
"Bool": val,
"Full Sample #": full_sample_count.get(val, 0),
"Full Sample %": full_sample_percent.get(val, 0),
"Dead Patients #": dead_sample_count.get(val, 0),
"Dead Patients %": dead_sample_percent.get(val, 0),
"Survived Patients #": survived_sample_count.get(val, 0),
"Survived Patients %": survived_sample_percent.get(val, 0),
},
index=[0],
)
summary_list.append(temp_df)
# Concatenate all temporary DataFrames into a single summary DataFramesummary_df=pd.concat(summary_list, ignore_index=True)
summary_df=summary_df.round(2)
summary_df
Screenshots
The text was updated successfully, but these errors were encountered:
Description
Table 2 in the article reproduces the quantitative statistics of the categorical attributes of the problem in question in an extremely clear manner. When reproducing the same table, I noticed a discrepancy in the results compared to the article, specifically regarding the value found for patients who survived and have anemia.
It is worth noting that this is likely just a typing error and does not compromise the published work. I opened this issue to document the case. Once again, congratulations on the achievement and the work conducted.
Steps to Reproduce
Below is a Python code that reproduces Table 2 of the article.
Screenshots
The text was updated successfully, but these errors were encountered: