As a CEO of different hospitals spread across the States I am planning to design and implement a new hospital specialised in some of the most prevalent and resource-demanding health conditions:
- Diabetes
- Cancer
- Hyperthesion
- Asthma
- Arthritis
- Obesity
As a result, a market revision is needed in order to find the best conditions (geographical, population, disease prevalence, etc.) to place the hospital.
- Check the prevalence of each condition across gender and age and which one is the most common disease
- Study the different admission types. Is there any relation to the age of patients?
- Study the effects of test results. Are they related with the length of stay?
- Month of admission
- Study other hospitals. How many patients do they have of 0- blood type?
- Health expenditure in figures. Who are the patients who spent less amount of money (i.e. 1000$-2000$) in hospital stays?
- Classify the patients where they had a cheap/medium/expensive bill after being discharged from hospital.
- Which patients stayed in the V.I.P. room (i.e. room 463)
- Database with 10,000 patient records from different USA hospitals (Kaggle)
- No missing data, no duplicates considered
- Data on admissions, bloodtype, medical conditions, test results, age, hospitals and doctors
- Brief EDA: unique, nunique, valuecount, summarize, rename variables
- Homogeneity of information: random.randint
- Variables: prim/sec keys, unique
- States creation: State list from Google, random number --> merge by index
- Id creation for connexions: from Python to SQL, from SQL to Python
- Connection to SQL
import pymysql
import sqlalchemy as alch
from getpass import getpass
import pandas as pd
import random
password= getpass("Por favor, introduce tu contraseña:")
dbName="healthcare"
# Connect to MySQL
conn = pymysql.connect(
host='localhost',
user='root',
password=password,
database='healthcare'
)
connectionData=f"mysql+pymysql://root:{password}@localhost/healthcare"
engine = alch.create_engine(connectionData)
1. Check the prevalence of each condition across gender and age and which one is the most common disease
query=pd.read_sql_query('''
select gender as Gender, medical_cond as Medical_condition, count(medical_cond) as N from patient p join bloodtype bt on p.blood_id=bt.blood_id
join medicalcond mc on p.medcond_id=mc.medcond_id
group by gender, medical_cond
order by Medical_condition desc;''', engine
)
query
Gender | Medical_condition | N | |
---|---|---|---|
0 | Male | Obesity | 790 |
1 | Female | Obesity | 838 |
2 | Male | Hypertension | 852 |
3 | Female | Hypertension | 836 |
4 | Female | Diabetes | 825 |
5 | Male | Diabetes | 798 |
6 | Male | Cancer | 816 |
7 | Female | Cancer | 887 |
8 | Male | Asthma | 834 |
9 | Female | Asthma | 874 |
10 | Male | Arthritis | 835 |
11 | Female | Arthritis | 815 |
query=pd.read_sql_query('''
select medical_cond as Medical_condition, count(p.medcond_id) as N from medicalcond mc join patient p on p.medcond_id=mc.medcond_id
group by p.medcond_id having N=(select max(N) from (select count(p.medcond_id) as N, medical_cond from medicalcond mc
join patient p on p.medcond_id=mc.medcond_id
group by p.medcond_id)sub1)''', engine
)
query
Medical_condition | N | |
---|---|---|
0 | Asthma | 1708 |
query=pd.read_sql_query('''
select admission_type as AdmissionType, round(avg(age),2) as MeanAge from patient p join admission a on p.pat_id=a.pat_id
join admission_type aty on a.admtype_id=aty.admtype_id
group by AdmissionType
order by MeanAge desc''', engine
)
query
AdmissionType | MeanAge | |
---|---|---|
0 | Urgent | 51.96 |
1 | Elective | 51.44 |
2 | Emergency | 51.35 |
query=pd.read_sql_query('''
select medical_cond as Medical_condition, round(avg(age),2) as MeanAge from patient p join medicalcond mc on p.medcond_id=mc.medcond_id
group by Medical_condition
order by MeanAge desc''', engine
)
query
Medical_condition | MeanAge | |
---|---|---|
0 | Diabetes | 51.80 |
1 | Obesity | 51.63 |
2 | Cancer | 51.58 |
3 | Arthritis | 51.53 |
4 | Asthma | 51.45 |
5 | Hypertension | 50.74 |
query=pd.read_sql_query('''
select test_result as TestResult, datediff(disc_date,adm_date) as LengthStay from admission ad join test_results tr on ad.testres_id=tr.testres_id
group by TestResult, LengthStay
order by LengthStay desc
limit 6''', engine
)
query
TestResult | LengthStay | |
---|---|---|
0 | Inconclusive | 30 |
1 | Normal | 30 |
2 | Abnormal | 30 |
3 | Abnormal | 29 |
4 | Normal | 29 |
5 | Inconclusive | 29 |
query=pd.read_sql_query('''
select date_format(convert(adm_date,date), '%%M') as Mes, count(*) as N from admission
group by Mes
order by N desc''', engine
)
query
Mes | N | |
---|---|---|
0 | July | 1005 |
1 | April | 985 |
2 | October | 980 |
3 | August | 968 |
4 | May | 960 |
5 | June | 959 |
6 | January | 948 |
7 | December | 935 |
8 | March | 926 |
9 | November | 918 |
10 | September | 877 |
11 | February | 874 |
query=pd.read_sql_query('''
select hospital as Hospital, count(blood_type) as N_BloodType from patient p join medicalcond mc on p.medcond_id=mc.medcond_id
join admission ad on p.pat_id=ad.pat_id join hospital h on ad.hospital_id=h.hospital_id
join bloodtype bt on p.blood_id=bt.blood_id
group by Hospital, blood_type
having blood_type='o-'
order by N_BloodType desc
limit 5''', engine
)
query
# 0- is the universal blood type so good for transfusions
Hospital | N_BloodType | |
---|---|---|
0 | Martinez PLC | 9 |
1 | Williams LLC | 6 |
2 | Smith-Williams | 6 |
3 | Burton, Harris and Phillips | 6 |
4 | Booth LLC | 6 |
query=pd.read_sql_query('''
select p.pat_id, name as Patient, billing_amount as $ from admission a join patient p on a.pat_id=p.pat_id
where billing_amount between 0 and 800
order by $ desc''', engine
)
query
pat_id | Patient | $ | |
---|---|---|---|
0 | 9279 | Mrs. Sandra Wood | 800.0 |
7. Classify the patients where they had a cheap/medium/expensive bill after being discharged from hospital.
query=pd.read_sql_query('''
select billing_amount as $,
CASE
WHEN billing_amount >20000 THEN 'Medium bill'
WHEN billing_amount >40000 THEN 'High bill'
ELSE 'Low bill'
END as Bill_Type
from admission a join patient p on a.pat_id=p.pat_id
limit 5''', engine
)
query
$ | Bill_Type | |
---|---|---|
0 | 26702.0 | Medium bill |
1 | 16492.0 | Low bill |
2 | 16492.0 | Low bill |
3 | 22986.0 | Medium bill |
4 | 22986.0 | Medium bill |
query=pd.read_sql_query('''
select hospital_id as Hospital, max(billing_amount) over (partition by hospital_id) as MaxBill$
from admission
where billing_amount > 2000
order by Hospital
limit 10''', engine
)
query
Hospital | MaxBill$ | |
---|---|---|
0 | 1 | 35608.0 |
1 | 1 | 35608.0 |
2 | 2 | 4934.0 |
3 | 3 | 7060.0 |
4 | 4 | 4063.0 |
5 | 5 | 39238.0 |
6 | 6 | 38813.0 |
7 | 6 | 38813.0 |
8 | 7 | 19144.0 |
9 | 8 | 30432.0 |
9. List three patients who stayed in the V.I.P. room (Room 463). How many patients stayed in that room? -- subquery
query=pd.read_sql_query('''
select name as Patient from patient where pat_id in (select pat_id from admission where room_number=463)
limit 3''', engine
)
query
Patient | |
---|---|
0 | David Copeland |
1 | Lori Smith |
2 | Brian Cox |
query=pd.read_sql_query('''
select count(*) from patient where pat_id in (select pat_id from admission where room_number=463)
''', engine
)
query
count(*) | |
---|---|
0 | 18 |
alter table admission
add column pat_nation VARCHAR(50) default null after hospital_id;
UPDATE admission
SET
pat_nation = 'American'
WHERE
adm_id > 0;
All patients are from an American Country
# Commit changes
conn.commit()
# Close connection
conn.close()