Skip to content

This project is based on SQL and Tableau. The aim is to do a market research in the USA healthcare system to analyze the implementation and construction of a new hospital! Let's look for disease prevalence, healthcare providesrs, stays, patients...and let's see what conclusions can we extract from this!

Notifications You must be signed in to change notification settings

aidamouref/Project-Hospital

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PROJECT II: IMPLEMENTING A NEW HOSPITAL IN THE USA HEALTHCARE SYSTEM

hospital.jpg

INTRO

As a CEO of different hospitals spread across the States I am planning to design and implement a new hospital specialised in some of the most prevalent and resource-demanding health conditions:

  • Diabetes
  • Cancer
  • Hyperthesion
  • Asthma
  • Arthritis
  • Obesity

As a result, a market revision is needed in order to find the best conditions (geographical, population, disease prevalence, etc.) to place the hospital.

OBJECTIVES

  • Check the prevalence of each condition across gender and age and which one is the most common disease
  • Study the different admission types. Is there any relation to the age of patients?
  • Study the effects of test results. Are they related with the length of stay?
  • Month of admission
  • Study other hospitals. How many patients do they have of 0- blood type?
  • Health expenditure in figures. Who are the patients who spent less amount of money (i.e. 1000$-2000$) in hospital stays?
  • Classify the patients where they had a cheap/medium/expensive bill after being discharged from hospital.
  • Which patients stayed in the V.I.P. room (i.e. room 463)

METHODOLOGY

  • Database with 10,000 patient records from different USA hospitals (Kaggle)
  • No missing data, no duplicates considered
  • Data on admissions, bloodtype, medical conditions, test results, age, hospitals and doctors
  • Brief EDA: unique, nunique, valuecount, summarize, rename variables
  • Homogeneity of information: random.randint
  • Variables: prim/sec keys, unique
  • States creation: State list from Google, random number --> merge by index
  • Id creation for connexions: from Python to SQL, from SQL to Python
  • Connection to SQL

Connexion Schema

Diagram.jpg

RESULTS

import pymysql
import sqlalchemy as alch
from getpass import getpass
import pandas as pd
import random
password= getpass("Por favor, introduce tu contraseña:") 
dbName="healthcare"
# Connect to MySQL
conn = pymysql.connect(
    host='localhost',
    user='root',
    password=password,
    database='healthcare'
)
connectionData=f"mysql+pymysql://root:{password}@localhost/healthcare"
engine = alch.create_engine(connectionData)

LET'S START QUERYING!!

1. Check the prevalence of each condition across gender and age and which one is the most common disease

query=pd.read_sql_query('''
select gender as Gender, medical_cond as Medical_condition, count(medical_cond) as N from patient p join bloodtype bt on p.blood_id=bt.blood_id
join medicalcond mc on p.medcond_id=mc.medcond_id
group by gender, medical_cond
order by Medical_condition desc;''', engine
)
query
Gender Medical_condition N
0 Male Obesity 790
1 Female Obesity 838
2 Male Hypertension 852
3 Female Hypertension 836
4 Female Diabetes 825
5 Male Diabetes 798
6 Male Cancer 816
7 Female Cancer 887
8 Male Asthma 834
9 Female Asthma 874
10 Male Arthritis 835
11 Female Arthritis 815
Most prevalent condition (subquery)
query=pd.read_sql_query('''
select medical_cond as Medical_condition, count(p.medcond_id) as N  from medicalcond mc join patient p on p.medcond_id=mc.medcond_id
group by p.medcond_id having N=(select max(N) from (select count(p.medcond_id) as N, medical_cond from medicalcond mc 
join patient p on p.medcond_id=mc.medcond_id
group by p.medcond_id)sub1)''', engine
)
query
Medical_condition N
0 Asthma 1708

2. Study the effect of age. Is there any relation to the admission type or to the health condition?

query=pd.read_sql_query('''
select admission_type as AdmissionType, round(avg(age),2) as MeanAge  from patient p join admission a on p.pat_id=a.pat_id
join admission_type aty on a.admtype_id=aty.admtype_id
group by AdmissionType
order by MeanAge desc''', engine
)
query
AdmissionType MeanAge
0 Urgent 51.96
1 Elective 51.44
2 Emergency 51.35
query=pd.read_sql_query('''
select medical_cond as Medical_condition, round(avg(age),2) as MeanAge from patient p join medicalcond mc on p.medcond_id=mc.medcond_id
group by Medical_condition
order by MeanAge desc''', engine
)
query
Medical_condition MeanAge
0 Diabetes 51.80
1 Obesity 51.63
2 Cancer 51.58
3 Arthritis 51.53
4 Asthma 51.45
5 Hypertension 50.74

3. Study the effects of test results. Are they related with the length of stay?

query=pd.read_sql_query('''
select test_result as TestResult, datediff(disc_date,adm_date) as LengthStay from admission ad join test_results tr on ad.testres_id=tr.testres_id
group by TestResult, LengthStay
order by LengthStay desc
limit 6''', engine
)
query
TestResult LengthStay
0 Inconclusive 30
1 Normal 30
2 Abnormal 30
3 Abnormal 29
4 Normal 29
5 Inconclusive 29

4. Which is the month with more admissions?

query=pd.read_sql_query('''
select date_format(convert(adm_date,date), '%%M') as Mes, count(*) as N  from admission
group by Mes
order by N desc''', engine
)
query
Mes N
0 July 1005
1 April 985
2 October 980
3 August 968
4 May 960
5 June 959
6 January 948
7 December 935
8 March 926
9 November 918
10 September 877
11 February 874

5 Study other hospitals. How many patients do they have 0- blood type?

query=pd.read_sql_query('''
select hospital as Hospital, count(blood_type) as N_BloodType  from patient p join medicalcond mc on p.medcond_id=mc.medcond_id 
join admission ad on p.pat_id=ad.pat_id join hospital h on ad.hospital_id=h.hospital_id
join bloodtype bt on p.blood_id=bt.blood_id
group by Hospital, blood_type
having blood_type='o-' 
order by N_BloodType desc
limit 5''', engine
)
query

# 0- is the universal blood type so good for transfusions
Hospital N_BloodType
0 Martinez PLC 9
1 Williams LLC 6
2 Smith-Williams 6
3 Burton, Harris and Phillips 6
4 Booth LLC 6

6. Who are the patients who spent less amount of money (i.e. 0$-800$) in hospital stays?

query=pd.read_sql_query('''
select p.pat_id, name as Patient, billing_amount as $ from admission a join patient p on a.pat_id=p.pat_id
where billing_amount between 0 and 800
order by $ desc''', engine
)
query
pat_id Patient $
0 9279 Mrs. Sandra Wood 800.0

7. Classify the patients where they had a cheap/medium/expensive bill after being discharged from hospital.

query=pd.read_sql_query('''
select billing_amount as $,
CASE
    WHEN billing_amount >20000 THEN 'Medium bill'
    WHEN billing_amount >40000 THEN 'High bill'
    ELSE 'Low bill'
END as Bill_Type
from admission a join patient p on a.pat_id=p.pat_id
limit 5''', engine
)
query
$ Bill_Type
0 26702.0 Medium bill
1 16492.0 Low bill
2 16492.0 Low bill
3 22986.0 Medium bill
4 22986.0 Medium bill

8. Select the maximum bill per hospital (window function).

query=pd.read_sql_query('''
select hospital_id as Hospital, max(billing_amount) over (partition by hospital_id) as MaxBill$
from admission
where billing_amount > 2000
order by Hospital
limit 10''', engine
)
query
Hospital MaxBill$
0 1 35608.0
1 1 35608.0
2 2 4934.0
3 3 7060.0
4 4 4063.0
5 5 39238.0
6 6 38813.0
7 6 38813.0
8 7 19144.0
9 8 30432.0

9. List three patients who stayed in the V.I.P. room (Room 463). How many patients stayed in that room? -- subquery

query=pd.read_sql_query('''
select name as Patient from patient where pat_id in (select pat_id from admission where room_number=463)
limit 3''', engine
)
query
Patient
0 David Copeland
1 Lori Smith
2 Brian Cox
query=pd.read_sql_query('''
select count(*) from patient where pat_id in (select pat_id from admission where room_number=463)
''', engine
)
query
count(*)
0 18

10. Add a column with the patient's nationality

alter table admission
add column pat_nation VARCHAR(50) default null after hospital_id;

UPDATE admission 
SET 
    pat_nation = 'American'
WHERE
    adm_id > 0;

All patients are from an American Country

# Commit changes
conn.commit()

# Close connection
conn.close()

STRENGTHS AND WEEAKNESSESS

Database clean and easy to work with (S)
Nice N for robust conclusions (S)
Mix of real and fake data (S)
Time constraints (W)
Very homogeneus data at some point -- solution: randomised (W)
Some desired variables were missing --> incorporated manually (W)

CONCLUSIONS

By looking a the data, efforts should be directed to asthmatic patients with a mean age of 50, although all the disease are similarly prevalente if we don't take into account other variables (e.g. gender). More staff should be hired during July (tourism and parties?). There are no age differences of patients regarding health disease or test result.

About

This project is based on SQL and Tableau. The aim is to do a market research in the USA healthcare system to analyze the implementation and construction of a new hospital! Let's look for disease prevalence, healthcare providesrs, stays, patients...and let's see what conclusions can we extract from this!

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published