Diagnosis Trends in Medical Reviews

This project involves analyzing the "Independent Medical Reviews" dataset using Hadoop/MapReduce with Apache Pig for data processing, Apache Hive for data storage, and Tableau for data visualization.

Prerequisites

Cloudera Quickstart Docker Image
Apache Pig and Hive (included in the Cloudera Docker image)
Tableau Desktop with ODBC driver installed
Access to the "Independent Medical Reviews" dataset
Copy pig script and the "Independent Medical Reviews" dataset to the docker container.

Setup and Execution Guide

Once the pipeline is setup you can access the tableau workbook to view the dashboard You will need to edit the connection to use your IP Address.

Username - hive

Password - cloudera

Setting Up the Environment

Start the Cloudera Docker container:

docker run --hostname=quickstart.cloudera --privileged=true -t -i --publish-all=true -p 8888:8888 -p 7180:7180 -p 80:80 cloudera/quickstart /usr/bin/docker-quickstart

Access the Cloudera environment through a web browser or terminal.

Data Processing with Apache Pig

Load the dataset into HDFS:

hdfs dfs -put Independent_Medical_Reviews_Filled.csv /medical_reviews/

Execute the Pig script for data processing:

pig diagnosis_correlation.pig

Data Storage in Hive

Access Hive and create a table for the processed data:

CREATE TABLE medical_data (
    treatment_category STRING,
    diagnosis_category STRING,
    case_count INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';

Load the data into the Hive table:

LOAD DATA INPATH '/output/treatment_diagnosis_correlation' INTO TABLE medical_data;

Visualization in Tableau

Set up an ODBC connection in Tableau to the Hive server.
Import data from the medical_data Hive table with the cloudera hadoop connection in Tableau.
Login using the username "hive" and password "cloudera".
Create the heatmap visualizations.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitattributes		.gitattributes
22204524_cloudccomputing_project.mp4		22204524_cloudccomputing_project.mp4
22204524_cloudcomputing_report.pdf		22204524_cloudcomputing_report.pdf
Heatmap Visulisations.twbx		Heatmap Visulisations.twbx
Independent_Medical_Reviews_Filled.csv		Independent_Medical_Reviews_Filled.csv
README.md		README.md
diagnosis_correlation.pig		diagnosis_correlation.pig

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diagnosis Trends in Medical Reviews

Prerequisites

Setup and Execution Guide

Setting Up the Environment

Data Processing with Apache Pig

Data Storage in Hive

Visualization in Tableau

About

Releases

Packages

Languages

tfiroze/Diagnosis-Trends-with-Hadoop

Folders and files

Latest commit

History

Repository files navigation

Diagnosis Trends in Medical Reviews

Prerequisites

Setup and Execution Guide

Setting Up the Environment

Data Processing with Apache Pig

Data Storage in Hive

Visualization in Tableau

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages