FDA Product Indications

Now that we have some new information sitting in our "landing" bucket, we'll want to combine that information with our existing FDA Product Labels data. For that, we can go back to our EMR cluster to help facilitate that MapReduce task.

Combine Products & Extracted Indications

Using our EMR cluster, we'll load both our FDA Product Labels data and our indications extraction data. Using the shared ID field as a key, we'll then be able to join the data together and save that information back to the "curated" bucket ready for consumption.

Connect to your EMR Cluster (as described in 02_EMR_Cluster)
Run pyspark
Open fda.indications.py in an editor
Update the values for BUCKET_LANDING and BUCKET_CURATED with the appropriate values
Copy the code and paste it into the pyspark shell

Run the Crawler

Given that we have just updated the FDA Product Labels data with a new column, what that means is our data's schema has now changed. We can easily update our Athena tables by rerunning our Glue Crawler.

From the AWS Glue Crawler Dashboard
Click on your newly created Crawler
Click "Run crawler"

Create the Athena View

Go to the AWS Athena Dashboard
Run each query from below (remember to replace <YOUR_DATABASE>)

Product Indications:

CREATE OR REPLACE VIEW product_labels_indications AS
SELECT id, LOWER(indications) as indications, effective_date FROM "<YOUR_DATABASE>"."<YOUR_TABLE>"
CROSS JOIN UNNEST(extracted_text) as t(indications);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

FDA Product Indications

Combine Products & Extracted Indications

Run the Crawler

Create the Athena View

Next »

Files

README.md

Latest commit

History

README.md

File metadata and controls

FDA Product Indications

Combine Products & Extracted Indications

Run the Crawler

Create the Athena View

Next »