Skip to content

Analysis of World Development Indicators (WDI) using big data technologies, specifically Databricks, Apache Spark, and Scala.

Notifications You must be signed in to change notification settings

spshah1701/World-Development-Indicators

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

World Development Indicators Analysis

Introduction

The World Development Indicators (WDI) is the World Bank's most comprehensive collection of cross-country development data. It's website basically provides access to data as well as information about data coverage, curation and methodologies and allow users to discover what type of indicators are available.

Tools and Technologies

  • Databricks
  • Apache Spark
  • Scala

Data Description

  1. Country.csv

247 rows representing the countries.
31 columns describing various attributes of the countries.

  1. Indicators.csv

5656458 rows representing data instances.
6 columns describing indicators of the countries.

Need for using Big Data Technologies

The size of this file is about 550MB, necessitating the use of Apache Spark implemented in Scala on Databricks. This combination provides a powerful and scalable framework for efficiently processing large-scale datasets.

My Implementation

Note: This link will be valid till 01-06-2024.

Project Setup and Replication

  • Create a free Databricks Community Edition account
  • Create a new cluster and wait till it is active and running
  • Upload the World Development Indicators.dbc Notebook to Databricks and connect it to the above cluster.
  • Upload the data (CSV files) to Databricks after downloading it from the source.
  • Run the cells, view and analyse the data as desired.

Access the data

%scala 

val Indicators = sqlContext.read.format("csv")
  .option("header", "true")
  .option("inferSchema", "true")
  .load("/FileStore/tables/Indicators.csv")

display(Indicators)

Create or Replace Temporary view

Temporary view allows to use SQL queries on the DataFrame as if it were an SQL table.

%scala

Indicators.createOrReplaceTempView("Indicators")

Write desired SQL queries for data visualization and analysis

%sql

select CountryName,Value,Year from Indicators where IndicatorCode in ("NY.GNP.PCAP.CD") and Year = 1962 and CountryName in ("Japan","China","France","United States") order by Value asc;

Output

image

About

Analysis of World Development Indicators (WDI) using big data technologies, specifically Databricks, Apache Spark, and Scala.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published