World Development Indicators Analysis

Introduction

The World Development Indicators (WDI) is the World Bank's most comprehensive collection of cross-country development data. It's website basically provides access to data as well as information about data coverage, curation and methodologies and allow users to discover what type of indicators are available.

Tools and Technologies

Databricks
Apache Spark
Scala

Data Description

Country.csv

247 rows representing the countries.
31 columns describing various attributes of the countries.

Indicators.csv

5656458 rows representing data instances.
6 columns describing indicators of the countries.

Need for using Big Data Technologies

The size of this file is about 550MB, necessitating the use of Apache Spark implemented in Scala on Databricks. This combination provides a powerful and scalable framework for efficiently processing large-scale datasets.

My Implementation

Published Notebook

Note: This link will be valid till 01-06-2024.

Project Setup and Replication

Create a free Databricks Community Edition account
Create a new cluster and wait till it is active and running
Upload the World Development Indicators.dbc Notebook to Databricks and connect it to the above cluster.
Upload the data (CSV files) to Databricks after downloading it from the source.
Run the cells, view and analyse the data as desired.

Access the data

%scala 

val Indicators = sqlContext.read.format("csv")
  .option("header", "true")
  .option("inferSchema", "true")
  .load("/FileStore/tables/Indicators.csv")

display(Indicators)

Create or Replace Temporary view

Temporary view allows to use SQL queries on the DataFrame as if it were an SQL table.

%scala

Indicators.createOrReplaceTempView("Indicators")

Write desired SQL queries for data visualization and analysis

%sql

select CountryName,Value,Year from Indicators where IndicatorCode in ("NY.GNP.PCAP.CD") and Year = 1962 and CountryName in ("Japan","China","France","United States") order by Value asc;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

World Development Indicators Analysis

Introduction

Tools and Technologies

Data Description

Need for using Big Data Technologies

My Implementation

Project Setup and Replication

Access the data

Create or Replace Temporary view

Write desired SQL queries for data visualization and analysis

Output

About

Releases

Packages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Country.csv		Country.csv
README.md		README.md
World Development Indicators.dbc		World Development Indicators.dbc

spshah1701/World-Development-Indicators

Folders and files

Latest commit

History

Repository files navigation

World Development Indicators Analysis

Introduction

Tools and Technologies

Data Description

Need for using Big Data Technologies

My Implementation

Project Setup and Replication

Access the data

Create or Replace Temporary view

Write desired SQL queries for data visualization and analysis

Output

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages