Skip to content

kazdaghli/Relations-countries-2018-GDELT

Repository files navigation

NoSQL_Project

GDELT NoSQL project The aim of this project is to explore relations between countries in 2018 based on specific queries on the GDELT database.

Architecture

architecture

Configuration Cassandra

Replication Factor = 3 Write = QUORUM(2) Read = ONE(1)

W + R = RF Eventual consistency

EMR Automation

Scripts to automate the cluster creation: 1 - bootstrap_cassandra.sh : Bootstrap cassandra on cluster creation 2 - cluster_configuration.py : Configurate the cluster (number and types of instances etc) + link to spark and cassandra 3 - create_cluster.sh : Launch to create EMR

Data Loading and preprocessing

For data with events and mentions tables : https://github.com/sarah911/NoSQL_Project/blob/master/2E4E4Q6WY/note.json

For data with gkg table: https://github.com/sarah911/NoSQL_Project/blob/master/2E1J1S7FX/note.json

Queries

Q1: Find the number of articles and events for a triplet ( Data, Country, Language )

Q2: Find events of an actor in the past 6 months

Q3: Find actors with the most negative or positive views based on ( Date, Country, Language )

Q4: Find actors, countries and organizations that divide the most given a date

Q5: The evolution of relations between countries

Part 1: Based on actors names (Table events)

Part 2: Based on actors countries (Table mentions)

Part 3: Based on articles written in a country about another one (Table GKG)

The final presentation

Presentation

About

GDELT NoSQL project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •