Skip to content

Dockerized realtime twitter streaming to MongoDB. Data synced with Elasticsearch and Kibana. Flask webapp created to search and display tweets.

License

Notifications You must be signed in to change notification settings

tngaspar/twitter-stream-mongo

Repository files navigation

Twitter Stream

Check out a Live Demo of the search engine webapp integrated with Elasticsearch here.

Table of Contents

  1. Features
  2. Project Components
  3. Requirements
  4. Installation
  5. Kibana
    1. Kibana Dashboard
    2. Kibana Search
  6. Search Webapp
    1. Main Page & Dashboard
    2. Search Output

Features

  • Dockerized realtime tweet streaming to MongoDB based on search rules. Tweepy used to connect to twitter API;
  • MongoDB collection is continuously synced with an Elasticsearch index using Monstache;
  • MongoDB queried with Mongo Express, a web-based MongoDB admin interface;
  • Kibana used to visualize and search tweets.
  • Flask search webapp connected served by nginx.

Project Components

All components of the project are dockerized. The Streaming Client is initiated by twitter_stream/Dockerfile and the Search Webapp by flask_search/Dockerfile. All remaining containers are created from DockerHub images.

Requirements

Installation

  1. Clone the repo:
$ git clone https://github.com/tngaspar/twitter-stream-mongo.git
  1. Create .env file in project root folder with the following parameters:
API_KEY=[Twitter API key]
API_SECRET_KEY=[Twitter API secrect key]
BEARER_TOKEN=[Twitter API bearer token]
MDB_HOST_NAME=mongodb://root:[Password]@mongo:27017/
MDB_DATABASE_NAME=tweetdb
MDB_COLLECTION_NAME=tweets
SEARCH_RULE=[Twitter Filtered Stream rule]
MONGODB_ROOT_PASSWORD=[choose Password]
MONGODB_REPLICA_SET_KEY=[choose ReplicaKey]

Replace all fields between brackets. You may find the twitter documentation for the SEARCH RULE here. By default the rule has lang:en, -is:retweet and -is:reply implicit so there's no need to add this parameters.

  1. Add password to mongo-url on monstache/monstache.config.toml:
mongo-url = "mongodb://root:[Password]@mongo:27017" 

Replace fields between brackets.

  1. In the project root directory run docker-compose:
$ docker-compose up -d

After this all containers should be up and running and the streaming initiated.

If running locally you can check MongoDB through Mongo Express at localhost:8081 and search gathered tweets in Kibana at localhost:5601. The seach webapp should also be up and accessible at 0.0.0.0 and localhost (port 80).

Kibana

Kibana allows search and analysis of tweet data from Elasticsearch.

Kibana Dashboard:

This dashboard may be imported to Kibana by navigating to Stack Management>Saved Objects>Import and importing the file doc/kibana_dashboard.ndjson.

Kibana Search:

Kibana uses syntax from Apache Lucene to query and filter data. Find out more here.

Here's a simple example:

Search Webapp

The flask_search webapp displays a user interface where it is possible to use the Elascticsearch search functionalities. It acts as a search engine on the records present on the index.

Main Page & Dashboard:

The main page shows the search bar and a snapshot of the Kibana Dashboard.

Search Output:

Search example with tweets gathered using software engineer, data, jobs and other related keywords as the streaming search rule.

(back to top)