Skip to content

A quick Elasticsearch/Logstash/Kibana (ELK) 7.x environment to quickly ingest realtime filtered tweets, perform Natural Language Processing (NLP), store in Elasticsearch (ES), and visualize in Kibana.

Notifications You must be signed in to change notification settings

v01t/tweets-nlp-elasticsearch

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 

Repository files navigation

A Python 3.6 project to load near-realtime filtered Twitter API data into Elasticsearch, and then visualize the results in Kibana.

Uses an ELK (Elasticsearch/Logstash/Kibana) 7.x Docker container for easy reproducibility. Tested with Python 3.6 (Anaconda Distro) on Win10 and MacOS 10.13.3.

To get Elasticsearch and Kibana up and running locally quickly, run these 2 docker commands.

Note: Make sure you have at least 4GB of RAM assigned to Docker. Increase the limit on mmap counts to 262,144 or more on Mac or Linux (I implement this workaround in the below Mac 'docker run' command). See here for more info on this: http://elk-docker.readthedocs.io/ Warning: Do not pull this public image on a low-bandwidth or metered Internet connection. It pulls down well over 5GB of traffic, total.

docker pull sebp/elk:721

Windows:

docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 -p 5000:5000 -it --name elk sebp/elk:721

Mac:

docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 -p 5000:5000 -e MAX_MAP_COUNT="262144" -it --name elk sebp/elk:721

More info on this Docker container can be found here: https://hub.docker.com/r/sebp/elk/

After the container is up and running, you should be able to hit these 2 URLs if everything is working properly:

Elasticsearch: http://localhost:9200/

Which should return a similar response to this:

{
  "name" : "elk",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "randomstring",
  "version" : {
    "number" : "7.2.1",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "fe6cb20",
    "build_date" : "2019-07-24T17:58:29.979462Z",
    "build_snapshot" : false,
    "lucene_version" : "8.0.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Kibana: http://localhost:5601/

Should return the Kibana Web GUI When we first open the Kibana Web GUI, we need to enter the text 'twitteranalysis' where it asks for the Index name or pattern on the Management tab (on the left).

After cloning this repo locally, we can run pip install to get all of the Python libraries loaded into your activate python environment: pip install -r requirements.txt

Python libraries used:

https://github.com/sloria/textblob
https://github.com/tweepy/tweepy
https://elasticsearch-py.readthedocs.io/en/master/api.html

After filling in your Twitter API credentials in the config.yml (rename the config.yml.template to config.yml), you can let the script run for a few minutes and load up the Index in Elasticsearch. After data (analyzed tweets) is loaded up in Elasticsearch, try a few URLs like this:

http://localhost:9200/twitteranalysis/_search?q=python http://localhost:9200/twitteranalysis/_search?q=sentiment:Positive&message=python

Feel free to open an issue if you have any problems getting this up and running.

About

A quick Elasticsearch/Logstash/Kibana (ELK) 7.x environment to quickly ingest realtime filtered tweets, perform Natural Language Processing (NLP), store in Elasticsearch (ES), and visualize in Kibana.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%