Elasticsearch Adapter

This page includes instructions on how to use Elasticsearch and Cloudberry to setup a small instance of TwitterMap on a local machine.

Requirements:

System: Linux or MacOS
Python 3.0+ (Please configure to run python scripts with the command: python3)
Java 8 SDK and sbt
At least 2GB memory

1. Setup Elasticsearch

Step 1.1: Create a directory named `quick-start` under your home directory and enter `quick-start` directory:

mkdir ~/quick-start
cd ~/quick-start

Step 1.2: Download elasticsearch

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.7.2.tar.gz

Step 1.3: Uncompress this file

tar -xzf elasticsearch-6.7.2.tar.gz

Step 1.4: Enter `elasticsearch-6.7.2/` directory

cd elasticsearch-6.7.2/

Step 1.5: Run elasticsearch

./bin/elasticsearch
Or start on daemon mode: ./bin/elasticsearch -d -p pid
- To shutdown elasticsearch on daemon mode, kill the process ID in the pid file
  
  pkill -F pid
Wait until you see the following messages:

[INFO ][o.e.n.Node               ] [7Z9-8gl] initialized
[INFO ][o.e.n.Node               ] [7Z9-8gl] starting ...
[INFO ][o.e.t.TransportService   ] [7Z9-8gl] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}INFO ][o.e.c.s.MasterService    ] [7Z9-8gl] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {7Z9-8gl}{7Z9-8glaTTi6WWF-OFP1hw}{manJsZAtS1aj7RnQjM550Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8589934592, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}
[INFO ][o.e.c.s.ClusterApplierService] [7Z9-8gl] new_master {7Z9-8gl}{7Z9-8glaTTi6WWF-OFP1hw}{manJsZAtS1aj7RnQjM550Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8589934592, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, reason: apply cluster state (from master [master {7Z9-8gl}{7Z9-8glaTTi6WWF-OFP1hw}{manJsZAtS1aj7RnQjM550Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8589934592, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[INFO ][o.e.h.n.Netty4HttpServerTransport] [7Z9-8gl] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[INFO ][o.e.n.Node               ] [7Z9-8gl] started

Step 1.6: Check the health status of your elasticsearch cluster

Open a new terminal window

curl -X GET "localhost:9200/_cluster/health?pretty"

The cluster health status has to be green or yellow. If your cluster's status is red, it indicates that the specific shard is not allocated in the cluster.

2. Install Cloudberry & TwitterMap

Clone the Cloudberry Github repository

cd ~/quick-start

git clone https://github.com/ISG-ICS/cloudberry.git

3. Download and ingest sample tweets into Elasticsearch

Step 3.1: Download sample tweets data file

cd ~/quick-start/cloudberry/examples/twittermap/script/

wget http://cloudberry.ics.uci.edu/img/sample.json.gz

Note: This file is sample.json.gz, different from the sample.adm.gz file in Quick Start tutorial

Step 3.2: Ingest sample tweets into elasticsearch cluster

cd ~/quick-start/cloudberry/examples/twittermap/

./script/ingestTweetToElasticCluster.sh

When the script completes, you should see something similar to the following messages:

[info] Showing high-level information about indices in Elasticsearch cluster AFTER ingesting data...

health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   twitter.ds_tweet fQiZx9wBQNKkqRB9fMw9Xw   4   0      73348            0       58mb           58mb


[success] Finish ingesting tweets

4. Configure Cloudberry

Edit file: ~/quick-start/cloudberry/cloudberry/neo/conf/application.conf

Step 4.1: Comment line 89 and 96, which are the AsterixDB configurations.

line 89: asterixdb.url = "http://localhost:19002/query/service"
line 96: asterixdb.lang = SQLPP

Step 4.2: Uncomment line 93 and 101, which are the Elasticsearch configurations.

line 93: #elasticsearch.url = "http://localhost:9200"
line 101: #asterixdb.lang = elasticsearch

Step 4.3: Update line 86 and line 87. Tune `DRUM` parameters to be more friendly to ElasticSearch.

line 86: berry.firstquery.gap = "60 days"
line 87: berry.query.gap = "180 days"

5. Configure Twittermap

Edit file: ~/quick-start/cloudberry/examples/twittermap/web/conf/application.conf

Step 5.1: Update line 94 and line 96. Configure the start date and end date of temporal queries.

line 94: startDate = "2019-01-04T18:29:23.000"
line 96: endDate = "2019-11-10T09:00:23.000"

6. Now you can start Cloudberry & Twittermap as in Quick Start!

To start Cloudberry & Twittermap See Step 2.2 and Step 2.4 in Quick Start.

Quick Start
Documentation
Advanced topics
- Database Adapters
- Enable Sidebar Live Tweets
- Realtime Tweets' Ingestion
How to Contribute
Research

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elasticsearch Adapter

Elasticsearch Adapter

Requirements:

1. Setup Elasticsearch

Step 1.1: Create a directory named `quick-start` under your home directory and enter `quick-start` directory:

Step 1.2: Download elasticsearch

Step 1.3: Uncompress this file

Step 1.4: Enter `elasticsearch-6.7.2/` directory

Step 1.5: Run elasticsearch

Step 1.6: Check the health status of your elasticsearch cluster

2. Install Cloudberry & TwitterMap

3. Download and ingest sample tweets into Elasticsearch

Step 3.1: Download sample tweets data file

Step 3.2: Ingest sample tweets into elasticsearch cluster

4. Configure Cloudberry

Step 4.1: Comment line 89 and 96, which are the AsterixDB configurations.

Step 4.2: Uncomment line 93 and 101, which are the Elasticsearch configurations.

Step 4.3: Update line 86 and line 87. Tune `DRUM` parameters to be more friendly to ElasticSearch.

5. Configure Twittermap

Step 5.1: Update line 94 and line 96. Configure the start date and end date of temporal queries.

6. Now you can start Cloudberry & Twittermap as in Quick Start!

Clone this wiki locally

Elasticsearch Adapter

Elasticsearch Adapter

Requirements:

1. Setup Elasticsearch

Step 1.1: Create a directory named quick-start under your home directory and enter quick-start directory:

Step 1.2: Download elasticsearch

Step 1.3: Uncompress this file

Step 1.4: Enter elasticsearch-6.7.2/ directory

Step 1.5: Run elasticsearch

Step 1.6: Check the health status of your elasticsearch cluster

2. Install Cloudberry & TwitterMap

3. Download and ingest sample tweets into Elasticsearch

Step 3.1: Download sample tweets data file

Step 3.2: Ingest sample tweets into elasticsearch cluster

4. Configure Cloudberry

Step 4.1: Comment line 89 and 96, which are the AsterixDB configurations.

Step 4.2: Uncomment line 93 and 101, which are the Elasticsearch configurations.

Step 4.3: Update line 86 and line 87. Tune DRUM parameters to be more friendly to ElasticSearch.

5. Configure Twittermap

Step 5.1: Update line 94 and line 96. Configure the start date and end date of temporal queries.

6. Now you can start Cloudberry & Twittermap as in Quick Start!

Clone this wiki locally

Step 1.1: Create a directory named `quick-start` under your home directory and enter `quick-start` directory:

Step 1.4: Enter `elasticsearch-6.7.2/` directory

Step 4.3: Update line 86 and line 87. Tune `DRUM` parameters to be more friendly to ElasticSearch.