Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker revamp #19

Open
wants to merge 58 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
b292b6d
Add docker compose for kibana and elasticsearch
heyqule Aug 22, 2019
4aef0a3
Merge remote-tracking branch 'upstream/master'
heyqule Aug 22, 2019
8ee20b8
Add support for multiple set of nltk tokens. Controls by --index
heyqule Aug 23, 2019
9dca234
Fequency adjustment
heyqule Aug 23, 2019
74e6b36
Fully automate the build with docker
heyqule Aug 25, 2019
53408c9
Add support to bypass fetching stock price outside of regular hours.
heyqule Aug 26, 2019
d5393e7
Fix time display
heyqule Aug 26, 2019
4dc0238
Optimization
heyqule Aug 26, 2019
b7cda28
Fix hour() error
heyqule Aug 26, 2019
a8bf22a
Fix Cache Cleaning issue
heyqule Aug 27, 2019
aea6a59
Change startup.sh to startup.sample.sh
heyqule Aug 27, 2019
64ddc35
Add Curl to python instance for cleaning purposes.
heyqule Aug 27, 2019
004c17c
Clean cache
heyqule Aug 28, 2019
3cd6de0
Change Kibana template
heyqule Aug 28, 2019
fde9181
Move news out of original sentiment script
heyqule Aug 29, 2019
92a9447
Break down News SA
heyqule Aug 29, 2019
4205c8c
remove exposed ports
heyqule Aug 29, 2019
ff5a8cf
Elasticsearch / Kibana 7.3 change
heyqule Aug 31, 2019
e310886
Add ndjson importer
heyqule Aug 31, 2019
10502c8
Add ndjson importer
heyqule Aug 31, 2019
7675154
Remove kibana 5.6 export
heyqule Aug 31, 2019
c2a7010
Fix kibana importer
heyqule Sep 2, 2019
cb42d10
Update Copyright
heyqule Sep 2, 2019
fd9fe56
Change to wt
heyqule Sep 2, 2019
3a3b452
Change Mapping to 7.3 format
heyqule Sep 2, 2019
f3c1895
Disable twitter sentiment stream in start.sh
heyqule Sep 2, 2019
e6c9f1b
Rename original py to og.py
heyqule Sep 2, 2019
3fc49a6
Change config handling
heyqule Sep 8, 2019
e86efe7
Fix twitter
heyqule Sep 9, 2019
5f1d87f
Since it's single node insance, disable replica
heyqule Sep 9, 2019
3e862ec
Refactors
heyqule Sep 9, 2019
84c6324
Minor Import script adjustment
heyqule Sep 9, 2019
4cd6af4
Index structure change
heyqule Sep 9, 2019
622eae1
Fix message body
heyqule Sep 9, 2019
baa9d5f
Optimiaztion
heyqule Sep 10, 2019
040887b
Add delay before fetching from elasticsearch .
heyqule Sep 10, 2019
56901dc
Kibana change
heyqule Sep 10, 2019
abbc740
Kibana - remove legend
heyqule Sep 10, 2019
a6002ac
Add kibana listener
heyqule Sep 10, 2019
c6cf17b
Revert ndjson
heyqule Sep 10, 2019
bda22a4
Attempt to fix stock price operant error
heyqule Sep 10, 2019
b7226d4
Fix elastic mapping
heyqule Sep 11, 2019
5cde9c9
Add delay for Seek Alpha
heyqule Sep 11, 2019
c3431c4
Add delay for Seek Alpha
heyqule Sep 11, 2019
d086bc6
- Separate sentiment for message and title
heyqule Sep 14, 2019
f84a379
- Kibana adjustment
heyqule Sep 14, 2019
60e06fc
- Config adjustment
heyqule Sep 14, 2019
0d7c7a4
- Improve Kibana dashboard
heyqule Sep 17, 2019
fb6bea1
- Improve Kibana Dashboard
heyqule Sep 22, 2019
097c774
- Additonal Readme change
heyqule Sep 22, 2019
efc7387
- Fix kibana tmp folder issue
heyqule Sep 22, 2019
175dd61
- Minor change to spawn timers
heyqule Sep 22, 2019
a178733
Minor Refactor
heyqule Sep 24, 2019
9c55d3d
Merge branch 'master' into master
shirosaidev Oct 11, 2019
6f38025
Fix issue found by shaggy63
heyqule Oct 12, 2019
5d17a6c
Merge remote-tracking branch 'origin/master'
heyqule Oct 12, 2019
646b0d9
Disable unnecessary exposed ports
heyqule Oct 12, 2019
b985410
Add copyright blocks to non-py files
heyqule Oct 16, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ services:
soft: 2048
hard: 2048
#expose this for local dev only!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when this is exposed permanently?

#ports:
# - "9200:9200"
ports:
- "9200:9200"
redis:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has redis been added to container?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It serves as article cache. When you fetch the news article from the news page, added article won't add again.

build:
context: ./redis-docker
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@
from textblob import TextBlob
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

from Initializer.LoggerInit import *

from Sentiment.Initializer.LoggerInit import *

def get_page_text(url):

Expand Down
Empty file.
File renamed without changes.
File renamed without changes.
Empty file.
Original file line number Diff line number Diff line change
@@ -1,22 +1,19 @@
import hashlib
import re
import time
from datetime import datetime

import nltk
import hashlib

try:
import urllib.parse as urlparse
except ImportError:
import urlparse

from config import *
from Initializer.str_unicode import *
from Helper.Sentiment import *
from Initializer.ElasticSearchInit import es
from Initializer.RedisInit import rds
from Initializer.LoggerInit import *

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to LoggerInit?


from Sentiment.Initializer.ElasticSearchInit import es
from Sentiment.Initializer.str_unicode import *
from Sentiment.Initializer.RedisInit import rds
from Sentiment.Helper.Sentiment import *


class NewsHeadlineListener:
Expand All @@ -29,8 +26,9 @@ def __init__(self, symbol,url=None):
# add any new headlines
for htext, htext_url in new_headlines:

md5Hash = hashlib.md5( (htext+htext_url).encode() ).hexdigest()
if rds.exists(md5Hash):
md5_hash = hashlib.md5((htext+htext_url).encode()).hexdigest()

if rds.exists(md5_hash) is 0:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

easier to read if we say if not rds.exists(md5_hash):, is is a keyword for checking if one object is an instance of a class. Although this may evaluate to true, it is confusing.


datenow = datetime.utcnow().isoformat()
# output news data
Expand All @@ -49,6 +47,7 @@ def __init__(self, symbol,url=None):
for t in nltk_tokens_ignored:
if t in tokens:
logger.info("Text contains token from ignore list, not adding")
rds.set(md5_hash,1,2628000)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic numbers. Why has True been replaced with the more abstract 1 and there is a seemingly random 26228000 parameter added?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better if you review directly from the latest commit. Changes like this one are outdated and no longer exists in the latest commit.

https://github.com/shirosaidev/stocksight/pull/19/files

continue
# check required tokens from config
tokenspass = False
Expand All @@ -65,6 +64,7 @@ def __init__(self, symbol,url=None):
break
if not tokenspass:
logger.info("Text does not contain token from required list, not adding")
rds.set(md5_hash,1,2628000)
continue

# get sentiment values
Expand All @@ -80,7 +80,7 @@ def __init__(self, symbol,url=None):
"polarity": polarity,
"subjectivity": subjectivity,
"sentiment": sentiment})
rds.set(md5Hash,True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This made more sense. I am not familiar with redis and I intuitively knew this was setting the hash to present in the data structure server.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original console script store the article cache as Python list. Since the python code exits once it's done fetching the articles in docker, it has to move to a 3rd party data store.

rds.set(md5_hash,1,2628000)


def get_news_headlines(self, url):
Expand Down
Empty file added src/Sentiment/__init__.py
Empty file.
4 changes: 2 additions & 2 deletions src/delindex.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import argparse

from Initializer.LoggerInit import *
from Initializer.ElasticSearchInit import es
from Sentiment.Initializer.ElasticSearchInit import es
from Sentiment.Initializer.LoggerInit import *

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for alphabetizing. Good clean code principles.


if __name__ == '__main__':

Expand Down
9 changes: 1 addition & 8 deletions src/news.sentiment.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,21 +11,14 @@
"""

import argparse
import json

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were these removed because they are imported indirectly through NewsHeadlineListener? Looks a lot cleaner without all the imports.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya.. They imported through NewsHeadlineListener. It's suggested to remove by PyCharm.

import re
import sys
import time

import nltk
import requests

try:
import urllib.parse as urlparse
except ImportError:
import urlparse

# import elasticsearch host, twitter keys and tokens

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this done in NewsHeadlineListener or do we need to add our own import statement?

from NewsHeadlineListener import *
from Sentiment.NewsHeadlineListener import *


STOCKSIGHT_VERSION = '0.1-b.6'
Expand Down
2 changes: 1 addition & 1 deletion src/startup.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/bash


sleep 20;
sleep 30;

while true
do
Expand Down