Skip to content
This repository has been archived by the owner on Jun 15, 2021. It is now read-only.

Development postgres #271

Open
wants to merge 40 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
818c4f9
add warning if missing regex
jamesmeneghello Jan 6, 2016
5daddfd
fix for https://github.com/Murodese/pynab/issues/255
brookesy2 Jan 15, 2016
1ef1e26
fix for https://github.com/Murodese/pynab/issues/255
brookesy2 Jan 15, 2016
141c3ee
Merge pull request #256 from brookesy2/development-postgres
jamesmeneghello Jan 16, 2016
1a1e76f
better error handling in pre.py
brookesy2 Jan 18, 2016
4fbf477
better error handling in pre.py
brookesy2 Jan 18, 2016
1c58a3c
Merge pull request #259 from brookesy2/development-postgres
jamesmeneghello Jan 19, 2016
8750835
check for None before regex search
brookesy2 Feb 5, 2016
1cb3480
Merge pull request #267 from brookesy2/development-postgres
jamesmeneghello Feb 7, 2016
0caffca
fix #266: add a uniqhash check before release processing
jamesmeneghello Feb 14, 2016
b9af2a9
fix #264: add a note about six on ubuntu14.04 and add missing package…
jamesmeneghello Feb 14, 2016
ed31ac1
fix #263: fix a commented-out codeline
jamesmeneghello Feb 14, 2016
dea3b37
fix #262: force a version of regex that we know to work
jamesmeneghello Feb 14, 2016
225edc2
fix #261: update readme to include webui re-chowns and service start …
jamesmeneghello Feb 14, 2016
33be621
fix for https://github.com/Murodese/pynab/issues/275
brookesy2 Apr 29, 2016
7fce477
Merge pull request #276 from brookesy2/development-postgres
jamesmeneghello Apr 29, 2016
c51ee8f
Fix #268: Rename additional types of pre-releases.
gkoh Apr 22, 2016
7bbc966
Merge pull request #278 from gkoh/upstream
brookesy2 May 6, 2016
eaceb82
Update requirements.txt
feld May 18, 2016
cc44eea
init/supervisor/pynab.conf: Don't autorestart backfill
feld May 18, 2016
834a17b
Fix #293, use the new valid nZEDb regex URL.
gkoh Sep 15, 2016
a3da51b
Merge pull request #294 from gkoh/development-postgres
brookesy2 Sep 15, 2016
0367db7
Fix #270.
gkoh Sep 19, 2016
ca8226e
Fixes irc collections dependency
styks1987 Sep 28, 2016
37b3378
Merge pull request #297 from styks1987/development-postgres
brookesy2 Sep 28, 2016
06019fe
Lock pytvmaze version to 1.*
NeilBetham Oct 17, 2016
e5ae211
Merge pull request #299 from NeilBetham/development-postgres
brookesy2 Oct 18, 2016
02a3a70
Merge pull request #283 from feld/development-postgres
brookesy2 Nov 15, 2016
bfceb88
Update deletes to select models, not just ID column
NeilBetham Oct 18, 2016
df70b89
Update regex version to match what is available
Herkemer Dec 28, 2016
74e8a5e
Merge pull request #305 from Herkemer/development-postgres
brookesy2 Dec 28, 2016
1c0f1d4
Fix #270.
gkoh Sep 19, 2016
4241ca4
Fix #307: Add supportedparams in capabilities
gkoh Feb 21, 2017
fdb1ffe
Merge pull request #295 from gkoh/development-postgres
brookesy2 Feb 21, 2017
749f3b0
Fixing pre import script
ctero May 28, 2017
481ecfd
Merge pull request #310 from ctero/development-postgres
brookesy2 Jun 5, 2017
27af5a2
Merge pull request #308 from gkoh/issue/307
brookesy2 Jun 5, 2017
45ae7fc
Merge pull request #301 from NeilBetham/development-postgres
brookesy2 Sep 8, 2017
49c2eb2
Updated take_last to keep="last" to support newer pandas versions. Be…
brookesy2 Sep 8, 2017
bb57cd6
Merge pull request #313 from brookesy2/development-postgres
brookesy2 Sep 8, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 26 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -469,6 +469,13 @@ environment variable for requests. For example, with Apache:

SetEnvIf X-Forwarded-Protocol https HTTPS=1

To start and stop nginx/uwsgi, follow OS service directions. For Ubuntu, this looks like this:

> sudo service nginx start/stop/restart
> sudo service uwsgi start/stop/restart

If any service fails to start, you can view the logs in /var/log/[nginx, uwsgi].

### Using the miscellaneous scripts ###

Categorise all uncategorised releases - this runs automatically after import.
Expand Down Expand Up @@ -560,9 +567,10 @@ A semi-reliable way to install the required packages is below (be careful of sud

> sudo apt-get install npm nodejs-legacy ruby ruby-compass

Run the npm install:
Run the npm install (chown the dir back to your user temporarily):

> cd webui
> chown -R <user>:<group> *
> npm install [not using sudo]

Install necessary build tools (using sudo):
Expand All @@ -581,6 +589,10 @@ Then initiate the build:
> bower install
> grunt build

Then chown back to www-data:

> chown -R www-data:www-data *

This will build a working and optimised version of the UI into the dist/ directory, which
will then be hosted by your webserver as part of api.py. Note that you can disable the web
interface in the main configuration.
Expand Down Expand Up @@ -627,6 +639,19 @@ Run the following:
> gem install sass --no-ri --no-rdoc
> gem install compass --no-ri --no-rdoc

- Using Ubuntu 14.04, I'm getting strange errors with parts of pynab referring to the "six" package.

(from @JameZUK)

This issue only presented itself with a highly frustrating error when trying to start the prebot. The error inside supervisor looks like: pynab:prebot: ERROR (abnormal termination)

To fix it, just remove six from the base Ubuntu install and force pip3 to upgrade it to the latest version:

> sudo rm /usr/lib/python2.7/dist-packages/six.pyc
> sudo rm /usr/lib/python2.7/dist-packages/six.py
> sudo rm -rf /usr/lib/python2.7/dist-packages/six-1.5.2.egg-info
> sudo pip3 install six --upgrade


Newznab API
===========
Expand Down
24 changes: 23 additions & 1 deletion config_sample.py
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@
# expects data in newznab sql dump format
# 'http://www.newznab.com/getregex.php?newznabID=<id>'
'regex_type': 'nzedb',
'regex_url': 'https://raw.githubusercontent.com/nZEDb/nZEDb/master/resources/db/schema/data/10-release_naming_regexes.tsv',
'regex_url': 'https://raw.githubusercontent.com/nZEDb/nZEDb/0.x/resources/db/schema/data/10-release_naming_regexes.tsv',

# blacklist_url: url to retrieve blacklists from
# generally leave alone
Expand Down Expand Up @@ -396,3 +396,25 @@
# db: database name in mongo
'db': 'pynab',
}

# Prebot to scrape pre/request ID's
# Currently Regex is only set up for nZEDbPRE on irc.synirc.net
# Defaults should most likely be kept (Except for nick)
prebot = {
# nick: nick of the prebot
# try not to use random characters as this may result in a ban
# REMEMBER TO SET THIS
'nick': '',

# channel: channel to join
# default: #nZEDbPRE
'channel': '#nZEDbPRE',

# server: IRC server to join
# default: irc.synirc.net
'server': 'irc.synirc.net',

# port: port used to connect to the IRC server
# default: 6667
'port': 6667,
}
2 changes: 1 addition & 1 deletion init/supervisor/pynab.conf
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ user=www-data
[program:backfill]
command=/usr/bin/python3 /opt/pynab/scan.py backfill
autostart=false
autorestart=true
autorestart=false
stopsignal=QUIT
user=www-data

Expand Down
6 changes: 3 additions & 3 deletions postprocess.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,19 +195,19 @@ def main():
# delete any orphan nzbs
log.info('postprocess: deleting orphan nzbs...')
# noinspection PyComparisonWithNone
deleted_nzbs = db.query(NZB.id).filter(NZB.release == None).delete(synchronize_session='fetch')
deleted_nzbs = db.query(NZB).filter(NZB.release == None).delete(synchronize_session='fetch')
log.info('postprocess: deleted {} orphaned nzbs.'.format(deleted_nzbs))

# delete any orphan nfos
log.info('postprocess: deleting orphan nfos...')
# noinspection PyComparisonWithNone
deleted_nfos = db.query(NFO.id).filter(NFO.release == None).delete(synchronize_session='fetch')
deleted_nfos = db.query(NFO).filter(NFO.release == None).delete(synchronize_session='fetch')
log.info('postprocess: deleted {} orphaned nfos.'.format(deleted_nfos))

# delete any orphan sfvs
log.info('postprocess: deleting orphan sfvs...')
# noinspection PyComparisonWithNone
deleted_sfvs = db.query(SFV.id).filter(SFV.release == None).delete(synchronize_session='fetch')
deleted_sfvs = db.query(SFV).filter(SFV.release == None).delete(synchronize_session='fetch')
log.info('postprocess: deleted {} orphaned sfvs.'.format(deleted_sfvs))

db.commit()
Expand Down
20 changes: 13 additions & 7 deletions prebot.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,14 @@
# Thanks to Joel Rosdahl <[email protected]> for this script
# Taken from https://bitbucket.org/jaraco/irc/src

import string
import random

import irc.bot
import irc.strings
from docopt import docopt

import pynab.pre
from pynab import log_init, log
import config


class TestBot(irc.bot.SingleServerIRCBot):
Expand All @@ -42,15 +41,22 @@ def on_pubmsg(self, c, e):


def main():
channel = "#nZEDbPRE"
nickname = ''.join([random.choice(string.ascii_letters) for n in range(8)])
log.info("Pre: Bot Nick - {}".format(nickname))
bot = TestBot(channel, nickname, "irc.synirc.net", 6667)
channel = config.prebot.get('channel')
nick = config.prebot.get('nick')
server = config.prebot.get('server')
port = config.prebot.get('port')

log.info("Pre: Bot Nick - {}".format(nick))
bot = TestBot(channel, nick, server, port)
bot.start()


if __name__ == '__main__':
arguments = docopt(__doc__, version=pynab.__version__)

if arguments['start']:
log_init('prebot')
main()
if config.prebot.get('nick'):
main()
else:
log.warn("Pre: Bot nick not set in config, please update and restart the bot")
3 changes: 3 additions & 0 deletions pynab/binaries.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,9 @@ def process():
db.query(Regex).filter(Regex.id==reg.id).delete()
db.commit()

if not all_regex:
log.warning('binary: no regexes available for any groups being processed. update your regex?')

# noinspection PyComparisonWithNone
query = db.query(Part).filter(Part.group_name.in_(relevant_groups)).filter(Part.binary_id == None)
total_parts = query.count()
Expand Down
11 changes: 9 additions & 2 deletions pynab/db.py
Original file line number Diff line number Diff line change
Expand Up @@ -356,12 +356,19 @@ def to_json(obj):
obj = json.dumps(dict, default=json_serial)
return obj

def create_hash(context):
def _create_hash(name, group_id, posted):
return hashlib.sha1('{}.{}.{}'.format(
name,
group_id,
posted
).encode('utf-8')).hexdigest()

def create_hash(context):
return _create_hash(
context.current_parameters['name'],
context.current_parameters['group_id'],
context.current_parameters['posted']
).encode('utf-8')).hexdigest()
)

class Release(Base):
__tablename__ = 'releases'
Expand Down
2 changes: 1 addition & 1 deletion pynab/groups.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ def scan(group_name, direction='forward', date=None, target=None, limit=None):

iterations += 1

if limit and iterations >= 3:#* config.scan.get('message_scan_limit') >= limit:
if limit and config.scan.get('message_scan_limit') >= limit:
log.info(
'group: {}: scan limit reached, ending early (will continue later)'.format(group_name))
return False
Expand Down
81 changes: 45 additions & 36 deletions pynab/pre.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,56 +10,65 @@
def nzedbirc(unformattedPre):
formattedPre = parseNzedbirc(unformattedPre)

with db_session() as db:
p = db.query(Pre).filter(Pre.name == formattedPre['name']).first()

if not p:
p = Pre(**formattedPre)
else:
for k, v in formattedPre.items():
setattr(p, k, v)

try:
db.add(p)
log.info("pre: Inserted/Updated - {}".format(formattedPre["name"]))
except Exception as e:
log.debug("pre: Error - {}".format(e))
if formattedPre is not None:
with db_session() as db:
p = db.query(Pre).filter(Pre.name == formattedPre['name']).first()

if not p:
p = Pre(**formattedPre)
else:
for k, v in formattedPre.items():
setattr(p, k, v)

try:
db.add(p)
log.info("pre: Inserted/Updated - {}".format(formattedPre["name"]))
except Exception as e:
log.debug("pre: Error - {}".format(e))


#Message legend: DT: PRE Time(UTC) | TT: Title | SC: Source | CT: Category | RQ: Requestid | SZ: Size | FL: Files | FN: Filename
#Sample: NEW: [DT: 2015-01-09 16:08:45][TT: Sample-Release][SC: sample-source][CT: 0DAY][RQ: N/A][SZ: N/A][FL: N/A][FN: N/A]
#Sample: NEW: [DT: 2016-04-29 14:57:16] [TT: RELEASE] [SC: GROUP] [CT: CATEGORY] [RQ: REQUEST] [SZ: 3550MB] [FL: 71x50MB] [FN: N/A]
def parseNzedbirc(unformattedPre):
CLEAN_REGEX = regex.compile('[\x02\x0F\x16\x1D\x1F]|\x03(\d{,2}(,\d{,2})?)?')
PRE_REGEX = regex.compile(
'(?P<preType>.+): \[DT: (?<pretime>.+)\] \[TT: (?P<name>.+)\] \[SC: (?P<source>.+)\] \[CT: (?P<category>.+)\] \[RQ: (?P<request>.+)\] \[SZ: (?P<size>.+)\] \[FL: (?P<files>.+)\] \[FN: (?P<filename>.+)\]')

formattedPre = {}

try:
formattedPre = PRE_REGEX.search(unformattedPre).groupdict()
except Exception as e:
log.debug("pre: Error parsing nzedbirc - {}".format(e))
if unformattedPre is not None:
try:
cleanPre = regex.sub(CLEAN_REGEX, '', unformattedPre);
formattedPre = PRE_REGEX.search(cleanPre).groupdict()
except Exception as e:
log.debug("pre: Message prior to error - {}".format(unformattedPre))
log.debug("pre: Error parsing nzedbirc - {}".format(e))
formattedPre = None

if formattedPre['preType'] == "NUK":
formattedPre['nuked'] = True
else:
formattedPre['nuked'] = False
if formattedPre is not None:
if formattedPre['preType'] == "NUK":
formattedPre['nuked'] = True
else:
formattedPre['nuked'] = False

#Deal with splitting out requests if they exist
if formattedPre['request'] != "N/A":
formattedPre['requestid'] = formattedPre['request'].split(":")[0]
formattedPre['requestgroup'] = formattedPre['request'].split(":")[1]
else:
formattedPre['requestid'] = None
#Deal with splitting out requests if they exist
if formattedPre['request'] != "N/A":
formattedPre['requestid'] = formattedPre['request'].split(":")[0]
formattedPre['requestgroup'] = formattedPre['request'].split(":")[1]
else:
formattedPre['requestid'] = None

formattedPre['searchname'] = releases.clean_release_name(formattedPre['name'])
formattedPre['searchname'] = releases.clean_release_name(formattedPre['name'])

#remove any columns we dont need. Perhaps a way to filter these out via regex? Or a way to ignore via sqlalchemy
formattedPre.pop("preType", None)
formattedPre.pop("size", None)
formattedPre.pop("files", None)
formattedPre.pop("request", None)
#remove any columns we dont need. Perhaps a way to filter these out via regex? Or a way to ignore via sqlalchemy
formattedPre.pop("preType", None)
formattedPre.pop("size", None)
formattedPre.pop("files", None)
formattedPre.pop("request", None)

return formattedPre
return formattedPre
else:
return None


# orlydb scraping
Expand Down
14 changes: 12 additions & 2 deletions pynab/releases.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from sqlalchemy.orm import *

from pynab import log
from pynab.db import to_json, db_session, engine, Binary, Part, Release, Group, Category, Blacklist
from pynab.db import to_json, db_session, engine, Binary, Part, Release, Group, Category, Blacklist, _create_hash
import pynab.categories
import pynab.nzbs
import pynab.rars
Expand Down Expand Up @@ -197,6 +197,7 @@ def process():
r = db.query(Release).filter(Release.name == completed_binary[1]).filter(
Release.posted == completed_binary[2]
).first()

if r:
# if it does, we have a duplicate - delete the binary
db.query(Binary).filter(Binary.id == completed_binary[0]).delete()
Expand All @@ -205,6 +206,15 @@ def process():
# if it's a really big file, we want to deal with it differently
binary = db.query(Binary).filter(Binary.id == completed_binary[0]).first()

# get the group early for use in uniqhash
group = db.query(Group).filter(Group.name == binary.group_name).one()

# check if the uniqhash already exists too
dupe_release = db.query(Release).filter(Release.uniqhash == _create_hash(binary.name, group.id, binary.posted)).first()
if dupe_release:
db.query(Binary).filter(Binary.id == completed_binary[0]).delete()
continue

# this is an estimate, so it doesn't matter too much
# 1 part nfo, 1 part sfv or something similar, so ignore two parts
# take an estimate from the middle parts, since the first/last
Expand Down Expand Up @@ -354,7 +364,7 @@ def process():
release.search_name = clean_release_name(binary.name)

# assign the release group
release.group = db.query(Group).filter(Group.name == binary.group_name).one()
release.group = group

# give the release a category
release.category_id = pynab.categories.determine_category(binary.name, binary.group_name)
Expand Down
4 changes: 3 additions & 1 deletion pynab/requests.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,10 @@ def process(limit=None):
# no longer need to check group
updated_release = group_requests.get(str(pre.requestid))
updated_release.pre_id = pre.id
updated_release.name = pre.name
updated_release.search_name = pre.searchname
db.merge(updated_release)
log.info("requests: found pre request id {} ({}) for {}".format(pre.requestid, group_name,
updated_release.name))

db.commit()
db.commit()
11 changes: 7 additions & 4 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ bottle
xmltodict
pynzb
requests
roman
regex>=2.4.44
roman>=2.0.0
regex==2014.05.17
lxml
daemonize
colorlog
Expand All @@ -23,11 +23,14 @@ sleekxmpp
eventlet
requests-futures
irc
six>=1.10.0
colorama
pandas
beautifulsoup4
pySmartDL
pytvmaze
pytvmaze<2.0
git+https://github.com/PyMySQL/PyMySQL.git
nltk

jaraco.itertools
jaraco.collections
jaraco.text
Loading