Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent fails to download content when there is a content warning #7

Open
ankenyr opened this issue Jun 5, 2017 · 6 comments
Open

Agent fails to download content when there is a content warning #7

ankenyr opened this issue Jun 5, 2017 · 6 comments

Comments

@ankenyr
Copy link
Contributor

ankenyr commented Jun 5, 2017

So for a video like the following
https://www.youtube.com/watch?ajax=1&v=G0Qz-ZCwbaQ
The agent is unable to get this information. It would be nice if there was a mechanism to do this. I imagine you would need to provide a username/password though this might cause problems if a person is using two factor. I don't know if you know a better way to approach this @sander1. Is there a secure way for plex agents to store passwords? I imagine I could just make a one off account used only for requesting stuff for the metadata.

@ankenyr
Copy link
Contributor Author

ankenyr commented Nov 5, 2017

Hi @sander1 I finally took a look into this and I believe I have a solution but I am running into a bit of a problem. What modified things a bit on the original lines but they can easily be changed back. The idea is if I see there is a key 'content' in the json we pass. If not then we check if there is 'verify_age' in the value for key 'location'. If so we will use urllib2Request to download it again and beautifulsoup to grab the correct information. This works well enough in standard python but it fails inside of the plugin with a certificate error. Pay no mind to the fact that I have shitty if elif statements. I will fix those up and this was mostly for testing as I was debugging stuff.

 req = HTTP.Request(YOUTUBE_VIDEO_DETAILS % metadata.id)
                        json = req.content[4:]
                        parsed_json = JSON.ObjectFromString(json)
                        Log(parsed_json)
                        if parsed_json['content']:
                            pass
                        elif 'verify_age' in parsed_json['location']:
                            headers = {"User-agent" : "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6; en-us) AppleWebKit/530.1+ (KHTML, like Gecko) Version/3.2.1 Safari/525.27.1", "Accept-encoding" : "gzip"}
                            req = urllib2.Request("YOUTUBE_VIDEO_DETAILS % metadata.id", None, headers)
                            f = urllib2.urlopen(req)
                            resp = f.read()
                            soup = BeautifulSoup(resp)
                            Log(soup)
                        json_obj = parsed_json['content']
                        Log('JSON_OBJ')
                        Log(json_obj)

error message

2017-11-05 10:01:22,581 (-c8f84c0) :  CRITICAL (agentkit:1078) - Exception in the update function of agent named 'YouTube', called with guid 'com.plexapp.agents.youtube://uKZOBcZVMD0?lang=xn' (most recent call last):
  File "/volume1/@appstore/Plex Media Server/Resources/Plug-ins-1bf240a65/Framework.bundle/Contents/Resources/Versions/2/Python/Framework/api/agentkit.py", line 1076, in _update
    agent.update(obj, media, lang, **kwargs)
  File "/volume1/Plex/Library/Application Support/Plex Media Server/Plug-ins/YouTube-Agent.bundle/Contents/Code/__init__.py", line 56, in update
    f = urllib2.urlopen(req)
  File "/volume1/@appstore/Plex Media Server/Resources/Python/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/volume1/@appstore/Plex Media Server/Resources/Python/lib/python2.7/urllib2.py", line 429, in open
    response = self._open(req, data)
  File "/volume1/@appstore/Plex Media Server/Resources/Python/lib/python2.7/urllib2.py", line 447, in _open
    '_open', req)
  File "/volume1/@appstore/Plex Media Server/Resources/Python/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/volume1/@appstore/Plex Media Server/Resources/Python/lib/python2.7/urllib2.py", line 1241, in https_open
    context=self._context)
  File "/volume1/@appstore/Plex Media Server/Resources/Python/lib/python2.7/urllib2.py", line 1198, in do_open
    raise URLError(err)
URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)>

I replicated how HTTP.py in the plex framework does the request. I am confused why my code gets this error. The ssl.py library in plex is exactly the same as the one on my synology server. I am thinking that Plex does something possibly that is preventing me from using urllib2 directly. Since you work for Plex it seems is there any insight you could provide?

@sander1
Copy link
Owner

sander1 commented Nov 5, 2017

The problem is that even if you write code to follow the verify_age?next=... URL, it does not lead you to useful data. It only gives you other JSON formatted data with a note that you have to login to watch the video.

@ankenyr
Copy link
Contributor Author

ankenyr commented Nov 6, 2017

It isn't about following the url, the page itself has the data. You could forego useing HTTP.request entirely to get the information required as done below.

import urllib2
from BeautifulSoup import BeautifulSoup
req = urllib2.Request("https://www.youtube.com/watch?v=seRxE3b6m_w")
 f = urllib2.urlopen(req)
resp = f.read()
soup = BeautifulSoup(resp)
 print soup.find("strong", {"class": "watch-time-text"}).text
Published on Aug 12, 2009
print soup.find("div", {"class": "yt-user-info"}).text
vlogbrothers
print soup.find("div", {"id": "watch-description-text"}).text
OK...this is only a little bit embarrassing.It's a song about vegetables that look like penises...hopefully it won't get flagged I mean...they're just vegetables!HERE ARE A LOT OF LINKS TO NERDFIGHTASTIC THINGS:Shirts and Stuff:http://dftba.com/artist/30/VlogbrothersHank's Music:http://dftba.com/artist/15/Hank-GreenJohn's Books:http://amzn.to/j3LYqo======================Hank's Twitter:http://www.twitter.com/hankgreenHank's Facebook:http://www.facebook.com/hankimonHank's tumblr:http://edwardspoonhands.tumblr.comJohn's Twitter:http://www.twitter.com/realjohngreenJohn's Facebook:http://www.facebook.com/johngreenfansJohn's tumblr:http://fishingboatproceeds.tumblr.com======================Other ChannelsCrash Course:http://www.youtube.com/crashcourseSciShow:http://www.youtube.com/scishowGaming:http://www.youtube.com/hankgamesVidCon:http://www.youtube.com/vidconHank's Channel:http://www.youtube.com/hankschannelTruth or Fail:http://www.youtube.com/truthorfail======================Nerdfighteriahttp://effyeahnerdfighters.com/http://effyeahnerdfighters.com/nftumblrshttp://reddit.com/r/nerdfightershttp://nerdfighteria.info/A Bunny(\(\( - -)((') (')

I imagine there is a reason to use HTTP.py as it is doing a lot of other stuff but in this case it doesn't return everything we could use. If you think it makes more sense to modify HTTP.py I am happy to do modifications there and send a PR but I figured this would be simpler but that damn SSL error is stumping me. Let me know what you think @sander1

@ankenyr
Copy link
Contributor Author

ankenyr commented Nov 8, 2017

Any thoughts on that SSL error @sander1?

@sander1
Copy link
Owner

sander1 commented Nov 10, 2017

I kept running into SSL issues too, and I haven't been able to "fix" it within the Plex framework (I'm not the dev of the framework btw).

For now I use a dirty workaround that does not validate the SSL certificates by adding:

import ssl, urllib2

Then use a function like this:

def GetData(url):

    req = urllib2.Request(url, headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"})
    ssl_context = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
    data = urllib2.urlopen(req, context=ssl_context).read()

    return data

Example in the HGTV Canada URL Service.

@ankenyr
Copy link
Contributor Author

ankenyr commented Nov 10, 2017

I assume you would not want something that is not validating certs checked in to the repo. Should we open up a issue on the framework about this? Seems like this is a valid issue. I really appreciate your help btw!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants