Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please complete doc and test it #3

Open
movingheart opened this issue Jun 27, 2016 · 2 comments
Open

Please complete doc and test it #3

movingheart opened this issue Jun 27, 2016 · 2 comments

Comments

@movingheart
Copy link

Some suggestions:

  1. complete your doc about how to use, please give a example in scrapy;
  2. this code have some bugs, eg. https://github.com/movingheart/django_example/blob/master/QQ%E5%9B%BE%E7%89%8720160628005154.png
@denity
Copy link

denity commented Apr 16, 2017

Has this bug fixed yet?

@redapple
Copy link
Contributor

@denity , if you're referring to :

2017-05-18 11:25:57 [twisted] CRITICAL: Unhandled Error
Traceback (most recent call last):
  File "/home/paul/.virtualenvs/scrapy-jsonrpc.py2/local/lib/python2.7/site-packages/twisted/protocols/basic.py", line 571, in dataReceived
    why = self.lineReceived(line)
  File "/home/paul/.virtualenvs/scrapy-jsonrpc.py2/local/lib/python2.7/site-packages/twisted/web/http.py", line 1811, in lineReceived
    self.allContentReceived()
  File "/home/paul/.virtualenvs/scrapy-jsonrpc.py2/local/lib/python2.7/site-packages/twisted/web/http.py", line 1906, in allContentReceived
    req.requestReceived(command, path, version)
  File "/home/paul/.virtualenvs/scrapy-jsonrpc.py2/local/lib/python2.7/site-packages/twisted/web/http.py", line 771, in requestReceived
    self.process()
--- <exception caught here> ---
  File "/home/paul/.virtualenvs/scrapy-jsonrpc.py2/local/lib/python2.7/site-packages/twisted/web/server.py", line 190, in process
    self.render(resrc)
  File "/home/paul/.virtualenvs/scrapy-jsonrpc.py2/local/lib/python2.7/site-packages/twisted/web/server.py", line 241, in render
    body = resrc.render(self)
  File "/home/paul/.virtualenvs/scrapy-jsonrpc.py2/local/lib/python2.7/site-packages/scrapy_jsonrpc/txweb.py", line 11, in render
    return self.render_object(r, txrequest)
  File "/home/paul/.virtualenvs/scrapy-jsonrpc.py2/local/lib/python2.7/site-packages/scrapy_jsonrpc/txweb.py", line 14, in render_object
    r = self.json_encoder.encode(obj) + "\n"
  File "/home/paul/.virtualenvs/scrapy-jsonrpc.py2/local/lib/python2.7/site-packages/scrapy_jsonrpc/serialize.py", line 89, in encode
    return super(ScrapyJSONEncoder, self).encode(o)
  File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
    return _iterencode(o, 0)
  File "/home/paul/.virtualenvs/scrapy-jsonrpc.py2/local/lib/python2.7/site-packages/scrapy_jsonrpc/serialize.py", line 109, in default
    return super(ScrapyJSONEncoder, self).default(o)
  File "/usr/lib/python2.7/json/encoder.py", line 184, in default
    raise TypeError(repr(o) + " is not JSON serializable")
exceptions.TypeError: <scrapy.crawler.Crawler object at 0x7f14cac75dd0> is not JSON serializable

when accessing http://localhost:<webserviceport>/crawler,
then I believe it's not a valid bug.

With Python 2.7, scrapy 1.3.3 and scrapy-jsonrpc and a simple spider like this:

# -*- coding: utf-8 -*-
import scrapy


class ExampleSpider(scrapy.Spider):
    name = "example"
    allowed_domains = ["example.com"]
    def start_requests(self):
        for i in range(0, 1000):
            yield scrapy.Request('http://httpbin.org/get?q=%d' % i)

    def parse(self, response):
        pass

I also get that error when accessing the webservice endpoint in my browser.

But this is not the intended way to interact with this RPC extension.

User should use it in similar way to what example-client.py does.

Example usage:
(note: the warnings below should be addressed with #11)

$ python example-client.py -H localhost -P 6025 list-running
/home/paul/src/scrapy-jsonrpc/scrapy_jsonrpc/serialize.py:8: ScrapyDeprecationWarning: Module `scrapy.spider` is deprecated, use `scrapy.spiders` instead
  from scrapy.spider import Spider
spider:7f9fe4276890:example

This internal does an HTTP GET on /crawler/engine/open_spiders

GET /crawler/engine/open_spiders HTTP/1.1
Accept-Encoding: identity
Host: localhost:6025
Connection: close
User-Agent: Python-urllib/2.7

HTTP/1.1 200 OK
Content-Length: 32
Access-Control-Allow-Headers:  X-Requested-With
Server: TwistedWeb/17.1.0
Connection: close
Date: Thu, 18 May 2017 09:34:49 GMT
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, PATCH, PUT, DELETE
Content-Type: application/json

["spider:7f9fe4276890:example"]

In other words, the /crawler resource is not usable directly (at least with GET in a browser).

Although the example client has bugs too. Stats for example are available at /crawler/stats, not /stats:

list-available does a POST on /crawler/spiders:

$ python example-client.py -H localhost -P 6025 list-available
/home/paul/src/scrapy-jsonrpc/scrapy_jsonrpc/serialize.py:8: ScrapyDeprecationWarning: Module `scrapy.spider` is deprecated, use `scrapy.spiders` instead
  from scrapy.spider import Spider
/home/paul/src/scrapy-jsonrpc/scrapy_jsonrpc/jsonrpc.py:40: ScrapyDeprecationWarning: Call to deprecated function unicode_to_str. Use scrapy.utils.python.to_bytes instead.
  data = unicode_to_str(json.dumps(req))
example

POST /crawler/spiders HTTP/1.1
Accept-Encoding: identity
Content-Length: 59
Host: localhost:6025
Content-Type: application/x-www-form-urlencoded
Connection: close
User-Agent: Python-urllib/2.7

{"params": {}, "jsonrpc": "2.0", "method": "list", "id": 1}

HTTP/1.1 200 OK
Content-Length: 51
Access-Control-Allow-Headers:  X-Requested-With
Server: TwistedWeb/17.1.0
Connection: close
Date: Thu, 18 May 2017 09:37:16 GMT
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, PATCH, PUT, DELETE
Content-Type: application/json

{"jsonrpc": "2.0", "result": ["example"], "id": 1}

get-global-stats does another POST:

$ python example-client.py -H localhost -P 6025 get-global-stats
/home/paul/src/scrapy-jsonrpc/scrapy_jsonrpc/serialize.py:8: ScrapyDeprecationWarning: Module `scrapy.spider` is deprecated, use `scrapy.spiders` instead
  from scrapy.spider import Spider
/home/paul/src/scrapy-jsonrpc/scrapy_jsonrpc/jsonrpc.py:40: ScrapyDeprecationWarning: Call to deprecated function unicode_to_str. Use scrapy.utils.python.to_bytes instead.
  data = unicode_to_str(json.dumps(req))
log_count/DEBUG                          115
scheduler/dequeued                       113
log_count/INFO                           12
downloader/response_count                113
downloader/response_status_count/200     113
log_count/WARNING                        4
scheduler/enqueued/memory                113
downloader/response_bytes                72569
start_time                               2017-05-18 09:32:18
scheduler/dequeued/memory                113
scheduler/enqueued                       113
downloader/request_bytes                 24743
response_received_count                  113
downloader/request_method_count/GET      114
downloader/request_count                 114


POST /crawler/stats HTTP/1.1
Accept-Encoding: identity
Content-Length: 64
Host: localhost:6025
Content-Type: application/x-www-form-urlencoded
Connection: close
User-Agent: Python-urllib/2.7

{"params": {}, "jsonrpc": "2.0", "method": "get_stats", "id": 1}

HTTP/1.1 200 OK
Content-Length: 528
Access-Control-Allow-Headers:  X-Requested-With
Server: TwistedWeb/17.1.0
Connection: close
Date: Thu, 18 May 2017 09:38:54 GMT
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, PATCH, PUT, DELETE
Content-Type: application/json

{"jsonrpc": "2.0", "result": {"log_count/DEBUG": 115, "scheduler/dequeued": 113, "log_count/INFO": 12, "downloader/response_count": 113, "downloader/response_status_count/200": 113, "log_count/WARNING": 4, "scheduler/enqueued/memory": 113, "downloader/response_bytes": 72569, "start_time": "2017-05-18 09:32:18", "scheduler/dequeued/memory": 113, "scheduler/enqueued": 113, "downloader/request_bytes": 24743, "response_received_count": 113, "downloader/request_method_count/GET": 114, "downloader/request_count": 114}, "id": 1}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants