Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing non-200 messages to the console #14

Open
lanegoolsby opened this issue Jun 8, 2022 · 0 comments
Open

Writing non-200 messages to the console #14

lanegoolsby opened this issue Jun 8, 2022 · 0 comments

Comments

@lanegoolsby
Copy link

I am trying to crawl a site that randomly returns an error response code. What I suspect is happening is the site is returning a HTTP 403 response code with an empty payload because the site itself has request rate throttling enabled. However, I can't confirm that because the crawler does not provide enough detail to confirm or deny.

Is there a way to get move verbose messages? If not, could this be added?

I am running the crawler as a Docker image on Mac.

Here's the error message I receive.

DEBUG:typesense.api_call:our.internal.typesense.server:443 is healthy. Status code: 400
ERROR:scrapy.core.scraper:Spider error processing <GET https://our.internal.site/some/path/> (referer: None)
Traceback (most recent call last):
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/twisted/internet/defer.py", line 662, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/root/src/documentation_spider.py", line 177, in parse_from_start_url
    self.add_records(response, from_sitemap=False)
  File "/root/src/documentation_spider.py", line 149, in add_records
    self.typesense_helper.add_records(records, response.url, from_sitemap)
  File "/root/src/typesense_helper.py", line 63, in add_records
    transformed_records[i:i + 50])
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/documents.py", line 56, in import_
    api_response = self.api_call.post(self._endpoint_path('import'), docs_import, params, as_json=False)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/api_call.py", line 145, in post
    timeout=self.config.connection_timeout_seconds)
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/api_call.py", line 113, in make_request
    error_message = r.json().get('message', 'API error.')
  File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/models.py", line 900, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant