You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to crawl a site that randomly returns an error response code. What I suspect is happening is the site is returning a HTTP 403 response code with an empty payload because the site itself has request rate throttling enabled. However, I can't confirm that because the crawler does not provide enough detail to confirm or deny.
Is there a way to get move verbose messages? If not, could this be added?
I am running the crawler as a Docker image on Mac.
Here's the error message I receive.
DEBUG:typesense.api_call:our.internal.typesense.server:443 is healthy. Status code: 400
ERROR:scrapy.core.scraper:Spider error processing <GET https://our.internal.site/some/path/> (referer: None)
Traceback (most recent call last):
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/twisted/internet/defer.py", line 662, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/root/src/documentation_spider.py", line 177, in parse_from_start_url
self.add_records(response, from_sitemap=False)
File "/root/src/documentation_spider.py", line 149, in add_records
self.typesense_helper.add_records(records, response.url, from_sitemap)
File "/root/src/typesense_helper.py", line 63, in add_records
transformed_records[i:i + 50])
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/documents.py", line 56, in import_
api_response = self.api_call.post(self._endpoint_path('import'), docs_import, params, as_json=False)
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/api_call.py", line 145, in post
timeout=self.config.connection_timeout_seconds)
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/typesense/api_call.py", line 113, in make_request
error_message = r.json().get('message', 'API error.')
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/requests/models.py", line 900, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The text was updated successfully, but these errors were encountered:
I am trying to crawl a site that randomly returns an error response code. What I suspect is happening is the site is returning a HTTP 403 response code with an empty payload because the site itself has request rate throttling enabled. However, I can't confirm that because the crawler does not provide enough detail to confirm or deny.
Is there a way to get move verbose messages? If not, could this be added?
I am running the crawler as a Docker image on Mac.
Here's the error message I receive.
The text was updated successfully, but these errors were encountered: