Skip to content
This repository has been archived by the owner on Dec 12, 2021. It is now read-only.

transient errors #38

Open
thehesiod opened this issue Feb 4, 2019 · 7 comments
Open

transient errors #38

thehesiod opened this issue Feb 4, 2019 · 7 comments

Comments

@thehesiod
Copy link

thehesiod commented Feb 4, 2019

Have started seeing more of these:
InternalTransientError: Temporary error in fetching URL: https://www.googleapis.com/bigquery/v2/projects/[PROJ]/datasets/gae_streaming/tables/Host/insertAll, please re-try at _get_fetch_result (/base/alloc/tmpfs/dynamic_runtimes/python27g/7e468a4e2dbc991a/python27/python27_lib/versions/1/google/appengine/api/urlfetch.py:446)

as well:
TimeoutError: (<requests.packages.urllib3.contrib.appengine.AppEngineManager object at 0x2a583ad9b4d0>, DeadlineExceededError('Deadline exceeded while waiting for HTTP response from URL: https://www.googleapis.com/bigquery/v2/projects/[PROJ]/datasets/gae_streaming/tables/Host/insertAll',))

and
ConnectionError: Connection closed unexpectedly by server at URL: https://www.googleapis.com/bigquery/v2/projects/[PROJ]/datasets/gae_streaming/tables/Host/insertAll

sounds like there's some missing retries. Does this mean that our bigquery tables will be missing entries? If these are retried then something should be changed for them not the show up in stackdriver error reporting.

@thehesiod
Copy link
Author

another:

BadGateway: 502 POST https://www.googleapis.com/bigquery/v2/projects/[PROJ]/datasets/gae_streaming/tables/Host/insertAll: <!DOCTYPE html> <html lang=en> <meta charset=utf-8> <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width"> <title>Error 502 (Server Error)!!1</title> <style> *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px} </style> <a href=//www.google.com/><span id=logo aria-label=Google></span></a> <p><b>502.</b> <ins>That’s an error.</ins> <p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds. <ins>That’s all we know.</ins>
at api_request (/base/data/home/apps/m~[PROJ]/20190130t125536.415774778466604461/external/gcloud_core_archive/google/cloud/_http.py:293)
at retry_target (/base/data/home/apps/m~[PROJ]/20190130t125536.415774778466604461/external/gcloud_api_core_archive/google/api_core/retry.py:177)
at retry_wrapped_func (/base/data/home/apps/m~[PROJ]/20190130t125536.415774778466604461/external/gcloud_api_core_archive/google/api_core/retry.py:260)
at _call_api (/base/data/home/apps/m~[PROJ]/20190130t125536.415774778466604461/external/gcloud_bigquery_archive/google/cloud/bigquery/client.py:311)

@msuozzo
Copy link
Member

msuozzo commented Feb 5, 2019

Given that these requests are issued from deferred tasks, I believe these failures will be retried by the queue itself 1.

@chief8192
Copy link
Contributor

Yeah all BigQuery insertions take place on the bigquery-streaming queue, which is configured to retry a lot:
https://github.com/google/upvote/blob/master/upvote/gae/queue.yaml#L102-L103

@thehesiod
Copy link
Author

cool, how could the code be updated to not log stackdriver errors that need to be investigated?

@thehesiod
Copy link
Author

or maybe this is a stackdriver/queue configuration thing?

@msuozzo
Copy link
Member

msuozzo commented Feb 5, 2019

So you're getting separate "potential issues" filed because the failures are occurring at different places in the app engine stack. These should eventually peter out (there are only so many places an HTTP request can fail....maybe) but the better alternative would be to surround the potential request site(s) (link which is actually like 60 lines...) with a broad try..except that raises a new error or (maybe) re-raises the original.

With this, you should only get a single alert in stackdriver for a unique error (although you'll still see the errors occurring in any graphs or request metrics).

@thehesiod
Copy link
Author

yes, so how would we change it so that only after it's run out of retries it's logged as an error? That's a lot of noise to keep track of. Seems like a general issue with tasks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants