RA fails in batches due to TIMEOUT #1316

PHA-SYSOPS · 2023-06-20T10:00:56Z

It seems RA fails, this is a two prone error for both pruntimev1 and v2.

for v2 you will get this error (which makes it hard to understand):
error 18; this may indicate that infrastructure for the epid attestation requested by gramine is missing on this machine

for v1 you will get more indicating error SGX_RA_TIMEOUT

There are 2 conditions that trigger this error:

obviously networking issues, mostly DNS, e.g. docker container uses wrong DNS servers like 127.0.0.1 which obviously wont work
the reply from intel is somehow delayed, mostly due to routing issues between ISP's and Microsoft Azure

We are specifically looking at issue number 2 here, where if you do a tcpdump you will notice that the reply is received later than 8 seconds (in my case between 8.2 and 11.7 seconds), which is long yes, but not a problem. Intel does not ratelimit like this, only send HTTP codes for that (see : https://www.intel.in/content/www/in/en/support/articles/000090552/software/intel-security-products.html)

The underlying code for this is :

`fn get_report_from_intel(quote: &[u8], ias_key: &str) -> Result<(String, String, String)> {
let encoded_quote = base64::encode(quote);
let encoded_json = format!("{{"isvEnclaveQuote":"{encoded_quote}"}}\r\n");

let mut res_body_buffer = Vec::new(); //container for body of a response
let timeout = Some(Duration::from_secs(8));

let url: reqwest::Url = format!("https://%7Bias_host%7D%7Bias_report_endpoint%7D%22%29.parse%28%29/?;
info!(from=%url, "Getting RA report");`

As we can see there is no catching fail here, or retry, and the 8 seconds is hardcoded. I would request that the timeout and amount of retries can be configured via ENV and this put in a retry/catch loop to solve this.

The text was updated successfully, but these errors were encountered:

kvinwang · 2023-06-20T10:44:14Z

Looks reasonable to me. Let me improve it later.

kvinwang mentioned this issue Jun 25, 2023

pruntime: Configurable RA timeout and retries #1321

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RA fails in batches due to TIMEOUT #1316

RA fails in batches due to TIMEOUT #1316

PHA-SYSOPS commented Jun 20, 2023

kvinwang commented Jun 20, 2023

RA fails in batches due to TIMEOUT #1316

RA fails in batches due to TIMEOUT #1316

Comments

PHA-SYSOPS commented Jun 20, 2023

kvinwang commented Jun 20, 2023