Skip to content
This repository has been archived by the owner on Apr 18, 2024. It is now read-only.

Response doesn't end cleanly. #27

Open
3 tasks
guanzo opened this issue May 31, 2023 · 4 comments
Open
3 tasks

Response doesn't end cleanly. #27

guanzo opened this issue May 31, 2023 · 4 comments

Comments

@guanzo
Copy link
Contributor

guanzo commented May 31, 2023

Run $ curl "http://138.199.41.51/ipfs/bafybeifpz6onienrgwvb3mw5rg7piq5jh63ystjn7s5wk6ttezy2gy5xwu/Mexico.JPG?format=car&dag-scope=entity&bytes=0:1048575" -k -v -o mexico-1mb.car -H "Accept: application/vnd.ipld.car"

138.199.41.51 is a core node. You can also repro with a localhost L1.

The request should timeout after 2 minutes with this error message: transfer closed with outstanding read data remaining

Full output

guanzo@DESKTOP-24333KG:~$ curl "http://localhost/ipfs/bafybeifpz6onienrgwvb3mw5rg7piq5jh63ystjn7s5wk6ttezy2gy5xwu/Mexico.JPG?format=car&dag-scope=entity&bytes=0:1048575" -k -v -o mexico-1mb.car -H "Accept: application/vnd.ipld.car"
*   Trying 127.0.0.1:80...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to localhost (127.0.0.1) port 80 (#0)
> GET /ipfs/bafybeifpz6onienrgwvb3mw5rg7piq5jh63ystjn7s5wk6ttezy2gy5xwu/Mexico.JPG?format=car&dag-scope=entity&car-scope=file&depth=1&entity-bytes=0:1048575&bytes=0:1048575 HTTP/1.1
> Host: localhost
> User-Agent: curl/7.81.0
> Accept: application/vnd.ipld.car
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx
< Date: Wed, 31 May 2023 03:08:04 GMT
< Content-Type: application/vnd.ipld.car; version=1
< Transfer-Encoding: chunked
< Connection: keep-alive
< cache-control: public, max-age=29030400, immutable
< content-disposition: attachment; filename="bafybeifpz6onienrgwvb3mw5rg7piq5jh63ystjn7s5wk6ttezy2gy5xwu.car"
< etag: bafybeifpz6onienrgwvb3mw5rg7piq5jh63ystjn7s5wk6ttezy2gy5xwu.car
< Server-Timing: fetch;dur=0.312074,indexer;dur=0.000067;candidates-found=118921159;candidates-filtered=119026583,retrieval;dur=118.942823;first-byte-received=12931692937
< Server-Timing: shim; dur=15900.205737
< Server-Timing: shim_lassie; dur=15899.976966
< Server-Timing: shim_lassie_headers; dur=15899.552953
< Server-Timing: shim_lassie_body; dur=0.63105
< x-ipfs-path: /ipfs/bafybeifpz6onienrgwvb3mw5rg7piq5jh63ystjn7s5wk6ttezy2gy5xwu/Mexico.JPG
< x-content-type-options: nosniff
< Saturn-Node-Id: 12926ed7-03b9-479e-a9b2-480f1d4a70b9
< Saturn-Node-Version: 0_dev
< Saturn-Transfer-Id: 60e903b530aa1d74e730d5754612db22
< Saturn-Cache-Status: HIT
< Server-Timing: nginx;dur=0, nginx_uct;dur=, nginx_uht;dur=, nginx_urt;dur=
< Timing-Allow-Origin: *
< Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Methods: GET, POST, OPTIONS
< Access-Control-Allow-Headers: DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range
< Access-Control-Expose-Headers: *
<
{ [51844 bytes data]
100 1024k    0 1024k    0     0   8722      0 --:--:--  0:02:00 --:--:--     0* transfer closed with outstanding read data remaining
100 1024k    0 1024k    0     0   8722      0 --:--:--  0:02:00 --:--:--     0
* Closing connection 0
curl: (18) transfer closed with outstanding read data remaining

I believe this is related to chunked transfer encoding. curl is waiting for the "terminating chunk" (0/r/n/r/n) so it knows the response is done.

If you remove the -o mexico-1mb.car, the request ends immediately and the error message changes to * Failed reading the chunked-encoded stream, which confirms it's related to chunked transfer encoding.

I know that nginx automatically handles chunked transfer encoding. What I don't know is how that interacts with modules that write the response body. Does nginx add the encoding before this module, after this module, or not at all if a module "claims" the response.

The expected behavior is for the curl response to end cleanly with no error. If you remove &bytes=0:1048575 from the request, it works fine because the module isn't involved.


Handling intentional missing terminating chunks.

If lassie is unable to download the full CAR, it will signal a streaming error by not sending the terminating chunk.

The L1 does the same.

Unfortunately, every http hop handler needs to explicitly forward this error signaling, I think the module may have to as well. But i'm not sure, this is something we need to investigate. What does the module see when the L1 closes uncleanly?


Action items

  • Investigate how nginx's automatic chunked transfer encoding behavior interacts with the module. Does the module need to implement chunked transfer encoding or can it hand the response to nginx's chunked transfer logic instead?
  • Determine if the module needs to explicitly forward unclean ends initiated by lassie/L1
  • Improve L1 integration test. Checking for http 200 is not enough. This is something I can do as it's not rust related.
    • Check the CAR file for specific cids based on the byte range request.
    • Check that the curl output does not contain chunked transfer encoding errors. Either transfer closed with outstanding read data remaining or Failed reading the chunked-encoded stream.
@guanzo
Copy link
Contributor Author

guanzo commented May 31, 2023

Reproduction in golang

package main

import (
	"fmt"
	"io"
	"net/http"
	"os"
	_"time"
)

func main() {
	CID := "bafybeifpz6onienrgwvb3mw5rg7piq5jh63ystjn7s5wk6ttezy2gy5xwu/Mexico.JPG"
	urlStr := fmt.Sprintf("http://localhost/ipfs/%s?format=car&dag-scope=entity&bytes=0:1048575", CID)
	fmt.Println(urlStr)

	req, err := http.NewRequest("GET", urlStr, nil)
	if err != nil {
		panic(err)
	}
	req.Header.Set("Accept", "application/vnd.ipld.car")

	client := http.Client{
		//Timeout: time.Second * 5, // Assume file is cached.
	}
	res, err := client.Do(req)
	if err != nil {
		panic(err)
	}
	defer res.Body.Close()

	fmt.Println(res.StatusCode)

	file, err := os.Create("mexico.car")
	if err != nil {
		fmt.Println("Error:", err)
		return
	}
	defer file.Close()

	_, err = io.Copy(file, res.Body)
	if err != nil {
		fmt.Println("Error:", err)
		return
	}
}

The script will timeout after 2 minutes with this output.

guanzo@DESKTOP-24333KG:~/workspace/filecoin-saturn/L1-replay-go$ go run cmd/test.go
http://localhost/ipfs/bafybeifpz6onienrgwvb3mw5rg7piq5jh63ystjn7s5wk6ttezy2gy5xwu/Mexico.JPG?format=car&dag-scope=entity&bytes=0:1048575
200
Error: unexpected EOF

guanzo@DESKTOP-24333KG:~/workspace/filecoin-saturn/L1-replay-go$ car ls mexico.car
bafybeifpz6onienrgwvb3mw5rg7piq5jh63ystjn7s5wk6ttezy2gy5xwu
bafybeigl2zyz4v2xqccop7wnkmdisytl7pswvjfqw3h3gnh4n4dgpxfnfu
bafkreihifi6v4or6ym4piecrabb7fojd3dlcx6yvrwnokez46fqp3kg67u

I can still car ls the file as its a valid CAR, the issue is the connection not closing cleanly.

This explains the high volume of "Unexpected EOF" errors that caboose is reporting. Since the connection just hangs until a timeout, it may have other negative impacts across rhea.

@willscott
Copy link
Contributor

when i run the curl command above i just get a timeout

@guanzo
Copy link
Contributor Author

guanzo commented May 31, 2023

Output from just now

guanzo@DESKTOP-24333KG:~$ curl "http://138.199.41.51/ipfs/bafybeifpz6onienrgwvb3mw5rg7piq5jh63ystjn7s5wk6ttezy2gy5xwu/Mexico.JPG?format=car&dag-scope=entity&bytes=0:1048575" -k -v -o mexico-1mb.car -H "Accept: application/vnd.ipld.car"
*   Trying 138.199.41.51:80...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to 138.199.41.51 (138.199.41.51) port 80 (#0)
> GET /ipfs/bafybeifpz6onienrgwvb3mw5rg7piq5jh63ystjn7s5wk6ttezy2gy5xwu/Mexico.JPG?format=car&dag-scope=entity&bytes=0:1048575 HTTP/1.1
> Host: 138.199.41.51
> User-Agent: curl/7.81.0
> Accept: application/vnd.ipld.car
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx
< Date: Wed, 31 May 2023 18:45:37 GMT
< Content-Type: application/vnd.ipld.car; version=1
< Transfer-Encoding: chunked
< Connection: keep-alive
< cache-control: public, max-age=29030400, immutable
< content-disposition: attachment; filename="bafybeifpz6onienrgwvb3mw5rg7piq5jh63ystjn7s5wk6ttezy2gy5xwu.car"
< etag: bafybeifpz6onienrgwvb3mw5rg7piq5jh63ystjn7s5wk6ttezy2gy5xwu.car
< Server-Timing: fetch;dur=0.062903,indexer;dur=0.000207;candidates-found=65907108;candidates-filtered=65974010,retrieval;dur=66.034397;connected=66047731;first-byte-received=831192398
< Server-Timing: shim; dur=871.48291
< Server-Timing: shim_lassie; dur=871.0812589999999
< Server-Timing: shim_lassie_headers; dur=870.569542
< Server-Timing: shim_lassie_body; dur=1.506149
< x-ipfs-path: /ipfs/bafybeifpz6onienrgwvb3mw5rg7piq5jh63ystjn7s5wk6ttezy2gy5xwu/Mexico.JPG
< x-content-type-options: nosniff
< Saturn-Node-Id: a6a3592c-7f1a-41f5-9293-11d6b5647e5e
< Saturn-Node-Version: 911_2015bfd
< Saturn-Transfer-Id: 47d4c50a421e0d2ff97b915da0354999
< Saturn-Cache-Status: HIT
< Server-Timing: nginx;dur=0, nginx_uct;dur=, nginx_uht;dur=, nginx_urt;dur=
< Timing-Allow-Origin: *
< Strict-Transport-Security: max-age=63072000; includeSubDomains; preload
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Methods: GET, POST, OPTIONS
< Access-Control-Allow-Headers: DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range
< Access-Control-Expose-Headers: *
<
{ [1460 bytes data]
100 1024k    0 1024k    0     0   113k      0 --:--:--  0:00:09 --:--:--     0

It downloads 1MB (1024k) then hangs for 2 minutes.

Should be reproducible on a local L1.

@guanzo
Copy link
Contributor Author

guanzo commented Jun 5, 2023

I think something is wrong with my machine, I can't repro on another machine.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants