Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

car extract -f bafkr... (raw root cid) not possible #537

Open
agmap opened this issue Sep 19, 2024 · 8 comments
Open

car extract -f bafkr... (raw root cid) not possible #537

agmap opened this issue Sep 19, 2024 · 8 comments
Labels
good first issue Good issue for new contributors kind/enhancement A net-new feature or improvement to an existing feature

Comments

@agmap
Copy link

agmap commented Sep 19, 2024

I am having some car files fetch with lassie which are raw root CIDs (bafk....)
I can't extract the car files using car extract -f <CID>
Getting error skipping raw root bafkr...

@willscott
Copy link
Member

do you not want car get for this case?
Extract expects to work with unixfs data and in this case of a raw root there's no file name / metadata present to allow extraction.

@agmap
Copy link
Author

agmap commented Sep 19, 2024

No, I want to extract the content into a real file. It is a small image that is not packed into a directory.
And yes, I know the file name and extension are not included.

@willscott
Copy link
Member

but that's what car get is doing - it's getting a specific block from the car, and putting it into a file

@agmap
Copy link
Author

agmap commented Sep 19, 2024

It is a single image that is smaller than 256kb, so it is “bafk”. --> car extract -f bafk....car --> not working
--> this image seems only one single block, because its small.

A image bigger than 256kb beginns with "bafy" --> car extract -f bafy....car --> working

So I am realy wonder?

And in the CLI I cant find car get command.
So how to extract that image. And why I need to use a different command for the same purpose??

@willscott
Copy link
Member

https://github.com/ipld/go-car/blob/master/cmd/car/get.go#L37

car get-block

@agmap
Copy link
Author

agmap commented Sep 19, 2024

OK, that works
car get-block <CID>.car <CID> 1.png

But I also want to use car extract -f <CID>
Otherwise I always have to check first whether it is a small or a larger file and use different commands for the same purpose.

It would realy help car extract -f <CID> works for all files.
Can you plaese also look into my lassie issues I created on github?

@willscott willscott added kind/enhancement A net-new feature or improvement to an existing feature good first issue Good issue for new contributors labels Sep 19, 2024
@rvagg
Copy link
Member

rvagg commented Sep 25, 2024

Reasonable request, the problem is here:

if root.Prefix().Codec == cid.Raw {
if verbose {
fmt.Fprintf(logger, "skipping raw root %s\n", root)
}
return 0, nil
}

The complication is that if you've bundled your file without a containing directory, then all we have is a raw bytes block with no metadata, so we don't know its name and extracting it is a problem. We don't even know that this block is a "file" per se, it's just a block of bytes. Ideally you should be bundling your files wrapped in a directory - if using kubo there's a commandline option for that IIRC. It results in two blocks, but at least the first block contains metadata and it clearly indicates that the block is unixfs and therefore a file.

When we get to the condition I linked above, we could just extract the raw as a file with the CID as its name, and print to stderr that this was done because the root was not wrapped in a unixfs directory. The user needs some kind of warning that an assumption was made.

@agmap
Copy link
Author

agmap commented Sep 25, 2024

In about 69% of all my NFTs, the ipfs file links in the URI of the NFT are not wrapped in a directory. So the file name and file extension are missing.
Please extract the file in my filesystem with the CID as file name and either without file extension or extract the file extension from the file signatures.
When I load the CID with the public gateway ipfs.io and download the CID, it has the correct file extension. The browsers also seem to extract the file extension from the magic numbers / file signatures.
https://en.wikipedia.org/wiki/List_of_file_signatures
https://mark0.net/soft-trid-e.html
https://gist.github.com/qti3e/6341245314bf3513abb080677cd1c93b
https://www.garykessler.net/library/file_sigs.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good issue for new contributors kind/enhancement A net-new feature or improvement to an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants