Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Archiver.stream(ArchiveStream) support #51

Open
netwolfuk opened this issue Oct 15, 2016 · 8 comments
Open

Add Archiver.stream(ArchiveStream) support #51

netwolfuk opened this issue Oct 15, 2016 · 8 comments
Labels

Comments

@netwolfuk
Copy link

Debian archive files (.deb) are an "ar" file containing a set of tar.gz files.

I'd love to be able to pass the ArchiveStream from the "ar" into a new Archiver.stream() call to extract a file from the embedded tar.gz.

I'm not familiar enough with Streams to know if that would work. Archiver.stream() appears to take a File only.

Thoughts?

@thrau thrau added the feature label Oct 15, 2016
@thrau
Copy link
Owner

thrau commented Oct 15, 2016

i've never thought much about nesting archive streams, it's certainly not possible in any way with the current API.

but it sounds like a fun thing to implement. on first glance i think this will involve some wrapping mechanism of the stream in CommonsArchiveEntry. it also means extending ArchiveEntry. i have some time on my hands tomorrow, i'll have a look at it.

@thrau
Copy link
Owner

thrau commented Oct 16, 2016

actually, this should work

Archiver arArchiver = ArchiverFactory.createArchiver("ar");
Archiver tarGzArchiver = ArchiverFactory.createArchiver("tar", "gz");

ArchiveStream stream = arArchiver.stream(new File("/home/thomas/bar.ar"));

ArchiveEntry entry;
while ((entry = stream.getNextEntry()) != null) {
    if (entry.getName().endsWith(".tar.gz")) {
        tarGzArchiver.extract(stream, new File("/tmp/")); // will extract the contents of the nested archive
        // will close the stream! see #52
    }
}

stream.close();

the problem in the current version is that Archiver.extract(InputStream, File) will close the input stream. therefore any subsequent calls to the stream will throw an IOException. you can extract one file. which is pretty stupid i admit.

i'll fix #52 and deploy a snapshot

@thrau
Copy link
Owner

thrau commented Oct 16, 2016

you can use 0.8.0-SNAPSHOT to try it

@netwolfuk
Copy link
Author

Wow, thanks for the speedy response. I've been thinking about this over the weekend realised now that my request was very poorly worded.
Inside the second stream, I am streaming out the file to a ByteArrayOutputStream, which means I may not even need to go near the filesystem. I know I did say "extract" but I am really streaming it.

My current code looks like this:

    @Rule
    public TemporaryFolder folder = new TemporaryFolder();
    @Test
    public void getControlFileAsStringTest() throws IOException {
        File controlTarGz = getControlTarGzFromDeb(new File("src/test/resources/build-essential_11.6ubuntu6_amd64.deb"),
                folder.getRoot());
        String controlFileContents = getControlFromControlTarGz(controlTarGz);
        System.out.println(controlFileContents);
    }


    public File getControlTarGzFromDeb(File debFile, File tmpLocation) throws IOException {

        Archiver archiver = ArchiverFactory.createArchiver(ArchiveFormat.AR);
        ArchiveStream stream = archiver.stream(debFile);
        ArchiveEntry entry;

        File controlTarGzFile = null;

        while((entry = stream.getNextEntry()) != null) {
            // access each archive entry individually using the stream
            // or extract it using entry.extract(destination)
            // or fetch meta-data using entry.getName(), entry.isDirectory(), ...
            System.out.println(entry.getName());
            if (entry.getName().equals("control.tar.gz")){
                controlTarGzFile = entry.extract(tmpLocation);
            }
        }
        stream.close();

        return controlTarGzFile;
    }

    public String getControlFromControlTarGz(File controlTarGzFile) throws IOException {
        Archiver archivertgz = ArchiverFactory.createArchiver(ArchiveFormat.TAR, CompressionType.GZIP);
        ArchiveStream stream = archivertgz.stream(controlTarGzFile);
        ArchiveEntry entry;
        ByteArrayOutputStream baos= new ByteArrayOutputStream();

        while((entry = stream.getNextEntry()) != null) {
            if (entry.getName().equals("./control")){
                IOUtils.copy(stream, baos);
            }
        }

        return baos.toString( StandardCharsets.UTF_8.toString() );
    }

Ultimately, it would be really nice to stream from the entry. Something like this:

    public String getControlStringFromArFile(File arFile) throws IOException {
        Archiver archiverAr = ArchiverFactory.createArchiver(ArchiveFormat.AR);
        Archiver archivertgz = ArchiverFactory.createArchiver(ArchiveFormat.TAR, CompressionType.GZIP);
        ArchiveStream stream = archiverAr.stream(arFile);
        ArchiveEntry entry, entry2;
        ByteArrayOutputStream baos= new ByteArrayOutputStream();

        while((entry = stream.getNextEntry()) != null) {
            // The ar contains a tgz file named control.tar.gz
            if (entry.getName().equals("control.tar.gz")) {
                ArchiveStream stream2 = archivertgz.stream(entry);
                while((entry2 = stream2.getNextEntry()) != null) {
                    //The control.tar.gz contains a text file named control 
                    if (entry2.getName().equals("./control")){
                        IOUtils.copy(stream2, baos);
                    }
                }
            }
        }

        return baos.toString( StandardCharsets.UTF_8.toString() );
    }

In the mean time, I'll test out your changes.

@thrau
Copy link
Owner

thrau commented Oct 16, 2016

the point of jarchivelib is to make it convenient to handle archives as File objects. for what you are trying to achieve, i would suggest using commons-compress directly 1, as they already have an excellent archive/compression stream API, which jarchivelib only makes use of.

@netwolfuk
Copy link
Author

Thanks Thomas for taking the time to respond. I will look into that. It's just that your API is much nicer to deal with ;-) Sorry to have wasted your time. I do appreciate your help thus far.

@abarsov
Copy link

abarsov commented Sep 20, 2017

The thing that is completely missing in commons-compress is restoring unix file mode while extracting archive from stream.
ZipFile resolve file mode from archive and in this library a good job was done for restoring file mode with help of FileModeMapper class.
At the same time restoring unix file mode from ZipArchiveInputStream won't be possible, because that simply skip entire Central Directory Record.
This way it seems own re-implementation of ZipArchiveInputStream is required.
Being implemented that would be really great feature, I haven't managed to find any library that could extract from stream and restore unix file mode at the same time.
(The main idea here is to iterate through entities and then after all entities finished, to parse additional files information from Central Directory Record. That information might be used for restoring file mode even after all files were extracted already)

@thrau
Copy link
Owner

thrau commented Sep 21, 2017

thanks for the input!
I've been mulling over file permissions for a while, and they're tricky because a) not all archive formats support them properly, which makes it hard to generalize. b) java's support for portable file permissions isn't very good, and will require resorting to hacks like the FileModeMapper.
i have some ideas for a major release, but the API will be completely new, and i don't plan to fiddle with them in 0.x.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants