Skip to content

Commit

Permalink
Fix checksum verification (#56)
Browse files Browse the repository at this point in the history
To investigate the intermittent checksum errors, I added some code to print out stats on the downloaded package index file when it encounters a checksum error. What stood out was that, whenever the checksum failed, the byte size written would always be slightly off from the byte size expected. E.g.;

```
release_url: http://ports.ubuntu.com/ubuntu-ports/dists/noble-security/InRelease
url: http://ports.ubuntu.com/ubuntu-ports/dists/noble-security/universe/binary-arm64/by-hash/SHA256/184473504b3e97aec550d1bb687b0871e8113f60ab7e8ff136f60f4ecdca9b20
uncompressed size: 1461506
compressed size:   365359
expected size:     365376
```

These errors also often go away when the automation is re-run. This would indicate that there are bytes not being consumed during the `async_copy(reader, writer)` process`. After several different attempts at forcing the remaining bytes to write, there are two changes that seemed to provide stability:

- Removing the `AsyncBufWriter`

    Since we need an `AsyncBufReader` in front of the `GzipDecoder`, we already are buffering the reads into large chunks before sending them to the writer so why not just write directly to the file instead of passing it through a buffered writer?

- Setting `reader.multiple_members(true)`

    You can gzip compress multiple files into a single stream and many gzip readers decode these automatically. Here though, the gzip content is not comprised of multiple files so why would setting this flag cause all the bytes to write? I suspect that, due to how the read process works, when the end of the stream is reached, an extra read is performed to check if there are more entries in the stream and that somehow triggers a flush.

Surprisingly, trying to explicitly flush or shutdown the `AsyncBufWriter` did not provide the expected stability for correcting these checksum issues. I've modified the code to explicitly call `flush()` on the write stream anyways.

I'm still not 100% convinced the instability is completely fixed but, so far, these changes seems reasonable thing to try out and observe for failure.
  • Loading branch information
colincasey authored Oct 10, 2024
1 parent 4e6757a commit ac78be9
Showing 1 changed file with 18 additions and 7 deletions.
25 changes: 18 additions & 7 deletions src/create_package_index.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,9 @@ use sequoia_openpgp::Cert;
use serde::{Deserialize, Serialize};
use sha2::{Digest, Sha256};
use tokio::fs::{read_to_string as async_read_to_string, write as async_write, File as AsyncFile};
use tokio::io::{copy as async_copy, BufReader as AsyncBufReader, BufWriter as AsyncBufWriter};
use tokio::io::{
copy as async_copy, AsyncWriteExt, BufReader as AsyncBufReader, BufWriter as AsyncBufWriter,
};
use tokio::sync::oneshot::channel;
use tokio::sync::oneshot::error::RecvError;
use tokio::task::{JoinError, JoinSet};
Expand Down Expand Up @@ -342,12 +344,14 @@ async fn get_package_list(
),
));

let mut writer = AsyncFile::create(&package_index_path)
.await
.map_err(|e| {
CreatePackageIndexError::WritePackagesLayer(package_index_path.clone(), e)
})
.map(AsyncBufWriter::new)?;
// Enable support for multistream gz files. In this mode, the reader expects the input to
// be a sequence of individually gzipped data streams, each with its own header and trailer,
// ending at EOF. This is standard behavior for gzip readers.
reader.multiple_members(true);

let mut writer = AsyncFile::create(&package_index_path).await.map_err(|e| {
CreatePackageIndexError::WritePackagesLayer(package_index_path.clone(), e)
})?;

async_copy(&mut reader, &mut writer).await.map_err(|e| {
CreatePackageIndexError::WritePackageIndexFromResponse(
Expand All @@ -356,6 +360,13 @@ async fn get_package_list(
)
})?;

writer.flush().await.map_err(|e| {
CreatePackageIndexError::WritePackageIndexFromResponse(
package_index_path.clone(),
e,
)
})?;

let calculated_hash = format!("{:x}", hasher.finalize());

if hash != calculated_hash {
Expand Down

0 comments on commit ac78be9

Please sign in to comment.