Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary reader fails to parse a sequence of NOPs followed by a Binary Version Marker (BVM) #891

Closed
sassaville opened this issue Jan 3, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@sassaville
Copy link

I wrote the following program to parse a binary Ion file containing multiple values:

use std::env;
use std::fs::File;
use ion_rs::result::IonResult;
use ion_rs::element::reader::ElementReader;
use ion_rs::ReaderBuilder;

fn main() -> IonResult<()> {
    let file_name = env::args().nth(1).unwrap_or("example.ion".to_string());
    let ion_file = File::open(file_name).unwrap();
    let mut reader = ReaderBuilder::default().build(ion_file)?;
    for element in reader.elements() {
        match element {
            Ok(value) => println!("value: {}", value),
            Err(error) => {
                println!("error: {}", error);
                break;
            }
        }
    }

    Ok(())
}

However, when given a particular input file the program always failed after parsing the first struct:

% cargo run bad_example.ion
...
value: {'object-id': 0}
error: found a non-value in value position

I eventually figured out that the Ion rust binary parser was having trouble because the input file followed a peculiar, but valid, Ion binary format. The input file was generated by a C program that serialized each Ion struct into a buffer initially filled will NULL (\0) bytes, and then appended the entire buffer, including trailing NULL bytes, to the output file meaning that the file ended up with the following Ion format:

BVM
Struct (symbol table)
Struct (data)
NOP
NOP
...
NOP
BVM
Struct (symbol table)
Struct (data)
NOP
NOP
...

If I stripped NOPs from the input file, then the rust program had no trouble parsing it:

dev-dsk-saville-2c-0ecfb2c2 % cargo run good_example.ion
...
value: {'object-id': 0}
value: {'object-id': 1}
value: {'object-id': 2}
value: {'object-id': 3}
value: {'object-id': 4}
value: {'object-id': 5}
value: {'object-id': 6}
value: {'object-id': 7}
value: {'object-id': 8}
value: {'object-id': 9}

While the original input file with NOPs is weird, and inefficient, it's still valid Ion that other parsers have no trouble with. For example, the Python library for Amazon Ion parses the file with NOPs just fine.

I believe the problem is in the interaction between read_next_item and read_sequence_item in raw_binary_reader.rs where read_next_item forwards to read_sequence_item by default, which calls consume_nop_padding to consume NOPs, but then expects type_descriptor to point at a value when it actually ends up pointing at a BVM. I think NOPs at the top level probably need to be consumed in read_next_item where the code can handle finding a BVM after consuming NOPs.

I've attached ion_rust_nop_example.zip containing:

  1. main.rs -- rust program used for parsing
  2. bad_example.ion -- ion binary file containing NOPs that the program fails to fully parse
  3. good_example.ion -- ion binary file with NOPs removed that the program parsed successfully
@sassaville sassaville added the bug Something isn't working label Jan 3, 2025
@zslayton
Copy link
Contributor

zslayton commented Jan 9, 2025

Related: amazon-ion/ion-tests#86

@zslayton
Copy link
Contributor

Per offline conversation, the reported issue affects v0.18.1 (the last non-RC version of ion-rust published), but works properly in recent releases (verified in at least rc.10 and later).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants