You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wrote the following program to parse a binary Ion file containing multiple values:
use std::env;
use std::fs::File;
use ion_rs::result::IonResult;
use ion_rs::element::reader::ElementReader;
use ion_rs::ReaderBuilder;
fn main() -> IonResult<()> {
let file_name = env::args().nth(1).unwrap_or("example.ion".to_string());
let ion_file = File::open(file_name).unwrap();
let mut reader = ReaderBuilder::default().build(ion_file)?;
for element in reader.elements() {
match element {
Ok(value) => println!("value: {}", value),
Err(error) => {
println!("error: {}", error);
break;
}
}
}
Ok(())
}
However, when given a particular input file the program always failed after parsing the first struct:
% cargo run bad_example.ion
...
value: {'object-id': 0}
error: found a non-value in value position
I eventually figured out that the Ion rust binary parser was having trouble because the input file followed a peculiar, but valid, Ion binary format. The input file was generated by a C program that serialized each Ion struct into a buffer initially filled will NULL (\0) bytes, and then appended the entire buffer, including trailing NULL bytes, to the output file meaning that the file ended up with the following Ion format:
While the original input file with NOPs is weird, and inefficient, it's still valid Ion that other parsers have no trouble with. For example, the Python library for Amazon Ion parses the file with NOPs just fine.
I believe the problem is in the interaction between read_next_item and read_sequence_item in raw_binary_reader.rs where read_next_item forwards to read_sequence_item by default, which calls consume_nop_padding to consume NOPs, but then expects type_descriptor to point at a value when it actually ends up pointing at a BVM. I think NOPs at the top level probably need to be consumed in read_next_item where the code can handle finding a BVM after consuming NOPs.
Per offline conversation, the reported issue affects v0.18.1 (the last non-RC version of ion-rust published), but works properly in recent releases (verified in at least rc.10 and later).
I wrote the following program to parse a binary Ion file containing multiple values:
However, when given a particular input file the program always failed after parsing the first struct:
I eventually figured out that the Ion rust binary parser was having trouble because the input file followed a peculiar, but valid, Ion binary format. The input file was generated by a C program that serialized each Ion struct into a buffer initially filled will NULL (
\0
) bytes, and then appended the entire buffer, including trailing NULL bytes, to the output file meaning that the file ended up with the following Ion format:If I stripped NOPs from the input file, then the rust program had no trouble parsing it:
While the original input file with NOPs is weird, and inefficient, it's still valid Ion that other parsers have no trouble with. For example, the Python library for Amazon Ion parses the file with NOPs just fine.
I believe the problem is in the interaction between
read_next_item
andread_sequence_item
in raw_binary_reader.rs whereread_next_item
forwards toread_sequence_item
by default, which callsconsume_nop_padding
to consume NOPs, but then expectstype_descriptor
to point at a value when it actually ends up pointing at a BVM. I think NOPs at the top level probably need to be consumed inread_next_item
where the code can handle finding a BVM after consuming NOPs.I've attached ion_rust_nop_example.zip containing:
main.rs
-- rust program used for parsingbad_example.ion
-- ion binary file containing NOPs that the program fails to fully parsegood_example.ion
-- ion binary file with NOPs removed that the program parsed successfullyThe text was updated successfully, but these errors were encountered: