Skipping empty rows #159

binh-vu · 2019-07-03T00:33:47Z

What version of the `csv` crate are you using?

1.1.1

Briefly describe the question, bug or feature request.

The library skips the empty rows even when flexible is set to be true.

The default behavior in python csv library, Excel, Number is keeping empty rows, so I think we should follow that (the users may have hard time to find the different between the data they see in Excel and the result from the Rust code).

Include a complete program demonstrating a problem.

use csv; // 1.1.1

fn main() {
    let s = "year,make\n\n1948,Porsche\n1967,Ford";

    let reader = csv::ReaderBuilder::new()
        .has_headers(false)
        .flexible(true)
        .from_reader(s.as_bytes());
        
    for (i, record) in reader.into_records().enumerate() {
        println!("row: {} - record: {:?}", i, record.unwrap());
    }
}

What is the observed behavior of the code above?

row: 0 - record: StringRecord(["year", "make"])
row: 1 - record: StringRecord(["1948", "Porsche"])
row: 2 - record: StringRecord(["1967", "Ford"])

What is the expected or desired behavior of the code above?

row: 0 - record: StringRecord(["year", "make"])
row: 1 - record: StringRecord([])
row: 2 - record: StringRecord(["1948", "Porsche"])
row: 3 - record: StringRecord(["1967", "Ford"])

Here is the python code for the same csv:

import io, csv
s = "year,make\n\n1948,Porsche\n1967,Ford"
print(list(csv.reader(io.StringIO(s))))
# output >>> [['year', 'make'], [], ['1948', 'Porsche'], ['1967', 'Ford']]

The text was updated successfully, but these errors were encountered:

jaredforth · 2021-05-27T15:32:01Z

Has this issue still not been addressed?

BurntSushi · 2021-05-27T15:37:42Z

@jaredforth Correct.

Re-reading this issue now, I am just not convinced that this change should be made. At the very least, this would be a breaking change, and I'm not convinced of the utility of including empty rows. The only reason given is for consistency with Python's csv reader, but I don't think that's a good enough reason on its own.

JamesMc86 · 2022-10-18T07:37:13Z

To add another voice to this - this has also just caught me out. I'm running in flexible mode and an empty row has meaning to the structure of the file I'm trying to parse.

I have a workaround in this case but wanted to add another voice to this feature

BurntSushi · 2022-10-18T10:50:59Z

Folks could help their case here by providing a more detailed motivation for why they need this feature. Give some examples. Explain why there is no other way or easy workaround.

gblock0 · 2022-10-18T11:59:59Z

We are working on a POC with this, so it's not blocking anything at the moment, but I figured I'd add our use case.

We use json-schema to define TSV/CSV files and then would like to use this library to help us validate that the TSV/CSV file adheres to the schema. To give a more concrete example, we allow some fields to be required and some to be nullable. The nullable can be empty string, "N/A", or really anything we want. We'd like to parse the whole file and know if there is an empty line because if there are required columns that don't exist or the nullable column definition is not adhered to, then it would fail the validation.

Would it be possible to add this as an optional param like "skip_empty_lines" and defaults to true so it wouldn't be a breaking change?

BurntSushi · 2022-10-18T12:05:12Z

@gblock0 I see. So it sounds like the idea here is that you want to do some kind of verification step that possibly precludes the existence of empty lines in a CSV file, but since this parser completely skips them and makes them inaccessible to you, it's not possible to (easily) check whether they exist.

Would it be possible to add this as an optional param like "skip_empty_lines" and defaults to true so it wouldn't be a breaking change?

Yes, if I were to add this, then it would have to come in under an option because I won't release a csv 2.0 just for this.

FWIW, I don't see this happening any time soon. IIRC, this behavior is baked in all the way down in csv-core's finite state machine. I am not convinced that this is really something worth supporting, but if someone did come up with a simple patch for this (and is an opt-in feature), then I think I'd be willing to bring it in.

gblock0 · 2022-10-19T02:50:11Z

@BurntSushi yep exactly. Totally understand that it's not priority, and if I was more familiar with Rust I'd take a swing at it.

Thanks for hearing me out and all the work you do!

jccalvojackson · 2022-12-02T09:59:42Z

tried to do the first problem of AOC but as it skips the empty lines, csv is not useful here 🤷‍♂️

BurntSushi · 2022-12-02T11:27:39Z

I only skimmed the problem, but I don't see any CSV data there. It's just line delimited. There's only a single column.

jccalvojackson · 2022-12-02T12:53:25Z

yeah, it is not csv in the strict sense, just a file. I'm using csv as a generic file reader whenever the file has a column structure (this being the simplest with just one column). So granted, not a use case that really justifies this change.

Yarn · 2023-03-24T14:13:18Z

To add another specific use case, I'm using xsv fixlengths and xsv table to view a rather interesting csv, essentially multiple csvs and some other random info separated by blank lines. Losing the blank lines makes it substantially harder to distinguish between different sections quickly.

amansawhney · 2023-08-17T19:03:16Z

I have a use case for this feature. I am reading in a CSV with one column that is deserialized using serde. The column is an optional usize. If the rows are skipped all the NONE vals are skipped and hence the data is parsed incorrectly.

amansawhney · 2023-08-17T19:16:07Z

I have a use case for this feature. I am reading in a CSV with one column that is deserialized using serde. The column is an optional usize. If the rows are skipped all the NONE vals are skipped and hence the data is parsed incorrectly.

I was actually able to fix this by wrapping every value in my csv with quotes when I serialized the data earlier in the pipeline. I still think this feature should be added since not everyone has the flexibility of being able to change their data pipelines.

dgendill · 2024-06-20T22:08:55Z

This feature would be very useful to me as well. Same as Yarn's comment, I'm working with a CSV that basically has multiple tables embedded inside of it. Here is a very minimal example or a real world CSV that some financial companies export.

Account Number,Investment Name,Symbol,Shares,Share Price,Total Value,
1234567,NVIDIA Corp,NVDA,1,130.7,130.7,



Account Number,Trade Date,Settlement Date,Transaction Type,Transaction Description,Investment Name,Symbol,Shares,Share Price,Principal Amount,Commissions and Fees,Net Amount,Accrued Interest,Account Type,
1234567,2023-03-27,2023-03-27,Buy,Buy,Nvidia Corp,NVDA,0.00000,1.0,0.30,0.0,0.30,0.0,CASH,

Being able to see the trailing empty lines would signal that I'm approaching a different table of data.

I was able to test Yarn's PR and it seems to work the way I expect. Looks like a non-breaking change that only adds a new option to the builder. I tested by changing my Cargo.toml file to point to Yarn's PR branch...

[dependencies]
csv = { git = "https://github.com/Yarn/rust-csv.git" }

And then calling skip_blank_lines(false) on the builder.

use std::{error::Error, process};

fn example() -> Result<(), Box<dyn Error>> {
    let file = std::fs::File::open("documents/example2.csv")?;
    let reader = std::io::BufReader::new(file);

    let mut rdr = csv::ReaderBuilder::new()
        .has_headers(false)
        .flexible(true)
        .skip_blank_lines(false)
        .trim(csv::Trim::None)
        .from_reader(reader);    

    for result in rdr.records() {        
        let record = result?;
        println!("{:?}", record);
    }
    Ok(())
}

fn main() {
    if let Err(err) = example() {
        println!("error running example: {}", err);
        process::exit(1);
    }
}

binh-vu changed the title ~~Skip empty row~~ The library skips empty rows Jul 3, 2019

binh-vu changed the title ~~The library skips empty rows~~ Skipping empty rows Jul 3, 2019

Yarn mentioned this issue Mar 24, 2023

add skip_blank_lines option #308

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skipping empty rows #159

Skipping empty rows #159

binh-vu commented Jul 3, 2019 •

edited

Loading

jaredforth commented May 27, 2021

BurntSushi commented May 27, 2021

JamesMc86 commented Oct 18, 2022

BurntSushi commented Oct 18, 2022

gblock0 commented Oct 18, 2022

BurntSushi commented Oct 18, 2022

gblock0 commented Oct 19, 2022

jccalvojackson commented Dec 2, 2022

BurntSushi commented Dec 2, 2022

jccalvojackson commented Dec 2, 2022

Yarn commented Mar 24, 2023

amansawhney commented Aug 17, 2023

amansawhney commented Aug 17, 2023

dgendill commented Jun 20, 2024 •

edited

Loading

Skipping empty rows #159

Skipping empty rows #159

Comments

binh-vu commented Jul 3, 2019 • edited Loading

What version of the csv crate are you using?

Briefly describe the question, bug or feature request.

Include a complete program demonstrating a problem.

What is the observed behavior of the code above?

What is the expected or desired behavior of the code above?

jaredforth commented May 27, 2021

BurntSushi commented May 27, 2021

JamesMc86 commented Oct 18, 2022

BurntSushi commented Oct 18, 2022

gblock0 commented Oct 18, 2022

BurntSushi commented Oct 18, 2022

gblock0 commented Oct 19, 2022

jccalvojackson commented Dec 2, 2022

BurntSushi commented Dec 2, 2022

jccalvojackson commented Dec 2, 2022

Yarn commented Mar 24, 2023

amansawhney commented Aug 17, 2023

amansawhney commented Aug 17, 2023

dgendill commented Jun 20, 2024 • edited Loading

binh-vu commented Jul 3, 2019 •

edited

Loading

What version of the `csv` crate are you using?

dgendill commented Jun 20, 2024 •

edited

Loading