-
-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skipping empty rows #159
Comments
Has this issue still not been addressed? |
@jaredforth Correct. Re-reading this issue now, I am just not convinced that this change should be made. At the very least, this would be a breaking change, and I'm not convinced of the utility of including empty rows. The only reason given is for consistency with Python's csv reader, but I don't think that's a good enough reason on its own. |
To add another voice to this - this has also just caught me out. I'm running in flexible mode and an empty row has meaning to the structure of the file I'm trying to parse. I have a workaround in this case but wanted to add another voice to this feature |
Folks could help their case here by providing a more detailed motivation for why they need this feature. Give some examples. Explain why there is no other way or easy workaround. |
We are working on a POC with this, so it's not blocking anything at the moment, but I figured I'd add our use case. We use json-schema to define TSV/CSV files and then would like to use this library to help us validate that the TSV/CSV file adheres to the schema. To give a more concrete example, we allow some fields to be required and some to be nullable. The nullable can be empty string, "N/A", or really anything we want. We'd like to parse the whole file and know if there is an empty line because if there are required columns that don't exist or the nullable column definition is not adhered to, then it would fail the validation. Would it be possible to add this as an optional param like "skip_empty_lines" and defaults to true so it wouldn't be a breaking change? |
@gblock0 I see. So it sounds like the idea here is that you want to do some kind of verification step that possibly precludes the existence of empty lines in a CSV file, but since this parser completely skips them and makes them inaccessible to you, it's not possible to (easily) check whether they exist.
Yes, if I were to add this, then it would have to come in under an option because I won't release a FWIW, I don't see this happening any time soon. IIRC, this behavior is baked in all the way down in |
@BurntSushi yep exactly. Totally understand that it's not priority, and if I was more familiar with Rust I'd take a swing at it. Thanks for hearing me out and all the work you do! |
tried to do the first problem of AOC but as it skips the empty lines, csv is not useful here 🤷♂️ |
I only skimmed the problem, but I don't see any CSV data there. It's just line delimited. There's only a single column. |
yeah, it is not csv in the strict sense, just a file. I'm using csv as a generic file reader whenever the file has a column structure (this being the simplest with just one column). So granted, not a use case that really justifies this change. |
To add another specific use case, I'm using xsv fixlengths and xsv table to view a rather interesting csv, essentially multiple csvs and some other random info separated by blank lines. Losing the blank lines makes it substantially harder to distinguish between different sections quickly. |
I have a use case for this feature. I am reading in a CSV with one column that is deserialized using serde. The column is an optional usize. If the rows are skipped all the NONE vals are skipped and hence the data is parsed incorrectly. |
I was actually able to fix this by wrapping every value in my csv with quotes when I serialized the data earlier in the pipeline. I still think this feature should be added since not everyone has the flexibility of being able to change their data pipelines. |
This feature would be very useful to me as well. Same as Yarn's comment, I'm working with a CSV that basically has multiple tables embedded inside of it. Here is a very minimal example or a real world CSV that some financial companies export.
Being able to see the trailing empty lines would signal that I'm approaching a different table of data. I was able to test Yarn's PR and it seems to work the way I expect. Looks like a non-breaking change that only adds a new option to the builder. I tested by changing my Cargo.toml file to point to Yarn's PR branch... [dependencies]
csv = { git = "https://github.com/Yarn/rust-csv.git" } And then calling use std::{error::Error, process};
fn example() -> Result<(), Box<dyn Error>> {
let file = std::fs::File::open("documents/example2.csv")?;
let reader = std::io::BufReader::new(file);
let mut rdr = csv::ReaderBuilder::new()
.has_headers(false)
.flexible(true)
.skip_blank_lines(false)
.trim(csv::Trim::None)
.from_reader(reader);
for result in rdr.records() {
let record = result?;
println!("{:?}", record);
}
Ok(())
}
fn main() {
if let Err(err) = example() {
println!("error running example: {}", err);
process::exit(1);
}
} |
What version of the
csv
crate are you using?1.1.1
Briefly describe the question, bug or feature request.
The library skips the empty rows even when flexible is set to be true.
The default behavior in python csv library, Excel, Number is keeping empty rows, so I think we should follow that (the users may have hard time to find the different between the data they see in Excel and the result from the Rust code).
Include a complete program demonstrating a problem.
What is the observed behavior of the code above?
What is the expected or desired behavior of the code above?
Here is the python code for the same csv:
The text was updated successfully, but these errors were encountered: