CSV Helper should stop parsing on line break when LineBreakInQuotedFieldIsBadData is set? #2152
Replies: 2 comments
-
I apologize, I meant to give a little context on one of our use cases on why we looked into using these settings. In the solution we are building we process very large CSV files, in the GBs, and one hundred million rows is our current cap, and even that is desired to be raised. So when an end user provides a file that has a starting quote but not an ending one, the buffer keeps growing on that single column until we hit Out of Memory. This is because the suspect line is early in the file and no new quotes are seen for a long span within the file. This is what lead us to find the |
Beta Was this translation helpful? Give feedback.
-
I believe this is fixed in this pull request. #2155 |
Beta Was this translation helpful? Give feedback.
-
When CSV Helper is parsing a record that contains quotes it follows the spec and allows it to span multiple lines within the file, but I believe that when you set
LineBreakInQuotedFieldIsBadData = true
this should not be the case as we are explicitly stating we do not allow line breaks in columns. Without setting this property its valid for it to continue reading the file looking for the closing tag, but since we have instructed that line breaks are not permitted I would assume it would stop reading when it finds one if it has not found a closing. Essentially this setting is allowing us to loosen the spec contract to match our "business logic" needs. The issue here is that we are "losing" valid lines of data..net Fiddle showing the usage
Related (I think): #1341
Beta Was this translation helpful? Give feedback.
All reactions