-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiline Cells do not have normalized line endings #140
Comments
Your suggestions sound very good to me. Perhaps this could be extended to all property types that implement ExcelMapper/ExcelMapper/ExcelMapper.cs Lines 1373 to 1378 in d133b2b
Would you like to take a crack at it? BTW the newline behavior is the same in other parser libraries, e.g. AngleSharp or Linq to XML: var xml = "<x>12\r\n34</x>";
var e = XElement.Parse(xml);
var s = e.Value; // -> "12\n34"; |
I glad to hear it. Yes, I would like a crack at it. But as you can tell from the crawling pace on the other issue over at XmlSchemaClassGenerator, I do not have much time. This will probably change for the next two weeks, though. Regarding the XML parser behavior, I take no issue with something that just parses XML (without even deserializing into .NET objects) leaving the strings in the state in which the XML processor left them as per the XML spec. The fact that some parsers use .NET strings at all instead of, say a UTF-8 |
No worries, we don't have any deadlines 😄 It's an interesting topic which I hadn't given much thought in this context (git's autocrlf options have been driving me mad every once in a while, though 🙄). Regarding serializers, both XmlSerializer and JsonSerializer leave the newline characters alone, i.e. they pass through what's in the original XML or JSON. This is probably the best approach as it's the only way to allow roundtripping. Unfortunately, it's out of the question here due to NPOI's behavior. |
Hi all,
when mapping multiline cells to a
string
property, the string does not have the usual line endings of a .NET string in the runtime environment, but will always have LF line endings. This is because NPOI returns it that way. I assume this stems from NPOI's XML processor replacing all line endings in the XLSX's internal XML files with LF on reading (as it should be according to the XML Spec, see https://www.w3.org/TR/REC-xml/#sec-line-ends).(Interestingly enough, multiline cells from the old XLS format will also be returned by NPOI with LF line endings. Not sure why that is, but hey, at least it is consistent across formats.)
This leads to .NET users of ExcelMapper who want to get the lines of a multiline cell having to be aware of this implementation detail of XLSX. They cannot just split the string with
Environment.NewLine
as they would with pretty much every other string they encounter, but have to split with\n
. Even worse, people would not even notice on Linux, but get subtle bugs on Windows.I already tried to address this issue at NPOI, but they dismissed it. To my great surprise their goal apparently is not to abstract away how the XLSX format works under the hood, but still require users to know intricate details such as this.
Maybe ExcelMapper is the right layer of abstraction to get rid of this pesky subtlety.
If it is, I suggest adding a default mapping mechanism to a
string[]
property that will map multiline string cells.I also suggest introducing the ability to have line endings normalized to
Environment.NewLine
when a multiline cell is mapped to astring
property. As this is a breaking change, it could be enabled by configuration.To make this even more general (and probably simpler to code), there could be a default mapping to an array of any type by parsing the individual lines of a multiline string cell. This would also work well with custom value parsers.
What do you think?
The text was updated successfully, but these errors were encountered: