Replies: 8 comments
-
I feel like it would be nice if CsvConfiguration were thread-safe for multiple CsvReaders/CsvWriters OR up to one writer, since that feels like a common use-case. It feels intuitive to me that I should be able to prepare a configuration once and never change it again and then use it in a read-only way concurrently across readers. My guess is that the main thing barring this from working is any lazily-constructed datastructures within the configuration (e. g. caches), since these cause the configuration to be mutated internally as it is used by CsvReaders/CsvWriters. One approach could be to identify and section-off all such caches and just make those thread-safe (on a side note, what happens to these caches today if the configuration is further modified after things are cached?). If you decide to go this route, one nice datastructure for this in .NET (if you're dealing with dictionaries) is Another way to achieve this would be to move any lazy-constructed objects out of the configuration and into a separate class that exists per-instance of CsvReader/CsvWriter. The only downside of this is that you don't get caching across reader/writer instances. Does that definition of thread-safety make sense to you? Is it true that lazy-constructed caches or other structures are the root cause of this issue? |
Beta Was this translation helpful? Give feedback.
-
It has been those caches most of the time, if not all of the time. There are potentially other areas too though. Moving the caches to the reader/writer wouldn't be good because you'd then have make changes on both of them, instead of the single spot in configuration. I'm sure there are people making configuration changes in the middle of reading/writing. Think about a web instance where someone makes the config a singleton and many endpoints will read/write a bunch of different files. I think the more appropriate way would be for a user to create a method that generates a config for them. They could then have consistency of the config and each reader/writer would have their own instance. My current though it it's probably best to not allow configs to be shared. The constructor should copy values to its own instance. All the properties on the config for collection types shouldn't have a setter to eliminate the user from setting the same instance in multiple places. Etc. Are there any reasons why someone would want a single instance shared instead of just duplicate values? Why would people need to make a change after reading/writing has begun and have it affect every reader/writer? There could be a reason; I just can't think of that it would be. |
Beta Was this translation helpful? Give feedback.
-
I think the main reason is that it is pretty easy to end up on multiple threads and it is intuitive that you can create an object and then use it across those threads without concern as long as you don't continue to modify it. For example, consider code like this:
Could this code be changed to pass in a If the configuration stores caches internally, then another reason to allow shareable configurations would be performance. If you construct a new configuration for each reader/writer then you don't benefit from that internal caching except within the context of a single file.
I 100% agree that this doesn't feel like a use-case that should be supported. Seems to me like the common pattern would be to construct a configuration upfront and then use it either once or a bunch of times.
By this do you mean that the configuration will be copied by the CsvReader/Writer constructor to avoid re-use? Seems like this would work although you don't get shared caches of course.
It's not what I would personally pick, but I think that if the documentation and examples call this out as being necessary then it would at least help folks figure out the issue on their own. One tricky thing is that the error you hit today with improper use is pretty opaque; I think that many people would not immediately jump to the conclusion that multi-threading was an issue. |
Beta Was this translation helpful? Give feedback.
-
Yes. Basically clone the config and have it's own internal copy. I think the cache properties are the only issue. Adding new values to either of the cache objects (or removing them) happens rarely. Reading from them happens a lot. It looks like reading from a Now the questions is, are there other places that can cause issues? |
Beta Was this translation helpful? Give feedback.
-
Hypothesis: Users who currently re-use config object can re-use the code that creates that config object. I don't see a valid scenario where someone wants side-effects (modify one config instance, use this in another instance). a) Make CsvConfiguration immutable. Remove any code that might change instances. C#9's new b) Disallow re-use (by e.g. getting into CsvReader constructor like I wouldn't go the way to make some mutable CsvConfiguration thread-safe (by locking / using concurrent data types internally). |
Beta Was this translation helpful? Give feedback.
-
I like everything you said here. :) |
Beta Was this translation helpful? Give feedback.
-
I started looking into the span/memory stuff again. I'm going to include this in that. That takes the config and copies values locally to help speed things up, so this would actually work out well. |
Beta Was this translation helpful? Give feedback.
-
All config values that are used are copied locally for performance reasons which has made this not an issue anymore. |
Beta Was this translation helpful? Give feedback.
-
First off, I'm not an expert on threading, and probably not that great with it in general. Feel free to tell me I'm wrong about something, or suggest a better way of doing something.
There have been a few issues recently due to
CsvConfiguration
being used in multiple threads.CsvHelper
should be thread safe if you're not sharing instances across threads. That means any static code will have locking around it. People have been using a singleCsvConfiguration
in multiple instances ofCsvReader
andCsvWriter
across threads. I believe that is where the issues are coming from.The big question is, how do we solve this? We could make
CsvConfiguration
thread safe, but that will slows things down a lot, from what I understand. I'd rather have a user lock around calls they're making instead.CsvHelper
itself is reading and writing here too which is a problem. I previously everything insideCsvConfiguration
instances to avoid threading issues.Here are some options.
CsvConfiguration
thread safe. It's not reallyCsvConfiguration
itself, it's all of the classes it's holding instances of. This could quickly become most classes in the system. I don't like this idea.CsvConfiguration
not being thread safe and that you shouldn't use the same instance in multiple places.Dispose
methods could remove the configuration from the list.CsvConfiguration.Clone()
method to easily allow a user to copy a configuration to use in another place.CsvConfiguration
and setter for the property. There could be aCsvConfiguration.Import
method or something if you want to use the same values across multiple instances.What are some other options? It could be a combination of several of these things.
Beta Was this translation helpful? Give feedback.
All reactions