-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix AnnexScalaInstance jar cache misses due to ordering and path comparison problems #64
Conversation
* the path. This is analogous to Bazel's File.short_path function. Fair warning: this function is super good enough, | ||
* but is likely not perfect. | ||
*/ | ||
def bazelShortPath(path: Path, replaceExternal: Boolean = true): Path = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put this in a function because I use this code in the path mapping PR for the deps checker and want to be able to reuse it in that PR
val mapBuilder = Map.newBuilder[Path, Path] | ||
// We want to compare short paths to avoid the Bazel sandbox prefix and arch/config specific parts of the path | ||
// We use a tree map because we want the keys sorted for comparison purposes | ||
val mapBuilder = TreeMap.newBuilder[Path, Path] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just do:
val key = allJars
.map { jar =>
FileUtil.bazelShortPath(
absoluteWorkDir.relativize(jar.toPath().toAbsolutePath().normalize()),
replaceExternal = false,
)
}
.toSet
?
Then, we wouldn't have to worry about sorting or order. Also, why do we construct a :
-separated string instead of using a Set
as the key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use the map later on in this file as I need to copy the jar from its original location to its destination location.
I use the string for the keys because classpaths typically are jars concatenated with :
.
I have to loop through the keys once anyway to make the string. I have to loop through the keys once to make the set.
Benefit of the set is that I don't have to insert things into a treemap, so that should be faster.
Overall I think using a map and the set as the cache key will be faster and safer. I'll make some updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even better I can make the key while making the map provided set has a builder. That would be a nice micro optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, that would be a good optimization. Even if constructing a Set
turns out to be slightly slower because we can't reuse the result of map
ing through, I think the safety benefits outweigh the performance detriments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up using an unsorted map and a set. I use the set for the key and that seems to work great. I also tried using the .values
Iterable as the key from the sorted map, but that resulted in cache misses. Didn't bother to dig in and find out why. I figure creating a set and an unsorted map is probably roughly the same cost as creating a sorted map.
*/ | ||
def bazelShortPath(path: Path, replaceExternal: Boolean = true): Path = { | ||
val nameCount = path.getNameCount() | ||
val pathString = path.toAbsolutePath().normalize().toString() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ever used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. It's a remnant of another approach. I'll remove.
} | ||
|
||
// Handle difference between Bazel's external directory being referred to as .. in the short_path | ||
if (replaceExternal && shortPath.startsWith("external")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more correct way of doing this might be:
val shortPathNameCount = shortPath.getNameCount()
if (shortPathNameCount > 0 && shortPath.subpath(0, 1) == Paths.get("external")) {
Paths.get("..", shortPath.subpath(1, shortPathNameCount))
} else {
shortPath
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon first reading I like this.
I think shortPathNameCount > 0 && shortPath.subpath(0, 1) == Paths.get("external")
is equivalent to shortPath.startsWith("external")
, so I don't think I'll change that.
I'll play around with the Paths.get approach to avoid the string operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a technicality, but shortPath.startsWith("external")
would match something like externalfoo/...
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, yes. I realized that when working on this yesterday and ended up using your suggestion.
8c4b8ff
to
eabce1e
Compare
…arison problems The jar cache in AnnexScalaInstance is a ConcurrentHashMap that uses a string representation of the compiler classpath (a list of jars) as its key. Problem is that list of jars which is used to create the key is not sorted, so you would sometimes get cache misses for the same classpath because the list had a different ordering. At a large enough scale, this would cause the JVM to run out of CodeHeap and crash. Presumably this was because we'd create up to n! AnnexScalaInstances for every AnnexScalaInstance and then classload for every single one of those instances. This change prevents those cache misses and thus prevents the worker from crashing. This should also fix cache misses with path mapping by comparing the short path, so we don't include the bazel arch/config specific parts of the path.
eabce1e
to
5b052ec
Compare
The jar cache in AnnexScalaInstance is a ConcurrentHashMap that uses a string representation of the compiler classpath (a list of jars) as its key.
Problem is that list of jars which is used to create the key is not sorted, so you would sometimes get cache misses for the same classpath because the list had a different ordering.
At a large enough scale, this would cause the JVM to run out of CodeHeap and crash. Presumably this was because we'd create up to n! AnnexScalaInstances for every AnnexScalaInstance and then classload for every single one of those instances.
This change prevents those cache misses and thus prevents the worker from crashing.
This should also fix cache misses with path mapping by comparing the short path, so we don't include the bazel arch/config specific parts of the path.