Slicing: compare number of processors correctly #76
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello,
we updated our version of
CTF
and were having some issues with regardto the performance of the slicing in our code.
After some sniffing around we found this line in the slice method.
In version
1.4.1
theif
statement had the<
operator in it.Some time after that this was changed into
<=
which according to myunderstanding renders the
else
codeblock useless, i.e.This means that when the number of processors where
tsr_B
isdistributed among is equal to
tsr_A
, there is also a checking of thedimensions and padding for
A
, and this means thatCTF
has to readthe data from
A
,which makes this block slower.
If this is true, this pull request would be a fix to the problem. We have
certainly tested it and it confirmed our suspicion. For slices of big tensors
the difference between the
<
and the<=
version is up to50%
in time,according to our benchmarks. However, for small tensors it appears to be
roughly equivalent, which makes sense.
Thank you very much for your great project!