Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix scalar tensor.mean #8970

Merged
merged 18 commits into from
Aug 23, 2022
Merged

fix scalar tensor.mean #8970

merged 18 commits into from
Aug 23, 2022

Conversation

Flowingsun007
Copy link
Contributor

for (int32_t i = 0; i < naxis; i++) {
reduce_axis[i] = JUST(maybe_wrap_dim(axis[i], ndim));
axis_num[reduce_axis[i]]++;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bug的原因在这一行,当x.mean(-1)里的tensor x是scalar tensor时,则参数axis:{1}; ndim=0,reduce_axis为空vector,reduce_axis[i]会引发segment fault。

@github-actions
Copy link
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

@@ -32,22 +32,21 @@ bool IsInplaceValid(const std::shared_ptr<Tensor>& x) {
Maybe<std::vector<int32_t>> CheckAxis(const std::vector<int32_t>& axis, const int32_t& ndim) {
const int32_t naxis = axis.size();

int32_t reduce_ndim = naxis;
if (naxis == 0 || ndim == 0) reduce_ndim = ndim;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里能解释下吗?没反应过来

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

哦,这里其实就是把两个判断合到一起了,如果naxis == 0,则reduce_ndim设置为ndim;如果ndim == 0(scalar tensor的情况),则reduce_ndim == 0(也==ndim)

const int32_t axis_i = JUST(maybe_wrap_dim(axis[i], ndim));
axis_num[axis_i]++;
CHECK_OR_RETURN(axis_num[axis_i] < 2) << Error::RuntimeError() << "dim " << axis_i
<< " appears multiple times in the list of dims";
Copy link
Contributor

@marigoold marigoold Aug 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里 else 分支似乎对返回值没有影响?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里应该是写错了(漏了),我再看一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

@Flowingsun007 Flowingsun007 requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 22, 2022 09:58
@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8970/

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 128.8ms (= 12876.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 144.5ms (= 14453.6ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.12 (= 144.5ms / 128.8ms)

OneFlow resnet50 time: 75.8ms (= 7580.4ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.3ms (= 8533.2ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.13 (= 85.3ms / 75.8ms)

OneFlow resnet50 time: 49.4ms (= 9870.6ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 54.2ms (= 10846.4ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.10 (= 54.2ms / 49.4ms)

OneFlow resnet50 time: 36.8ms (= 7357.1ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 44.8ms (= 8960.6ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.22 (= 44.8ms / 36.8ms)

OneFlow resnet50 time: 28.4ms (= 5673.8ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 36.1ms (= 7219.6ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.27 (= 36.1ms / 28.4ms)

OneFlow swin dataloader time: 0.412s (= 82.425s / 200, num_workers=1)
PyTorch swin dataloader time: 0.149s (= 29.862s / 200, num_workers=1)
Relative speed: 0.362 (= 0.149s / 0.412s)

OneFlow swin dataloader time: 0.117s (= 23.477s / 200, num_workers=4)
PyTorch swin dataloader time: 0.040s (= 7.914s / 200, num_workers=4)
Relative speed: 0.337 (= 0.040s / 0.117s)

OneFlow swin dataloader time: 0.060s (= 12.043s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.405s / 200, num_workers=8)
Relative speed: 0.366 (= 0.022s / 0.060s)

❌ OneFlow resnet50 time: 137.0ms (= 13703.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.9ms (= 16388.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 163.9ms / 137.0ms)

OneFlow resnet50 time: 85.4ms (= 8541.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.1ms (= 11210.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.31 (= 112.1ms / 85.4ms)

OneFlow resnet50 time: 59.0ms (= 11808.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.9ms (= 15582.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 77.9ms / 59.0ms)

OneFlow resnet50 time: 45.5ms (= 9093.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.0ms (= 14200.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.56 (= 71.0ms / 45.5ms)

OneFlow resnet50 time: 38.8ms (= 7763.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.9ms (= 13783.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.78 (= 68.9ms / 38.8ms)

@@ -35,23 +35,23 @@ bool IsScalarTensor(const std::shared_ptr<Tensor>& x) {

Maybe<std::vector<int32_t>> CheckAxis(const std::vector<int32_t>& axis, const int32_t& ndim) {
const int32_t naxis = axis.size();

int32_t reduce_ndim = naxis;
if (naxis == 0 || ndim == 0) reduce_ndim = ndim;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没有加大括号

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if后只有一句表达式,感觉不加也行😂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我们代码的标准是要加的

Comment on lines 49 to 50
CHECK_OR_RETURN_ERROR(axis_num[axis_i] < 2) << Error::RuntimeError() << "dim " << axis_i
<< " appears multiple times in the list of dims";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一段复用

static inline Maybe<std::bitset<dim_bitset_size>> dim_list_to_bitset(
const std::vector<int32_t>& dims, int64_t ndims) {
CHECK_LE_OR_RETURN(ndims, (int64_t)dim_bitset_size)
<< Error::RuntimeError() << "Only tensors with up to " << dim_bitset_size
<< " dims are supported";
std::bitset<dim_bitset_size> seen;
for (int32_t i = 0; i < dims.size(); i++) {
size_t dim = JUST(maybe_wrap_dim(dims[i], ndims));
CHECK_OR_RETURN(!seen[dim]) << Error::RuntimeError() << "The dim " << dim
<< " appears multiple times in the list of dims";
seen[dim] = true;
}
return seen;
}
这个函数吧,有 wrap_dim 和 axis_num 的检察,另外这里判断 axis_num[axis_i] < 2 也不太好

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8970/

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

❌ OneFlow resnet50 time: 129.5ms (= 12954.4ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.1ms (= 14314.7ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 143.1ms / 129.5ms)

OneFlow resnet50 time: 76.4ms (= 7637.9ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 89.6ms (= 8963.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.17 (= 89.6ms / 76.4ms)

OneFlow resnet50 time: 48.7ms (= 9736.6ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 59.6ms (= 11911.8ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.22 (= 59.6ms / 48.7ms)

OneFlow resnet50 time: 35.6ms (= 7123.4ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 47.0ms (= 9401.0ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.32 (= 47.0ms / 35.6ms)

OneFlow resnet50 time: 29.3ms (= 5865.9ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 37.5ms (= 7497.2ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.28 (= 37.5ms / 29.3ms)

OneFlow swin dataloader time: 0.258s (= 51.662s / 200, num_workers=1)
PyTorch swin dataloader time: 0.151s (= 30.166s / 200, num_workers=1)
Relative speed: 0.584 (= 0.151s / 0.258s)

OneFlow swin dataloader time: 0.071s (= 14.164s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.216s / 200, num_workers=4)
Relative speed: 0.580 (= 0.041s / 0.071s)

OneFlow swin dataloader time: 0.039s (= 7.896s / 200, num_workers=8)
PyTorch swin dataloader time: 0.023s (= 4.507s / 200, num_workers=8)
Relative speed: 0.571 (= 0.023s / 0.039s)

❌ OneFlow resnet50 time: 138.9ms (= 13886.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 168.3ms (= 16831.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 168.3ms / 138.9ms)

OneFlow resnet50 time: 87.1ms (= 8714.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.6ms (= 10361.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 103.6ms / 87.1ms)

OneFlow resnet50 time: 58.5ms (= 11707.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.8ms (= 15765.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 78.8ms / 58.5ms)

OneFlow resnet50 time: 45.1ms (= 9027.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.3ms (= 14067.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.56 (= 70.3ms / 45.1ms)

OneFlow resnet50 time: 39.5ms (= 7894.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 66.9ms (= 13389.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.70 (= 66.9ms / 39.5ms)

@Flowingsun007 Flowingsun007 merged commit 6318057 into master Aug 23, 2022
@Flowingsun007 Flowingsun007 deleted the dev_fix_scalar_tensor.mean branch August 23, 2022 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants