Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve demo/mnist dataProvider speed #713

Merged
merged 2 commits into from
Dec 16, 2016

Conversation

wangyang59
Copy link

In response to issue #688 , change the dataProvider in MNIST demo to speed up the training.

@wangyang59
Copy link
Author

Hi @reyoung
Following your suggestions, I have changed the dataProvider in demo/mnist to speed up the training, by using cache and numpy loading.
The initial data loading time is reduced, however I didn't observe obvious speed improvement during training. I was wondering if you could help provide some insights.

@backyes
Copy link
Contributor

backyes commented Dec 3, 2016

@wangyang59

With GPU enabled, with VGG model, the dataprovider was not bottleneck.

I1203 18:05:06.540743 15120 Stat.cpp:133] Stat=forwardBackward                TID=15120  total=21833.5    avg=218.334    max=431.113    min=142.575    count=100
I1203 18:05:06.540753 15120 Stat.cpp:133] Stat=PyDP2.getNextBatchInternal     TID=15243  total=54543.9    avg=534.744    max=54353      min=1.759      count=102
I1203 18:05:06.540802 15120 Stat.cpp:133] Stat=TrainBatch                     TID=15120  total=22172.9    avg=221.728    max=432.781    min=145.168    count=100

The time of TrainBatch almost be equal to that of forwardBackward, and the that of PyDP2.getNextBatchInternal is hidden by buffer. So use cache with RAM does not work for this model.

Following stat are from the second 100Batches

1203 18:08:25.162082 15120 Stat.cpp:133] Stat=PyDP2.getNextBatchInternal     TID=15243  total=204.234    avg=2.042      max=4.782      min=1.876      count=100
I1203 18:08:25.162092 15120 Stat.cpp:133] Stat=getNextBatchInternal           TID=15243  total=204.434    avg=2.044      max=4.783      min=1.879      count=100
I1203 18:08:25.162101 15120 Stat.cpp:133] Stat=controller_dequeue             TID=15120  total=13016      avg=2.829      max=33.726     min=0          count=4600

For Cpu enabled, the data provider will not be the bottleneck, since the CPU is slower than GPU for vgg model, so the CPU will be busy for handling forwardbackward.

Maybe you should reduce the model complexity to improve some speed if necessary.

@backyes
Copy link
Contributor

backyes commented Dec 3, 2016

@wangyang59

But I also agree with @reyoung , the numpy could be better for handling input data instead of handwritten code. :-)

@qingqing01
Copy link
Contributor

qingqing01 commented Dec 5, 2016

The small_vgg model used in this demo is more time consuming than the config in Caffe. BatchNorm is used after each convolution layer in this model. I wonder that the time may be consuming at BatchNorm layer with trainer_count = 10. I also suggest using a simple model, such as LeNet.

@wangyang59
Copy link
Author

@backyes
Thanks a lot for providing the profiling information of the demo. I was just wondering how you got those statistics?

Copy link
Collaborator

@reyoung reyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but maybe should be updated before merged.

Because currently Paddle use pre-commit to automatically format python code.

See #874

@wangyang59
Copy link
Author

@reyoung I have used pre-commit to format the code. Please have a look and let me know if there are other problems. Thanks~

@reyoung reyoung merged commit 2be7ec9 into PaddlePaddle:develop Dec 16, 2016
zhhsplendid pushed a commit to zhhsplendid/Paddle that referenced this pull request Sep 25, 2019
…addlePaddle#713)

* update catalog name to match item name

* update catalog name to match item name in windows source compilation
lizexu123 pushed a commit to lizexu123/Paddle that referenced this pull request Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants