Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

more details about train? #9

Open
piaoger opened this issue Nov 1, 2017 · 9 comments
Open

more details about train? #9

piaoger opened this issue Nov 1, 2017 · 9 comments

Comments

@piaoger
Copy link

piaoger commented Nov 1, 2017

Nice to see your rust impl.
I'd like to use it for 2X factor upscale. Can you provide more details about how to train the model? Or do you have any sample dataset or model for quick setup?

@millardjn
Copy link
Owner

Hi,

I used a few different datasets, but imagenet is the main one. To train you just need a reasonably varied set of images, even only 100 photos.

Unfortunately sometime recently rust-llvm got worse at slp auto-vectorization, so the currently released matrix multiplication is suddenly much slower. I've got a fix that relies on loop auto-vectorization instead but haven't released it.

I'll get back to you this weekend once I've fixed the performance issues.

@piaoger
Copy link
Author

piaoger commented Nov 3, 2017

Thanks for your response.
When I testing model training for 2x factor, I provided "TRAINING_FOLDER"/"PARAMETER_FILE" params, and also changed the "FACTOR" into 2, it seems nothing happened on my macbook pro.
Besides the performance issue, am I missed something?

@millardjn
Copy link
Owner

I've updated the dependancies, so you might want to rebase.

Changing FACTOR should be all that is required, although it means that the built-in res/*.rsr files have to be removed or replaced as they are only for FACTOR=3. This is what my training command looks like:
cargo run --release -- train -r -v D:/ML/set14/original D:/test.rsr D:/ML/Imagenet

For performance the right env flags are required:
RUSTFLAGS=-C target-cpu=native
MATMULFLAGS=arch_sandybridge or if your cpu is new enough you can use arch_haswell

If you are on nightly you can also use prefetch and ftz_daz:
MATMULFLAGS=arch_sandybridge, ftz_daz, prefetch
see matrixmultiply_mt.

@millardjn
Copy link
Owner

Regarding nothing happening, that's worrying.
I've had one other report of that happening on OSX: #3. Unfortunately I can't test on OSX.

Could you tell me what gets printed to stdout when running an upscale task and when running train?

Upscaling would normally print:Upscaling using imagenet neural net parameters... Writing file... Done
Training would normally print:Loading paths for D:/ML/Imagenet ... and then training error for each batch.

When first running train it can take a while to start if there are a lot of files in the training_folder.

If you have time could you clone matrixmultiply_mt and run:
cargo test
and if you have nightly:
cargo bench
to check if it completes fine? It's possible there is a deadlock there.

This should help pinpoint whats happening, Thanks.

@piaoger
Copy link
Author

piaoger commented Nov 4, 2017

I updated deps with "cargo update" and tried again. Nothing is still happened, CPU usage is closed to 0.0%. So I cloned matrixmultiply_mt and run "cargo test" and "cargo bench".

  • cargo test is find and all tests are passed quickly.
  • cargo bench seems something wrong.
    I started below bench test and below is the result when I returned almost one hour later.
running 61 tests
test mat_mul_f32::m0004            ... bench:         584 ns/iter (+/- 139)
test mat_mul_f32::m0005            ... bench:       1,023 ns/iter (+/- 229)
test mat_mul_f32::m0006            ... bench:       1,048 ns/iter (+/- 224)
test mat_mul_f32::m0007            ... bench:       1,114 ns/iter (+/- 122)
test mat_mul_f32::m0008            ... bench:       1,133 ns/iter (+/- 539)
test mat_mul_f32::m0009            ... bench:       2,041 ns/iter (+/- 698)
test mat_mul_f32::m0012            ... bench:       2,273 ns/iter (+/- 827)
test mat_mul_f32::m0016            ... bench:       3,801 ns/iter (+/- 1,836)
test mat_mul_f32::m0032            ... bench:      18,220 ns/iter (+/- 8,100)
test mat_mul_f32::m0064            ... bench:      85,042 ns/iter (+/- 8,352)
test mat_mul_f32::m0127            ... bench:     234,191 ns/iter (+/- 17,730)
test mat_mul_f32::m0256            ... bench:   1,323,189 ns/iter (+/- 99,209)
test mat_mul_f32::m0512            ... bench:  10,057,554 ns/iter (+/- 4,495,104)
test mat_mul_f32::mix128x10000x128 ... 

Also, I found the cpu usage is really low( 0.0% ) from mat_mul_f32::m0064 in my further benches.

@piaoger
Copy link
Author

piaoger commented Nov 4, 2017

I also run "cargo test" and ""cargo bench" on my ubuntu machine, both of them are working fine.
It seems that it's only a Mac OSX issue.

Blow is bechmark:

running 61 tests
test mat_mul_f32::m0004            ... bench:         667 ns/iter (+/- 61)
test mat_mul_f32::m0005            ... bench:       1,247 ns/iter (+/- 162)
test mat_mul_f32::m0006            ... bench:       1,246 ns/iter (+/- 111)
test mat_mul_f32::m0007            ... bench:       1,345 ns/iter (+/- 160)
test mat_mul_f32::m0008            ... bench:       1,463 ns/iter (+/- 193)
test mat_mul_f32::m0009            ... bench:       2,449 ns/iter (+/- 308)
test mat_mul_f32::m0012            ... bench:       2,662 ns/iter (+/- 356)
test mat_mul_f32::m0016            ... bench:       4,690 ns/iter (+/- 623)
test mat_mul_f32::m0032            ... bench:      22,270 ns/iter (+/- 2,925)
test mat_mul_f32::m0064            ... bench:     232,781 ns/iter (+/- 49,175)
test mat_mul_f32::m0127            ... bench:     467,294 ns/iter (+/- 113,538)
test mat_mul_f32::m0256            ... bench:   1,326,382 ns/iter (+/- 425,530)
test mat_mul_f32::m0512            ... bench:   5,773,158 ns/iter (+/- 1,125,111)
test mat_mul_f32::mix128x10000x128 ... bench:   5,171,025 ns/iter (+/- 363,278)
test mat_mul_f32::mix16x4          ... bench:       3,555 ns/iter (+/- 277)
test mat_mul_f32::mix32x2          ... bench:       3,522 ns/iter (+/- 273)
test mat_mul_f32::mix97            ... bench:     345,694 ns/iter (+/- 47,583)
test mat_mul_f32::skew1024x01      ... bench:     140,752 ns/iter (+/- 24,186)
test mat_mul_f32::skew1024x02      ... bench:     154,346 ns/iter (+/- 30,854)
test mat_mul_f32::skew1024x03      ... bench:     148,338 ns/iter (+/- 19,113)
test mat_mul_f32::skew1024x04      ... bench:     150,207 ns/iter (+/- 29,046)

I will go back to rusty-sr on linux:)

@millardjn
Copy link
Owner

Thank you very much for doing this testing, its good to know where the problem is!

I've updated matrixmultiply_mt to use the parking_lot crate which supplies alternative Condvar and Mutex. Apparently they are implemented differently and are more broadly compatible, hopefully it helps.

If it still doesn't work on OSX then turning off multithreading might be an option in the short term:
MATMULFLAGS= ... , no_multithreading

I'll ask around and see if anyone knows if OSX has open issues. The only one I know of has been fixed jemalloc/jemalloc#895.

@piaoger
Copy link
Author

piaoger commented Nov 4, 2017

Where is the update about matrixmultiply_mt? The latest update in matrixmultiply_mt is 17hours old.

For rusty_sr itself, I still have question about how to use:

  1. For trying to train a new 2x factor, I downloaded test/train dataset from project SRCNN and its dataset for test. Does it make sense? And can you provide more information about PARAMETR argment, what it for?
    homepage: http://mmlab.ie.cuhk.edu.hk/projects/SRCNN.html
    VALIDATION FOLDER: SRCNN/Test/Test14
    TRAINING_FOLDER: SRCNN/Training
  2. My commandline
    cargo run --release -- train -v ../SRCNN/Test/Test14 ./test.rsr ./SRCNN/Training
  3. Change FACTOR from 3 to 2.

I have started a new 2x factor training with above arguments a couple of hours again and it's running :) Please let me know if I am wrong so that I can start a new training with right arguments.

@millardjn
Copy link
Owner

millardjn commented Nov 5, 2017

Crates.io "Last Updated" is wrong, not sure why. Version 0.1.4 is the new one.

That training setup looks correct :).

The PARAMETER_FILE argument is where the weights learned by the neural network get saved (every 100 steps, when the validation PSNR gets printed). You can then use them when upscaling later using -c or --custom:
./rusty_sr --custom ./test.rsr ./input.png ./output.png

If you want your new parameters/weights to be used by default you'll have to put test.rsr in the /res folder and then change a few parts of the code. If you train on a few different datasets you can include them all in /res and reprogram how the -p -parameters arguments work on lines:

  • main.rs:26-28
  • main.rs:54
  • main.rs:144-152

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants