Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for png decoding on linux #1881

Closed
wants to merge 8 commits into from

Conversation

r-zenine
Copy link
Contributor

@r-zenine r-zenine commented Feb 13, 2020

Hi @cpuhrsch

As discussed here, This PR add's c++ operators to decode png.

Thanks

@r-zenine r-zenine requested a review from cpuhrsch February 13, 2020 06:19
@r-zenine r-zenine changed the title Add's support for png decoding on linux WIP: Add's support for png decoding on linux Feb 13, 2020
@r-zenine
Copy link
Contributor Author

It's still a WIP, I have some weird issues with the search paths.

@r-zenine r-zenine force-pushed the read-png-linux branch 5 times, most recently from caeac7e to d102d39 Compare February 13, 2020 14:42
@r-zenine
Copy link
Contributor Author

Hi @cpuhrsch,

There first PR is done. The failures are due to pipelines timing out. I don't know how to fix that (already tried with and without ninja ). What could we do ?

Thanks

@cpuhrsch
Copy link
Contributor

Hello @r-zenine,

Thanks for splitting this out!

Master is currently timing out as well, I imagine it's because our builds are starting to take too long. Could you try increasing the timeout? Especially with submodule libraries that are being built from source we're increasing the build times yet again. You could also build in parallel via the -j or --parallel flags of cmake. While Ninja causes an OOM, cmake might not.

On another note, let's make zlib a submodule as well. This way we're not increasing the number of dependencies. I thinks this is the main repository. Another comment on submodules: Since you can choose the commit it might make sense to pin it to a recent release for now, if you haven't done that already. That way we should run into less risk of a dependency bug.

  • Christian

@codecov-io
Copy link

codecov-io commented Feb 13, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@b6f28ec). Click here to learn what that means.
The diff coverage is 0%.

Impacted file tree graph

@@           Coverage Diff            @@
##             master   #1881   +/-   ##
========================================
  Coverage          ?   0.43%           
========================================
  Files             ?      93           
  Lines             ?    7418           
  Branches          ?    1119           
========================================
  Hits              ?      32           
  Misses            ?    7378           
  Partials          ?       8
Impacted Files Coverage Δ
torchvision/io/image.py 0% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b6f28ec...75f4130. Read the comment docs.

@r-zenine
Copy link
Contributor Author

Hi, @cpuhrsch,

Hope you are doing well!

I have added zlib as a vendored dependency and pinned the versions for both libpng and zlib.

I increased the timeout for both circleci (30 minutes) and travis (1h).

There is one last pipeline that is failing. The unit test for ops in failing on windows. I tried to rebase from master in the hope of getting a fix for that with no success.

Have a good day!

Let me know if there is anything else left to do!

@cpuhrsch
Copy link
Contributor

That Windows build is a mystery to me. Paging Dr. Windows @peterjc123 to take a look if he has some time.

Copy link
Contributor

@cpuhrsch cpuhrsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks fine to me. We'll just need to figure out the Windows error.

@peterjc123
Copy link
Contributor

peterjc123 commented Feb 14, 2020

Well, the test failures are not related to this PR. Looks like sth is breaking at the PyTorch side.

@cpuhrsch
Copy link
Contributor

cpuhrsch commented Feb 14, 2020

Ok, let's wait for the next nightly and try again, since the PyTorch repo is running clear.

@r-zenine r-zenine changed the title WIP: Add's support for png decoding on linux Add's support for png decoding on linux Feb 14, 2020
@r-zenine
Copy link
Contributor Author

r-zenine commented Feb 17, 2020

Hi @cpuhrsch,

Please do not merge this PR. While working on the jpeg one. I figured out that even tough this approach works for now, it is not stricly what we wanted.

Here is what is happening:
We are building the extension code code against vendored versions of libpng and zlib. However, the code is running against the system wide dynamic libraries. It works because libpng and zlib are ubiquitous in linux systems.

As in torch, we would need to ship the .so in package using data_files or package_data arguments in the setup function. My struggle now is that when I do this, the code fails because the dynamic library is not found.

There is an option runtime_library_dir the might help. But there is little documentation about it.
Do you happen to know how it's done in pytorch ?

Thanks

Copy link
Contributor

@cpuhrsch cpuhrsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for pointing this out. Let's working on actually making this link statically.

setup.py Outdated Show resolved Hide resolved
@r-zenine
Copy link
Contributor Author

Hi @cpuhrsch ,

I have pushed a version with libpng and zlib linked dynamically and shipped with the package.
The windows pipeline is failing without any useful error message. I will try to investigate further.

What do you think of the approach ? Do we still try to link everything statically?

have a good day !

@cpuhrsch
Copy link
Contributor

Hello @r-zenine,

Yes, ideally we link everything statically. Shipping dynamic libraries can be a nightmare and in particular building against a library that has dynamic dependencies can lead to lots of conflicts. If we can build something statically it's always preferred and if it seems possible we should try until it works.

Thanks,
Christian

@r-zenine
Copy link
Contributor Author

hi @cpuhrsch ,

It works ! We are linking everything statically !
Thanks for pushing hard for this !

The only thing that remains is the windows error that has nothing to do with the PR !

Have a good day

@r-zenine
Copy link
Contributor Author

Hi @cpuhrsch ,

What are the next steps ? Anything else i have to do ?

Have a good day !

@cpuhrsch
Copy link
Contributor

Great! Thanks again for your work! I'll take a closer look at the Windows build and also a few other things related to this PR setup. This might take a few days / week.

@r-zenine
Copy link
Contributor Author

Hi @cpuhrsch ,

Let me work on MacOs support on the same PR. I'll try to push that on thursday. I will let you handle the windows pipeline and will get back to work on the png support.

Thanks a lot for the help !

@r-zenine
Copy link
Contributor Author

Hi @cpuhrsch ,

I just added macos support in the build. Tests for image are passing everywhere but other pipelines are failing because of other tests. It's probably due to something that happened with pytorch nightly.

Have a good day !!

@r-zenine r-zenine changed the title Add's support for png decoding on linux Add's support for png decoding on linux and macos Feb 20, 2020
@cpuhrsch cpuhrsch requested a review from fmassa February 24, 2020 23:10
@cpuhrsch cpuhrsch changed the title Add's support for png decoding on linux and macos Add support for png decoding on linux and macos Feb 25, 2020
@r-zenine
Copy link
Contributor Author

Hi @cpuhrsch ,

I think the conda pipelines are failing because of a linking issue. There is a weird warning in the build :

WARNING (torchvision,lib/python3.5/site-packages/torchvision/_C.so): did not find - or even know where to look for: /lib64/libm.so.6
WARNING (torchvision,lib/python3.5/site-packages/torchvision/_C.so): did not find - or even know where to look for: /lib64/libc.so.6
WARNING (torchvision,lib/python3.5/site-packages/torchvision/_C.so): did not find - or even know where to look for: $RPATH/libtorch_python.so
WARNING (torchvision,lib/python3.5/site-packages/torchvision/_C.so): did not find - or even know where to look for: /lib64/ld-linux-x86-64.so.2
WARNING (torchvision,lib/python3.5/site-packages/torchvision/_C.so): did not find - or even know where to look for: /lib64/libpthread.so.0
WARNING (torchvision,lib/python3.5/site-packages/torchvision/_C.so): Needed DSO lib/libgcc_s.so.1 found in [u'libgcc-ng']
WARNING (torchvision,lib/python3.5/site-packages/torchvision/_C.so): .. but [u'libgcc-ng'] not in reqs/run, (i.e. it is overlinking) (likely) or a missing dependency (less likely)
WARNING (torchvision,lib/python3.5/site-packages/torchvision/_C.so): did not find - or even know where to look for: $RPATH/libpng16.so.16
WARNING (torchvision,lib/python3.5/site-packages/torchvision/_C.so): did not find - or even know where to look for: $RPATH/libtorch_cpu.so
   INFO (torchvision,lib/python3.5/site-packages/torchvision/_C.so): Needed DSO lib/libcudart.so.10.0 found in defaults::cudatoolkit-10.0.130-0
WARNING (torchvision,lib/python3.5/site-packages/torchvision/_C.so): did not find - or even know where to look for: $RPATH/libc10_cuda.so
WARNING (torchvision,lib/python3.5/site-packages/torchvision/_C.so): Needed DSO lib/libstdc++.so.6 found in [u'libstdcxx-ng']
WARNING (torchvision,lib/python3.5/site-packages/torchvision/_C.so): .. but [u'libstdcxx-ng'] not in reqs/run, (i.e. it is overlinking) (likely) or a missing dependency (less likely)
WARNING (torchvision,lib/python3.5/site-packages/torchvision/_C.so): did not find - or even know where to look for: $RPATH/libtorch.so
WARNING (torchvision,lib/python3.5/site-packages/torchvision/_C.so): did not find - or even know where to look for: $RPATH/libtorch_cuda.so
WARNING (torchvision,lib/python3.5/site-packages/torchvision/_C.so): did not find - or even know where to look for: $RPATH/libc10.so
WARNING (torchvision): dso library package defaults::python-3.5.6-hc3d631a_0 in requirements/run but it is not used (i.e. it is overdepending or perhaps statically linked? If that is what you want then add it to `build/ignore_run_exports`)

Do you have any idea where it is coming from ? Do we have that on master, or is it specific to this PR ?

Thanks

@cpuhrsch
Copy link
Contributor

Hello @r-zenine,

We'll need to wait on @fmassa for some insight here. He's currently working on a hi-pri deadline so this will take a bit longer than expected.

Is this working locally for you? You could also try installing this within a docker container (if your environment allows you to set that up) to get local feedback on the linking behavior etc.

In particular, if you take a look at the CI jobs you can see that they pull in the "pytorch/conda-cuda" container. This could be worthwhile looking into so you can get a faster turnaround time on experiments.

Thanks,
Christian

@r-zenine
Copy link
Contributor Author

r-zenine commented Mar 4, 2020

Hi @fmassa , @cpuhrsch

Here is a status of what's going on, I tried to statically link the libraries (it works with wheel not conda and I could not figure out a fix with conda despite dedicating many hours to it). I also tried dynamically linking against libpng and shipping the library with the package ( like what pytorch does for it's dependencies). It's seems to be working on most pipeline. There is a bug with the linking in macos conda that I still need to figure out.

If we go that way, it might make updating the windows pipelines easier.

Thoughts ?

@r-zenine r-zenine changed the title Add support for png decoding on linux and macos Add support for png decoding on linux Mar 8, 2020
r-zenine added 8 commits March 8, 2020 15:40
- Pipelines are timing out, increasing to 30 minutes.
- Vendors zlib and add's libpng as a dependency.
- Ship libpng.so and zlib.so with package on linux
- Installs cmake on macos
- Fix conda not found. in macos_wheel
- Update $PATH on macOs_wheel for cmake
- activate tests for macos
@andfoy
Copy link
Contributor

andfoy commented Jun 15, 2020

@r-zenine, thanks for your contribution! We've migrated your commits to #2301, so we will close this one

@andfoy andfoy closed this Jun 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants