Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR: Add libpng and libjpeg-turbo requirement into conda recipe #2301

Merged
merged 107 commits into from
Jun 30, 2020

Conversation

andfoy
Copy link
Contributor

@andfoy andfoy commented Jun 8, 2020

Fixes #2291

@andfoy andfoy changed the title PR: Add libpng requirement into conda recipe PR: Add libpng and libjpeg-turbo requirement into conda recipe Jun 8, 2020
@fmassa
Copy link
Member

fmassa commented Jun 9, 2020

Hi @andfoy ,

Could you try rebasing your changes on top of #1881 and #1909 to test how the changes perform when the dependencies on libpng and libjpeg-turbo are enabled?

setup.py Show resolved Hide resolved
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, it's great that all tests pass!

I have a few questions / comments.
In particular, I'm thinking if using the raw libjpeg API would enable us to fix some of the incompatibilities that arises from the other libraries -- IIRC, libjpeg-turbo had the same ABI as libjpeg, so one could switch to use it at runtime.

Also, could you add a note somewhere (maybe in the README?) explaining what are the steps that the user should do to get the image extensions compiled depending on their system?

Comment on lines +25 to +26
# Pillow introduces unwanted conflicts with libjpeg-turbo, as it depends on jpeg
# The fix depends on https://github.com/conda-forge/conda-forge.github.io/issues/673
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain when / how these conflicts materialize?
Could that lead to segfaults when the user imports torchvision and PIL?

Copy link
Contributor Author

@andfoy andfoy Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conflict here is that we're substituting the libjpeg library, which is used primarily by Pillow, however this may not affect us as the ABI between jpegturbo and libjpeg is the same AFAIK.

Most Linux distributions distinguish between libturbojpeg (the library itself) and libjpeg-turbo (the libjpeg version using turbojpeg), thus, they do not have this problem, as we link against libturbojpeg, rather than against libjpeg-turbo.

In conda-forge, we have this conflict because the recipe for libjpeg-turbo produces both the turbo flavored libjpeg and libturbojpeg. I spoke to @isuruf, one of the maintainers in conda-forge, and he told me that they are working towards a solution in conda-forge/conda-forge.github.io#673, but right now there are no alternatives to this conflict.

I guess we are not having any problem with this conflicting installation, as the tests are passing. However, we should encourage users to install PyTorch and torchvision on a separate environment, so we prevent other errors that could be caused as part of this conflict and we're not aware of

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, this makes me a bit worried, because AFAIK we don't have tests running on OSX, only Linux and Windows (we do compile on OSX though).
So if there is a problem happening in OSX we wouldn't be able to see it in CI.

Do you think that, if we were to use libjpeg API, everything would be safer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't they running on the binary_macos_conda_*_pyxx CircleCI pipelines?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, yeah, we were not running tests for OSX with wheels, only conda.

But still, do you think if we were to use libjpeg API it would make things safer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think right now we don't have much trouble on our setup, as we are linking against libturbojpeg and not libjpeg directly. If we were compilling against libjpeg, then we would be in trouble. Right now the only conflict that we have is the one on conda-forge.

A provisional solution would be compilling libturbojpeg (Without libjpeg) ourselves and publish it into the conda pytorch channel until conda-forge/conda-forge.github.io#673 is fixed. What do you think about this?

packaging/torchvision/meta.yaml Outdated Show resolved Hide resolved
@@ -1,3 +1,6 @@
channel_sources:
- defaults,conda-forge
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it now safe to use conda-forge as well? At some point we had issues with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems only libturbojpeg is being pulled from there, so all other packages are being pulled from defaults or the main PyTorch channel

.travis.yml Outdated Show resolved Hide resolved
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @andfoy !

@fmassa fmassa merged commit 766721b into pytorch:master Jun 30, 2020
@andfoy andfoy deleted the add_libpng branch June 30, 2020 17:23
@peterjc123
Copy link
Contributor

peterjc123 commented Jun 30, 2020

Caused failure on master:

_________________________ ImageTester.test_decode_jpeg _________________________
 
RuntimeError: No such operator image::decode_jpeg
 

 
During handling of the above exception, another exception occurred:
 

 
self = <test_image.ImageTester testMethod=test_decode_jpeg>
 

 
    def test_decode_jpeg(self):
 
        for img_path in get_images(IMAGE_ROOT, "jpg"):
 
            img_pil = torch.from_numpy(np.array(Image.open(img_path)))
 
            size = os.path.getsize(img_path)
 
            img_ljpeg = decode_jpeg(torch.from_file(img_path, dtype=torch.uint8, size=size))
 
    
 
            norm = img_ljpeg.shape[0] * img_ljpeg.shape[1] * img_ljpeg.shape[2] * 255
 
            err = torch.abs(img_ljpeg.flatten().float() - img_pil.flatten().float()).sum().float() / (norm)
 
    
 
            self.assertLessEqual(err, 1e-2)
 
    
 
        with self.assertRaisesRegex(ValueError, "Expected a non empty 1-dimensional tensor."):
 
            decode_jpeg(torch.empty((100, 1), dtype=torch.uint8))
 
    
 
        with self.assertRaisesRegex(ValueError, "Expected a torch.uint8 tensor."):
 
            decode_jpeg(torch.empty((100, ), dtype=torch.float16))
 
    
 
        with self.assertRaisesRegex(RuntimeError, "Error while reading jpeg headers"):
 
>           decode_jpeg(torch.empty((100), dtype=torch.uint8))
 
E           AssertionError: "Error while reading jpeg headers" does not match "No such operator image::decode_jpeg"
 

@ezyang
Copy link
Contributor

ezyang commented Jun 30, 2020

This PR has broken the doc push job on PyTorch main repo:


Jun 30 18:53:59                  from /var/lib/jenkins/workspace/vision/torchvision/csrc/cpu/image/readpng_cpu.cpp:12:
Jun 30 18:53:59 /usr/include/pngconf.h:383:12: error: '__pngconf' does not name a type
Jun 30 18:53:59             __pngconf.h__ in libpng already includes setjmp.h;
Jun 30 18:53:59             ^
Jun 30 18:53:59 /usr/include/pngconf.h:384:12: error: '__dont__' does not name a type
Jun 30 18:53:59             __dont__ include it again.;
Jun 30 18:53:59             ^
Jun 30 18:53:59 /var/lib/jenkins/workspace/vision/torchvision/csrc/cpu/image/readpng_cpu.cpp: In function 'at::Tensor decodePNG(const at::Tensor&)':
Jun 30 18:53:59 /var/lib/jenkins/workspace/vision/torchvision/csrc/cpu/image/readpng_cpu.cpp:35:5: error: 'png_const_bytep' does not name a type
Jun 30 18:53:59      png_const_bytep ptr;
Jun 30 18:53:59      ^
Jun 30 18:53:59 /var/lib/jenkins/workspace/vision/torchvision/csrc/cpu/image/readpng_cpu.cpp:37:10: error: 'struct decodePNG(const at::Tensor&)::Reader' has no member named 'ptr'
Jun 30 18:53:59    reader.ptr = png_const_bytep(datap) + 8;
Jun 30 18:53:59           ^
Jun 30 18:53:59 /var/lib/jenkins/workspace/vision/torchvision/csrc/cpu/image/readpng_cpu.cpp:37:37: error: 'png_const_bytep' was not declared in this scope
Jun 30 18:53:59    reader.ptr = png_const_bytep(datap) + 8;
Jun 30 18:53:59                                      ^
Jun 30 18:53:59 /var/lib/jenkins/workspace/vision/torchvision/csrc/cpu/image/readpng_cpu.cpp: In lambda function:
Jun 30 18:53:59 /var/lib/jenkins/workspace/vision/torchvision/csrc/cpu/image/readpng_cpu.cpp:42:27: error: 'struct decodePNG(const at::Tensor&)::Reader' has no member named 'ptr'
Jun 30 18:53:59          std::copy(reader->ptr, reader->ptr + bytes, output);
Jun 30 18:53:59                            ^
Jun 30 18:53:59 /var/lib/jenkins/workspace/vision/torchvision/csrc/cpu/image/readpng_cpu.cpp:42:40: error: 'struct decodePNG(const at::Tensor&)::Reader' has no member named 'ptr'
Jun 30 18:53:59          std::copy(reader->ptr, reader->ptr + bytes, output);
Jun 30 18:53:59                                         ^
Jun 30 18:53:59 /var/lib/jenkins/workspace/vision/torchvision/csrc/cpu/image/readpng_cpu.cpp:43:17: error: 'struct decodePNG(const at::Tensor&)::Reader' has no member named 'ptr'
Jun 30 18:53:59          reader->ptr += bytes;
Jun 30 18:53:59                  ^
Jun 30 18:53:59 error: command 'gcc' failed with exit status 1

I'm going to hotfix it by pinning doc push to an older version of PyTorch, but this will need to get fixed eventually.

@fmassa
Copy link
Member

fmassa commented Jun 30, 2020

@ezyang can you point to the location of the location of where the doc push job is defined? This issue arises because the image might have an older version of libpng. We should guard against this case (I thought we did, but looks like we missed something)

@ezyang
Copy link
Contributor

ezyang commented Jun 30, 2020

pytorch/pytorch@9ac0feb

@fmassa
Copy link
Member

fmassa commented Jun 30, 2020

@andfoy the errors pointed out by @peterjc123 are due to the latest CI changes that were merged just before yours, in #2328

@andfoy
Copy link
Contributor Author

andfoy commented Jun 30, 2020

@fmassa @ezyang , I'll revert this PR, rebase again and open it again with the doc fixes as well

fmassa pushed a commit that referenced this pull request Jul 1, 2020
@andfoy
Copy link
Contributor Author

andfoy commented Jul 1, 2020

This PR has broken the doc push job on PyTorch main repo:

@ezyang, is it possible to get access to the full log of that build?

@andfoy andfoy mentioned this pull request Jul 1, 2020
de-vri-es pushed a commit to fizyr-forks/torchvision that referenced this pull request Aug 4, 2020
…ch#2301)

* Add libpng requirement into conda recipe

* Try to install libjpeg-turbo

* Add PNG reading capabilities

* Remove newline

* Add image extension to compilation instructions

* Include png functions as part of the main library

* Update CMakeLists

* Detect if building on conda-build

* Debug

* More debug messages

* Print globbed libreries

* Print globbed libreries

* Point to correct PNG path

* Remove libJPEG preventively

* Debug extension loading

* Link libpng explicitly

* Link with PNG

* Add PNG reading capabilities

* Add libpng requirement into conda recipe

* Try to install libjpeg-turbo

* Remove newline

* Add image extension to compilation instructions

* Include png functions as part of the main library

* Update CMakeLists

* Detect if building on conda-build

* Debug

* More debug messages

* Print globbed libreries

* Print globbed libreries

* Point to correct PNG path

* Remove libJPEG preventively

* Debug extension loading

* Link libpng explicitly

* Link with PNG

* Install libpng on conda-based wheel distributions

* Add -y flag

* Add -y flag to yum

* Locate LibPNG on windows conda

* Remove empty else

* Copy libpng16.so

* Copy dylib on Mac

* Improve check on Windows

* Try to install ninja using conda on windows

* Use libpng on Windows

* Package lib on windows wheel

* Point library to the correct place

* Include binaries as part of wheel

* Copy libpng.so on linux

* Look for png.h on Windows when using conda-build

* Do not skip png tests on Mac/Win

* Restore libjpeg-turbo

* Install jpeg-turbo on wheel distributions

* Install libjpeg-turbo from conda-forge on wheel distributions

* Do not pull av on conda-build

* Add pillow disclaimer

* Vendors libjpeg-turbo 2.0.4

* Merge JPEG work

* Remove submodules

* Regenerate circle config

* Fix style issues

* Fix C++ style issues

* More style corrections

* Add JPEG-turbo to linking libraries

* More style corrections

* More style corrections

* More style corrections

* Install libjpeg-turbo-devel

* Install libturbo-jpeg on typing pipeline

* Update Circle template

* Windows and Unix turbojpeg have the same linking name

* Install turbojpeg-devel instead of libjpeg-turbo

* Copy TurboJPEG binaries to wheel

* Move test image

* Move back test image

* Update JPEG test path

* Remove dot from extension

* Move image functions to extension

* Use stdout arg in subprocess

* Disable image extension if libpng or turbojpeg are not found

* Append libpng stdout

* Prevent list appending on lists

* Minor path correction

* Minor error correction

* Add linking flags

* Style issues correction

* Address minor review corrections

* Refactor library search

* Restore access index

* Fix JPEG tests

* Update libpng version in Travis

* Add -y flag

* Remove dot

* Update libpng using apt

* Check libpng version

* Change libturbojpeg binary

* Update import

* Change call

* Restore av in conda recipe

* Minor error correction

* Remove unused comment in travis.yml

* Update README

* Fix missing links

* Remove fixes for 16.04

Co-authored-by: Ryad ZENINE <[email protected]>
de-vri-es pushed a commit to fizyr-forks/torchvision that referenced this pull request Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Package libjpeg and libpng within torchvision
6 participants