Builds without TensorFlow #147

stweil · 2020-08-14T20:38:00Z

Pre-build packages of the different version of TF don't exist for every platform (for example not with Python 3.8, not for non-Intel hosts). A typical error message which terminates make all looks like this:

ERROR: No matching distribution found for tensorflow-gpu==1.15.* (from ocrd-keraslm==0.3.2)

I think it would be good to support builds on such platforms, too, by skipping problematic modules, without requiring hacker methods like make all -k. Possible solutions:

make all could detect whether required packages are available and skip modules which need unavailable packages.
make all could support a macro to disable groups of modules (like for example all modules which depend on TF1). It is already possible to define the desired modules, but that requires much knowledge about the different modules.

The first one of these solution could be really user friendly, especially when it also shows an informative message.

The text was updated successfully, but these errors were encountered:

bertsky · 2020-08-14T22:07:37Z

I wouldn't call make all -k a hack. It's exactly what you are asking for: keep building despite being unable to install some modules.

make all could detect whether required packages are available and skip modules which need unavailable packages.

That's just make all -k.

make all could support a macro to disable groups of modules (like for example all modules which depend on TF1). It is already possible to define the desired modules, but that requires much knowledge about the different modules.

Yes, but modules keep changing, and new ones get added. For some extra level of representation, someone would have to aggregate and keep managing all this extra knowledge.

stweil · 2020-08-15T06:15:02Z

I wouldn't call make all -k a hack. It's exactly what you are asking for: keep building despite being unable to install some modules.

make all -k is not the right solution, because it would waste build time for known problems and hide unknown problems.

Yes, but modules keep changing, and new ones get added. For some extra level of representation, someone would have to aggregate and keep managing all this extra knowledge.

We already aggregate the knowledge about modules which depend on TF.

bertsky · 2020-08-15T09:22:49Z

make all -k is not the right solution, because it would waste build time for known problems and hide unknown problems.

Good point.

We already aggregate the knowledge about modules which depend on TF.

True.

kba · 2020-08-15T11:03:39Z

make all could detect whether required packages are available and skip modules which need unavailable packages.

While I disagree with the idea in general (tensorflow should provide packages for Python 3.8 and different architectures! that cannot be so difficult, they have huge corporate backers!) but I see the need for a workaround and a solution that adresses this problem (as you propose) is preferable to brute-forcing with make -k.

stweil · 2020-08-17T14:37:19Z

tensorflow should provide packages for Python 3.8 and different architectures

"We don't add new versions of Python for old releases. If you want to use TF 1.15, you have to use Python 3.7 or lower. If you want to use Python 3.8 you have to use TF 2.2 or newer." (see source). So even TF 2.1 won't be supported with Python 3.7.

Unless all processors work with the latest TF, too, we'll have such problems.

bertsky · 2020-08-17T14:55:08Z

tensorflow should provide packages for Python 3.8 and different architectures

"We don't add new versions of Python for old releases. If you want to use TF 1.15, you have to use Python 3.7 or lower. If you want to use Python 3.8 you have to use TF 2.2 or newer." (see source). So even TF 2.1 won't be supported with Python 3.7.

Unless all processors work with the latest TF, too, we'll have such problems.

Wow. TF just keeps scaring users away on an epic scale. But why hasn't this seen more downvotes? (@stweil thanks for sharing!)

bertsky · 2020-08-17T18:36:46Z

* `make all` could support a macro to disable groups of modules (like for example all modules which depend on TF1). It is already possible to define the desired modules, but that requires much knowledge about the different modules.
make all could detect whether required packages are available and skip modules which need unavailable packages.
While I disagree with the idea in general (tensorflow should provide packages for Python 3.8 and different architectures! that cannot be so difficult, they have huge corporate backers!) but I see the need for a workaround and a solution that adresses this problem (as you propose) is preferable to brute-forcing with make -k.

So, do we

make this automatic and narrow – checking for Python 3.8 only and automatically skipping all TF modules –, or
manual and agnostic – offering a pre-defined (say) NO_TENSORFLOW so users can make all NO_TENSORFLOW=1?

stweil · 2020-08-17T19:41:46Z

In the light of the statement cited above and to cover not only Python 3.8 environments but also non-Intel hosts, I think we need separate checks for each sub-env. Ideally a build with Python 3.8 on Intel compatible hardware would then build processors which uses TF 2.2 or newer, but skip those depending on older TF versions. On ARM, all processors which depend on TF would be skipped unless there is a package source which provides TF packages optimized for the hardware. NVIDIA for example provides TF packages for their Jetson developer boards (of course again not all required versions of TF).

bertsky · 2020-08-17T20:09:15Z

In the light of the statement cited above and to cover not only Python 3.8 environments but also non-Intel hosts, I think we need separate checks for each sub-env. Ideally a build with Python 3.8 on Intel compatible hardware would then build processors which uses TF 2.2 or newer, but skip those depending on older TF versions. On ARM, all processors which depend on TF would be skipped unless there is a package source which provides TF packages optimized for the hardware. NVIDIA for example provides TF packages for their Jetson developer boards (of course again not all required versions of TF).

Ok, this all sounds like 1 – automatic, specific checks that disable modules which have no chance. Checking Python or TF version is simple, but checking for Intel-compatible hardware can be quite intricate (as can be seen in autoconf's host checking)...

stweil · 2020-08-17T20:46:00Z

The test is much simpler: just try to install one of the required Python packages (keras, tensorboard, tensorflow, ...).

stweil · 2020-08-17T20:48:27Z

Side note: I just noticed that Homebrew on macOS updated from Python 3.7 to 3.8 recently.
Normally I'd say :-)
With OCR-D I feel more like :-(

That increases the importance of this issue.

bertsky · 2020-08-18T09:31:50Z

The test is much simpler: just try to install one of the required Python packages (keras, tensorboard, tensorflow, ...).

You mean in the Makefile, bypassing the module's setup.py or Makefile? That would go against encapsulation. We'd have to double check the ocrd_all Makefile every time the module gets updated. Also, this would not actually save any installation time over the simple make all -k approach.

stweil · 2020-08-18T09:41:33Z

Yes, I mean in the Makefile. A sub-env for TensorFlow 1 like the current headless-tf1 can try to install a Python module for TensorFlow 1 without anything going more against encapsulation than we already have to do because of the known package conflicts. Trying to install a single module is much faster than installing lots of other Python modules from the different requirement lists before failing.

bertsky · 2020-08-18T09:57:38Z

A sub-env for TensorFlow 1 like the current headless-tf1 can try to install a Python module for TensorFlow 1 without anything going more against encapsulation than we already have to do because of the known package conflicts.

Agreed.

Trying to install a single module is much faster than installing lots of other Python modules from the different requirement lists before failing.

I don't think that difference will be noticable though. Torch, TF and Keras are the largest among the dependencies usually.

So our recipes would look like this:

$(call multirule,$(OCRD_COR_ASV_ANN)): cor-asv-ann
ifeq (0,$(MAKELEVEL))
	$(MAKE) -B -o $< $(notdir $(OCRD_COR_ASV_ANN))
	$(call delegate_venv,$(OCRD_COR_ASV_ANN))
$(OCRD_COR_ASV_ANN): VIRTUAL_ENV := $(SUB_VENV)/headless-tf1
else
	. $(ACTIVATE_VENV) && $(PIP) install tensorflow==1.15.2 opencv-python-headless
	$(pip_install)
endif
endif

But then we still need to run with make -k. How do you want to back out of the error?

stweil · 2020-08-18T10:02:36Z

our recipes would look like this

No, that would still try installing TF1 several times. The ideal solution would try it once if VIRTUAL_ENV points to headless-tf1, set a macro, and use that macro to enable / disable all parts which depend on TF1.

bertsky · 2020-08-18T10:13:34Z

our recipes would look like this

No, that would still try installing TF1 several times.

No it would not. Remember, we use multirule above, so this will be tried once per module (which can be nearly equated with once per venv).

The ideal solution would try it once if VIRTUAL_ENV points to headless-tf1, set a macro, and use that macro to enable / disable all parts which depend on TF1.

Huh? How do you set a macro dependent on the result of a recipe? Makefiles are executed in 2 passes, variable expansion and dependency tree comes before rule application.

You probably meant introducing another conditional into the rules (with a shell call). But that would not change how often the attempt is made. (It's definitely always tried next time make is invoked.) And you still need to back out of creating the delegating shell script if the sub-make failed.

kba · 2020-08-18T10:17:40Z

No, that would still try installing TF1 several times. The ideal solution would try it once if VIRTUAL_ENV points to headless-tf1, set a macro, and use that macro to enable / disable all parts which depend on TF1.

That sounds reasonable though I cannot quite imagine how that would work but am open to see a proof-of-concept. I would still see the merit in a NO_TENSORFLOW option for testing&development btw.

So our recipes would look like this:

$(call multirule,$(OCRD_COR_ASV_ANN)): cor-asv-ann
ifeq (0,$(MAKELEVEL))
	$(MAKE) -B -o $< $(notdir $(OCRD_COR_ASV_ANN))
	$(call delegate_venv,$(OCRD_COR_ASV_ANN))
$(OCRD_COR_ASV_ANN): VIRTUAL_ENV := $(SUB_VENV)/headless-tf1
else
	. $(ACTIVATE_VENV) && $(PIP) install tensorflow==1.15.2 opencv-python-headless
	$(pip_install)
endif
endif

But then we still need to run with make -k. How do you want to back out of the error?

Indeed, what to do next (honestly asking, finding it hard to grasp the consequences of these changes)?

stweil · 2022-03-24T16:02:02Z

That sounds reasonable though I cannot quite imagine how that would work but am open to see a proof-of-concept. I would still see the merit in a NO_TENSORFLOW option for testing&development btw.

That proof-of-concept is now realized in PR #295.

stweil · 2022-03-24T16:06:18Z

I just read the Dependabot alerts for cor-asv-ann. It lists 218 (!) alerts for tensorflow-gpu with 3 of them marked critical and 60 of them marked high. That's a nightmare. And most alerts are only fixed in tensorflow>2.5.2.

cneud · 2022-03-24T21:54:20Z

Since the situation with modules still requiring tf1 is indeed a nightmare, could we "soft" migrate those modules to tf2 using tf.compat.v1 and thoroughly test that for regressions?

bertsky · 2022-03-24T22:31:44Z

I just read the Dependabot alerts for cor-asv-ann. It lists 218 (!) alerts for tensorflow-gpu with 3 of them marked critical and 60 of them marked high. That's a nightmare. And most alerts are only fixed in tensorflow>2.5.2.

which is why these alerts are useless.

Since the situation with modules still requiring tf1 is indeed a nightmare, could we "soft" migrate those modules to tf2 using tf.compat.v1 and thoroughly test that for regressions?

@cneud, finding regressions would be the hard thing to do in my case. cor-asv-ann / ocrd_keraslm / ocrd_segment do not have regression tests. (And numerical differences could be benign or not.) What's more, I would almost certainly have to repeat parts of the training process – a lot of work.

Also, are you sure it's just that first step of replacing TF1 API with compat.v1? In the migration guide the state:

Run the automated script to convert your TF1.x API usage to tf.compat.v1.

Remove old tf.contrib.layers and replace them with TF Slim symbols. Also check TF Addons for other tf.contrib symbols.

Rewrite your TF1.x model forward passes to run in TF2 with eager execution enabled.

Validate the accuracy and numerical correctness of your migrated code.

Upgrade your training, evaluation and model saving code to TF2 equivalents.

(Optional) Migrate your TF2-compatible tf.compat.v1 APIs including TF Slim usage to idiomatic TF2 APIs.

It's still not entirely clear to me under which circumstances the other steps apply...

cneud · 2022-03-24T22:52:13Z

It's still not entirely clear to me under which circumstances the other steps apply

Indeed, this is mainly what I also wonder about - how can we better assess the effort for migration to tf2? I suppose we have to make a start with migrating to tf2 at some point (also for @qurator-spk), so what could be initial baby steps on the way. The issues with tf1 builds and Python 3.8 will probably also become worse over time. Maybe we can make a common effort between devs of the modules that require tf1, or plan that in for some time in the foreseeable future.

bertsky · 2022-03-24T23:00:32Z

Agreed! So in the end we'll just have to start trying – and back every step of the way up by some tests at least.

BTW, for 3.8 we have a solution already (see #289, which uses nvidia-tensorflow prebuilds). But 3.9+ is certainly never going to be supported by anyone. And it also became apparent recently that more and more dependencies of TF1 now start conflicting with other packages:

ocrd_all/Makefile

Lines 634 to 636 in 63d13e4

    
           # - preempt conflict over numpy between scikit-image and tensorflow 
        
           # - preempt conflict over numpy between tifffile and tensorflow (and allow py36) 
        
           . $(ACTIVATE_VENV) && $(SEMPIP) $(PIP) install imageio==2.14.1 "tifffile<2022"

cneud · 2022-03-24T23:19:07Z

we'll just have to start trying – and back every step of the way up

Exactly, those conflicts and workarounds require discussion, fixes, testing as well and thereby more and more time too. Maybe trying out with most simple means can help weighing the cost of a migration vs this better.

stweil mentioned this issue Aug 17, 2020

Builds with missing TF create unusable "executables" #152

Closed

stweil added the enhancement New feature or request label Aug 17, 2020

This was referenced Aug 20, 2020

include browse-ocrd submodule / executable #156

Open

Parallel build terminates with unexpected message when wget and curl are missing #157

Open

Tensorflow CPU vs GPU #68

Closed

bertsky mentioned this issue Jan 27, 2021

ocrd_all native installation fails due to tensorflow #235

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Builds without TensorFlow #147

Builds without TensorFlow #147

stweil commented Aug 14, 2020 •

edited

Loading

bertsky commented Aug 14, 2020

stweil commented Aug 15, 2020

bertsky commented Aug 15, 2020

kba commented Aug 15, 2020

stweil commented Aug 17, 2020 •

edited

Loading

bertsky commented Aug 17, 2020

bertsky commented Aug 17, 2020

stweil commented Aug 17, 2020

bertsky commented Aug 17, 2020

stweil commented Aug 17, 2020

stweil commented Aug 17, 2020 •

edited

Loading

bertsky commented Aug 18, 2020

stweil commented Aug 18, 2020

bertsky commented Aug 18, 2020

stweil commented Aug 18, 2020 •

edited

Loading

bertsky commented Aug 18, 2020

kba commented Aug 18, 2020

stweil commented Mar 24, 2022

stweil commented Mar 24, 2022

cneud commented Mar 24, 2022

bertsky commented Mar 24, 2022

cneud commented Mar 24, 2022

bertsky commented Mar 24, 2022

cneud commented Mar 24, 2022

Builds without TensorFlow #147

Builds without TensorFlow #147

Comments

stweil commented Aug 14, 2020 • edited Loading

bertsky commented Aug 14, 2020

stweil commented Aug 15, 2020

bertsky commented Aug 15, 2020

kba commented Aug 15, 2020

stweil commented Aug 17, 2020 • edited Loading

bertsky commented Aug 17, 2020

bertsky commented Aug 17, 2020

stweil commented Aug 17, 2020

bertsky commented Aug 17, 2020

stweil commented Aug 17, 2020

stweil commented Aug 17, 2020 • edited Loading

bertsky commented Aug 18, 2020

stweil commented Aug 18, 2020

bertsky commented Aug 18, 2020

stweil commented Aug 18, 2020 • edited Loading

bertsky commented Aug 18, 2020

kba commented Aug 18, 2020

stweil commented Mar 24, 2022

stweil commented Mar 24, 2022

cneud commented Mar 24, 2022

bertsky commented Mar 24, 2022

cneud commented Mar 24, 2022

bertsky commented Mar 24, 2022

cneud commented Mar 24, 2022

stweil commented Aug 14, 2020 •

edited

Loading

stweil commented Aug 17, 2020 •

edited

Loading

stweil commented Aug 17, 2020 •

edited

Loading

stweil commented Aug 18, 2020 •

edited

Loading