Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[build bug] Modifying gumbo source doesn't cause rebuilds #2718

Closed
stevecheckoway opened this issue Dec 8, 2022 · 14 comments · Fixed by #3220
Closed

[build bug] Modifying gumbo source doesn't cause rebuilds #2718

stevecheckoway opened this issue Dec 8, 2022 · 14 comments · Fixed by #3220
Labels
state/needs-triage Inbox for non-installation-related bug reports or help requests

Comments

@stevecheckoway
Copy link
Contributor

Please describe the bug

Modifying the gumbo source files doesn't cause rake compile:nokogiri to rebuild the extension.

Help us reproduce what you're seeing

  1. Compile the extension with rake compile:nokogiri
  2. Touch a gumbo source file: touch gumbo-parser/src/util.h
  3. Recompile the extension with rake compile:nokogiri
$ bundle exec rake compile:nokogiri
/usr/bin/make install target_prefix=
install -c -p -m 755 nokogiri.bundle /Users/steve/programming/nokogiri/lib/nokogiri
cp tmp/arm64-darwin21/nokogiri/3.1.2/nokogiri.bundle tmp/arm64-darwin21/stage/lib/nokogiri/nokogiri.bundle
$ touch gumbo-parser/src/util.h
$ bundle exec rake compile:nokogiri
cp gumbo-parser/src/util.h tmp/arm64-darwin21/stage/gumbo-parser/src/util.h
/usr/bin/make install target_prefix=
install -c -p -m 755 nokogiri.bundle /Users/steve/programming/nokogiri/lib/nokogiri
cp tmp/arm64-darwin21/nokogiri/3.1.2/nokogiri.bundle tmp/arm64-darwin21/stage/lib/nokogiri/nokogiri.bundle

Note that the extension was already compiled for the first invocation so nothing needed to be done. In the second compilation, it also thinks there's nothing to do. (The same holds true if you actually modify the file, not merely touch it, but touching is traditionally sufficient for build systems.)

Expected behavior

I expect the gumbo library to be recompiled (or at least the parts of it that were changed) and the nokogiri.bundle to be rebuilt.

Environment

  • I'm building the main branch.
  • ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [arm64-darwin21]
  • macOS 13.0.1
@stevecheckoway stevecheckoway added the state/needs-triage Inbox for non-installation-related bug reports or help requests label Dec 8, 2022
@stevecheckoway stevecheckoway changed the title [bug] Modifying gumbo source doesn't cause rebuilds [build bug] Modifying gumbo source doesn't cause rebuilds Dec 8, 2022
@flavorjones
Copy link
Member

Thanks for opening this issue up, this has been mildly annoying me, too.

It's a side effect of how I ended up implementing the libgumbo build, as a separate library and not part of the extension -- using the packaged_tarball strategy.

Maybe that was a mistake, especially given your notes about link-time optimization in #2331, and I should have picked packaged_source instead. I'll play with it a bit and see if I can convince myself to do it this way.

@stevecheckoway
Copy link
Contributor Author

This is definitely not a priority. I just noticed it and thought I'd make a note of it.

@stevecheckoway
Copy link
Contributor Author

stevecheckoway commented Jun 6, 2024

(Edit: I cleaned up this comment a bit since I posted it as I was rushing out the door without checking that it actually made any sense.)

I'm running into this again so I'd like to spend a little bit of time looking at how difficult it would be to work around this. The build process is a little mysterious to me currently.

When I run rake compile from a clean repo, the Rake::ExtensionTask will run extconf.rb which will build libgumbo.

If I touch one of the gumbo source files and rerun rake compile, something will have noticed the modified file and copy it into the tmp directory the extension is built from but extconf.rb is not run again.

If I change the source_pattern to include the gumbo-parser directories,

    ext.source_pattern = "{.,../../gumbo-parser/{src,test}}/*.{c,cc,cpp,h}"

and rerun rake compile, then extconf.rb will be run again but libgumbo won't be built. Rerunning rake compile again causes extconf.rb to run and again no libgumbo will be built.

So I think we need to make two changes here:

  1. Change the source pattern to consider the gumbo source directories.
  2. Change extconf.rb to rebuild libgumbo if any of the source files have changed.

I was surprised to see that touching the files causes them to be copied to tmp but the extension task isn't run. Which piece of code is performing the file copying? Could that be modified to remove whatever causes miniportile to think libgumbo is installed and thus force a rebuild?

@stevecheckoway
Copy link
Contributor Author

I'm investigating adding a minimal amount of autoconf/automake to get gumbo to build normally. That should remove the bespoke Makefile system I constructed but more importantly, will hopefully allow miniportile to compile this correctly using source_directory. (It currently fails because it expects configure to have produced a Makefile in the work directory but that hasn't actually happened.)

@flavorjones
Copy link
Member

flavorjones commented Jun 6, 2024

@stevecheckoway This is an unfortunate situation and I'm sorry I haven't fixed it before now. The easiest situation would be to refactor the extconf.rb to allow us to re-run the libgumbo recipe on demand. If you want to inject automake I won't stop you, but that feels like a bigger change to me.

Edit: I'm going to spike on the extconf.rb refactor now.

@flavorjones
Copy link
Member

Hmm, refactoring the extconf didn't fix this the way I'd hoped. Going to park it until tomorrow and try again with fresh eyes.

@flavorjones
Copy link
Member

As a workaround, you can run rake clean compile test which will recompile libgumbo and nokogiri's extension files (but not libxml2/libxslt). There is a better workaround I found last year deleting specific files in tmp but I need to rediscover it.

@flavorjones
Copy link
Member

flavorjones commented Jun 7, 2024

@stevecheckoway OK - pushed a branch that I think does what you want

Branch is flavorjones-fix-gumbo-dev-builds which basically gives you a flag to go back to how nokogumbo used to do it (as "packaged source").

How to use it:

bundle exec rake clean clobber # clean slate
bundle exec rake compile test -- --gumbo-dev
# change file
bundle exec rake compile test -- --gumbo-dev

If you change a source file, rake compile will do what you want without re-running the extconf.rb.

LMK how it works for you.

@stevecheckoway
Copy link
Contributor Author

Getting gumbo to build using autoconf/automake was surprisingly easy. I was able to integrate it with mini_portile in a very simple manner:

libgumbo_recipe = process_recipe("libgumbo", "1.0.0-nokogiri", static_p, cross_build_p, false) do |recipe|
  recipe.source_directory = File.join(PACKAGE_ROOT_DIR, "gumbo-parser")
end

I made rake gumbo:test build and run the gumbo tests.

The only hitch: It didn't solve the underlying problem in that modifying a file in gumbo-parser/src does not lead to a rebuild even though extconf.rb was definitely running. And I think I figured out why. In retrospect, it's a little obvious.

I believe that the intended result of extconf.rb is to produce a Makefile that it can use to build/rebuild the extension. So of course the usual course of things is for extconf.rb to not be run again unless it changes. The part that was confusing me was that extconf.rb is definitely being run again. But that's because it gets called by the clean-ports target that gets added to the Makefile:

all: clean-ports
clean-ports: $(DLLIB)
	-$(Q)$(RUBY) $(srcdir)/extconf.rb --clean --enable-static

One way to deal with this is to modify that to something like this.

  File.open("Makefile", "at") do |mk|
    mk.print(<<~EOF)

      .PHONY: rebuild-libgumbo

      $(TARGET_SO): rebuild-libgumbo
      rebuild-libgumbo:
      \t-$(Q)$(MAKE) -C tmp/#{libgumbo_recipe.host}/ports/libgumbo/1.0.0-nokogiri/libgumbo-1.0.0-nokogiri install
    EOF
  end

This approach works. Presumably, it would mostly work for the non-autotools build. The problem it would have is the current Makefile does not have any dependency information.

I see two paths forward

  1. Take the modify the Makefile approach outlined above and make it compile using the current gumbo-parser/src/Makefile
  2. Build gumbo with the autotools and also use the Makefile modification approach.

Path 1's advantage is smallest change required to get 90% of the functionality. I see no disadvantages relative to the situation today.

Path 2's advantages are it would unify how rake compile and rake gumbo:test compile libgumbo (although they'd still be built twice) and dependency tracking for rebuilds would work correctly. The disadvantage is that this introduces autotools to Nokogiri and we'd have to decide what gets checked into the repo. Normally, configure, aclocal.m4, and the various Makefile.ins would not get checked in. But this raises the barrier to entry to compiling Nokogiri from the GitHub repo.

Having written this all out, I'll try out path 1 tomorrow and we can decide later if autotools are worth it.

@stevecheckoway
Copy link
Contributor Author

@stevecheckoway OK - pushed a branch that I think does what you want

Branch is flavorjones-fix-gumbo-dev-builds which basically gives you a flag to go back to how nokogumbo used to do it (as "packaged source").

How to use it:

bundle exec rake clean clobber # clean slate
bundle exec rake compile test -- --gumbo-dev
# change file
bundle exec rake compile test -- --gumbo-dev

If you change a source file, rake compile will do what you want without re-running the extconf.rb.

LMK how it works for you.

Ah great. I'll test that out tomorrow.

@stevecheckoway
Copy link
Contributor Author

@flavorjones, that branch works great!

Since I've already gone through the exercise of autotoolizing libgumbo, I figured I'd push the branch in case you want to take a look. main...stevecheckoway:nokogiri:autotools-gumbo

I added one improvement: rake gumbo:test no longer builds a second version of libgumbo. This would speed up CI a bit.

@flavorjones
Copy link
Member

OK - let me create a PR for my hack first, if that works and unblocks you. I'll take a look at the autotools branch as soon as I can, but a first glance looks great!

@flavorjones
Copy link
Member

See #3220

@flavorjones
Copy link
Member

Aaaaand I pushed a slightly-tweaked version of your autotools branch at #3221

flavorjones added a commit that referenced this issue Jun 7, 2024
**What problem is this PR intended to solve?**

Enable a faster development feedback loop when working on libgumbo. See
notes in CONTRIBUTING.md for how to use it.

Closes #2718
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
state/needs-triage Inbox for non-installation-related bug reports or help requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants