bf40e76 disrupts remote control on Mac build #1386

markjfine · 2024-10-20T15:57:42Z

Understand this was mostly a graphics update that doesn't really touch anything to do with remote control. Nevertheless, something is crushing remote control output. GQRX is acting on external commands but isn't responding reliably when queried. Correct operation is restored when bf40e76 is reverted via git.

The text was updated successfully, but these errors were encountered:

argilo · 2024-10-20T15:59:36Z

Are you able to grab a stack trace in a debugger?

argilo · 2024-10-20T16:01:09Z

Oh wait, I thought it was a crash but I guess it's just misbehaving.

argilo · 2024-10-20T17:13:32Z

I don't immediately see what connection that commit could have to the remote control, so some investigation will be needed.

markjfine · 2024-10-20T17:31:57Z

Agree. I'll continue trying to track down what's going on by doing a bit more debugging. It's possible that there may be a Qt graphics processing delay (prioritising trace and waterfall over everything else) that requires me to increase the socket blocking request wait time following a command in Frequency Browser. It's already set to 50000 uSec, but that should be more than enough, otherwise it starts to get sluggish.

argilo · 2024-10-20T17:45:23Z

I believe graphics and networking are both processed on the UI thread (in an event loop) so changes to the plotter could affect timing of remote control commands.

Expecting a remote control response within a short time frame may not be a great idea. Things like FFT size changes can hold up the UI thread for a while.

argilo · 2024-10-20T17:58:04Z

That said, I would expect the performance improvements in #1383 to result in the UI thread spending less time on drawing, not more.

markjfine · 2024-10-20T20:15:51Z

I would expect the performance improvements in #1383 to result in the UI thread spending less time on drawing, not more.

Normally, I'd tend to agree. But there's a massive hit on responsiveness when bf40e76 is applied.

I've been typically running GQRX with an FFT Size of 16384 and Rate of 25 fps. This has never been an issue with respect to control timing.

However, now whenever I tune to something, or change the bandwidth or mode remotely, I've noticed there's a marker over the Rate label that says '>>> Rate' in white on red background for a certain length of time. I can drop down to 1 fps and this will still occur. Previously this was a short blink and I hardly even noticed it but now it is a lot more pronounced.

Moreso, what I've noticed is that no comms seems to occur until the red warning goes back to normal indicating some kind of processing delay. This could explain why an initial tuning, mode, or bandwidth command works, but any follow-up query of meter, freq, mode, and bandwidth seems to fail until '>>> Rate' reverts back to just 'Rate'.

I have a Sync button on Frequency Browser that will re-query everything, just in case the user changes something manually on GQRX and wants to re-synch the app (usually for station database lookup). This works fine for when the above query fails. It can be used after the Rate label returns to normal, but is kind of cumbersome after a while.

Just for reference, this is on a MacBook M1 Pro (2021), 16" Retina XDR display, 16-core GPU, 16 GB memory, running Sequoia 15.0.1.

Going to continue to examine @yuzawa-san's changes to see what could be causing this.

yuzawa-san · 2024-10-20T20:49:27Z

i have no idea what could have caused this. i personally do not use the remote, but if you let me know the exact steps you took i could try to reproduce. it is ok when the remote is not used?

argilo · 2024-10-20T21:05:33Z

If you run in a debugger and press Ctrl+C during a stall, the backtrace should reveal the culprit.

markjfine · 2024-10-20T21:50:30Z

i have no idea what could have caused this. i personally do not use the remote, but if you let me know the exact steps you took i could try to reproduce. it is ok when the remote is not used?

The steps I took to reproduce was first to 'git pull origin master -t', then build. Tested and noticed the remote communications problem. Attempted to increase socket windows, which I took from 50000 uSec to 400000 uSec, which was clunky and was still erratic. Taking the Rate down from 25 fps to 2 helped a little but still was erratic. Did a 'git revert bf40e76 -n' and rebuilt, set the parameters back to the way they were, problem went away.

Actually, the '>>> Rate' blip is still more pronounced than before even without using Frequency Browser. You can manually tune with Frequency Manager killed and it still appeared. I should note that to me, the '>>> Rate' thing is seems like a symptom of processing getting bogged down.

Other FFT Settings I'm using (if it helps) are:
WF Span: Auto
Window: Blackman-Harris
Plot Mode: Avg, green, Fill
Plot Scale: dBFS
Averaging: Slightly higher than mid-way, just Band Plan checked.
Split Plot: Half-way
Plot and WF locked
WF Mode: Sync, Gqrx
Freq Zoom: off.

Also noticed you were doing most of your work in the FM band, whereas I'm using Shortwave. Guaranteed there's a lot more active drawing to do in a couple of MHz of SW than in the FM band. That said, I've looked at your code and the only thing that looks like it could radically affect things is the conversion in some areas from discrete pen graphics in Plotter.cpp, but that's just supposition on my part.

yuzawa-san · 2024-10-20T22:04:33Z

ok i'm at a bit of a loss right now. i looked over the pr and basically every optimization in there should have made things better, since in a lot of places its simply doing physically less stuff. i found that of the optimizations the fill was most improved. that by far used the most CPU prior, so i'm shocked that the latest changes would be worse. maybe disable the fill and see if that improves stuff, at least that may be a lead.

in my original pr, i generated the flame graphs. i used the https://formulae.brew.sh/formula/flamegraph
and ran this once the application was up and running, feel free to tune the params if needed:

sudo dtrace -x ustackframes=256 -n 'profile-1ms /execname == "gqrx"/ { @[ustack()] = count(); } tick-15s { exit(0); }' \
| tee /tmp/$(date +"%Y-%m-%d_%H.%M.%S")-ProcessName_sample.txt \
| stackcollapse.pl \
| flamegraph.pl \
> /tmp/$(date +"%Y-%m-%d_%H.%M.%S")-WindowServer_sample.txt.svg

maybe something would pop out there which is unique to your use case.

markjfine · 2024-10-20T23:14:04Z

Hah... I cherry-picked the pr back in, then flipped Fill off and it's now working properly again.
Interesting. So now we know where to look.

markjfine · 2024-10-20T23:26:46Z

Also ran flamegraph with both GQRX and Frequency Browser running.

First with Fill off:

Then with Fill engaged:

I need to read more about what I'm looking at, but if I had to guess it looks like Fill is eating a cpu on this box.

yuzawa-san · 2024-10-21T00:24:34Z

very interesting, we got real lucky on that hunch. i found that the fill (before) was actually done in a shared thread pool (see attached flame graph screenshot, middle section see QThreadPoolThread::run), so most of the fills in the ui event loop were quite short since the "work" was offloaded to those other threads. this may explain why the event loop would appears more clogged now. since the work was moved it may make it appear there is a slow down in that ui loop. unclear how we should proceed from here. the prior implementation was quite cpu intensive but did not jam up the event loop, but the new one does appear to jam up the loop while using less cpu. the remote actions should live in that event loop being they are essentially actions on the ui as i far as i see it. a compromise could be to to do the expensive drawing near the "top" where the variance is, but draw the "bottom" where there is no variance as a solid fillRect (which should be faster). my hunch is that when the avg/max line has a lot of variance, the polygon rasterizer has to break it into a lot of little nonsequential writes. hey at least the solid color composition from that PR is working. i'll dig in over the next week.

argilo added the bug label Oct 20, 2024

argilo mentioned this issue Oct 20, 2024

Plotter and waterfall performance enhancements #1383

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bf40e76 disrupts remote control on Mac build #1386

bf40e76 disrupts remote control on Mac build #1386

markjfine commented Oct 20, 2024

argilo commented Oct 20, 2024

argilo commented Oct 20, 2024

argilo commented Oct 20, 2024

markjfine commented Oct 20, 2024

argilo commented Oct 20, 2024

argilo commented Oct 20, 2024

markjfine commented Oct 20, 2024

yuzawa-san commented Oct 20, 2024

argilo commented Oct 20, 2024 •

edited

Loading

markjfine commented Oct 20, 2024

yuzawa-san commented Oct 20, 2024

markjfine commented Oct 20, 2024

markjfine commented Oct 20, 2024

yuzawa-san commented Oct 21, 2024

bf40e76 disrupts remote control on Mac build #1386

bf40e76 disrupts remote control on Mac build #1386

Comments

markjfine commented Oct 20, 2024

argilo commented Oct 20, 2024

argilo commented Oct 20, 2024

argilo commented Oct 20, 2024

markjfine commented Oct 20, 2024

argilo commented Oct 20, 2024

argilo commented Oct 20, 2024

markjfine commented Oct 20, 2024

yuzawa-san commented Oct 20, 2024

argilo commented Oct 20, 2024 • edited Loading

markjfine commented Oct 20, 2024

yuzawa-san commented Oct 20, 2024

markjfine commented Oct 20, 2024

markjfine commented Oct 20, 2024

yuzawa-san commented Oct 21, 2024

argilo commented Oct 20, 2024 •

edited

Loading