Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bf40e76 disrupts remote control on Mac build #1386

Open
markjfine opened this issue Oct 20, 2024 · 14 comments
Open

bf40e76 disrupts remote control on Mac build #1386

markjfine opened this issue Oct 20, 2024 · 14 comments
Labels

Comments

@markjfine
Copy link

Understand this was mostly a graphics update that doesn't really touch anything to do with remote control. Nevertheless, something is crushing remote control output. GQRX is acting on external commands but isn't responding reliably when queried. Correct operation is restored when bf40e76 is reverted via git.

@argilo
Copy link
Member

argilo commented Oct 20, 2024

Are you able to grab a stack trace in a debugger?

@argilo
Copy link
Member

argilo commented Oct 20, 2024

Oh wait, I thought it was a crash but I guess it's just misbehaving.

@argilo argilo added the bug label Oct 20, 2024
@argilo
Copy link
Member

argilo commented Oct 20, 2024

I don't immediately see what connection that commit could have to the remote control, so some investigation will be needed.

@markjfine
Copy link
Author

Agree. I'll continue trying to track down what's going on by doing a bit more debugging. It's possible that there may be a Qt graphics processing delay (prioritising trace and waterfall over everything else) that requires me to increase the socket blocking request wait time following a command in Frequency Browser. It's already set to 50000 uSec, but that should be more than enough, otherwise it starts to get sluggish.

@argilo
Copy link
Member

argilo commented Oct 20, 2024

I believe graphics and networking are both processed on the UI thread (in an event loop) so changes to the plotter could affect timing of remote control commands.

Expecting a remote control response within a short time frame may not be a great idea. Things like FFT size changes can hold up the UI thread for a while.

@argilo
Copy link
Member

argilo commented Oct 20, 2024

That said, I would expect the performance improvements in #1383 to result in the UI thread spending less time on drawing, not more.

@markjfine
Copy link
Author

I would expect the performance improvements in #1383 to result in the UI thread spending less time on drawing, not more.

Normally, I'd tend to agree. But there's a massive hit on responsiveness when bf40e76 is applied.

I've been typically running GQRX with an FFT Size of 16384 and Rate of 25 fps. This has never been an issue with respect to control timing.

However, now whenever I tune to something, or change the bandwidth or mode remotely, I've noticed there's a marker over the Rate label that says '>>> Rate' in white on red background for a certain length of time. I can drop down to 1 fps and this will still occur. Previously this was a short blink and I hardly even noticed it but now it is a lot more pronounced.

Moreso, what I've noticed is that no comms seems to occur until the red warning goes back to normal indicating some kind of processing delay. This could explain why an initial tuning, mode, or bandwidth command works, but any follow-up query of meter, freq, mode, and bandwidth seems to fail until '>>> Rate' reverts back to just 'Rate'.

I have a Sync button on Frequency Browser that will re-query everything, just in case the user changes something manually on GQRX and wants to re-synch the app (usually for station database lookup). This works fine for when the above query fails. It can be used after the Rate label returns to normal, but is kind of cumbersome after a while.

Just for reference, this is on a MacBook M1 Pro (2021), 16" Retina XDR display, 16-core GPU, 16 GB memory, running Sequoia 15.0.1.

Going to continue to examine @yuzawa-san's changes to see what could be causing this.

@yuzawa-san
Copy link
Contributor

i have no idea what could have caused this. i personally do not use the remote, but if you let me know the exact steps you took i could try to reproduce. it is ok when the remote is not used?

@argilo
Copy link
Member

argilo commented Oct 20, 2024

If you run in a debugger and press Ctrl+C during a stall, the backtrace should reveal the culprit.

@markjfine
Copy link
Author

i have no idea what could have caused this. i personally do not use the remote, but if you let me know the exact steps you took i could try to reproduce. it is ok when the remote is not used?

The steps I took to reproduce was first to 'git pull origin master -t', then build. Tested and noticed the remote communications problem. Attempted to increase socket windows, which I took from 50000 uSec to 400000 uSec, which was clunky and was still erratic. Taking the Rate down from 25 fps to 2 helped a little but still was erratic. Did a 'git revert bf40e76 -n' and rebuilt, set the parameters back to the way they were, problem went away.

Actually, the '>>> Rate' blip is still more pronounced than before even without using Frequency Browser. You can manually tune with Frequency Manager killed and it still appeared. I should note that to me, the '>>> Rate' thing is seems like a symptom of processing getting bogged down.

Other FFT Settings I'm using (if it helps) are:
WF Span: Auto
Window: Blackman-Harris
Plot Mode: Avg, green, Fill
Plot Scale: dBFS
Averaging: Slightly higher than mid-way, just Band Plan checked.
Split Plot: Half-way
Plot and WF locked
WF Mode: Sync, Gqrx
Freq Zoom: off.

Also noticed you were doing most of your work in the FM band, whereas I'm using Shortwave. Guaranteed there's a lot more active drawing to do in a couple of MHz of SW than in the FM band. That said, I've looked at your code and the only thing that looks like it could radically affect things is the conversion in some areas from discrete pen graphics in Plotter.cpp, but that's just supposition on my part.

@yuzawa-san
Copy link
Contributor

ok i'm at a bit of a loss right now. i looked over the pr and basically every optimization in there should have made things better, since in a lot of places its simply doing physically less stuff. i found that of the optimizations the fill was most improved. that by far used the most CPU prior, so i'm shocked that the latest changes would be worse. maybe disable the fill and see if that improves stuff, at least that may be a lead.

in my original pr, i generated the flame graphs. i used the https://formulae.brew.sh/formula/flamegraph
and ran this once the application was up and running, feel free to tune the params if needed:

sudo dtrace -x ustackframes=256 -n 'profile-1ms /execname == "gqrx"/ { @[ustack()] = count(); } tick-15s { exit(0); }' \
| tee /tmp/$(date +"%Y-%m-%d_%H.%M.%S")-ProcessName_sample.txt \
| stackcollapse.pl \
| flamegraph.pl \
> /tmp/$(date +"%Y-%m-%d_%H.%M.%S")-WindowServer_sample.txt.svg

maybe something would pop out there which is unique to your use case.

@markjfine
Copy link
Author

Hah... I cherry-picked the pr back in, then flipped Fill off and it's now working properly again.
Interesting. So now we know where to look.

@markjfine
Copy link
Author

Also ran flamegraph with both GQRX and Frequency Browser running.

First with Fill off:
2024-10-20_19 18 03-WindowServer_sample txt

Then with Fill engaged:
2024-10-20_19 21 39-WindowServer_sample txt

I need to read more about what I'm looking at, but if I had to guess it looks like Fill is eating a cpu on this box.

@yuzawa-san
Copy link
Contributor

very interesting, we got real lucky on that hunch. i found that the fill (before) was actually done in a shared thread pool (see attached flame graph screenshot, middle section see QThreadPoolThread::run), so most of the fills in the ui event loop were quite short since the "work" was offloaded to those other threads. this may explain why the event loop would appears more clogged now. since the work was moved it may make it appear there is a slow down in that ui loop. unclear how we should proceed from here. the prior implementation was quite cpu intensive but did not jam up the event loop, but the new one does appear to jam up the loop while using less cpu. the remote actions should live in that event loop being they are essentially actions on the ui as i far as i see it. a compromise could be to to do the expensive drawing near the "top" where the variance is, but draw the "bottom" where there is no variance as a solid fillRect (which should be faster). my hunch is that when the avg/max line has a lot of variance, the polygon rasterizer has to break it into a lot of little nonsequential writes. hey at least the solid color composition from that PR is working. i'll dig in over the next week.

2024-10-02_19 37 32-WindowServer_sample txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants