Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor some SPI transactions #988

Merged
merged 5 commits into from
Aug 2, 2024
Merged

Refactor some SPI transactions #988

merged 5 commits into from
Aug 2, 2024

Conversation

2bndy5
Copy link
Member

@2bndy5 2bndy5 commented Jul 15, 2024

Refactor 1-byte SPI commands

I came across this tactic when writing the rust implementation in rf24-rs.

By telling read_register() to read 0 bytes from a register that is actually a reserved SPI command byte, we don't need to use a special form of write_register(uint8_t, uint8_t). The same transaction still occurs with less code executed to do it.

This only affects get_status(), flush_tx(), flush_rx(), and reUseTX().
Luckily, those SPI commands already have 0x20 bit asserted, so there's no need to mask the command byte with W_REGISTER as is done in write_register().
This needs testing on the various supported platforms to ensure a 1-byte buffer behaves as expected.

Removed mnemonics bit-wise operations

  • R_REGISTER is defined as 0. So, reg | 0 always results in reg. We don't need to compute this at runtime.

  • REGISTER_MASK is defined as 0x1F. All register offsets are all under 0x1F. Any register offset masked with REGISTER_MASK results in the same register offset value, so we don't need to compute this at runtime.

    Note, all register offsets used are a known number. There is no public API that allows users to read a register offset larger than 0x1F.

Additional SPI refactoring

  • When writing an ACK payload to TX FIFO, static payload size is not applicable. So, bypass the additional checks in write_payload() about ensuring static payload size is satisfied; instead let writeAckPayload() use write_register() directly.

    Note, the W_ACK_PAYLOAD command already has 0x20 bit (W_REGISTER) asserted.

I came across this tactic when writing the rust implementation.

By telling `read_register()` to read 0 bytes from a register that is actually a reserved SPI command byte, we don't need to use a special form of `write_register(uint8_t, uint8_t)`. The same transaction still occurs with less code executed to do it.

This only affects `get_status()`, `flush_tx()`, and `flush_rx()`.
Luckily, those SPI commands already have 0x20 bit asserted, so there's no need to mask the command byte with `W_REGISTER` as is done in `write_register()`.
This needs testing on the various supported platforms to ensure a 1-byte buffer behaves as expected.
Copy link
Contributor

github-actions bot commented Jul 15, 2024

Memory usage change @ 0829b87

Board flash % RAM for global variables %
arduino:avr:nano ❔ -16 - +2 -0.05 - +0.01 0 - 0 0.0 - 0.0
arduino:samd:mkrzero 💚 -104 - -28 -0.04 - -0.01 0 - 0 0.0 - 0.0
Click for full report table
Board examples/GettingStarted
flash
% examples/GettingStarted
RAM for global variables
% examples/AcknowledgementPayloads
flash
% examples/AcknowledgementPayloads
RAM for global variables
% examples/ManualAcknowledgements
flash
% examples/ManualAcknowledgements
RAM for global variables
% examples/StreamingData
flash
% examples/StreamingData
RAM for global variables
% examples/MulticeiverDemo
flash
% examples/MulticeiverDemo
RAM for global variables
% examples/InterruptConfigure
flash
% examples/InterruptConfigure
RAM for global variables
% examples/scanner
flash
% examples/scanner
RAM for global variables
% examples/encodeRadioDetails
flash
% examples/encodeRadioDetails
RAM for global variables
%
arduino:avr:nano 2 0.01 0 0.0 -16 -0.05 0 0.0 -2 -0.01 0 0.0 2 0.01 0 0.0 2 0.01 0 0.0 -8 -0.03 0 0.0 -16 -0.05 0 0.0 -12 -0.04 0 0.0
arduino:samd:mkrzero -64 -0.02 0 0.0 -60 -0.02 0 0.0 -64 -0.02 0 0.0 -60 -0.02 0 0.0 -28 -0.01 0 0.0 -68 -0.03 0 0.0 -64 -0.02 0 0.0 -104 -0.04 0 0.0
Click for full report CSV
Board,examples/GettingStarted<br>flash,%,examples/GettingStarted<br>RAM for global variables,%,examples/AcknowledgementPayloads<br>flash,%,examples/AcknowledgementPayloads<br>RAM for global variables,%,examples/ManualAcknowledgements<br>flash,%,examples/ManualAcknowledgements<br>RAM for global variables,%,examples/StreamingData<br>flash,%,examples/StreamingData<br>RAM for global variables,%,examples/MulticeiverDemo<br>flash,%,examples/MulticeiverDemo<br>RAM for global variables,%,examples/InterruptConfigure<br>flash,%,examples/InterruptConfigure<br>RAM for global variables,%,examples/scanner<br>flash,%,examples/scanner<br>RAM for global variables,%,examples/encodeRadioDetails<br>flash,%,examples/encodeRadioDetails<br>RAM for global variables,%
arduino:avr:nano,2,0.01,0,0.0,-16,-0.05,0,0.0,-2,-0.01,0,0.0,2,0.01,0,0.0,2,0.01,0,0.0,-8,-0.03,0,0.0,-16,-0.05,0,0.0,-12,-0.04,0,0.0
arduino:samd:mkrzero,-64,-0.02,0,0.0,-60,-0.02,0,0.0,-64,-0.02,0,0.0,-60,-0.02,0,0.0,-28,-0.01,0,0.0,-68,-0.03,0,0.0,-64,-0.02,0,0.0,-104,-0.04,0,0.0

@2bndy5
Copy link
Member Author

2bndy5 commented Jul 15, 2024

Hmm, flash size on ATmega328P increased by 2 bytes. Its probably because a pointer data type is slightly larger than a single byte. I still expect speed-ups though because write_register(uint8_t, uint8_t) doesn't have to check if it is a 1-byte vs 2-byte transaction anymore. Maybe the speed-up would be negligible since all 2-byte transactions are used to configure the radio, clear the status flags, or read FIFO_STATUS/RPD registers.

I could try using read_register(uint8_t) instead with a special check like

// only the following 1-byte SPI commands have 0xE0 bits asserted
// FLUSH_TX, FLUSH_RX, RF24_NOP, REUSE_TX_PL
if ((reg & 0xE0) != 0xE0) {
    result = _SPI.transfer(0xff);
}

@2bndy5 2bndy5 force-pushed the refactor-spi-cmds branch from 9c0ba58 to ba12195 Compare July 15, 2024 11:18
@2bndy5
Copy link
Member Author

2bndy5 commented Jul 15, 2024

I could try using read_register(uint8_t) instead with a special check

I don't particularly like this idea. It basically goes back to having a specialized ***_register() that's tailored to 1 byte transactions. Additionally, we'd have to explicitly ignore the returned result.


While there is a 2 byte increase in flash on ATmega328p. there is also a not-insignificant decrease in flash on ATSAMD21. I think this change is still worth it considering coretx-M chips are a much more desirable purchase now-a-days; the price is almost the same as the older ATmega328 but with a lot more benefits (like extra RAM and native USB support).

@2bndy5
Copy link
Member Author

2bndy5 commented Jul 15, 2024

I'd also like to refactor write_payload() into startFastWrite()'s definition because there would be no special need to have another private SPI write function. Both W_TX_PAYLOAD and W_TX_PAYLOAD_NO_ACK SPI commands already have the 0x20 (W_REGISTER) bit asserted. I already changed the writeAckPayload() to use write_register() directly (see 7abc73d).

Although, I'm afraid such a refactor might have an undesirable change in compile size. At this point, I'm only concerned with possible speed-ups without increasing the compile size (preferably reducing it).

@TMRh20
Copy link
Member

TMRh20 commented Jul 16, 2024

I would add my opinion here, but I don't have one yet. I need to look over the code etc and understand what you are doing. 😄

I haven't looked at the RF24 code base for quite some time now, so gimme some time to review here.

@2bndy5
Copy link
Member Author

2bndy5 commented Jul 16, 2024

I'm not adamant about these changes. I just opened the PR as a draft to see compile size reports. Compile size aside, I believe these changes should speed up the SPI transactions a little.

@2bndy5 2bndy5 changed the title Refactor 1-byte SPI commands Refactor some SPI transactions Jul 18, 2024
@2bndy5
Copy link
Member Author

2bndy5 commented Jul 18, 2024

Testing looks good on Linux.

I'm seeing a noticeable speed-up during the streamingData example. The ackPayload example seems to be running about the same speed. Its hard to measure because the speed-up may be a matter nanoseconds (dependent on CPU cycles).

@2bndy5 2bndy5 marked this pull request as ready for review July 18, 2024 04:48
@TMRh20
Copy link
Member

TMRh20 commented Jul 20, 2024

This is just weird. How you figured out to use a read instead of a write is beyond me. It seems to work too, so I am again very impressed.

@2bndy5
Copy link
Member Author

2bndy5 commented Jul 20, 2024

It was just from taking a fresh look. Its kinda fun to write something from scratch, especially in a new/different language.

When I first joined up with you here, I just finished writing the pure python implementation, so I brought my "improvement" ideas with me then. Now that I wrote a pure rust implementation, I have some work (related to these changes) to do in the CirPy lib...

The "read" and "write" terms are more of a human perception when every transaction over the SPI bus is full duplex (both read and write). I lucked out with the 0x20 bit being asserted in the 1-byte SPI commands.

@TMRh20
Copy link
Member

TMRh20 commented Jul 20, 2024

The "read" and "write" terms are more of a human perception when every transaction over the SPI bus is full duplex (both read and write).

Yes, but one needs a complete understanding of SPI and RF24 behaviors along with putting two-and-two together to pull something like this off!

@2bndy5

This comment was marked as resolved.

2bndy5 added 4 commits July 20, 2024 02:53
`R_REGISTER` is defined as 0.
`reg | 0` always results in `reg`
`REGISTER_MASK` is defined as 0x1F.
All register offsets are all under 0x1F.
Any register offset masked with REGISTER_MASK results in the same register offset value, so we don't need to compute this at runtime.

Note: all register offsets used are a known number. There is no public API that allows users to read a register offset larger than 0x1F.
When writing an ACK payload to TX FIFO, static payload size is not applicable.
So, bypass the additional checks in `write_payload()` about ensuring static payload size is satisfied.

Note, the `W_ACK_PAYLOAD` command already has 0x20 bit (`W_REGISTER`) asserted.
@2bndy5 2bndy5 force-pushed the refactor-spi-cmds branch from 7abc73d to 0829b87 Compare July 20, 2024 09:53
@TMRh20
Copy link
Member

TMRh20 commented Jul 27, 2024

Finally had some time to test this more, and my testing shows a speed-up of at least 4-6 KB/s transferring data from a Nano to a Due at 1MBPS. That's pretty significant!

@2bndy5
Copy link
Member Author

2bndy5 commented Jul 27, 2024

I want to check that this still works with the Pico SDK. I doubt there will be any problems, but it doesn't hurt to verify anyway.

@2bndy5
Copy link
Member Author

2bndy5 commented Aug 2, 2024

This works fine with Pico SDK. Seen as how I've tested Linux and PicoSDK (and Arduino-pico core for good measure), are there any objections to merging this?

@TMRh20
Copy link
Member

TMRh20 commented Aug 2, 2024 via email

@2bndy5 2bndy5 merged commit 1acf968 into master Aug 2, 2024
80 checks passed
@2bndy5 2bndy5 deleted the refactor-spi-cmds branch August 2, 2024 09:21
@2bndy5
Copy link
Member Author

2bndy5 commented Oct 6, 2024

We should probably publish a new release to get these optimizations out into the Arduino world. This idea also beckons a release crusade because of the change SERIAL_DEBUG -> RF24_DEBUG (and corresponding changes up the RF24 stack).

@TMRh20
Copy link
Member

TMRh20 commented Oct 6, 2024

Sounds good to me!

In looking at it, there are a lot of changes all the way through the stack that probably should have been pushed out sooner.

I haven't done much coding through the summertime though... been slacking lol

@2bndy5
Copy link
Member Author

2bndy5 commented Oct 6, 2024

It is a bit difficult to remember what changes are unreleased; we currently need to look at the git history since the last tag.

If the immanent release crusade goes well, then I could set up a CI workflow to output unreleased changes in a job summary (like what I did here for a different project).

The unreleased changes are not going to be kept in the CHANGELOGs because that would be rather cumbersome on local git clones.

@TMRh20
Copy link
Member

TMRh20 commented Oct 6, 2024

Hmm, well Its not that hard to do a compare as in v1.4.9...master
but automating things further is pretty nice.

@2bndy5
Copy link
Member Author

2bndy5 commented Oct 6, 2024

It is a little annoying to get to that page. Usually I have to type it manually or first compare a release and change the comparison once the page loads. I'm sure there's a git command too...

@2bndy5
Copy link
Member Author

2bndy5 commented Oct 6, 2024

Release crusade completed

There's going to be some mis-categorization on behalf of git-cliff because we're not using conventional commit messages. For example, in RF24Ethernet, I noticed
image
But it should really be categorized as "Added". This because the merge commit (nRF24/RF24Ethernet@9b78743 from PR nRF24/RF24Ethernet#49) included the word "Remove" in the commit message's body:

  • Add non-blocking connect MQTT example
  • Remove unused variable test
  • Put timer calculations outside loops
  • Add new example file to docs

I'm still not sure why the word "Add" (or the PR label image) didn't cause it to be categorized properly into "Added". 🤷🏼‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants