Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teensy 3.0 #1

Open
apmorton opened this issue Jun 7, 2013 · 9 comments
Open

Teensy 3.0 #1

apmorton opened this issue Jun 7, 2013 · 9 comments

Comments

@apmorton
Copy link

apmorton commented Jun 7, 2013

Hello,

I have spent some time making this work with teensy 3.0, but my efforts were mostly specific to my application, so I pretty much implemented only what I needed.

There are however some pointers I can give you from what I found.

It looks to me like what you do with pins_arduino_compile_time.h is already accomplished with the teensy 3.0, but I am not sure, so in my implementation I simply used digitalWrite for the latch pin. The clock speed is so much higher on the teensy 3.0 I don't think it matters much.

The most important part is of course the interrupt handler, which is mostly identical other than the add_one_pin_to_byte macro, which is as follows.

#define add_one_pin_to_byte(sendInt, thres, ptr) { \
    unsigned char val = *ptr; \
    asm volatile("cmp %0, %1" :: "r" (thres), "r" (val) :); \
    asm volatile("rrx %0, %1" : "=r" (sendInt) : "r" (sendInt) :); \
}

the ARM RRX function works on a 32bit register, so you must pass it an unsigned int for sendInt, and then shift it right 24bits after you add all the bits to send.

for the interrupt timer you use IntervalTimer

m_timer = new IntervalTimer();
period = 1000000.0 / (ledFrequency * maxBrightness);
m_timer.begin(myHandlerFunction, period);

with 3 registers at 100hz pwm frequency with 255 maximum brightness, these are my results.

Load of interrupt: 0.3640691
Clock cycles per interrupt: 685.31
Interrupt frequency: 25500.00 Hz
PWM frequency: 99.61 Hz

This is without the chips actually connected (by boards are being fabricated now), so I suspect much of that is waiting for SPI to timeout in the read loop?

If I don't actually send the values over SPI it drops to Load of interrupt: 0.1692283

If you would like to see my code I will gladly post it somewhere, but it is teensy 3.0 specific.

@elcojacobs
Copy link
Owner

The most important bit of code is the 2 assembly lines, which are highly architecture specific.
On AVR, the compare result is put into the carry. If you do a shift over carry next, you can shift that 1 bit into a byte. Do it 8 times and you have one full byte of compare results.

You would have to check the ARM architecture to see whether this trick works on the teensy as well.

Another optimization I do is that I overlap SPI output with calculating the values for the next byte. The interrupt load on AVR for your setup is 0.27 (according to the calculator on my website). You are pretty close, so maybe the SPI speed is the limiting factor.

I look forward to hearing whether your code efforts produce blinking lights! If it works, commit it on GitHub. If you fork my repo and work from there, preferably with compile time switches between the platforms, I can pull in your results and give you credit.

@apmorton
Copy link
Author

apmorton commented Jun 7, 2013

According to what I put in the calculator, it should be 0.12. 3 registers, 255 max, 100hz, 48mhz

I should have mentioned that I tested my code, sort of.

Instead of having it write to SPI, i have it store in an extern unsigned char, defined in the main sketch, and then print that in binary over serial when it changes.

with all 8 channels set at increments of 30 it goes something like this.

ShiftPWM.SetOne(0, 30);
ShiftPWM.SetOne(1, 60);
ShiftPWM.SetOne(2, 90);
...
11111111
11111110
11111100
11111000
11110000
11100000
11000000
10000000
0

and then repeats.

So the two instructions in ASM seem to be working properly, which is good. CMP + RRX do the same thing on arm as CP + ROR do on avr, with the difference being RRX operates on a 32bit register where ROR operates on an 8bit register, so you just have to shift it down after you put all the bits in.

My code is using the SPI module, so that probably attributes some of the extra load, as SPI becomes completely serial, waiting until the send finishes before calculating the next byte, although I only see this being a real issue with very long chains of registers.

I will see about contributing my code back with compile time switches, but it could get pretty messy the way its structured now.

It might be best to move all the arch specific things into precompiler defines and then into arm.h and avr.h, and then conditionally include those in ShiftPWM.h or something.

I am not entirely sure where to get the information for pins_arduino_compile_time.h, as it doesn't seem to line up with pins_arduino.h, but maybe I haven't looked long enough. For me I was using SPI anyway, so using digitalWrite on only the latch pin didn't seem like a huge deal.

perhaps you could take a look at the teensy 3.0 hardware definitions from teensyduino and construct the entries for pins_arduino_compile_time.h, and I will test them?

@elcojacobs
Copy link
Owner

I made that file myself, because Arduino normally looks up the port and bit
in a program memory struct.
DigitalWrite takes 53 clock cycles, while a bitset on a known port takes 2.

That file was a way to make the pin numbers known at compile time instead
of looking them up at runtime.
Maybe the digitalWriteFast library can replace it.

I have not worked on ShiftPWM for quite a while, because BrewPi takes up
all my time.

Only the latch pin as digitalWrite() is not that bad, but I also have a non
SPI version, which has a lot more pin writes. With my hack, the non SPI
version takes 13 clock cycles per pin (5 for SPI).

It could be that Teensy is smarter about this, I have not looked into it.

Elco

On 7 June 2013 16:05, Juvenal1228 [email protected] wrote:

According to what I put in the calculator, it should be 0.12. 3 registers,
255 max, 100hz, 48mhz

I should have mentioned that I tested my code, sort of.

Instead of having it write to SPI, i have it store in an extern unsigned
char, defined in the main sketch, and then print that in binary over serial
when it changes.

with all 8 channels set at increments of 30 it goes something like this.

ShiftPWM.SetOne(0, 30);
ShiftPWM.SetOne(1, 60);
ShiftPWM.SetOne(2, 90);
...

11111111
11111110
11111100
11111000
11110000
11100000
11000000
10000000
0

and then repeats.

So the two instructions in ASM seem to be working properly, which is good.
CMP + RRX do the same thing on arm as CP + ROR do on avr, with the
difference being RRX operates on a 32bit register where ROR operates on an
8bit register, so you just have to shift it down after you put all the bits
in.

My code is using the SPI module, so that probably attributes some of the
extra load, as SPI becomes completely serial, waiting until the send
finishes before calculating the next byte, although I only see this being a
real issue with very long chains of registers.

I will see about contributing my code back with compile time switches, but
it could get pretty messy the way its structured now.

It might be best to move all the arch specific things into precompiler
defines and then into arm.h and avr.h, and then conditionally include those
in ShiftPWM.h or something.

I am not entirely sure where to get the information for
pins_arduino_compile_time.h, as it doesn't seem to line up with
pins_arduino.h, but maybe I haven't looked long enough. For me I was
using SPI anyway, so using digitalWrite on only the latch pin didn't seem
like a huge deal.

perhaps you could take a look at the teensy 3.0 hardware definitions from
teensyduino and construct the entries for pins_arduino_compile_time.h,
and I will test them?


Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-19108799
.

@apmorton
Copy link
Author

apmorton commented Jun 7, 2013

I just looked into it, and digitalWriteFast is included for teensy 3.0 as part of the standard library installed by teensyduino.

couldnt you also just cache the return of digitalPinToBitMask and digitalPinToPort for each port when starting up?

I guess this would be slightly more clock cycles when accessing, by a few.

I will try to throw together a PR for teensy 3.0 this weekend.

@elcojacobs
Copy link
Owner

The point is knowing the registers at compile time, because the compiler can then use a bitSet instruction instead of using the mask and port register.

@apmorton
Copy link
Author

It is likely I will not have the time to integrate my solution into ShiftPWM, so I have decided to just release my crap as-is, and hopefully someone with enough time will be able to integrate it.

https://github.com/Juvenal1228/T3ShiftPWM

For my application, all of the logic is in my own code, so I didn't really implement any of the convenience functions your library has to set RGB values etc.

my code just operates directly on m_values

@PaulStoffregen
Copy link

If anyone's still watching this old thread, I recently ported ShiftPWM to Teensy 3.0 & 3.1.

https://github.com/PaulStoffregen/ShiftPWM

@apmorton
Copy link
Author

@PaulStoffregen I only looked at your code quickly, so forgive me if I missed something.

It doesn't look like you implemented add_one_pin_to_byte for ARM, which means your implementation cannot use the hardware SPI method.

This is the macro which works with the teensy arm assembly, taken from my T3ShiftPWM repo.

#define add_one_pin_to_byte(sendInt, thres, ptr) { \
    unsigned char val = *ptr; \
    asm volatile("cmp %0, %1" :: "r" (thres), "r" (val) :); \
    asm volatile("rrx %0, %1" : "=r" (sendInt) : "r" (sendInt) :); \
}

https://github.com/apmorton/T3ShiftPWM/blob/master/T3ShiftPWM.h#L14

The performance difference for my application was definitely noticeable between the two methods.

@PaulStoffregen
Copy link

I'm currently working on an API for SPI transactions. Details can be found on these 2 conversations:

http://forum.pjrc.com/threads/25582-SPI-sharing-between-interrupts-and-main-program/page2

https://groups.google.com/a/arduino.cc/forum/#!topic/developers/TuZLfjeZjDI

SPI transaction support is particularly relevant for libraries like ShiftPWM, which use the hardware SPI port from interrupt context. If a non-interrupt library like Ethernet or SD is using the SPI port at the moment ShiftPWM gets an interrupt, ShiftPWM will end up using the SPI port while that other library still has its chip select active and some other device listening. That's a very bad result.

This same problem happens on all Arduino boards. It's not unique to Teensy.

I'm working on a revised SPI library to solve this problem. Details are in those lengthy conversations.

At the moment, I'm intentionally stalling work on SPI-based libraries, until this SPI transaction stuff is working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants