-
Notifications
You must be signed in to change notification settings - Fork 714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle CAN disconnect (detach device from bus) on J1939 #443
Comments
If no device receiver on the CAN bus ACKed a CAN frame, the linux CAN framework will not send local echo for particular packet (SKB). The linux J1939 stack is using this "echo" to control order of J1939 packets on the bus and at same time to communicate packet errors and progress to the user space over the socket error queue. And here is one example on how to process an error queue message: Similar examples can be found here: |
Thanks! As I understand, by recvmsg (and error queue) I can determine that CAN-Bus was detached. How I may determine CAN-Bus was reconnected? By my experience with J1939 driver, if delay between bus detach and attach more than ~10 secs, after bus attach I retrieve error on sendto with errno=ENOTRECOVERABLE. If bus was detached (at middle of normal work of program), i retrieve error on sendto and errno=EHOSTUNREACH. |
Hm, looks like a bug in the driver. We have two places where socket is set to an error state, but no place to recover it: @marckleinebudde, @hartkopp, @yope, any ideas to better handle it? |
Maybe we need a corresponding |
As counter part for the j1939_sk_netdev_event_netdown - yes. For j1939_sk_send_loop_abort it is a bit more difficult. It will not set the socket in to error state if error queue is activated. But there is still problem how to clear an socket error if no error queue is activated? Using ioctl? |
@in-text are you able to detect re-attached bus by using error queue? |
Okay if I will use j1939cat app for bus attach/detach event? Or I need write some specific code for this test? |
j1939cat is designed to abort on errors. You will need to write own code. Please note, there is no attach/detach "event". Just send errors. |
I also have a similar problem with the J1939 stack, when using PDU2 format and segmentation. I'm building a Linux application which simulates J1939 ECUs of a real vehicle and one of those ECUs sends cyclic messages using the PDU2 format. Those messages are longer than 8 byte so they need to be segmented. If there's another device on the bus this works without problem. However if there's no other device, and the cyclic messages continue to be sent, then after a short time the socket seems to enter some kind of erroneous state. Should then another device be switched on, the socket will be unable to send other Heres how it looks when I send data with another device active, everything works as expected:
Here the received CAN data when I switch on my receiving device after the simulation has been running
After switching on the other device I would expect to see the same results as in the first case but the socket becomes Here the corresponding dmesg output:
I don't really know how to work around this problem as I don't see any way how to make sure that there is another device on I've observed this behavior on kernel 6.8.0 as well as 5.10.72. |
Hi @xile273 , did you tried to use error queue? It will send you all packet notifications and prevent from blocking the socket. |
Hi and thank you for your help! Edit: |
Hm.. I tried to use isobusfs client and server, with following modification but was not able to reproduce issue with stalled socket:
I did following test: The kernel is provided by ubuntu: After server and client started, i detached both interfaces from each other. The server side started printing messages: After both interfaces was reattached, the server continued to work as expected. Can you please test if you can reproduce same issue with cangen? Will it be able to continue sending packets after after you detach and reattach the remote MCU? |
Thank you for your quick responses! I'm out of office and don't have access to my equipment but I will try as soon as possible. I'm sorry I also probably wasn't that clear about the error. I've only seen it happen when using segmented
Like that I can 100% reliably reproduce the error. I'll test your code and send you my reproducer as soon as possible though. |
Hi, I've done some more testing and the modified isobusfs behaves the same on my system. But I think I now know what causes the behavior. It only happens when sending those segmented broadcast messages but with an interval which is too short to actually send the whole data before the next send is triggered, and with no other attached can device on the bus. This means I'm probably also to blame for misusing the j1939 stack... See this little test program: It sends a 34 byte message in an interval of 50 milliseconds and on my systems I get this output:
After the call to close, it hangs and no further messages are observed.
When running this program with a receiver attached from the start, it works as expected and the socket closes after But realistically speaking this program can probably be considered as erroneous code on the userspace side... I've also written a version which prints out received errorqueue messages but it behaves very similar. I've done my tests with Kernel 6.5.0 and 6.8.0 with a Kvaser USBcan and a Kvaser Leaf Light v2 CAN interface. |
Ok, thx. Now i can reproduce it. By the way, it is possible to reproduce it with following commands too:
The only problem is to find some time for debugging :( ... |
@xile273 , can you please test following patch: |
Thank you, I hope you stil had some time to enjoy your weekend...
I did not think of that, that would have been easier...
Your patch works for me and fixes the issue, thank you! |
@xile273 , thank you! |
Hello!
How can I determine situation, when device has been detached from CAN bus (in userspace) physically?
The text was updated successfully, but these errors were encountered: