Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: Release interaction with Call #234

Open
RealKeyboardWarrior opened this issue Dec 16, 2021 · 5 comments
Open

Q: Release interaction with Call #234

RealKeyboardWarrior opened this issue Dec 16, 2021 · 5 comments

Comments

@RealKeyboardWarrior
Copy link

RealKeyboardWarrior commented Dec 16, 2021

  • Is it possible to release a session while a call is in progress?
    There are mutexes to prevent simultaneously reading and writing within a single session:

    trezord-go/core/core.go

    Lines 629 to 656 in 106e5e9

    func (c *Core) readWriteDev(
    body []byte,
    acquired *session,
    mode CallMode,
    ) ([]byte, error) {
    if mode == CallModeRead {
    if len(body) != 0 {
    return nil, errors.New("non-empty body on read mode")
    }
    c.log.Log("skipping write")
    } else {
    acquired.writeMutex.Lock()
    err := c.writeDev(body, acquired.dev)
    acquired.writeMutex.Unlock()
    if err != nil {
    return nil, err
    }
    }
    if mode == CallModeWrite {
    c.log.Log("skipping read")
    return []byte{0}, nil
    }
    acquired.readMutex.Lock()
    defer acquired.readMutex.Unlock()
    return c.readDev(acquired.dev)
    }

I do not see the Release function acquiring locks for these mutexes so it seems like we'd be able to release a session whilst it's busy reading and writing? Perhaps I'm missing some libusb magic locking mechanisms here.

trezord-go/core/core.go

Lines 354 to 372 in 106e5e9

func (c *Core) release(
ssid string,
disconnected bool,
debug bool,
) error {
c.log.Log(fmt.Sprintf("session %s", ssid))
s := c.sessions(debug)
v, ok := s.Load(ssid)
if !ok {
c.log.Log("session not found")
return ErrSessionNotFound
}
s.Delete(ssid)
acquired := v.(*session)
c.log.Log("bus close")
err := acquired.dev.Close(disconnected)
return err
}

There are locks on the session maps, but those only apply to adding and removing values within the map.

  • Is that a desirable property?
    If it's not a desirable and deliberate property then I'm willing to contribute a fix.
@prusnak
Copy link
Member

prusnak commented Dec 16, 2021

Maybe @matejcik and @szymonlesisz can comment on this? Is this desirable to be able to call Release while a Call is still running?

@RealKeyboardWarrior
Copy link
Author

RealKeyboardWarrior commented Dec 16, 2021

Hi,

The readMutex is being acquired here through finishReadQueue:

trezord-go/usb/libusb.go

Lines 435 to 456 in 106e5e9

func (d *LibUSBDevice) Close(disconnected bool) error {
d.mw.Log("storing d.closed")
atomic.StoreInt32(&d.closed, 1)
if d.cancel {
// libusb close does NOT cancel transfers on close
// => we are using our own function that we added to libusb/sync.c
// this "unblocks" Interrupt_Transfer in readWrite
d.mw.Log("canceling previous transfers")
lowlevel.Cancel_Sync_Transfers_On_Device(d.dev)
// reading recently disconnected device sometimes causes weird issues
// => if we *know* it is disconnected, don't finish read queue
//
// Finishing read queue is not necessary when we don't allow cancelling
// (since when we don't allow cancelling, we don't allow session stealing)
if !disconnected {
d.mw.Log("finishing read queue")
d.finishReadQueue(d.debug)
}
}

But it seems to try interrupting the sending without acquiring a mutex.

@matejcik
Copy link

I wouldn't rely on any libusb magic, it's generally not cross-OS reliable.

Releasing while a call is in progress seems like asking for trouble, but an usecase I can envision goes like this:
App A is waiting for user interaction on Trezor, or got otherwise stuck, and it's not visible to the user, so it's not apparent what they should do.
App B is sitting at the "Another app is using Trezor screen, steal focus?" and wants to be able to shut out App A.

If we disallow releasing while a call is in progress, the user can only resolve this by reconnecting the Trezor.

Also how would this interact with using Post/Read instead of Call ?

@RealKeyboardWarrior
Copy link
Author

I've been trying to simulate two applications attempting to race the bridge at the same time, one is racing to GetXPUB for 44'/0'/0' and another is for GetXPUB 44'/0'/1', and it attempts to detect a case where an unexpected XPub message is returned, indicating that there might be stale data being sent back to us.

It seems that the bridge doesn't handle multiple applications well when they are aggressively racing to acquire a handle to the device. If two applications are aggressively calling the bridge they will return either session not found or wrong previous session. Neither application will be able to use the device if one of them is retrying instantly.

The bridge version that I'm using is 2.0.27 downloaded from https://suite.trezor.io/web/bridge/, on Debian 10.

Additionally to the behavior above, I've managed to produce two strange messages that might indicate funky stuff going on.

Use before acquiring or after releasing

It seems like I've managed to produce a scenario where the bridge attempts to use the device before having claimed the device, or perhaps after it had released its claim.

Dec 17 11:45:09 dev kernel: usb 1-3: usbfs: process 8090 (trezord) did not claim interface 0 before use

Bigger log:

Dec 17 11:45:07 dev trezord[8090]: POST /call/468
Dec 17 11:45:07 dev trezord[8090]: POST /acquire/1/468
Dec 17 11:45:07 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:07 +0100] "POST /call/468 HTTP/1.1" 400 46
Dec 17 11:45:07 dev trezord[8090]: POST /acquire/1/null
Dec 17 11:45:08 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:07 +0100] "POST /acquire/1/468 HTTP/1.1" 200 18
Dec 17 11:45:08 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:07 +0100] "POST /acquire/1/null HTTP/1.1" 400 35
Dec 17 11:45:08 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:07 +0100] "POST /listen HTTP/1.1" 200 95
Dec 17 11:45:08 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:07 +0100] "POST /listen HTTP/1.1" 200 95
Dec 17 11:45:08 dev trezord[8090]: POST /call/469
Dec 17 11:45:08 dev trezord[8090]: POST /listen
Dec 17 11:45:08 dev trezord[8090]: POST /acquire/1/null
Dec 17 11:45:08 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:08 +0100] "POST /acquire/1/null HTTP/1.1" 400 35
Dec 17 11:45:08 dev trezord[8090]: POST /listen
Dec 17 11:45:08 dev trezord[8090]: POST /acquire/1/469
Dec 17 11:45:09 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:08 +0100] "POST /acquire/1/469 HTTP/1.1" 200 18
Dec 17 11:45:09 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:08 +0100] "POST /listen HTTP/1.1" 200 95
Dec 17 11:45:09 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:08 +0100] "POST /listen HTTP/1.1" 200 95
Dec 17 11:45:09 dev kernel: usb 1-3: usbfs: process 8090 (trezord) did not claim interface 0 before use
Dec 17 11:45:09 dev trezord[8090]: POST /listen
Dec 17 11:45:09 dev trezord[8090]: POST /call/470
Dec 17 11:45:09 dev trezord[8090]: POST /listen
Dec 17 11:45:09 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:08 +0100] "POST /call/469 HTTP/1.1" 400 26
Dec 17 11:45:09 dev trezord[8090]: POST /acquire/1/470
Dec 17 11:45:09 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:09 +0100] "POST /call/470 HTTP/1.1" 400 46
Dec 17 11:45:09 dev trezord[8090]: POST /acquire/1/470
Dec 17 11:45:10 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:09 +0100] "POST /acquire/1/470 HTTP/1.1" 400 35
Dec 17 11:45:10 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:09 +0100] "POST /acquire/1/470 HTTP/1.1" 200 18
Dec 17 11:45:10 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:09 +0100] "POST /listen HTTP/1.1" 200 95
Dec 17 11:45:10 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:09 +0100] "POST /listen HTTP/1.1" 200 95
Dec 17 11:45:10 dev trezord[8090]: POST /call/471
Dec 17 11:45:10 dev trezord[8090]: POST /acquire/1/470
Dec 17 11:45:10 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:10 +0100] "POST /acquire/1/470 HTTP/1.1" 400 35
Dec 17 11:45:10 dev trezord[8090]: POST /listen
Dec 17 11:45:10 dev trezord[8090]: POST /listen
Dec 17 11:45:10 dev trezord[8090]: POST /acquire/1/471
Dec 17 11:45:10 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:10 +0100] "POST /call/471 HTTP/1.1" 400 46
Dec 17 11:45:10 dev trezord[8090]: POST /acquire/1/471
Dec 17 11:45:11 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:10 +0100] "POST /acquire/1/471 HTTP/1.1" 200 18
Dec 17 11:45:11 dev trezord[8090]: 127.0.0.1 - - [17/Dec/2021:11:45:10 +0100] "POST /acquire/1/471 HTTP/1.1" 400 35

Strange disconnect war

Application 1 racing to get 44'/0'/0', the first few attempts work fine when the other application isn't racing.
This might have happened at the same time as the previous issue above but I can't know because I forgot to get a detailed dump.

BIP39 passphrase: 
44'/0'/0': attempt 0
44'/0'/0': attempt 1
44'/0'/0': attempt 2
44'/0'/0': attempt 3
44'/0'/0': attempt 4
44'/0'/0': attempt 5
Error: session not found
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
44'/0'/0': attempt 31
44'/0'/0': attempt 32
44'/0'/0': attempt 33

Application 2 racing to get 44'/0'/1'

connection detected, setting descriptor:
{
 path: '1',
 session: '15',
 debugSession: null,
 product: 21441,
 vendor: 4617,
 debug: false
}
undefined
Error: session not found
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: session not found
Error: wrong previous session
Error: session not found
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session
Error: device disconnected during action
Error: wrong previous session
Error: wrong previous session

Lots and lots of USB resetting going on when fighting over sessions

Lots and lots of resetting of the USB device..

Dec 17 11:00:00 dev kernel: usbcore: registered new interface driver usbfs
Dec 17 11:00:00 dev kernel: usbcore: registered new interface driver hub
Dec 17 11:00:00 dev kernel: usbcore: registered new device driver usb
Dec 17 11:00:00 dev kernel: usb usb1: New USB device found, idVendor=1d6b, idProduct=0001, bcdDevice= 4.19
Dec 17 11:00:00 dev kernel: usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
Dec 17 11:00:00 dev kernel: usb usb1: Product: OHCI PCI host controller
Dec 17 11:00:00 dev kernel: usb usb1: Manufacturer: Linux 4.19.0-17-amd64 ohci_hcd
Dec 17 11:00:00 dev kernel: usb usb1: SerialNumber: 0000:00:06.0
Dec 17 11:00:00 dev kernel: usb 1-1: new full-speed USB device number 2 using ohci-pci
Dec 17 11:00:00 dev kernel: usb 1-1: New USB device found, idVendor=80ee, idProduct=0021, bcdDevice= 1.00
Dec 17 11:00:00 dev kernel: usb 1-1: New USB device strings: Mfr=1, Product=3, SerialNumber=0
Dec 17 11:00:00 dev kernel: usb 1-1: Product: USB Tablet
Dec 17 11:00:00 dev kernel: usb 1-1: Manufacturer: VirtualBox
Dec 17 11:00:00 dev kernel: usbcore: registered new interface driver usbhid
Dec 17 11:00:00 dev kernel: usbhid: USB HID core driver
Dec 17 11:00:00 dev kernel: input: VirtualBox USB Tablet as /devices/pci0000:00/0000:00:06.0/usb1/1-1/1-1:1.0/0003:80EE:0021.0001/input/input6
Dec 17 11:00:00 dev kernel: hid-generic 0003:80EE:0021.0001: input,hidraw0: USB HID v1.10 Mouse [VirtualBox USB Tablet] on usb-0000:00:06.0-1/input0
Dec 17 11:00:00 dev kernel: usb 1-2: new full-speed USB device number 3 using ohci-pci
Dec 17 11:00:00 dev kernel: usb 1-2: New USB device found, idVendor=27c6, idProduct=533c, bcdDevice= 1.00
Dec 17 11:00:00 dev kernel: usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Dec 17 11:00:00 dev kernel: usb 1-2: Product: FingerPrint
Dec 17 11:00:00 dev kernel: usb 1-2: Manufacturer: Goodix
Dec 17 11:00:00 dev mtp-probe[419]: checking bus 1, device 2: "/sys/devices/pci0000:00/0000:00:06.0/usb1/1-1"
Dec 17 11:00:00 dev mtp-probe[412]: checking bus 1, device 3: "/sys/devices/pci0000:00/0000:00:06.0/usb1/1-2"
Dec 17 11:03:39 dev kernel: usb 1-3: new full-speed USB device number 4 using ohci-pci
Dec 17 11:03:40 dev kernel: usb 1-3: New USB device found, idVendor=1209, idProduct=53c1, bcdDevice= 2.00
Dec 17 11:03:40 dev kernel: usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Dec 17 11:03:40 dev kernel: usb 1-3: Product: TREZOR
Dec 17 11:03:40 dev kernel: usb 1-3: Manufacturer: SatoshiLabs
Dec 17 11:03:40 dev kernel: usb 1-3: SerialNumber: 8F552B0716DC9AA5F89A8CA4
Dec 17 11:03:40 dev kernel: hid-generic 0003:1209:53C1.0002: hiddev0,hidraw1: USB HID v1.11 Device [SatoshiLabs TREZOR] on usb-0000:00:06.0-3/input1
Dec 17 11:03:40 dev mtp-probe[4069]: checking bus 1, device 4: "/sys/devices/pci0000:00/0000:00:06.0/usb1/1-3"
Dec 17 11:03:40 dev mtp-probe[4080]: checking bus 1, device 4: "/sys/devices/pci0000:00/0000:00:06.0/usb1/1-3"
Dec 17 11:03:40 dev kernel: usb 1-3: reset full-speed USB device number 4 using ohci-pci
Dec 17 11:03:47 dev kernel: usb 1-3: reset full-speed USB device number 4 using ohci-pci
Dec 17 11:03:49 dev kernel: usb 1-3: reset full-speed USB device number 4 using ohci-pci
Dec 17 11:03:50 dev kernel: usb 1-3: reset full-speed USB device number 4 using ohci-pci
Dec 17 11:03:52 dev kernel: usb 1-3: reset full-speed USB device number 4 using ohci-pci
...
Dec 17 11:45:05 dev kernel: usb 1-3: reset full-speed USB device number 4 using ohci-pci
Dec 17 11:45:06 dev kernel: usb 1-3: reset full-speed USB device number 4 using ohci-pci
Dec 17 11:45:07 dev kernel: usb 1-3: reset full-speed USB device number 4 using ohci-pci
Dec 17 11:45:08 dev kernel: usb 1-3: reset full-speed USB device number 4 using ohci-pci
Dec 17 11:45:09 dev kernel: usb 1-3: usbfs: process 8090 (trezord) did not claim interface 0 before use

@matejcik
Copy link

Resetting is expected, bridge needs to reset the device at acquire time. I don't exactly remember why :) but some things go bad when we omit the reset.

It seems that the bridge doesn't handle multiple applications well when they are aggressively racing to acquire a handle to the device. If two applications are aggressively calling the bridge they will return either session not found or wrong previous session. Neither application will be able to use the device if one of them is retrying instantly.

This seems like expected behavior? Bridge is not exactly trying to multiplex access to device -- that doesn't work well with the device itself, we'd have to put Initialize management into the Bridge itself and generally be much more strict about things.
The racey apps are "misbehaving" -- when you encounter "wrong previous session" or "session not found", you're supposed to throw a modal like "the device is being used elsewhere. click to use in this app.", and /listen for the other app releasing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants