Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memfs.exe, processor.exe and file Explorer, task manager block #1119

Open
5 tasks done
Jasperkkz opened this issue Nov 11, 2022 · 7 comments
Open
5 tasks done

memfs.exe, processor.exe and file Explorer, task manager block #1119

Jasperkkz opened this issue Nov 11, 2022 · 7 comments

Comments

@Jasperkkz
Copy link

Jasperkkz commented Nov 11, 2022

Feature request can skip this form. Bug report must complete it. Check List must be 100% match or it will be automatically closed without further discussion. Please remove this line.

Environment

  • Windows version: windows server 2016 (64-bit)
  • Processor architecture: Amd64
  • Dokany version: v2.0.5.1000
  • Library type (Dokany/FUSE): Dokany

Check List

  • I checked my issue doesn't exist yet
  • My issue is valid with mirror default sample and not specific to my user-mode driver implementation
  • I can always reproduce the issue with the provided description below.
  • I have updated Dokany to the latest version and have reboot my computer after.
  • I tested one of the last snapshot from appveyor CI

Description

I used memfs.exe to create a disk and mount it in a folder, then started multiple processor.exe processes in that folder. The DLLS that processor.exe depends on are on this disk.
Then let them work on parallel file process tasks. In addition, multiple processor.exe processes can start another process at the same time, reading the same PE file.
In this case, the processor.exe and memfs.exe blocks during the run, and even the file Explorer and task manager get stuck. This is an occasional problem.

Logs

The log was incomplete. I found that the blocked thread could not output the complete log.

@Liryna
Copy link
Member

Liryna commented Nov 11, 2022

Hi @lylin108 , if logs cannot be provided could you attach a debugger and see where are the worker threads in memfs ?
You might also want to install a debug version of dokan driver and enable kernel logs with WinDbg https://github.com/dokan-dev/dokany/wiki/How-to-Debug-Dokan#logs-on-debug-build

@Jasperkkz
Copy link
Author

I used version 2.0.6.1 of the Debug driver instead and turned on debug mode. Then I opened DbgView and found an abnormal block in the log.

20519512	882.48413086	Stuck trying to lock "Temp\\dynamicLibrary.dll" (ffffe48149e90c10 with ERESOURCE ffffe4814aee5680) in thread ffffe4814922e800 at DokanCompleteCreate:1310. Current exclusive owner is thread ffffe4814744a080 with outermost lock at DokanFilterCallbackAcquireForCreateSection:152.	
20519513	882.48413086	Stuck trying to lock "\DosDevices\E:\MountPoint" (ffffe4814926b1b0 with ERESOURCE ffffe4814926b260) in thread ffffe4814920f800 at DokanFreeFCB:196. Current exclusive owner is thread ffffe481536d3800 with outermost lock at DokanFreeFCB:196.	
20519514	882.48413086	Stuck trying to lock "Temp\\dynamicLibrary.dll" (ffffe48149e90c10 with ERESOURCE ffffe4814aee5680) in thread ffffe4814a049800 at DokanDispatchCreate:719. Current exclusive owner is thread ffffe4814744a080 with outermost lock at DokanFilterCallbackAcquireForCreateSection:152.	
20519515	882.48413086	Stuck trying to lock "\DosDevices\E:\MountPoint" (ffffe4814926b1b0 with ERESOURCE ffffe4814926b260) in thread ffffe48153611080 at DokanGetFCB:118. Current exclusive owner is thread ffffe481536d3800 with outermost lock at DokanFreeFCB:196.	
20519516	882.48413086	Stuck trying to lock "Temp\\dynamicLibrary.dll" (ffffe48149e90c10 with ERESOURCE ffffe4814aee5680) in thread ffffe48149c3e040 at DokanCompleteCreate:1310. Current exclusive owner is thread ffffe4814744a080 with outermost lock at DokanFilterCallbackAcquireForCreateSection:152.	
20519517	882.48413086	Stuck trying to lock "Temp\\dynamicLibrary.dll" (ffffe48149e90c10 with ERESOURCE ffffe4814aee5680) in thread ffffe4814aa1c080 at DokanDispatchCreate:719. Current exclusive owner is thread ffffe4814744a080 with outermost lock at DokanFilterCallbackAcquireForCreateSection:152.	
20519518	882.48413086	Stuck trying to lock "Temp\\dynamicLibrary.dll" (ffffe48149e90c10 with ERESOURCE ffffe4814aee5680) in thread ffffe481536d3800 at DokanFreeFCB:197. Current exclusive owner is thread ffffe4814744a080 with outermost lock at DokanFilterCallbackAcquireForCreateSection:152.	
20519519	882.48413086	Stuck trying to lock "Temp\\dynamicLibrary.dll" (ffffe48149e90c10 with ERESOURCE ffffe4814aee5680) in thread ffffe4814aff7080 at DokanFilterCallbackAcquireForCreateSection:152. Current exclusive owner is thread ffffe4814744a080 with outermost lock at DokanFilterCallbackAcquireForCreateSection:152.	
20519520	882.48413086	Stuck trying to lock "\DosDevices\E:\MountPoint" (ffffe4814926b1b0 with ERESOURCE ffffe4814926b260) in thread ffffe48149227800 at DokanGetFCB:118. Current exclusive owner is thread ffffe481536d3800 with outermost lock at DokanFreeFCB:196.	
20519521	882.48413086	Stuck trying to lock "\DosDevices\E:\MountPoint" (ffffe4814926b1b0 with ERESOURCE ffffe4814926b260) in thread ffffe48149e40540 at DokanFreeFCB:196. Current exclusive owner is thread ffffe481536d3800 with outermost lock at DokanFreeFCB:196.	
20519522	882.48413086	Stuck trying to lock "Temp\\dynamicLibrary.dll" (ffffe48149e90c10 with ERESOURCE ffffe4814aee5680) in thread ffffe4814926b800 at DokanCompleteCleanup:160. Current exclusive owner is thread ffffe4814744a080 with outermost lock at DokanFilterCallbackAcquireForCreateSection:152.	

E:\MountPoint is the Mount point of the disk.
In this case, I am starting multiple identical child processes in parallel, which depend on the same dll.
I found my process stuck in the middle of creating child processes. The process of DokanFilterCallbackAcquireForCreateSection function called the call stack is as follows:
image
Other child processes block access to the dll, One of the call stack:
image
Then the call stack for memfs.exe is as follows:
image
image
All threads are blocked on DokanFCB.

@Liryna
Copy link
Member

Liryna commented Nov 14, 2022

Thank you @lylin108 for the great debugging!
I believe DokanFilterCallbackAcquireForCreateSection is the source of the deadlock. The lock inside is held during a whole filesystem request while other types do not hold the lock while being processed by the userland filesystem.
In this case usually, this request (thread ffffe4814744a080 NtCreateSection) was sent to memfs and is on hold for some reason. It would be great to debug further and find where this pending request is and which type it is.
You can dump the irp type in WinDBg but for where it is in dokan you will have to look at the driver and userland logs.

What could happen is that all memfs threads are locked in the kernel and therefore none can handle the DokanFilterCallbackAcquireForCreateSection request one.

@Jasperkkz
Copy link
Author

Jasperkkz commented Nov 18, 2022

I analyzed the above deadlock situation and found that in the DokanFreeFCB function, it locks vcb and then locks fcb again.
In my log, the thread ffffe481536d3800 is using vcb resources. Then I look at this dump file below and see that it is blocking access to fcb resources.

The call stack looks like this.

image

Looking at the source code, I find that line 196 is to lock the vcb resource and line 197 is to lock the fcb resource。

line 196: DokanVCBLockRW(Vcb);
line 197: DokanVCBUnlock(Vcb);

From the dump and log, I think a situation of three thread competition may cause dokan deadlock. We name the threee thread as T1, T2,T3, T1 call MiCreateImageFileMap to File1 and lock File1's fcb at DokanFilterCallbackAcquireForCreateSection:152,then T1 stuck at KeWaitForSingleObject to wait IRP complete. T2 reponsed to r3 call of CloseHandle(File1), and execute between line 196 and 197 because of File1's fcb is hold by T1, meanwhiles vcb lock is hold by T2. T3 is dokan's event thread or TimeoutThread , it will stuck at DokanCompleteCreate- >DokanFreeFCB because of vcb of File1 is hold by T2, so T3 can't complete IRP of File1 and make T1 alwayse wait.

So I suspect that access the FCB resource is locked in the DokanFilterCallbackAcquireForCreateSection function, caused the memfs. Exe process calls DokanDispatchCompletion, blocking access VCB resources.
And then I try to change the blocking here.

The fix make locks of vcb and fcb should acquired together, avoid stuck between lockVcb and lockFcb.

  do 
  {
    DokanVCBLockRW(Vcb);
    BOOLEAN LockAcquired = DokanFCBLockRW_INTIMES(Fcb);
    if (!LockAcquired) 
    {
      DokanVCBUnlock(Vcb);
    LARGE_INTEGER wait = RtlConvertLongToLargeInteger(
          -DOKAN_RESOURCE_LOCK_DEBUG_INTERVAL_MSEC_WAIT * 10);

      KeDelayExecutionThread(KernelMode, TRUE, &wait);
    } else {
      break;
    }
  } while (TRUE);

The DokanFCBLockRW_INTIMES macro is defined as follows:

#define DokanFCBLockRW_INTIMES(fcb)                                                    \
      DokanResourceLockIninMaxTimeWithDebugInfo(                                          \
          TRUE, (fcb)->AdvancedFCBHeader.Resource, &(fcb)->ResourceDebugInfo,  \
          &(fcb)->Vcb->ResourceLogger, DokanCallSiteID, &(fcb)->FileName,      \
          (fcb));

BOOLEAN DokanResourceLockIninMaxTimeWithDebugInfo(
    __in BOOLEAN Writable,
                                    __in PERESOURCE Resource,
                                    __in PDokanResourceDebugInfo DebugInfo,
                                    __in PDOKAN_LOGGER Logger,
                                    __in const char *Site,
                                    __in const UNICODE_STRING *ObjectName,
                                    __in const void *ObjectPointer) {
  // The wait is in 100ns units. Negative means "from now" as opposed to an
  // absolute wake up time.
  LARGE_INTEGER wait = RtlConvertLongToLargeInteger(
      -DOKAN_RESOURCE_LOCK_DEBUG_INTERVAL_MSEC * 10);
  LARGE_INTEGER lastWarnTime = {0};
  LARGE_INTEGER systemTime = {0};
  BOOLEAN warned = FALSE;
  BOOLEAN resultLockAcquired = FALSE;

  ULONG curTryLockTimes = 0;

  for (;;) {
    KeEnterCriticalRegion();
    if (Writable) {
      resultLockAcquired = ExAcquireResourceExclusiveLite(Resource, FALSE);
    } else {
      resultLockAcquired = ExAcquireResourceSharedLite(Resource, FALSE);
    }

    ++curTryLockTimes;

    if (resultLockAcquired ) {
      break;
    }

    if ((curTryLockTimes > Max_Try_lock_times)) {
      KeLeaveCriticalRegion();
      break;
    }

    KeLeaveCriticalRegion();
    KeQuerySystemTime(&systemTime);
    if (lastWarnTime.QuadPart == 0) {
      lastWarnTime = systemTime;
    } else if ((systemTime.QuadPart - lastWarnTime.QuadPart) / 10 >=
               DOKAN_RESOURCE_LOCK_WARNING_MSEC) {
      DokanLockWarn(Resource, DebugInfo, Logger, Site, ObjectName,
                    ObjectPointer);
      warned = TRUE;
      lastWarnTime = systemTime;
    }
    KeDelayExecutionThread(KernelMode, TRUE, &wait);
  }

  if (resultLockAcquired) {

        if (ExIsResourceAcquiredExclusiveLite(Resource)) {
      if (DebugInfo->ExclusiveLockCount == 0) {
        DebugInfo->ExclusiveLockSite = Site;
        DebugInfo->ExclusiveOwnerThread = KeGetCurrentThread();
      }
      // Note that we may need this increment even for a non-writable request,
      // since any recursive acquire of an exclusive lock is exclusive.
      ++DebugInfo->ExclusiveLockCount;
    }
    if (warned) {
      DokanLockNotifyResolved(Resource, Logger);
    }
  }

  return resultLockAcquired;
}

Then I give it a try. From the log , i found it ease the dealock by release the vcb halfway. But after a long time test, the dokan still deaklocked. The log show it may still stuck by vcb and fcb lock competition.

I tested an environment where 32 identical processes started concurrently. And the files that the process needs to load are also in the working directory of the memory file system. And processes exit and start very frequently.

@Liryna
Copy link
Member

Liryna commented Nov 18, 2022

So I suspect that access the FCB resource is locked in the DokanFilterCallbackAcquireForCreateSection function, caused the memfs. Exe process calls DokanDispatchCompletion, blocking access VCB resources.

Indeed, that's why I believe we should look to have T1 holding DokanFilterCallbackAcquireForCreateSection to be processed so that it release its lock and T2 and T3 can be processed.

What could happen is that all memfs threads are locked in the kernel and therefore none can handle the DokanFilterCallbackAcquireForCreateSection request one.

Unfortunately, I believe DokanFCBLockRW_INTIMES will not help as it still keeps the threads in the kernel and therefore keeps blocking the T1 request to be handled.
The solution would be to somehow delay the T2 / T3 request completion so that both threads can come back to memfs, handle the T1 request and then after unstack the T2 / T3 previous request and complete them.

Or maybe as I suggested in #1119 (comment), find what is the type of T1 request and possibly reduce the locking on it.

@Liryna
Copy link
Member

Liryna commented Dec 11, 2022

@lylin108 Any update on this ? Do not hesitate to share if you gather new info

@Jasperkkz
Copy link
Author

sorry for reply late. I tried to track the irp of "FsRtlGetFileSize" (This API function is where the resource holding thread is blocked). But the blocked thread called to the Dokan driver cannot be found in the log. It looks like the thread is stuck without calling the driver.
I tried to print out all IRPs in the doakn driver irp queue. But I haven't had time to process it recently. I'll try to print it out this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants