-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memfs.exe, processor.exe and file Explorer, task manager block #1119
Comments
Hi @lylin108 , if logs cannot be provided could you attach a debugger and see where are the worker threads in memfs ? |
Thank you @lylin108 for the great debugging! What could happen is that all memfs threads are locked in the kernel and therefore none can handle the |
I analyzed the above deadlock situation and found that in the DokanFreeFCB function, it locks vcb and then locks fcb again. The call stack looks like this. Looking at the source code, I find that line 196 is to lock the vcb resource and line 197 is to lock the fcb resource。 line 196: DokanVCBLockRW(Vcb); From the dump and log, I think a situation of three thread competition may cause dokan deadlock. We name the threee thread as T1, T2,T3, T1 call MiCreateImageFileMap to File1 and lock File1's fcb at DokanFilterCallbackAcquireForCreateSection:152,then T1 stuck at KeWaitForSingleObject to wait IRP complete. T2 reponsed to r3 call of CloseHandle(File1), and execute between line 196 and 197 because of File1's fcb is hold by T1, meanwhiles vcb lock is hold by T2. T3 is dokan's event thread or TimeoutThread , it will stuck at DokanCompleteCreate- >DokanFreeFCB because of vcb of File1 is hold by T2, so T3 can't complete IRP of File1 and make T1 alwayse wait. So I suspect that access the FCB resource is locked in the DokanFilterCallbackAcquireForCreateSection function, caused the memfs. Exe process calls DokanDispatchCompletion, blocking access VCB resources. The fix make locks of vcb and fcb should acquired together, avoid stuck between lockVcb and lockFcb. do
{
DokanVCBLockRW(Vcb);
BOOLEAN LockAcquired = DokanFCBLockRW_INTIMES(Fcb);
if (!LockAcquired)
{
DokanVCBUnlock(Vcb);
LARGE_INTEGER wait = RtlConvertLongToLargeInteger(
-DOKAN_RESOURCE_LOCK_DEBUG_INTERVAL_MSEC_WAIT * 10);
KeDelayExecutionThread(KernelMode, TRUE, &wait);
} else {
break;
}
} while (TRUE); The DokanFCBLockRW_INTIMES macro is defined as follows: #define DokanFCBLockRW_INTIMES(fcb) \
DokanResourceLockIninMaxTimeWithDebugInfo( \
TRUE, (fcb)->AdvancedFCBHeader.Resource, &(fcb)->ResourceDebugInfo, \
&(fcb)->Vcb->ResourceLogger, DokanCallSiteID, &(fcb)->FileName, \
(fcb));
BOOLEAN DokanResourceLockIninMaxTimeWithDebugInfo(
__in BOOLEAN Writable,
__in PERESOURCE Resource,
__in PDokanResourceDebugInfo DebugInfo,
__in PDOKAN_LOGGER Logger,
__in const char *Site,
__in const UNICODE_STRING *ObjectName,
__in const void *ObjectPointer) {
// The wait is in 100ns units. Negative means "from now" as opposed to an
// absolute wake up time.
LARGE_INTEGER wait = RtlConvertLongToLargeInteger(
-DOKAN_RESOURCE_LOCK_DEBUG_INTERVAL_MSEC * 10);
LARGE_INTEGER lastWarnTime = {0};
LARGE_INTEGER systemTime = {0};
BOOLEAN warned = FALSE;
BOOLEAN resultLockAcquired = FALSE;
ULONG curTryLockTimes = 0;
for (;;) {
KeEnterCriticalRegion();
if (Writable) {
resultLockAcquired = ExAcquireResourceExclusiveLite(Resource, FALSE);
} else {
resultLockAcquired = ExAcquireResourceSharedLite(Resource, FALSE);
}
++curTryLockTimes;
if (resultLockAcquired ) {
break;
}
if ((curTryLockTimes > Max_Try_lock_times)) {
KeLeaveCriticalRegion();
break;
}
KeLeaveCriticalRegion();
KeQuerySystemTime(&systemTime);
if (lastWarnTime.QuadPart == 0) {
lastWarnTime = systemTime;
} else if ((systemTime.QuadPart - lastWarnTime.QuadPart) / 10 >=
DOKAN_RESOURCE_LOCK_WARNING_MSEC) {
DokanLockWarn(Resource, DebugInfo, Logger, Site, ObjectName,
ObjectPointer);
warned = TRUE;
lastWarnTime = systemTime;
}
KeDelayExecutionThread(KernelMode, TRUE, &wait);
}
if (resultLockAcquired) {
if (ExIsResourceAcquiredExclusiveLite(Resource)) {
if (DebugInfo->ExclusiveLockCount == 0) {
DebugInfo->ExclusiveLockSite = Site;
DebugInfo->ExclusiveOwnerThread = KeGetCurrentThread();
}
// Note that we may need this increment even for a non-writable request,
// since any recursive acquire of an exclusive lock is exclusive.
++DebugInfo->ExclusiveLockCount;
}
if (warned) {
DokanLockNotifyResolved(Resource, Logger);
}
}
return resultLockAcquired;
} Then I give it a try. From the log , i found it ease the dealock by release the vcb halfway. But after a long time test, the dokan still deaklocked. The log show it may still stuck by vcb and fcb lock competition. I tested an environment where 32 identical processes started concurrently. And the files that the process needs to load are also in the working directory of the memory file system. And processes exit and start very frequently. |
Indeed, that's why I believe we should look to have T1 holding
Unfortunately, I believe Or maybe as I suggested in #1119 (comment), find what is the type of T1 request and possibly reduce the locking on it. |
@lylin108 Any update on this ? Do not hesitate to share if you gather new info |
sorry for reply late. I tried to track the irp of "FsRtlGetFileSize" (This API function is where the resource holding thread is blocked). But the blocked thread called to the Dokan driver cannot be found in the log. It looks like the thread is stuck without calling the driver. |
Feature request can skip this form. Bug report must complete it.
Check List
must be 100% match or it will be automatically closed without further discussion. Please remove this line.Environment
Check List
Description
I used memfs.exe to create a disk and mount it in a folder, then started multiple processor.exe processes in that folder. The DLLS that processor.exe depends on are on this disk.
Then let them work on parallel file process tasks. In addition, multiple processor.exe processes can start another process at the same time, reading the same PE file.
In this case, the processor.exe and memfs.exe blocks during the run, and even the file Explorer and task manager get stuck. This is an occasional problem.
Logs
The log was incomplete. I found that the blocked thread could not output the complete log.
The text was updated successfully, but these errors were encountered: