Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix problem with RemoveDirectory #129

Closed
wants to merge 1 commit into from

Conversation

sapek
Copy link

@sapek sapek commented Jul 31, 2019

Description

RemoveDirectory API exhibits a strange behavior when removing a
directory that is open. The directory is deleted and the API returns
success, however calling RemoveDirectory on the parent directory
immediately afterwards fails with The directory is not empty. error.
It appears that the directory deletion is completed asynchronously. This
behaviour is not documented AFAIK but is easy to reproduce:

  1. Create an empty directory tree c:\1\2

  2. Open c:\1\2 in File Explorer

  3. Run the following program:

             RemoveDirectory("c:\\1\\2")
             RemoveDirectory("c:\\1")
    

The second RemoveDirectory fails with The directory is not empty. even
though the directory c:\1\2 has been removed. Adding a short delay
between RemoveDirectory calls makes the second call succeed.
There is no principled way to determine right delay. Experimentally even 20 us
was sufficient on my machine so 100 us seemed like a conservative number
that still isn't too long.

Motivation and Context

This problem is most apparent in System.removeDirectoryRecursive function which traverses a directory tree deleting files and then empty directories. If a directory is open by some process on
the system (e.g. anti-virus software) the function fails with The directory is not empty. error while leaving an empty directory on the system. Various Haskell projects have been plagued by low-frequency occurrence of this problem, e.g.:

haskell/cabal#3123
haskell/cabal#3126
Euterpea/Euterpea2#18

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • I have updated the documentation accordingly.
  • I have not added a new Haskell dependency.
  • I have included a changelog entry.
  • I have not modified the version of the package in Win32.cabal.

RemoveDirectory API exhibits a strange behaviour when removing a
directory that is open. The directory is deleted and the API returns
success, however calling RemoveDirectory for the parent directory
immediately afterwards fails with "The directory is not empty." error.
It appears that the directory deletion is completed asynchronously. This
behaviour is not documented AFAIK but is easy to reproduce:

        1. Create an empty directory tree c:\1\2
        2. Open c:\1\2 in File Explorer
        3. Run the following program:

                RemoveDirectory("c:\\1\\2");
                RemoveDirectory("c:\\1");

The second RemoveDirectory fails with "The directory is not empty." even
though the directory c:\1\2 has been removed. Adding a short delay
between RemoveDirectory calls makes the second call succeed.
@Mistuke
Copy link
Contributor

Mistuke commented Aug 3, 2019

Hi, thanks for the pull request,

I don't however think this is the right place for this. I believe such a change should be in a higher level library such as directory like haskell/directory#96 rather than Win32 as Win32 is supposed to be low level platform bindings to the APIs.

Also failIfFalseWithRetry_ already does a threadDelay between retries with each retry taking increasingly longer. The current implementation here looks like it will make operations that you know will never succeed much slower, e.g. removeDirectory will now always take at least 100ms and when the path doesn't exist it will take up to a couple of seconds.

@sapek
Copy link
Author

sapek commented Aug 3, 2019

I first made the fix in System.Directory however I realized that Win32 might be a better place for a few reasons:

  • Even thought removeDirectoryRecursive is most common place where this problem manifests itself, any other code that deletes child directory followed by deleting parent directory will hit the issue.
  • removeDirectoryRecursive implementation is platform independent and the fix/workaround is Windows specific. The only reasonable place to fix this in System.Directory would be in their wrapper of Win32.RemoveDirectory.
  • Win32 already deals with delays for Windows API calls while System.Directory doesn't have any code like that.
  • The problem this fixes is not a higher level operation like retying when a file is locked. The problem is with Win32 API which perform an operation and returns success but the effect of the operation is not visible immediately after the API returns.

About delays:

  • The delay added here is 100 microseconds, no 100 milliseconds. And as I said, 100 us is conservative based on my experiments, so we could cut it in half.
  • I agree that adding an unconditional delay is incredibly ugly, however I think thare is no other solution for this Windows problem.
  • The delay in failIfFalseWithRetry_ doesn't help here because the API return success.

@Mistuke
Copy link
Contributor

Mistuke commented Aug 6, 2019

removeDirectoryRecursive implementation is platform independent and the fix/workaround is Windows specific. The only reasonable place to fix this in System.Directory would be in their wrapper of Win32.RemoveDirectory.

There is no such thing in directory. everything in directory abstracts over platform specific features. Which is why directory exists. removeDirectoryRecursive eventually calls removePathInternal which is where it calls Win32.RemoveDirectory and why it changes the namespace of paths on Windows.

Win32's entire purpose is to be a building block for other libraries and GHC. It is supposed to be a thin wrapper around the Windows APIs. I do not believe Win32 is the right place for this as it violates this expectation. For the very few exceptions we have made here we have named the APIs something else to avoid confusion.

Win32 already deals with delays for Windows API calls while System.Directory doesn't have any code like that.

As the issue I have pointed to above indicates the directory maintainer understands that such code belongs in directory as it deals with platform specific issues. And even there he would like it not to be the default mode of operation.

The problem this fixes is not a higher level operation like retying when a file is locked. The problem is with Win32 API which perform an operation and returns success but the effect of the operation is not visible immediately after the API returns.

This is the documented behavior of the API. MSDN states

The RemoveDirectory function marks a directory for deletion on close. Therefore, the directory is not removed until the last handle to the directory is closed.

It is by design, as such Win32 should reflect the same behavior as MSDN mentions as this is how the API should work.

And lastly Win32 is often used in places to not introduce a dependency on GHC's IO manager. The implementation of threadDelay closely ties it to the I/O manager as thread delays are handled by the timer manager in the I/O manager. This makes RemoveDirectory not applicable in these scenarios.

For these reasons I won't be accepting these change in Win32. I believe the best place for this is in directory.

@sjakobi
Copy link
Member

sjakobi commented Nov 17, 2021

I believe the best place for this is in directory.

I've raised the issue in haskell/directory#129.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants