During the past few days I have been trying to track down an issue in the Ubuntu One client tests when ran on Windows that would use all the threads that the python process could have. As you can imaging finding out why there are deadlocks is quite hard, specially when I though that the code was thread safe, guess what? it wasn’t

The bug I had in the code was related to the way in which ReadDirectoryChangesW works. This functions has two different ways to be executed:

Synchronous

The ReadDirectoryChangesW can be executed in a sync mode by NOT providing a OVERLAPPED structure to perform the IO operations, for example:

def _watcherThread(self, dn, dh, changes):
        flags = win32con.FILE_NOTIFY_CHANGE_FILE_NAME
        while 1:
            try:
                print "waiting", dh
                changes = win32file.ReadDirectoryChangesW(dh,
                                                          8192,
                                                          False,
                                                          flags)
                print "got", changes
            except:
                raise
            changes.extend(changes)

The above example has the following two problems:

  • ReadDirectoryChangesW without an OVERLAPPED blocks infinitely.
  • If another thread attempts to close the handle while ReadDirectoryChangesW is waiting on it, the CloseHandle() method blocks (which has nothing to do with the GIL – it is correctly managed)

I got bitten in the ass by the second item which broke my tests in two different ways since it let thread block and a Handle used so that the rest of the tests could not remove the tmp directories that were under used by the block threads.

Asynchronous

In other to be able to use the async version of the function we just have to use an OVERLAPPED structure, this way the IO operations will no block and we will also be able to close the handle from a diff thread.

def _watcherThreadOverlapped(self, dn, dh, changes):
        flags = win32con.FILE_NOTIFY_CHANGE_FILE_NAME
        buf = win32file.AllocateReadBuffer(8192)
        overlapped = pywintypes.OVERLAPPED()
        overlapped.hEvent = win32event.CreateEvent(None, 0, 0, None)
        while 1:
            win32file.ReadDirectoryChangesW(dh,
                                            buf,
                                            False, #sub-tree
                                            flags,
                                            overlapped)
            # Wait for our event, or for 5 seconds.
            rc = win32event.WaitForSingleObject(overlapped.hEvent, 5000)
            if rc == win32event.WAIT_OBJECT_0:
                # got some data!  Must use GetOverlappedResult to find out
                # how much is valid!  0 generally means the handle has
                # been closed.  Blocking is OK here, as the event has
                # already been set.
                nbytes = win32file.GetOverlappedResult(dh, overlapped, True)
                if nbytes:
                    bits = win32file.FILE_NOTIFY_INFORMATION(buf, nbytes)
                    changes.extend(bits)
                else:
                    # This is "normal" exit - our 'tearDown' closes the
                    # handle.
                    # print "looks like dir handle was closed!"
                    return
            else:
                print "ERROR: Watcher thread timed-out!"
                return # kill the thread!

Using the ReadDirectoryW function in this way does solve all the other issues that are found on the sync version and the only extra overhead added is that you need to understand how to deal with COM events which is not that hard after you have worked with it for a little.

I leave this here for people that might find the same issue and for me to remember how much my ass hurt.

References

Read more