Canonical Voices

Posts tagged with 'windows'

mandel

Recently a very interesting bug has been reported agains Ubuntu One on Windows. Apparently we try to sync a number of system folders that are present on Windows 7 to be backward compatible.

The problem

The actual problem in the code is that we are using os.listdir. While lisdir on python does return system folders (at the end of the day, they are there) os.walk does not, for example, lets imaging hat we have the following scenario:

Documents
    My Pictures (System folder)
    My Videos (System folder)
    Random dir
    Random Text.txt

If we run os.listdir we would have the following:

import os
>> os.listdir('Documents')
['My Pictures', 'My Videos', 'Random dir', 'Random Text.txt']

While if we use os.walk we have:

import os
path, dirs, files = os.walk('Documents')
print dirs
>> ['Random dir']
print files
>> ['Random Text.txt']

The fix is very simple, simply filter the result from os.listdir using the following function:

import win32file
 
INVALID_FILE_ATTRIBUTES = -1
 
 
def is_system_path(path):
    """Return if the function is a system path."""
    attrs = win32file.GetFileAttributesW(path)
    if attrs == INVALID_FILE_ATTRIBUTES:
        return False
    return win32file.FILE_ATTRIBUTE_SYSTEM & attrs ==\
        win32file.FILE_ATTRIBUTE_SYSTEM

File system events

An interesting question to ask after the above is, how does ReadDirectoryChangesW work with systen directories? Well, thankfully it works correctly. What does that mean? Well, it means the following:

  • Changes in the system folders do not get notified.
  • Moves from a watch directory to a system folder is not a MOVE_TO, MOVE_FROM couple but a FILE_DELETED

The above means that if you have a system folder in a watch path you do not need to worry since the events will work correctly, which are very very good news.

Read more
mandel

Two really good developers, Alecu and Diego, have discovered a very interestning bug in the os.path.expanduser function in Python. If you have a user in your Windows machine with a name hat uses Japanese characters like “??????” you will have the following in your system:

  • The Windows Shell will show the path correctly, that is: “C:\Users\??????”
  • cmd.exe will show: “C:\Users\??????”
  • All the env variables will be wrong, which means they will be similar to the info shown in cmd.exe

The above is clearly a problem, specially when the implementation of os.path.expanduser on Winodws is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def expanduser(path):
    """Expand ~ and ~user constructs.
 
    If user or $HOME is unknown, do nothing."""
    if path[:1] != '~':
        return path
    i, n = 1, len(path)
    while i < n and path[i] not in '/\\':
        i = i + 1
 
    if 'HOME' in os.environ:
        userhome = os.environ['HOME']
    elif 'USERPROFILE' in os.environ:
        userhome = os.environ['USERPROFILE']
    elif not 'HOMEPATH' in os.environ:
        return path
    else:
        try:
            drive = os.environ['HOMEDRIVE']
        except KeyError:
            drive = ''
        userhome = join(drive, os.environ['HOMEPATH'])
 
    if i != 1: #~user
        userhome = join(dirname(userhome), path[1:i])
 
    return userhome + path[i:]

For the time being my proposed fix for Ubuntu One is to do the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import ctypes
from ctypes import windll, wintypes
 
class GUID(ctypes.Structure):
    _fields_ = [
         ('Data1', wintypes.DWORD),
         ('Data2', wintypes.WORD),
         ('Data3', wintypes.WORD),
         ('Data4', wintypes.BYTE * 8)
    ]
    def __init__(self, l, w1, w2, b1, b2, b3, b4, b5, b6, b7, b8):
        """Create a new GUID."""
        self.Data1 = l
        self.Data2 = w1
        self.Data3 = w2
        self.Data4[:] = (b1, b2, b3, b4, b5, b6, b7, b8)
 
    def __repr__(self):
        b1, b2, b3, b4, b5, b6, b7, b8 = self.Data4
        return 'GUID(%x-%x-%x-%x%x%x%x%x%x%x%x)' % (
                   self.Data1, self.Data2, self.Data3, b1, b2, b3, b4, b5, b6, b7, b8)
 
# constants to be used according to the version on shell32
CSIDL_PROFILE = 40
FOLDERID_Profile = GUID(0x5E6C858F, 0x0E22, 0x4760, 0x9A, 0xFE, 0xEA, 0x33, 0x17, 0xB6, 0x71, 0x73)
 
def expand_user():
    # get the function that we can find from Vista up, not the one in XP
    get_folder_path = getattr(windll.shell32, 'SHGetKnownFolderPath', None)
 
    if get_folder_path is not None:
        # ok, we can use the new function which is recomended by the msdn
        ptr = ctypes.c_wchar_p()
        get_folder_path(ctypes.byref(FOLDERID_Profile), 0, 0, ctypes.byref(ptr))
        return ptr.value
    else:
        # use the deprecated one found in XP and on for compatibility reasons
       get_folder_path = getattr(windll.shell32, 'SHGetSpecialFolderPathW', None)
       buf = ctypes.create_unicode_buffer(300)
       get_folder_path(None, buf, CSIDL_PROFILE, False)
       return buf.value

The above code ensure that we only use SHGetFolderPathW when SHGetKnownFolderPathW is not available in the system. The reasoning for that is that SHGetFolderPathW is deprecated and new applications are encourage to use SHGetKnownFolderPathW.

A much better solution is to patch ntpath.py so that is something like what I propose for Ubuntu One. Does anyone know if this is fixed in Python 3? Shall I propose a fix?

PS: For ref I got the GUI value from here.

Read more
mandel

On Ubuntu One we use BtiRock to create the packages for Windows. One of the new features I’m working on is to check if there are updates every so often so that the user gets notified. This code on Ubuntu is not needed because the Update Manger takes care of that, but when you work in an inferior OS…

Generate the auto-update.exe

In order to check for updates we use the generated auto-update.exe wizard from BitRock. Generating the wizard is very straight forward first, as with most of the BitRock stuff, we generate the XML to configure the generated .exe.

<autoUpdateProject>
    <fullName>Ubuntu One</fullName>
    <shortName>ubuntuone</shortName>
    <vendor>Canonical</vendor>
    <version>201</version>
    <singleInstanceCheck>1</singleInstanceCheck>
    <requireInstallationByRootUser>0</requireInstallationByRootUser>
    <requestedExecutionLevel>asInvoker</requestedExecutionLevel>
</autoUpdateProject>

There is just a single thing that is worth mentioning about the above XML. The requireInstallationByRootUser is true because we use the generated .exe to check if there are updates present and we do not what the user to have to be root for that, it does not make sense. Once you have the above or similar XML you can execute:

{$bitrock_installation$}\autoupdate\bin\customize.exe" build ubuntuone_autoupdate.xml windows

Which generates the .exe (the output is in ~\Documents\AutoUpdate\output).

Putting it together with Twisted

The following code provides an example of a couple of functions that can be used by the application, first to check for an update, and to perform the actual update.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import os
import sys
 
# Avoid pylint error on Linux
# pylint: disable=F0401
import win32api
# pylint: enable=F0401
 
from twisted.internet import defer
from twisted.internet.utils import getProcessValue
 
AUTOUPDATE_EXE_NAME = 'autoupdate-windows.exe'
 
def _get_update_path():
    """Return the path in which the autoupdate command is found."""
    if hasattr(sys, "frozen"):
        exec_path = os.path.abspath(sys.executable)
    else:
        exec_path = os.path.dirname(__file__)
    folder = os.path.dirname(exec_path)
    update_path = os.path.join(folder, AUTOUPDATE_EXE_NAME)
    if os.path.exists(update_path):
        return update_path
    return None
 
 
@defer.inlineCallbacks
def are_updates_present(logger):
    """Return if there are updates for Ubuntu One."""
    update_path = _get_update_path()
    logger.debug('Update path %s', update_path)
    if update_path is not None:
        # If there is an update present we will get 0 and other number
        # otherwise
        retcode = yield getProcessValue(update_path, args=('--mode',
            'unattended'), path=os.path.dirname(update_path))
        logger.debug('Return code %s', retcode)
        if retcode == 0:
            logger.debug('Returning True')
            defer.returnValue(True)
    logger.debug('Returning False')
    defer.returnValue(False)
 
 
def perform_update():
    """Spawn the autoupdate process and call the stop function."""
    update_path = _get_update_path()
    if update_path is not None:
        # lets call the updater with the commands that are required,
        win32api.ShellExecute(None, 'runas',
            update_path,
            '--unattendedmodeui none', '', 0)

With the above you should be able to easily update the installation of your frozen python app on Windows when using BitRock.

Read more
mandel

Following my last post regarding how to list all installed applications using python here is the code that one will require to remove an installed msi from a Windows machine using python.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class MsiException(Exception):
    """Raised when there are issues with the msi actions."""
 
 
def uninstall_product(uid):
    """Will remove the old beta from the users machine."""
    # we use the msi lib to be able to uninstall apps
    property_name = u'LocalPackage'
    uninstall_path = get_property_for_product(uid, property_name)
    if uninstall_path is not None:
        # lets remove the package.
        command_line = u'REMOVE=ALL'
        result = windll.msi.MsiInstallProductW(uninstall_path, command_line)
        if result != ERROR_SUCCESS:
            raise MsiException('Could not remove product %s' % uid)

The missing functions can be found in the last post about the topic.

Read more
mandel

On of the features that I really like from Ubuntu One is the ability to have Read Only shares that will allow me to share files with some of my friends without them having the chance to change my files. In order to support that in a more explicit way on Windows we needed to be able to change the ACEs of an ACL from a file to stop the user from changing the files. In reality there is no need to change the ACEs since the server will ensure that the files are not changed, but as with python, is better to be explicit that to be implicit.

Our solution has the following details:

  • The file system is not using FAT.
  • We assume that the average user does not change the ACEs of a file usually.
  • If the user changes the ACEs he does not add any deny ACE.
  • We want to keep the already present ACEs.

The idea is very simple, we will add a ACE for the path that will remove the user the write rights so that we cannot edit/rename/delete a file and that he can only list the directories. The full code is the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
USER_SID = LookupAccountName("", GetUserName())[0]
 
def _add_deny_ace(path, rights):
    """Remove rights from a path for the given groups."""
    if not os.path.exists(path):
        raise WindowsError('Path %s could not be found.' % path)
 
    if rights is not None:
        security_descriptor = GetFileSecurity(path, DACL_SECURITY_INFORMATION)
        dacl = security_descriptor.GetSecurityDescriptorDacl()
        # set the attributes of the group only if not null
        dacl.AddAccessDeniedAceEx(ACL_REVISION_DS,
                CONTAINER_INHERIT_ACE | OBJECT_INHERIT_ACE, rights,
                USER_SID)
        security_descriptor.SetSecurityDescriptorDacl(1, dacl, 0)
        SetFileSecurity(path, DACL_SECURITY_INFORMATION, security_descriptor)
 
 
def _remove_deny_ace(path):
    """Remove the deny ace for the given groups."""
    if not os.path.exists(path):
        raise WindowsError('Path %s could not be found.' % path)
    security_descriptor = GetFileSecurity(path, DACL_SECURITY_INFORMATION)
    dacl = security_descriptor.GetSecurityDescriptorDacl()
    # if we delete an ace in the acl the index is outdated and we have
    # to ensure that we do not screw it up. We keep the number of deleted
    # items to update accordingly the index.
    num_delete = 0
    for index in range(0, dacl.GetAceCount()):
        ace = dacl.GetAce(index - num_delete)
        # check if the ace is for the user and its type is 1, that means
        # is a deny ace and we added it, lets remove it
        if USER_SID == ace[2] and ace[0][0] == 1:
            dacl.DeleteAce(index - num_delete)
            num_delete += 1
    security_descriptor.SetSecurityDescriptorDacl(1, dacl, 0)
    SetFileSecurity(path, DACL_SECURITY_INFORMATION, security_descriptor)
 
 
def set_no_rights(path):
    """Set the rights for 'path' to be none.
 
    Set the groups to be empty which will remove all the rights of the file.
 
    """
    os.chmod(path, 0o000)
    rights = FILE_ALL_ACCESS
    _add_deny_ace(path, rights)
 
 
def set_file_readonly(path):
    """Change path permissions to readonly in a file."""
    # we use the win32 api because chmod just sets the readonly flag and
    # we want to have more control over the permissions
    rights = FILE_WRITE_DATA | FILE_APPEND_DATA | FILE_GENERIC_WRITE
    # the above equals more or less to 0444
    _add_deny_ace(path, rights)
 
 
def set_file_readwrite(path):
    """Change path permissions to readwrite in a file."""
    # the above equals more or less to 0774
    _remove_deny_ace(path)
    os.chmod(path, stat.S_IWRITE)
 
 
def set_dir_readonly(path):
    """Change path permissions to readonly in a dir."""
    rights = FILE_WRITE_DATA | FILE_APPEND_DATA
 
    # the above equals more or less to 0444
    _add_deny_ace(path, rights)
 
 
def set_dir_readwrite(path):
    """Change path permissions to readwrite in a dir.
 
    Helper that receives a windows path.
 
    """
    # the above equals more or less to 0774
    _remove_deny_ace(path)
    # remove the read only flag
    os.chmod(path, stat.S_IWRITE)

Adding the Deny ACE

The idea of the code is very simple, we will add a Deny ACE to the path so that the user cannot write it. The Deny ACE is different if it is a file or a directory since we want the user to be able to list the contents of a directory.

3
4
5
6
7
8
9
10
11
12
13
14
15
16
def _add_deny_ace(path, rights):
    """Remove rights from a path for the given groups."""
    if not os.path.exists(path):
        raise WindowsError('Path %s could not be found.' % path)
 
    if rights is not None:
        security_descriptor = GetFileSecurity(path, DACL_SECURITY_INFORMATION)
        dacl = security_descriptor.GetSecurityDescriptorDacl()
        # set the attributes of the group only if not null
        dacl.AddAccessDeniedAceEx(ACL_REVISION_DS,
                CONTAINER_INHERIT_ACE | OBJECT_INHERIT_ACE, rights,
                USER_SID)
        security_descriptor.SetSecurityDescriptorDacl(1, dacl, 0)
        SetFileSecurity(path, DACL_SECURITY_INFORMATION, security_descriptor)

Remove the Deny ACE

Very similar to the above but doing the opposite, lets remove the Deny ACES present for the current user. If you notice we store how many we removed, the reason is simple, if we remove an ACE the index is no longer valid so we have to calculate the correct one by knowing how many we removed.

19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def _remove_deny_ace(path):
    """Remove the deny ace for the given groups."""
    if not os.path.exists(path):
        raise WindowsError('Path %s could not be found.' % path)
    security_descriptor = GetFileSecurity(path, DACL_SECURITY_INFORMATION)
    dacl = security_descriptor.GetSecurityDescriptorDacl()
    # if we delete an ace in the acl the index is outdated and we have
    # to ensure that we do not screw it up. We keep the number of deleted
    # items to update accordingly the index.
    num_delete = 0
    for index in range(0, dacl.GetAceCount()):
        ace = dacl.GetAce(index - num_delete)
        # check if the ace is for the user and its type is 1, that means
        # is a deny ace and we added it, lets remove it
        if USER_SID == ace[2] and ace[0][0] == 1:
            dacl.DeleteAce(index - num_delete)
            num_delete += 1
    security_descriptor.SetSecurityDescriptorDacl(1, dacl, 0)
    SetFileSecurity(path, DACL_SECURITY_INFORMATION, security_descriptor)

Implement access

Our access implementation takes into account the Deny ACE added to ensure that we do not only look at the flags.

def access(path):
    """Return if the path is at least readable."""
    # lets consider the access on an illegal path to be a special case
    # since that will only occur in the case where the user created the path
    # for a file to be readable it has to be readable either by the user or
    # by the everyone group
    # XXX: ENOPARSE ^ (nessita)
    if not os.path.exists(path):
        return False
    security_descriptor = GetFileSecurity(path, DACL_SECURITY_INFORMATION)
    dacl = security_descriptor.GetSecurityDescriptorDacl()
    for index in range(0, dacl.GetAceCount()):
        # add the sid of the ace if it can read to test that we remove
        # the r bitmask and test if the bitmask is the same, if not, it means
        # we could read and removed it.
        ace = dacl.GetAce(index)
        if USER_SID == ace[2] and ace[0][0] == 1:
            # check wich access is denied
            if ace[1] | FILE_GENERIC_READ == ace[1] or\
               ace[1] | FILE_ALL_ACCESS == ace[1]:
                return False
    return True

Implement can_write

The following code is similar to access but checks if we have a readonly file.

def can_write(path):
    """Return if the path is at least readable."""
    # lets consider the access on an illegal path to be a special case
    # since that will only occur in the case where the user created the path
    # for a file to be readable it has to be readable either by the user or
    # by the everyone group
    # XXX: ENOPARSE ^ (nessita)
    if not os.path.exists(path):
        return False
    security_descriptor = GetFileSecurity(path, DACL_SECURITY_INFORMATION)
    dacl = security_descriptor.GetSecurityDescriptorDacl()
    for index in range(0, dacl.GetAceCount()):
        # add the sid of the ace if it can read to test that we remove
        # the r bitmask and test if the bitmask is the same, if not, it means
        # we could read and removed it.
        ace = dacl.GetAce(index)
        if USER_SID == ace[2] and ace[0][0] == 1:
            if ace[1] | FILE_GENERIC_WRITE == ace[1] or\
               ace[1] | FILE_WRITE_DATA == ace[1] or\
               ace[1] | FILE_APPEND_DATA == ace[1] or\
               ace[1] | FILE_ALL_ACCESS == ace[1]:
                # check wich access is denied
                return False
    return True

And that is about it, I hope it helps other projects :D

Read more
mandel

Last week was probably one of the best coding sprints I have had since I started working in Canonical, I’m serious!. I had the luck to pair program with alecu on the FilesystemMonitor that we use in Ubuntu One on windows. The implementation has improved so much that I wanted to blog about it and show it as an example of how to hook the ReadDirectoryChangesW call from COM into twisted so that you can process the events using twisted which is bloody cool.

We have reduce the implementation of the Watch and WatchManager to match our needs and reduce the API provided since we do not use all the API provided by pyinotify. The Watcher implementation is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
class Watch(object):
    """Implement the same functions as pyinotify.Watch."""
 
    def __init__(self, watch_descriptor, path, mask, auto_add, processor,
        buf_size=8192):
        super(Watch, self).__init__()
        self.log = logging.getLogger('ubuntuone.SyncDaemon.platform.windows.' +
            'filesystem_notifications.Watch')
        self.log.setLevel(TRACE)
        self._processor = processor
        self._buf_size = buf_size
        self._wait_stop = CreateEvent(None, 0, 0, None)
        self._overlapped = OVERLAPPED()
        self._overlapped.hEvent = CreateEvent(None, 0, 0, None)
        self._watching = False
        self._descriptor = watch_descriptor
        self._auto_add = auto_add
        self._ignore_paths = []
        self._cookie = None
        self._source_pathname = None
        self._process_thread = None
        # remember the subdirs we have so that when we have a delete we can
        # check if it was a remove
        self._subdirs = []
        # ensure that we work with an abspath and that we can deal with
        # long paths over 260 chars.
        if not path.endswith(os.path.sep):
            path += os.path.sep
        self._path = os.path.abspath(path)
        self._mask = mask
        # this deferred is fired when the watch has started monitoring
        # a directory from a thread
        self._watch_started_deferred = defer.Deferred()
 
    @is_valid_windows_path(path_indexes=[1])
    def _path_is_dir(self, path):
        """Check if the path is a dir and update the local subdir list."""
        self.log.debug('Testing if path %r is a dir', path)
        is_dir = False
        if os.path.exists(path):
            is_dir = os.path.isdir(path)
        else:
            self.log.debug('Path "%s" was deleted subdirs are %s.',
                path, self._subdirs)
            # we removed the path, we look in the internal list
            if path in self._subdirs:
                is_dir = True
                self._subdirs.remove(path)
        if is_dir:
            self.log.debug('Adding %s to subdirs %s', path, self._subdirs)
            self._subdirs.append(path)
        return is_dir
 
    def _process_events(self, events):
        """Process the events form the queue."""
        # do not do it if we stop watching and the events are empty
        if not self._watching:
            return
 
        # we transform the events to be the same as the one in pyinotify
        # and then use the proc_fun
        for action, file_name in events:
            if any([file_name.startswith(path)
                        for path in self._ignore_paths]):
                continue
            # map the windows events to the pyinotify ones, tis is dirty but
            # makes the multiplatform better, linux was first :P
            syncdaemon_path = get_syncdaemon_valid_path(
                                        os.path.join(self._path, file_name))
            is_dir = self._path_is_dir(os.path.join(self._path, file_name))
            if is_dir:
                self._subdirs.append(file_name)
            mask = WINDOWS_ACTIONS[action]
            head, tail = os.path.split(file_name)
            if is_dir:
                mask |= IN_ISDIR
            event_raw_data = {
                'wd': self._descriptor,
                'dir': is_dir,
                'mask': mask,
                'name': tail,
                'path': '.'}
            # by the way in which the win api fires the events we know for
            # sure that no move events will be added in the wrong order, this
            # is kind of hacky, I dont like it too much
            if WINDOWS_ACTIONS[action] == IN_MOVED_FROM:
                self._cookie = str(uuid4())
                self._source_pathname = tail
                event_raw_data['cookie'] = self._cookie
            if WINDOWS_ACTIONS[action] == IN_MOVED_TO:
                event_raw_data['src_pathname'] = self._source_pathname
                event_raw_data['cookie'] = self._cookie
            event = Event(event_raw_data)
            # FIXME: event deduces the pathname wrong and we need to manually
            # set it
            event.pathname = syncdaemon_path
            # add the event only if we do not have an exclude filter or
            # the exclude filter returns False, that is, the event will not
            # be excluded
            self.log.debug('Event is %s.', event)
            self._processor(event)
 
    def _call_deferred(self, f, *args):
        """Executes the defeered call avoiding possible race conditions."""
        if not self._watch_started_deferred.called:
            f(args)
 
    def _watch(self):
        """Watch a path that is a directory."""
        # we are going to be using the ReadDirectoryChangesW whihc requires
        # a directory handle and the mask to be used.
        handle = CreateFile(
            self._path,
            FILE_LIST_DIRECTORY,
            FILE_SHARE_READ | FILE_SHARE_WRITE,
            None,
            OPEN_EXISTING,
            FILE_FLAG_BACKUP_SEMANTICS | FILE_FLAG_OVERLAPPED,
            None)
        self.log.debug('Watching path %s.', self._path)
        while True:
            # important information to know about the parameters:
            # param 1: the handle to the dir
            # param 2: the size to be used in the kernel to store events
            # that might be lost while the call is being performed. This
            # is complicated to fine tune since if you make lots of watcher
            # you migh used too much memory and make your OS to BSOD
            buf = AllocateReadBuffer(self._buf_size)
            try:
                ReadDirectoryChangesW(
                    handle,
                    buf,
                    self._auto_add,
                    self._mask,
                    self._overlapped,
                )
                reactor.callFromThread(self._call_deferred,
                    self._watch_started_deferred.callback, True)
            except error:
                # the handle is invalid, this may occur if we decided to
                # stop watching before we go in the loop, lets get out of it
                reactor.callFromThread(self._call_deferred,
                    self._watch_started_deferred.errback, error)
                break
            # wait for an event and ensure that we either stop or read the
            # data
            rc = WaitForMultipleObjects((self._wait_stop,
                                         self._overlapped.hEvent),
                                         0, INFINITE)
            if rc == WAIT_OBJECT_0:
                # Stop event
                break
            # if we continue, it means that we got some data, lets read it
            data = GetOverlappedResult(handle, self._overlapped, True)
            # lets ead the data and store it in the results
            events = FILE_NOTIFY_INFORMATION(buf, data)
            self.log.debug('Events from ReadDirectoryChangesW are %s', events)
            reactor.callFromThread(self._process_events, events)
 
        CloseHandle(handle)
 
    @is_valid_windows_path(path_indexes=[1])
    def ignore_path(self, path):
        """Add the path of the events to ignore."""
        if not path.endswith(os.path.sep):
            path += os.path.sep
        if path.startswith(self._path):
            path = path[len(self._path):]
            self._ignore_paths.append(path)
 
    @is_valid_windows_path(path_indexes=[1])
    def remove_ignored_path(self, path):
        """Reaccept path."""
        if not path.endswith(os.path.sep):
            path += os.path.sep
        if path.startswith(self._path):
            path = path[len(self._path):]
            if path in self._ignore_paths:
                self._ignore_paths.remove(path)
 
    def start_watching(self):
        """Tell the watch to start processing events."""
        for current_child in os.listdir(self._path):
            full_child_path = os.path.join(self._path, current_child)
            if os.path.isdir(full_child_path):
                self._subdirs.append(full_child_path)
        # start to diff threads, one to watch the path, the other to
        # process the events.
        self.log.debug('Start watching path.')
        self._watching = True
        reactor.callInThread(self._watch)
        return self._watch_started_deferred
 
    def stop_watching(self):
        """Tell the watch to stop processing events."""
        self.log.info('Stop watching %s', self._path)
        SetEvent(self._wait_stop)
        self._watching = False
        self._subdirs = []
 
    def update(self, mask, auto_add=False):
        """Update the info used by the watcher."""
        self.log.debug('update(%s, %s)', mask, auto_add)
        self._mask = mask
        self._auto_add = auto_add
 
    @property
    def path(self):
        """Return the patch watched."""
        return self._path
 
    @property
    def auto_add(self):
        return self._auto_add

The important details of this implementations are the following:

Use a deferred to notify that the watch started.

During or tests we noticed that the start watch function was slow which would mean that from the point when we start watching the directory and the point when the thread actually started we would be loosing events. The function now returns a deferred that will be fired when the ReadDirectoryChangesW has been called which ensures that no events will be lost. The interesting parts are the following:

define the deferred

31
32
33
       # this deferred is fired when the watch has started monitoring
        # a directory from a thread
        self._watch_started_deferred = defer.Deferred()

Call the deferred either when we successfully started watching:

128
129
130
131
132
133
134
135
136
137
138
            buf = AllocateReadBuffer(self._buf_size)
            try:
                ReadDirectoryChangesW(
                    handle,
                    buf,
                    self._auto_add,
                    self._mask,
                    self._overlapped,
                )
                reactor.callFromThread(self._call_deferred,
                    self._watch_started_deferred.callback, True)

Call it when we do have an error:

139
140
141
142
143
144
            except error:
                # the handle is invalid, this may occur if we decided to
                # stop watching before we go in the loop, lets get out of it
                reactor.callFromThread(self._call_deferred,
                    self._watch_started_deferred.errback, error)
                break

Threading and firing the reactor.

There is an interesting detail to take care of in this code. We have to ensure that the deferred is not called more than once, to do that you have to callFromThread a function that will fire the event only when it was not already fired like this:

103
104
105
106
    def _call_deferred(self, f, *args):
        """Executes the defeered call avoiding possible race conditions."""
        if not self._watch_started_deferred.called:
            f(args)

If you do not do the above, but the code bellow you will have a race condition in which the deferred is called more than once.

            buf = AllocateReadBuffer(self._buf_size)
            try:
                ReadDirectoryChangesW(
                    handle,
                    buf,
                    self._auto_add,
                    self._mask,
                    self._overlapped,
                )
                if not self._watch_started_deferred.called:
                    reactor.callFromThread(self._watch_started_deferred.callback, True)
            except error:
                # the handle is invalid, this may occur if we decided to
                # stop watching before we go in the loop, lets get out of it
                if not self._watch_started_deferred.called:
                    reactor.callFromThread(self._watch_started_deferred.errback, error)
                break

Execute the processing of events in the reactor main thread.

Alecu has bloody great ideas way too often, and this is one of his. The processing of the events is queued to be executed in the twisted reactor main thread which reduces the amount of threads we use and will ensure that the events are processed in the correct order.

153
154
155
156
157
158
            # if we continue, it means that we got some data, lets read it
            data = GetOverlappedResult(handle, self._overlapped, True)
            # lets ead the data and store it in the results
            events = FILE_NOTIFY_INFORMATION(buf, data)
            self.log.debug('Events from ReadDirectoryChangesW are %s', events)
            reactor.callFromThread(self._process_events, events)

Just for this the flight to Buenos Aires was well worth it!!! For anyone to see the full code feel free to look at ubuntuone.platform.windows from ubuntuone.

Read more
mandel

Pywin32 is a very cool project that allows you to access the win api without having to go through ctypes and deal with all the crazy parameters that COM is famous for. Unfortunately sometimes it has som issues which you face only a few times in your life.

This case I found a bug where GetFileSecurity does not use the GetFileSecurityW method but the w-less version. For those who don’t have to deal with this terrible details, the W usually means that the functions knows how to deal with utf-8 strings (backward compatibility can be a problem sometimes). I have reported the bug but for those that are in a hurry here is the patch:

diff -r 7dce71d174a9 win32/src/win32security.i
--- a/win32/src/win32security.i	Sat Jun 18 10:16:06 2011 -0400
+++ b/win32/src/win32security.i	Mon Jun 20 14:15:27 2011 +0200
@@ -2108,7 +2108,7 @@
  if (!PyWinObject_AsTCHAR(obFname, &fname))
   goto done;
 
-	if (GetFileSecurity(fname, info, psd, dwSize, &dwSize)) {
+	if (GetFileSecurityW(fname, info, psd, dwSize, &dwSize)) {
   PyErr_SetString(PyExc_RuntimeError, "Can't query for SECURITY_DESCRIPTOR size info?");
   goto done;
  }
@@ -2117,7 +2117,7 @@
   PyErr_SetString(PyExc_MemoryError, "allocating SECURITY_DESCRIPTOR");
   goto done;
  }
- if (!GetFileSecurity(fname, info, psd, dwSize, &dwSize)) {
+ if (!GetFileSecurityW(fname, info, psd, dwSize, &dwSize)) {
   PyWin_SetAPIError("GetFileSecurity");
   goto done;
  }
@@ -2153,7 +2153,7 @@
  PSECURITY_DESCRIPTOR psd;
  if (!PyWinObject_AsSECURITY_DESCRIPTOR(obsd, &psd))
   goto done;
-	if (!SetFileSecurity(fname, info, psd)) {
+	if (!SetFileSecurityW(fname, info, psd)) {
   PyWin_SetAPIError("SetFileSecurity");
   goto done;
  }

Read more
mandel

At the moment some of the tests (and I cannot point out which ones) of ubuntuone-client fail when they are ran on Windows. The reason for this is due to the way in which we get the notifications out of the file system and the way the tests are written. Before I blame the OS or the tests, let me explain a number of facts about the Windows filesystem and the possible ways to interact with it.

To be able to get file system changes from the OS the Win32 API provides the following:

SHChangedNotifyRegister

This function was broken up to Vista when it was fixed, Unfortunately AFAIK we also support Windows XP which means that we cannot trust this function. On top of that taking this path means that we can have a performance issue. Because the function is build on top of Windows messages, if too many changes occur the sync daemon would start receiving roll up messages that just state that something changed and it would be up to the sync daemon to decide what really happened. Therefore we can all agree that this is a no no, right?

FindFirstChangeNotification

This is a really easy function to use which is based on ReadDirectoryChangesW (I think is a simple wrapper around it) that lets you know that something changed but gives no information about what changed. Because if is based on ReadDirectoryChangesW it suffers from the same issues.

ReadDirectoryChangesW

This is by far the most common way to get the notification changes from the system. Now, in theory there are two possible cases which can go wrong that would affect the events raised by this function:

  1. There are too many events and the buffer gets overloaded and we start loosing events. A simple way to solve this issues is to process the events in a diff thread asap so that we can keep reading the changes.
  2. We use the sync version of the function which means that we could have the following issues:
    • Blue screen of death because we used too much memory from the kernel space.
    • We cannot close the handles used to watch the changes in the directories. This makes the threads to end up blocked.

As I mentioned this is the theory and therefore makes perfect sense to choose this option as the way to get notified by the changes until… you hit a great little feature of Windows called write-behind caching. The idea of write-behind caching is the following one:

When you attempt to write a new file on your HD Windows does not directly modify the HD. Instead it makes a not of the fact that your intention is to write on disk and saves your changes in memory. Ins’t that smart?

Well, that lovely feature does come set as default AFAIK from XP onwards. Any smart person would wonder how does that interact with FindFirstChangeNotification/ReadDirectoryChangesW, well after some work here is what I have managed to find out:

The IO Manager (internal to the kernel) is queueing up disk-write requests in an internal buffer, and the actual changes are not physically committed until some condition is met which I believe is for the “write-behind caching” feature. The problem appears to be that the user-space callback via FileSystemWatcher/ReadDirectoryChanges does not occur when disk-write requests are inserted into the queue, but rather occurs when they are leaving the queue and being physically committed to disk. For what I have been able to manage through observation, the life time of a queue is based on:

  1. Whether more writes are being inserted in the q.
  2. Is another app request a read from an item in the q.

This means that when using FileSystemWatcher/ReadDirectoryChanges the events are fired only when the changes are actually committed and as for a user-space program this follows a non-deterministic process (insert spanish swearing here). a way to work around this issue is to use the FluxhFileBuffers function on the volume, which does need admin rights, yeah!

Change Journal records

Well, this allows to track the changes that have been committed in an NTFS system (that means that we do not have support to FAT). This technique allows to keep track of the changes using an update sequence number that keeps track of changes in an interesting manner. At first look, although parsing the data is hard, this solution seems to be very similar to the one used by pyinotify and therefore someone will say, hey, let just ell twisted to do a select on that file and read the changes. Well, no, is not that easy, files do not provide the functionality used for select, just sockets (http://msdn.microsoft.com/en-us/library/aa363803%28VS.85%29.aspx) /me jumps of happiness

File system filterr

Well, this is an easy one to summarize, you have to write a driver like piece of code. Means C, COM and being able to crash the entire system with a nice blue screen (although I can change the color to aubergine before we crash)

Conclusion

At this point I hope I have convinced a few to believe that ReadDirectoryChangesW is the best option to take but might be wondering why I mentioned the write-behind caching feature, well here comes my complain towards the tests. We do use the real file system notifications for testing and the trial test cases do have a timeout! Those two facts plus the lovely write-behind caching feature mean that the tests on Windows fail just because the bloody evens are not raise until the leave the q from the IO manager.

Read more
mandel

During the past few days I have been trying to track down an issue in the Ubuntu One client tests when ran on Windows that would use all the threads that the python process could have. As you can imaging finding out why there are deadlocks is quite hard, specially when I though that the code was thread safe, guess what? it wasn’t

The bug I had in the code was related to the way in which ReadDirectoryChangesW works. This functions has two different ways to be executed:

Synchronous

The ReadDirectoryChangesW can be executed in a sync mode by NOT providing a OVERLAPPED structure to perform the IO operations, for example:

def _watcherThread(self, dn, dh, changes):
        flags = win32con.FILE_NOTIFY_CHANGE_FILE_NAME
        while 1:
            try:
                print "waiting", dh
                changes = win32file.ReadDirectoryChangesW(dh,
                                                          8192,
                                                          False,
                                                          flags)
                print "got", changes
            except:
                raise
            changes.extend(changes)

The above example has the following two problems:

  • ReadDirectoryChangesW without an OVERLAPPED blocks infinitely.
  • If another thread attempts to close the handle while ReadDirectoryChangesW is waiting on it, the CloseHandle() method blocks (which has nothing to do with the GIL – it is correctly managed)

I got bitten in the ass by the second item which broke my tests in two different ways since it let thread block and a Handle used so that the rest of the tests could not remove the tmp directories that were under used by the block threads.

Asynchronous

In other to be able to use the async version of the function we just have to use an OVERLAPPED structure, this way the IO operations will no block and we will also be able to close the handle from a diff thread.

def _watcherThreadOverlapped(self, dn, dh, changes):
        flags = win32con.FILE_NOTIFY_CHANGE_FILE_NAME
        buf = win32file.AllocateReadBuffer(8192)
        overlapped = pywintypes.OVERLAPPED()
        overlapped.hEvent = win32event.CreateEvent(None, 0, 0, None)
        while 1:
            win32file.ReadDirectoryChangesW(dh,
                                            buf,
                                            False, #sub-tree
                                            flags,
                                            overlapped)
            # Wait for our event, or for 5 seconds.
            rc = win32event.WaitForSingleObject(overlapped.hEvent, 5000)
            if rc == win32event.WAIT_OBJECT_0:
                # got some data!  Must use GetOverlappedResult to find out
                # how much is valid!  0 generally means the handle has
                # been closed.  Blocking is OK here, as the event has
                # already been set.
                nbytes = win32file.GetOverlappedResult(dh, overlapped, True)
                if nbytes:
                    bits = win32file.FILE_NOTIFY_INFORMATION(buf, nbytes)
                    changes.extend(bits)
                else:
                    # This is "normal" exit - our 'tearDown' closes the
                    # handle.
                    # print "looks like dir handle was closed!"
                    return
            else:
                print "ERROR: Watcher thread timed-out!"
                return # kill the thread!

Using the ReadDirectoryW function in this way does solve all the other issues that are found on the sync version and the only extra overhead added is that you need to understand how to deal with COM events which is not that hard after you have worked with it for a little.

I leave this here for people that might find the same issue and for me to remember how much my ass hurt.

References

Read more
mandel

One of the things we wanted to achieve for the Windows port of Ubuntu One was to deploy to the users systems .exe files rather than requiring them to have python and all the different dependencies installed in their machine. There are different reasons we wanted to do this, but this post is not related to that. The goal of this post is to explain what to do when you are using py2exe and you depend on a package such as lazr.restfulclient.

Why lazr.restfulclient?

There are different reasons why I’m using lazr.restfulclient as an example:

  • It is a dependency we do have on Ubuntu One, and therefore I already have done the work with it.
  • It uses two features of setuptools that do not play well with py2exe:
    • It uses namespaced packages.
    • I uses pkg_resources to load resources used for the client.

Working around the use of namedspaced packages

This is actually a fairly easy thing to solve and it is well documented in the py2exe wiki, nevertheless I’d like to show it in this post so that the inclusion of the lazr.restfulclient is complete.

The main issue with namedspaced packages is that you have to tell the module finder from py2exe where to find those packages, which in our example are lazr.authentication, lazr.restfulclient and lazr.uri. A way to do that would be the following:

import lazr
try:
    import py2exe.mf as modulefinder
except ImportError:
    import modulefinder
 
for p in lazr.__path__:
        modulefinder.AddPackagePath(__name__, p)

Adding the lazr resources

This is a more problematic issue to solve since we have to work around a limitation found in py2exe. The lazr.restfulcient tries to load a resource from the py2exe library.zip but as the zipfile is reserved for compiled files, and therefore the module fails. In py2exe there is no way to state that those resource files have to be copied to the library.zip which would mean that an error is raised at runtime when trying to use the lib but not at build time.

The best way (if not the only one) to solve this is to extend the py2exe command to copy the resource files to the folders that are zipped before they are embedded, that way pkg_resource will be able to load the file with no problems.

import os
import glob
import lazr.restfulclient
from py2exe.build_exe import py2exe as build_exe
 
class LazrMediaCollector(build_exe):
    """Extension that copies lazr missing data."""
 
    def copy_extensions(self, extensions):
        """Copy the missing extensions."""
        build_exe.copy_extensions(self, extensions)
 
        # Create the media subdir where the
        # Python files are collected.
        media = os.path.join('lazr', 'restfulclient')
        full = os.path.join(self.collect_dir, media)
        if not os.path.exists(full):
            self.mkpath(full)
 
        # Copy the media files to the collection dir.
        # Also add the copied file to the list of compiled
        # files so it will be included in zipfile.
        for f in glob.glob(lazr.restfulclient.__path__[0] + '/*.txt'):
            name = os.path.basename(f)
            self.copy_file(f, os.path.join(full, name))
            self.compiled_files.append(os.path.join(media, name))

In order to use the above command class to perform the compilation you simply have to tell setup tools which command class to use.

cmdclass = {'py2exe' : LazrMediaCollector}

With the above done, you can use the usual ‘python setup.py install py2exe’. Now, the question for the Internet, can this be done with Pyinstaller?

Read more
mandel

Before I introduce the code, let me say that this is not a 100% exact implementation of the interfaces that can be found in pyinotify but the implementation of a subset that matches my needs. The main idea of creating this post is to give an example of the implementation of such a library for Windows trying to reuse the code that can be found in pyinotify.

Once I have excused my self, let get into the code. First of all, there are a number of classes from pyinotify that we can use in our code. That subset of classes is the below code which I grabbed from pyinotify git:

#!/usr/bin/env python
 
# pyinotify.py - python interface to inotify
# Copyright (c) 2010 Sebastien Martini <seb@dbzteam.org>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
"""Platform agnostic code grabed from pyinotify."""
import logging
import os
 
COMPATIBILITY_MODE = False
 
 
class RawOutputFormat:
    """
    Format string representations.
    """
    def __init__(self, format=None):
        self.format = format or {}
 
    def simple(self, s, attribute):
        if not isinstance(s, str):
            s = str(s)
        return (self.format.get(attribute, '') + s +
                self.format.get('normal', ''))
 
    def punctuation(self, s):
        """Punctuation color."""
        return self.simple(s, 'normal')
 
    def field_value(self, s):
        """Field value color."""
        return self.simple(s, 'purple')
 
    def field_name(self, s):
        """Field name color."""
        return self.simple(s, 'blue')
 
    def class_name(self, s):
        """Class name color."""
        return self.format.get('red', '') + self.simple(s, 'bold')
 
output_format = RawOutputFormat()
 
 
class EventsCodes:
    """
    Set of codes corresponding to each kind of events.
    Some of these flags are used to communicate with inotify, whereas
    the others are sent to userspace by inotify notifying some events.
 
    @cvar IN_ACCESS: File was accessed.
    @type IN_ACCESS: int
    @cvar IN_MODIFY: File was modified.
    @type IN_MODIFY: int
    @cvar IN_ATTRIB: Metadata changed.
    @type IN_ATTRIB: int
    @cvar IN_CLOSE_WRITE: Writtable file was closed.
    @type IN_CLOSE_WRITE: int
    @cvar IN_CLOSE_NOWRITE: Unwrittable file closed.
    @type IN_CLOSE_NOWRITE: int
    @cvar IN_OPEN: File was opened.
    @type IN_OPEN: int
    @cvar IN_MOVED_FROM: File was moved from X.
    @type IN_MOVED_FROM: int
    @cvar IN_MOVED_TO: File was moved to Y.
    @type IN_MOVED_TO: int
    @cvar IN_CREATE: Subfile was created.
    @type IN_CREATE: int
    @cvar IN_DELETE: Subfile was deleted.
    @type IN_DELETE: int
    @cvar IN_DELETE_SELF: Self (watched item itself) was deleted.
    @type IN_DELETE_SELF: int
    @cvar IN_MOVE_SELF: Self (watched item itself) was moved.
    @type IN_MOVE_SELF: int
    @cvar IN_UNMOUNT: Backing fs was unmounted.
    @type IN_UNMOUNT: int
    @cvar IN_Q_OVERFLOW: Event queued overflowed.
    @type IN_Q_OVERFLOW: int
    @cvar IN_IGNORED: File was ignored.
    @type IN_IGNORED: int
    @cvar IN_ONLYDIR: only watch the path if it is a directory (new
                      in kernel 2.6.15).
    @type IN_ONLYDIR: int
    @cvar IN_DONT_FOLLOW: don't follow a symlink (new in kernel 2.6.15).
                          IN_ONLYDIR we can make sure that we don't watch
                          the target of symlinks.
    @type IN_DONT_FOLLOW: int
    @cvar IN_MASK_ADD: add to the mask of an already existing watch (new
                       in kernel 2.6.14).
    @type IN_MASK_ADD: int
    @cvar IN_ISDIR: Event occurred against dir.
    @type IN_ISDIR: int
    @cvar IN_ONESHOT: Only send event once.
    @type IN_ONESHOT: int
    @cvar ALL_EVENTS: Alias for considering all of the events.
    @type ALL_EVENTS: int
    """
 
    # The idea here is 'configuration-as-code' - this way, we get
    # our nice class constants, but we also get nice human-friendly text
    # mappings to do lookups against as well, for free:
    FLAG_COLLECTIONS = {'OP_FLAGS': {
        'IN_ACCESS'        : 0x00000001,  # File was accessed
        'IN_MODIFY'        : 0x00000002,  # File was modified
        'IN_ATTRIB'        : 0x00000004,  # Metadata changed
        'IN_CLOSE_WRITE'   : 0x00000008,  # Writable file was closed
        'IN_CLOSE_NOWRITE' : 0x00000010,  # Unwritable file closed
        'IN_OPEN'          : 0x00000020,  # File was opened
        'IN_MOVED_FROM'    : 0x00000040,  # File was moved from X
        'IN_MOVED_TO'      : 0x00000080,  # File was moved to Y
        'IN_CREATE'        : 0x00000100,  # Subfile was created
        'IN_DELETE'        : 0x00000200,  # Subfile was deleted
        'IN_DELETE_SELF'   : 0x00000400,  # Self (watched item itself)
                                          # was deleted
        'IN_MOVE_SELF'     : 0x00000800,  # Self(watched item itself) was moved
        },
                        'EVENT_FLAGS': {
        'IN_UNMOUNT'       : 0x00002000,  # Backing fs was unmounted
        'IN_Q_OVERFLOW'    : 0x00004000,  # Event queued overflowed
        'IN_IGNORED'       : 0x00008000,  # File was ignored
        },
                        'SPECIAL_FLAGS': {
        'IN_ONLYDIR'       : 0x01000000,  # only watch the path if it is a
                                          # directory
        'IN_DONT_FOLLOW'   : 0x02000000,  # don't follow a symlink
        'IN_MASK_ADD'      : 0x20000000,  # add to the mask of an already
                                          # existing watch
        'IN_ISDIR'         : 0x40000000,  # event occurred against dir
        'IN_ONESHOT'       : 0x80000000,  # only send event once
        },
                        }
 
    def maskname(mask):
        """
        Returns the event name associated to mask. IN_ISDIR is appended to
        the result when appropriate. Note: only one event is returned, because
        only one event can be raised at a given time.
 
        @param mask: mask.
        @type mask: int
        @return: event name.
        @rtype: str
        """
        ms = mask
        name = '%s'
        if mask & IN_ISDIR:
            ms = mask - IN_ISDIR
            name = '%s|IN_ISDIR'
        return name % EventsCodes.ALL_VALUES[ms]
 
    maskname = staticmethod(maskname)
 
 
# So let's now turn the configuration into code
EventsCodes.ALL_FLAGS = {}
EventsCodes.ALL_VALUES = {}
for flagc, valc in EventsCodes.FLAG_COLLECTIONS.items():
    # Make the collections' members directly accessible through the
    # class dictionary
    setattr(EventsCodes, flagc, valc)
 
    # Collect all the flags under a common umbrella
    EventsCodes.ALL_FLAGS.update(valc)
 
    # Make the individual masks accessible as 'constants' at globals() scope
    # and masknames accessible by values.
    for name, val in valc.items():
        globals()[name] = val
        EventsCodes.ALL_VALUES[val] = name
 
 
# all 'normal' events
ALL_EVENTS = reduce(lambda x, y: x | y, EventsCodes.OP_FLAGS.values())
EventsCodes.ALL_FLAGS['ALL_EVENTS'] = ALL_EVENTS
EventsCodes.ALL_VALUES[ALL_EVENTS] = 'ALL_EVENTS'
 
 
class _Event:
    """
    Event structure, represent events raised by the system. This
    is the base class and should be subclassed.
 
    """
    def __init__(self, dict_):
        """
        Attach attributes (contained in dict_) to self.
 
        @param dict_: Set of attributes.
        @type dict_: dictionary
        """
        for tpl in dict_.items():
            setattr(self, *tpl)
 
    def __repr__(self):
        """
        @return: Generic event string representation.
        @rtype: str
        """
        s = ''
        for attr, value in sorted(self.__dict__.items(), key=lambda x: x[0]):
            if attr.startswith('_'):
                continue
            if attr == 'mask':
                value = hex(getattr(self, attr))
            elif isinstance(value, basestring) and not value:
                value = "''"
            s += ' %s%s%s' % (output_format.field_name(attr),
                              output_format.punctuation('='),
                              output_format.field_value(value))
 
        s = '%s%s%s %s' % (output_format.punctuation('<'),
                           output_format.class_name(self.__class__.__name__),
                           s,
                           output_format.punctuation('>'))
        return s
 
    def __str__(self):
        return repr(self)
 
 
class _RawEvent(_Event):
    """
    Raw event, it contains only the informations provided by the system.
    It doesn't infer anything.
    """
    def __init__(self, wd, mask, cookie, name):
        """
        @param wd: Watch Descriptor.
        @type wd: int
        @param mask: Bitmask of events.
        @type mask: int
        @param cookie: Cookie.
        @type cookie: int
        @param name: Basename of the file or directory against which the
                     event was raised in case where the watched directory
                     is the parent directory. None if the event was raised
                     on the watched item itself.
        @type name: string or None
        """
        # Use this variable to cache the result of str(self), this object
        # is immutable.
        self._str = None
        # name: remove trailing '\0'
        d = {'wd': wd,
             'mask': mask,
             'cookie': cookie,
             'name': name.rstrip('\0')}
        _Event.__init__(self, d)
        logging.debug(str(self))
 
    def __str__(self):
        if self._str is None:
            self._str = _Event.__str__(self)
        return self._str
 
 
class Event(_Event):
    """
    This class contains all the useful informations about the observed
    event. However, the presence of each field is not guaranteed and
    depends on the type of event. In effect, some fields are irrelevant
    for some kind of event (for example 'cookie' is meaningless for
    IN_CREATE whereas it is mandatory for IN_MOVE_TO).
 
    The possible fields are:
      - wd (int): Watch Descriptor.
      - mask (int): Mask.
      - maskname (str): Readable event name.
      - path (str): path of the file or directory being watched.
      - name (str): Basename of the file or directory against which the
              event was raised in case where the watched directory
              is the parent directory. None if the event was raised
              on the watched item itself. This field is always provided
              even if the string is ''.
      - pathname (str): Concatenation of 'path' and 'name'.
      - src_pathname (str): Only present for IN_MOVED_TO events and only in
              the case where IN_MOVED_FROM events are watched too. Holds the
              source pathname from where pathname was moved from.
      - cookie (int): Cookie.
      - dir (bool): True if the event was raised against a directory.
 
    """
    def __init__(self, raw):
        """
        Concretely, this is the raw event plus inferred infos.
        """
        _Event.__init__(self, raw)
        self.maskname = EventsCodes.maskname(self.mask)
        if COMPATIBILITY_MODE:
            self.event_name = self.maskname
        try:
            if self.name:
                self.pathname = os.path.abspath(os.path.join(self.path,
                                                             self.name))
            else:
                self.pathname = os.path.abspath(self.path)
        except AttributeError, err:
            # Usually it is not an error some events are perfectly valids
            # despite the lack of these attributes.
            logging.debug(err)
 
 
class _ProcessEvent:
    """
    Abstract processing event class.
    """
    def __call__(self, event):
        """
        To behave like a functor the object must be callable.
        This method is a dispatch method. Its lookup order is:
          1. process_MASKNAME method
          2. process_FAMILY_NAME method
          3. otherwise calls process_default
 
        @param event: Event to be processed.
        @type event: Event object
        @return: By convention when used from the ProcessEvent class:
                 - Returning False or None (default value) means keep on
                 executing next chained functors (see chain.py example).
                 - Returning True instead means do not execute next
                   processing functions.
        @rtype: bool
        @raise ProcessEventError: Event object undispatchable,
                                  unknown event.
        """
        stripped_mask = event.mask - (event.mask & IN_ISDIR)
        maskname = EventsCodes.ALL_VALUES.get(stripped_mask)
        if maskname is None:
            raise ProcessEventError("Unknown mask 0x%08x" % stripped_mask)
 
        # 1- look for process_MASKNAME
        meth = getattr(self, 'process_' + maskname, None)
        if meth is not None:
            return meth(event)
        # 2- look for process_FAMILY_NAME
        meth = getattr(self, 'process_IN_' + maskname.split('_')[1], None)
        if meth is not None:
            return meth(event)
        # 3- default call method process_default
        return self.process_default(event)
 
    def __repr__(self):
        return '<%s>' % self.__class__.__name__
 
 
class ProcessEvent(_ProcessEvent):
    """
    Process events objects, can be specialized via subclassing, thus its
    behavior can be overriden:
 
    Note: you should not override __init__ in your subclass instead define
    a my_init() method, this method will be called automatically from the
    constructor of this class with its optionals parameters.
 
      1. Provide specialized individual methods, e.g. process_IN_DELETE for
         processing a precise type of event (e.g. IN_DELETE in this case).
      2. Or/and provide methods for processing events by 'family', e.g.
         process_IN_CLOSE method will process both IN_CLOSE_WRITE and
         IN_CLOSE_NOWRITE events (if process_IN_CLOSE_WRITE and
         process_IN_CLOSE_NOWRITE aren't defined though).
      3. Or/and override process_default for catching and processing all
         the remaining types of events.
    """
    pevent = None
 
    def __init__(self, pevent=None, **kargs):
        """
        Enable chaining of ProcessEvent instances.
 
        @param pevent: Optional callable object, will be called on event
                       processing (before self).
        @type pevent: callable
        @param kargs: This constructor is implemented as a template method
                      delegating its optionals keyworded arguments to the
                      method my_init().
        @type kargs: dict
        """
        self.pevent = pevent
        self.my_init(**kargs)
 
    def my_init(self, **kargs):
        """
        This method is called from ProcessEvent.__init__(). This method is
        empty here and must be redefined to be useful. In effect, if you
        need to specifically initialize your subclass' instance then you
        just have to override this method in your subclass. Then all the
        keyworded arguments passed to ProcessEvent.__init__() will be
        transmitted as parameters to this method. Beware you MUST pass
        keyword arguments though.
 
        @param kargs: optional delegated arguments from __init__().
        @type kargs: dict
        """
        pass
 
    def __call__(self, event):
        stop_chaining = False
        if self.pevent is not None:
            # By default methods return None so we set as guideline
            # that methods asking for stop chaining must explicitely
            # return non None or non False values, otherwise the default
            # behavior will be to accept chain call to the corresponding
            # local method.
            stop_chaining = self.pevent(event)
        if not stop_chaining:
            return _ProcessEvent.__call__(self, event)
 
    def nested_pevent(self):
        return self.pevent
 
    def process_IN_Q_OVERFLOW(self, event):
        """
        By default this method only reports warning messages, you can
        overredide it by subclassing ProcessEvent and implement your own
        process_IN_Q_OVERFLOW method. The actions you can take on receiving
        this event is either to update the variable max_queued_events in order
        to handle more simultaneous events or to modify your code in order to
        accomplish a better filtering diminishing the number of raised events.
        Because this method is defined, IN_Q_OVERFLOW will never get
        transmitted as arguments to process_default calls.
 
        @param event: IN_Q_OVERFLOW event.
        @type event: dict
        """
        log.warning('Event queue overflowed.')
 
    def process_default(self, event):
        """
        Default processing event method. By default does nothing. Subclass
        ProcessEvent and redefine this method in order to modify its behavior.
 
        @param event: Event to be processed. Can be of any type of events but
                      IN_Q_OVERFLOW events (see method process_IN_Q_OVERFLOW).
        @type event: Event instance
        """
        pass
 
 
class PrintAllEvents(ProcessEvent):
    """
    Dummy class used to print events strings representations. For instance this
    class is used from command line to print all received events to stdout.
    """
    def my_init(self, out=None):
        """
        @param out: Where events will be written.
        @type out: Object providing a valid file object interface.
        """
        if out is None:
            out = sys.stdout
        self._out = out
 
    def process_default(self, event):
        """
        Writes event string representation to file object provided to
        my_init().
 
        @param event: Event to be processed. Can be of any type of events but
                      IN_Q_OVERFLOW events (see method process_IN_Q_OVERFLOW).
        @type event: Event instance
        """
        self._out.write(str(event))
        self._out.write('\n')
        self._out.flush()
 
 
class WatchManagerError(Exception):
    """
    WatchManager Exception. Raised on error encountered on watches
    operations.
 
    """
    def __init__(self, msg, wmd):
        """
        @param msg: Exception string's description.
        @type msg: string
        @param wmd: This dictionary contains the wd assigned to paths of the
                    same call for which watches were successfully added.
        @type wmd: dict
        """
        self.wmd = wmd
        Exception.__init__(self, msg)

Unfortunatly we need to implement the code that talks with the Win32 API to be able to retrieve the events in the file system. In my design this is done by the Watch class that looks like this:

# Author: Manuel de la Pena <manuel@canonical.com>
#
# Copyright 2011 Canonical Ltd.
#
# This program is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License version 3, as published
# by the Free Software Foundation.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranties of
# MERCHANTABILITY, SATISFACTORY QUALITY, or FITNESS FOR A PARTICULAR
# PURPOSE.  See the GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program.  If not, see <http://www.gnu.org/licenses/>.
"""File notifications on windows."""
 
import logging
import os
import re
 
import winerror
 
from Queue import Queue, Empty
from threading import Thread
from uuid import uuid4
from twisted.internet import task, reactor
from win32con import (
    FILE_SHARE_READ,
    FILE_SHARE_WRITE,
    FILE_FLAG_BACKUP_SEMANTICS,
    FILE_NOTIFY_CHANGE_FILE_NAME,
    FILE_NOTIFY_CHANGE_DIR_NAME,
    FILE_NOTIFY_CHANGE_ATTRIBUTES,
    FILE_NOTIFY_CHANGE_SIZE,
    FILE_NOTIFY_CHANGE_LAST_WRITE,
    FILE_NOTIFY_CHANGE_SECURITY,
    OPEN_EXISTING
)
from win32file import CreateFile, ReadDirectoryChangesW
from ubuntuone.platform.windows.pyinotify import (
    Event,
    WatchManagerError,
    ProcessEvent,
    PrintAllEvents,
    IN_OPEN,
    IN_CLOSE_NOWRITE,
    IN_CLOSE_WRITE,
    IN_CREATE,
    IN_ISDIR,
    IN_DELETE,
    IN_MOVED_FROM,
    IN_MOVED_TO,
    IN_MODIFY,
    IN_IGNORED
)
from ubuntuone.syncdaemon.filesystem_notifications import (
    GeneralINotifyProcessor
)
from ubuntuone.platform.windows.os_helper import (
    LONG_PATH_PREFIX,
    abspath,
    listdir
)
 
# constant found in the msdn documentation:
# http://msdn.microsoft.com/en-us/library/ff538834(v=vs.85).aspx
FILE_LIST_DIRECTORY = 0x0001
FILE_NOTIFY_CHANGE_LAST_ACCESS = 0x00000020
FILE_NOTIFY_CHANGE_CREATION = 0x00000040
 
# a map between the few events that we have on windows and those
# found in pyinotify
WINDOWS_ACTIONS = {
  1: IN_CREATE,
  2: IN_DELETE,
  3: IN_MODIFY,
  4: IN_MOVED_FROM,
  5: IN_MOVED_TO
}
 
# translates quickly the event and it's is_dir state to our standard events
NAME_TRANSLATIONS = {
    IN_OPEN: 'FS_FILE_OPEN',
    IN_CLOSE_NOWRITE: 'FS_FILE_CLOSE_NOWRITE',
    IN_CLOSE_WRITE: 'FS_FILE_CLOSE_WRITE',
    IN_CREATE: 'FS_FILE_CREATE',
    IN_CREATE | IN_ISDIR: 'FS_DIR_CREATE',
    IN_DELETE: 'FS_FILE_DELETE',
    IN_DELETE | IN_ISDIR: 'FS_DIR_DELETE',
    IN_MOVED_FROM: 'FS_FILE_DELETE',
    IN_MOVED_FROM | IN_ISDIR: 'FS_DIR_DELETE',
    IN_MOVED_TO: 'FS_FILE_CREATE',
    IN_MOVED_TO | IN_ISDIR: 'FS_DIR_CREATE',
}
 
# the default mask to be used in the watches added by the FilesystemMonitor
# class
FILESYSTEM_MONITOR_MASK = FILE_NOTIFY_CHANGE_FILE_NAME | \
    FILE_NOTIFY_CHANGE_DIR_NAME | \
    FILE_NOTIFY_CHANGE_ATTRIBUTES | \
    FILE_NOTIFY_CHANGE_SIZE | \
    FILE_NOTIFY_CHANGE_LAST_WRITE | \
    FILE_NOTIFY_CHANGE_SECURITY | \
    FILE_NOTIFY_CHANGE_LAST_ACCESS
 
 
# The implementation of the code that is provided as the pyinotify
# substitute
class Watch(object):
    """Implement the same functions as pyinotify.Watch."""
 
    def __init__(self, watch_descriptor, path, mask, auto_add,
        events_queue=None, exclude_filter=None, proc_fun=None):
        super(Watch, self).__init__()
        self.log = logging.getLogger('ubuntuone.platform.windows.' +
            'filesystem_notifications.Watch')
        self._watching = False
        self._descriptor = watch_descriptor
        self._auto_add = auto_add
        self.exclude_filter = None
        self._proc_fun = proc_fun
        self._cookie = None
        self._source_pathname = None
        # remember the subdirs we have so that when we have a delete we can
        # check if it was a remove
        self._subdirs = []
        # ensure that we work with an abspath and that we can deal with
        # long paths over 260 chars.
        self._path = os.path.abspath(path)
        if not self._path.startswith(LONG_PATH_PREFIX):
            self._path = LONG_PATH_PREFIX + self._path
        self._mask = mask
        # lets make the q as big as possible
        self._raw_events_queue = Queue()
        if not events_queue:
            events_queue = Queue()
        self.events_queue = events_queue
 
    def _path_is_dir(self, path):
        """"Check if the path is a dir and update the local subdir list."""
        self.log.debug('Testing if path "%s" is a dir', path)
        is_dir = False
        if os.path.exists(path):
            is_dir = os.path.isdir(path)
        else:
            self.log.debug('Path "%s" was deleted subdirs are %s.',
                path, self._subdirs)
            # we removed the path, we look in the internal list
            if path in self._subdirs:
                is_dir = True
                self._subdirs.remove(path)
        if is_dir:
            self.log.debug('Adding %s to subdirs %s', path, self._subdirs)
            self._subdirs.append(path)
        return is_dir
 
    def _process_events(self):
        """Process the events form the queue."""
        # we transform the events to be the same as the one in pyinotify
        # and then use the proc_fun
        while self._watching or not self._raw_events_queue.empty():
            file_name, action = self._raw_events_queue.get()
            # map the windows events to the pyinotify ones, tis is dirty but
            # makes the multiplatform better, linux was first :P
            is_dir = self._path_is_dir(file_name)
            if os.path.exists(file_name):
                is_dir = os.path.isdir(file_name)
            else:
                # we removed the path, we look in the internal list
                if file_name in self._subdirs:
                    is_dir = True
                    self._subdirs.remove(file_name)
            if is_dir:
                self._subdirs.append(file_name)
            mask = WINDOWS_ACTIONS[action]
            head, tail = os.path.split(file_name)
            if is_dir:
                mask |= IN_ISDIR
            event_raw_data = {
                'wd': self._descriptor,
                'dir': is_dir,
                'mask': mask,
                'name': tail,
                'path': head.replace(self.path, '.')
            }
            # by the way in which the win api fires the events we know for
            # sure that no move events will be added in the wrong order, this
            # is kind of hacky, I dont like it too much
            if WINDOWS_ACTIONS[action] == IN_MOVED_FROM:
                self._cookie = str(uuid4())
                self._source_pathname = tail
                event_raw_data['cookie'] = self._cookie
            if WINDOWS_ACTIONS[action] == IN_MOVED_TO:
                event_raw_data['src_pathname'] = self._source_pathname
                event_raw_data['cookie'] = self._cookie
            event = Event(event_raw_data)
            # FIXME: event deduces the pathname wrong and we need manually
            # set it
            event.pathname = file_name
            # add the event only if we do not have an exclude filter or
            # the exclude filter returns False, that is, the event will not
            # be excluded
            if not self.exclude_filter or not self.exclude_filter(event):
                self.log.debug('Addding event %s to queue.', event)
                self.events_queue.put(event)
 
    def _watch(self):
        """Watch a path that is a directory."""
        # we are going to be using the ReadDirectoryChangesW whihc requires
        # a direcotry handle and the mask to be used.
        handle = CreateFile(
            self._path,
            FILE_LIST_DIRECTORY,
            FILE_SHARE_READ | FILE_SHARE_WRITE,
            None,
            OPEN_EXISTING,
            FILE_FLAG_BACKUP_SEMANTICS,
            None
        )
        self.log.debug('Watchng path %s.', self._path)
        while self._watching:
            # important information to know about the parameters:
            # param 1: the handle to the dir
            # param 2: the size to be used in the kernel to store events
            # that might be lost whilw the call is being performed. This
            # is complicates to fine tune since if you make lots of watcher
            # you migh used to much memory and make your OS to BSOD
            results = ReadDirectoryChangesW(
                handle,
                1024,
                self._auto_add,
                self._mask,
                None,
                None
            )
            # add the diff events to the q so that the can be processed no
            # matter the speed.
            for action, file in results:
                full_filename = os.path.join(self._path, file)
                self._raw_events_queue.put((full_filename, action))
                self.log.debug('Added %s to raw events queue.',
                    (full_filename, action))
 
    def start_watching(self):
        """Tell the watch to start processing events."""
        # get the diff dirs in the path
        for current_child in listdir(self._path):
            full_child_path = os.path.join(self._path, current_child)
            if os.path.isdir(full_child_path):
                self._subdirs.append(full_child_path)
        # start to diff threads, one to watch the path, the other to
        # process the events.
        self.log.debug('Sart watching path.')
        self._watching = True
        watch_thread = Thread(target=self._watch,
            name='Watch(%s)' % self._path)
        process_thread = Thread(target=self._process_events,
            name='Process(%s)' % self._path)
        process_thread.start()
        watch_thread.start()
 
    def stop_watching(self):
        """Tell the watch to stop processing events."""
        self._watching = False
        self._subdirs = []
 
    def update(self, mask, proc_fun=None, auto_add=False):
        """Update the info used by the watcher."""
        self.log.debug('update(%s, %s, %s)', mask, proc_fun, auto_add)
        self._mask = mask
        self._proc_fun = proc_fun
        self._auto_add = auto_add
 
    @property
    def path(self):
        """Return the patch watched."""
        return self._path
 
    @property
    def auto_add(self):
        return self._auto_add
 
    @property
    def proc_fun(self):
        return self._proc_fun
 
 
class WatchManager(object):
    """Implement the same functions as pyinotify.WatchManager."""
 
    def __init__(self, exclude_filter=lambda path: False):
        """Init the manager to keep trak of the different watches."""
        super(WatchManager, self).__init__()
        self.log = logging.getLogger('ubuntuone.platform.windows.'
            + 'filesystem_notifications.WatchManager')
        self._wdm = {}
        self._wd_count = 0
        self._exclude_filter = exclude_filter
        self._events_queue = Queue()
        self._ignored_paths = []
 
    def stop(self):
        """Close the manager and stop all watches."""
        self.log.debug('Stopping watches.')
        for current_wd in self._wdm:
            self._wdm[current_wd].stop_watching()
            self.log.debug('Watch for %s stopped.', self._wdm[current_wd].path)
 
    def get_watch(self, wd):
        """Return the watch with the given descriptor."""
        return self._wdm[wd]
 
    def del_watch(self, wd):
        """Delete the watch with the given descriptor."""
        try:
            watch = self._wdm[wd]
            watch.stop_watching()
            del self._wdm[wd]
            self.log.debug('Watch %s removed.', wd)
        except KeyError, e:
            logging.error(str(e))
 
    def _add_single_watch(self, path, mask, proc_fun=None, auto_add=False,
        quiet=True, exclude_filter=None):
        self.log.debug('add_single_watch(%s, %s, %s, %s, %s, %s)', path, mask,
            proc_fun, auto_add, quiet, exclude_filter)
        self._wdm[self._wd_count] = Watch(self._wd_count, path, mask,
            auto_add, events_queue=self._events_queue,
            exclude_filter=exclude_filter, proc_fun=proc_fun)
        self._wdm[self._wd_count].start_watching()
        self._wd_count += 1
        self.log.debug('Watch count increased to %s', self._wd_count)
 
    def add_watch(self, path, mask, proc_fun=None, auto_add=False,
        quiet=True, exclude_filter=None):
        if hasattr(path, '__iter__'):
            self.log.debug('Added collection of watches.')
            # we are dealing with a collection of paths
            for current_path in path:
                if not self.get_wd(current_path):
                    self._add_single_watch(current_path, mask, proc_fun,
                        auto_add, quiet, exclude_filter)
        elif not self.get_wd(path):
            self.log.debug('Adding single watch.')
            self._add_single_watch(path, mask, proc_fun, auto_add,
                quiet, exclude_filter)
 
    def update_watch(self, wd, mask=None, proc_fun=None, rec=False,
                     auto_add=False, quiet=True):
        try:
            watch = self._wdm[wd]
            watch.stop_watching()
            self.log.debug('Stopped watch on %s for update.', watch.path)
            # update the data and restart watching
            auto_add = auto_add or rec
            watch.update(mask, proc_fun=proc_fun, auto_add=auto_add)
            # only start the watcher again if the mask was given, otherwhise
            # we are not watchng and therefore do not care
            if mask:
                watch.start_watching()
        except KeyError, e:
            self.log.error(str(e))
            if not quiet:
                raise WatchManagerError('Watch %s was not found' % wd, {})
 
    def get_wd(self, path):
        """Return the watcher that is used to watch the given path."""
        for current_wd in self._wdm:
            if self._wdm[current_wd].path in path and \
                self._wdm[current_wd].auto_add:
                return current_wd
 
    def get_path(self, wd):
        """Return the path watched by the wath with the given wd."""
        watch_ = self._wmd.get(wd)
        if watch:
            return watch.path
 
    def rm_watch(self, wd, rec=False, quiet=True):
        """Remove the the watch with the given wd."""
        try:
            watch = self._wdm[wd]
            watch.stop_watching()
            del self._wdm[wd]
        except KeyrError, err:
            self.log.error(str(err))
            if not quiet:
                raise WatchManagerError('Watch %s was not found' % wd, {})
 
    def rm_path(self, path):
        """Remove a watch to the given path."""
        # it would be very tricky to remove a subpath from a watcher that is
        # looking at changes in ther kids. To make it simpler and less error
        # prone (and even better performant since we use less threads) we will
        # add a filter to the events in the watcher so that the events from
        # that child are not received :)
        def ignore_path(event):
            """Ignore an event if it has a given path."""
            is_ignored = False
            for ignored_path in self._ignored_paths:
                if ignore_path in event.pathname:
                    return True
            return False
 
        wd = self.get_wd(path)
        if wd:
            if self._wdm[wd].path == path:
                self.log.debug('Removing watch for path "%s"', path)
                self.rm_watch(wd)
            else:
                self.log.debug('Adding exclude filter for "%s"', path)
                # we have a watch that cotains the path as a child path
                if not path in self._ignored_paths:
                    self._ignored_paths.append(path)
                # FIXME: This assumes that we do not have other function
                # which in our usecase is correct, but what is we move this
                # to other projects evet?!? Maybe using the manager
                # exclude_filter is better
                if not self._wdm[wd].exclude_filter:
                    self._wdm[wd].exclude_filter = ignore_path
 
    @property
    def watches(self):
        """Return a reference to the dictionary that contains the watches."""
        return self._wdm
 
    @property
    def events_queue(self):
        """Return the queue with the events that the manager contains."""
        return self._events_queue
 
 
class Notifier(object):
    """
    Read notifications, process events. Inspired by the pyinotify.Notifier
    """
 
    def __init__(self, watch_manager, default_proc_fun=None, read_freq=0,
                 threshold=10, timeout=-1):
        """Init to process event according to the given timeout & threshold."""
        super(Notifier, self).__init__()
        self.log = logging.getLogger('ubuntuone.platform.windows.'
            + 'filesystem_notifications.Notifier')
        # Watch Manager instance
        self._watch_manager = watch_manager
        # Default processing method
        self._default_proc_fun = default_proc_fun
        if default_proc_fun is None:
            self._default_proc_fun = PrintAllEvents()
        # Loop parameters
        self._read_freq = read_freq
        self._threshold = threshold
        self._timeout = timeout
 
    def proc_fun(self):
        return self._default_proc_fun
 
    def process_events(self):
        """
        Process the event given the threshold and the timeout.
        """
        self.log.debug('Processing events with threashold: %s and timeout: %s',
            self._threshold, self._timeout)
        # we will process an amount of events equal to the threshold of
        # the notifier and will block for the amount given by the timeout
        processed_events = 0
        while processed_events < self._threshold:
            try:
                raw_event = None
                if not self._timeout or self._timeout < 0:
                    raw_event = self._watch_manager.events_queue.get(
                        block=False)
                else:
                    raw_event = self._watch_manager.events_queue.get(
                        timeout=self._timeout)
                watch = self._watch_manager.get_watch(raw_event.wd)
                if watch is None:
                    # Not really sure how we ended up here, nor how we should
                    # handle these types of events and if it is appropriate to
                    # completly skip them (like we are doing here).
                    self.log.warning('Unable to retrieve Watch object '
                        + 'associated to %s', raw_event)
                    processed_events += 1
                    continue
                if watch and watch.proc_fun:
                    self.log.debug('Executing proc_fun from watch.')
                    watch.proc_fun(raw_event)  # user processings
                else:
                    self.log.debug('Executing default_proc_fun')
                    self._default_proc_fun(raw_event)
                processed_events += 1
            except Empty:
                # increase the number of processed events, and continue
                processed_events += 1
                continue
 
    def stop(self):
        """Stop processing events and the watch manager."""
        self._watch_manager.stop()

While one of the threads is retrieving the events from the file system, the second one process them so that the will be exposed as pyinotify events. I have done so because I did not want to deal with OVERLAP structures for asyn operations in Win32 and because I wanted to use pyinotify events so that if someone with experience in pyinotify looks at the output, he can easily understand it. I really like this approach because it allowed me to reuse a fair amount of logic hat we had in the Ubuntu client and to approach the port in a very TDD way since the tests I’ve used are the same ones as the ones found on Ubuntu :)

Read more
mandel

Yet again Windows has presented me a challenge when trying to work with its file system, this time in the form of lock files. The Ubuntu One client on linux uses pyinotify to be able to listen to the file system events this, for example, allows the daemon to be updating your files when a new version has been created without the direct intervention of the user.

Although Windows does not have pyinotify (for obvious reasons) a developer that wants to perform such a directory monitoring can rely on the ReadDirectoryChangesW function. This function provides a similar behavior but unfortunately the information it provides is limited when compared with the one from pyinotify. On one hand, there are less events you can listen on Windows (IN_OPEN and IN_CLOSE for example are not present) but it also provides very little information by just giving 5 actions back, that is while on Windows you can listen to:

  • FILE_NOTIFY_CHANGE_FILE_NAME
  • FILE_NOTIFY_CHANGE_DIR_NAME
  • FILE_NOTIFY_CHANGE_ATTRIBUTES
  • FILE_NOTIFY_CHANGE_SIZE
  • FILE_NOTIFY_CHANGE_LAST_WRITE
  • FILE_NOTIFY_CHANGE_LAST_ACCESS
  • FILE_NOTIFY_CHANGE_CREATION
  • FILE_NOTIFY_CHANGE_SECURITY

You will only get back 5 values which are integers that represent the action that was performed. YesterdayI decide to see if it was possible to query the Windows Object Manager to see the currently used FILE HANDLES which would returned the open files. My idea was to write such a function and the pool (ouch!) to find when a file was opened or close. The result of such an attempt is the following:

import os
import struct
 
import winerror
import win32file
import win32con
 
from ctypes import *
from ctypes.wintypes import *
from Queue import Queue
from threading import Thread
from win32api import GetCurrentProcess, OpenProcess, DuplicateHandle
from win32api import error as ApiError
from win32con import (
    FILE_SHARE_READ,
    FILE_SHARE_WRITE,
    OPEN_EXISTING,
    FILE_FLAG_BACKUP_SEMANTICS,
    FILE_NOTIFY_CHANGE_FILE_NAME,
    FILE_NOTIFY_CHANGE_DIR_NAME,
    FILE_NOTIFY_CHANGE_ATTRIBUTES,
    FILE_NOTIFY_CHANGE_SIZE,
    FILE_NOTIFY_CHANGE_LAST_WRITE,
    FILE_NOTIFY_CHANGE_SECURITY,
    DUPLICATE_SAME_ACCESS
)
from win32event import WaitForSingleObject, WAIT_TIMEOUT, WAIT_ABANDONED
from win32event import error as EventError
from win32file import CreateFile, ReadDirectoryChangesW, CloseHandle
from win32file import error as FileError
 
# from ubuntuone.platform.windows.os_helper import LONG_PATH_PREFIX, abspath
 
LONG_PATH_PREFIX = '\\\\?\\'
# constant found in the msdn documentation:
# http://msdn.microsoft.com/en-us/library/ff538834(v=vs.85).aspx
FILE_LIST_DIRECTORY = 0x0001
FILE_NOTIFY_CHANGE_LAST_ACCESS = 0x00000020
FILE_NOTIFY_CHANGE_CREATION = 0x00000040
 
# XXX: the following code is some kind of hack that allows to get the opened
# files in a system. The techinique uses an no documented API from windows nt
# that is internal to MS and might change in the future braking our code :(
UCHAR = c_ubyte
PVOID = c_void_p
 
ntdll = windll.ntdll
 
SystemHandleInformation = 16
STATUS_INFO_LENGTH_MISMATCH = 0xC0000004
STATUS_BUFFER_OVERFLOW = 0x80000005L
STATUS_INVALID_HANDLE = 0xC0000008L
STATUS_BUFFER_TOO_SMALL = 0xC0000023L
STATUS_SUCCESS = 0
 
CURRENT_PROCESS = GetCurrentProcess ()
DEVICE_DRIVES = {}
for d in "abcdefghijklmnopqrstuvwxyz":
    try:
        DEVICE_DRIVES[win32file.QueryDosDevice (d + ":").strip ("\x00").lower ()] = d + ":"
    except FileError, (errno, errctx, errmsg):
        if errno == 2:
            pass
        else:
          raise
 
class x_file_handles(Exception):
    pass
 
def signed_to_unsigned(signed):
    unsigned, = struct.unpack ("L", struct.pack ("l", signed))
    return unsigned
 
class SYSTEM_HANDLE_TABLE_ENTRY_INFO(Structure):
    """Represent the SYSTEM_HANDLE_TABLE_ENTRY_INFO on ntdll."""
    _fields_ = [
        ("UniqueProcessId", USHORT),
        ("CreatorBackTraceIndex", USHORT),
        ("ObjectTypeIndex", UCHAR),
        ("HandleAttributes", UCHAR),
        ("HandleValue", USHORT),
        ("Object", PVOID),
        ("GrantedAccess", ULONG),
    ]
 
class SYSTEM_HANDLE_INFORMATION(Structure):
    """Represent the SYSTEM_HANDLE_INFORMATION on ntdll."""
    _fields_ = [
        ("NumberOfHandles", ULONG),
        ("Handles", SYSTEM_HANDLE_TABLE_ENTRY_INFO * 1),
    ]
 
class LSA_UNICODE_STRING(Structure):
    """Represent the LSA_UNICODE_STRING on ntdll."""
    _fields_ = [
        ("Length", USHORT),
        ("MaximumLength", USHORT),
        ("Buffer", LPWSTR),
    ]
 
class PUBLIC_OBJECT_TYPE_INFORMATION(Structure):
    """Represent the PUBLIC_OBJECT_TYPE_INFORMATION on ntdll."""
    _fields_ = [
        ("Name", LSA_UNICODE_STRING),
        ("Reserved", ULONG * 22),
    ]
 
class OBJECT_NAME_INFORMATION (Structure):
    """Represent the OBJECT_NAME_INFORMATION on ntdll."""
    _fields_ = [
        ("Name", LSA_UNICODE_STRING),
    ]
 
class IO_STATUS_BLOCK_UNION (Union):
    """Represent the IO_STATUS_BLOCK_UNION on ntdll."""
    _fields_ = [
        ("Status", LONG),
        ("Pointer", PVOID),
    ]
 
class IO_STATUS_BLOCK (Structure):
    """Represent the IO_STATUS_BLOCK on ntdll."""
    _anonymous_ = ("u",)
    _fields_ = [
        ("u", IO_STATUS_BLOCK_UNION),
        ("Information", POINTER (ULONG)),
    ]
 
class FILE_NAME_INFORMATION (Structure):
    """Represent the on FILE_NAME_INFORMATION ntdll."""
    filename_size = 4096
    _fields_ = [
        ("FilenameLength", ULONG),
        ("FileName", WCHAR * filename_size),
    ]
 
def get_handles():
    """Return all the processes handles in the system atm."""
    system_handle_information = SYSTEM_HANDLE_INFORMATION()
    size = DWORD (sizeof (system_handle_information))
    while True:
        result = ntdll.NtQuerySystemInformation(
            SystemHandleInformation,
            byref(system_handle_information),
            size,
            byref(size)
        )
        result = signed_to_unsigned(result)
        if result == STATUS_SUCCESS:
            break
        elif result == STATUS_INFO_LENGTH_MISMATCH:
            size = DWORD(size.value * 4)
            resize(system_handle_information, size.value)
        else:
            raise x_file_handles("NtQuerySystemInformation", hex(result))
 
    pHandles = cast(
        system_handle_information.Handles,
        POINTER(SYSTEM_HANDLE_TABLE_ENTRY_INFO * \
                system_handle_information.NumberOfHandles)
    )
    for handle in pHandles.contents:
        yield handle.UniqueProcessId, handle.HandleValue
 
def get_process_handle (pid, handle):
    """Get a handle for the process with the given pid."""
    try:
        hProcess = OpenProcess(win32con.PROCESS_DUP_HANDLE, 0, pid)
        return DuplicateHandle(hProcess, handle, CURRENT_PROCESS,
            0, 0, DUPLICATE_SAME_ACCESS)
    except ApiError,(errno, errctx, errmsg):
        if errno in (
              winerror.ERROR_ACCESS_DENIED,
              winerror.ERROR_INVALID_PARAMETER,
              winerror.ERROR_INVALID_HANDLE,
              winerror.ERROR_NOT_SUPPORTED
        ):
            return None
        else:
            raise
 
 
def get_type_info (handle):
    """Get the handle type information."""
    public_object_type_information = PUBLIC_OBJECT_TYPE_INFORMATION()
    size = DWORD(sizeof(public_object_type_information))
    while True:
        result = signed_to_unsigned(
            ntdll.NtQueryObject(
                handle, 2, byref(public_object_type_information), size, None))
        if result == STATUS_SUCCESS:
            return public_object_type_information.Name.Buffer
        elif result == STATUS_INFO_LENGTH_MISMATCH:
            size = DWORD(size.value * 4)
            resize(public_object_type_information, size.value)
        elif result == STATUS_INVALID_HANDLE:
            return None
        else:
            raise x_file_handles("NtQueryObject.2", hex (result))
 
 
def get_name_info (handle):
    """Get the handle name information."""
    object_name_information = OBJECT_NAME_INFORMATION()
    size = DWORD(sizeof(object_name_information))
    while True:
        result = signed_to_unsigned(
            ntdll.NtQueryObject(handle, 1, byref (object_name_information),
            size, None))
        if result == STATUS_SUCCESS:
            return object_name_information.Name.Buffer
        elif result in (STATUS_BUFFER_OVERFLOW, STATUS_BUFFER_TOO_SMALL,
                        STATUS_INFO_LENGTH_MISMATCH):
            size = DWORD(size.value * 4)
            resize (object_name_information, size.value)
        else:
            return None
 
 
def filepath_from_devicepath (devicepath):
    """Return a file path from a device path."""
    if devicepath is None:
        return None
    devicepath = devicepath.lower()
    for device, drive in DEVICE_DRIVES.items():
        if devicepath.startswith(device):
            return drive + devicepath[len(device):]
    else:
        return devicepath
 
def get_real_path(path):
    """Return the real path avoiding issues with the Library a in Windows 7"""
    assert os.path.isdir(path)
    handle = CreateFile(
        path,
        FILE_LIST_DIRECTORY,
        FILE_SHARE_READ | FILE_SHARE_WRITE,
        None,
        OPEN_EXISTING,
        FILE_FLAG_BACKUP_SEMANTICS,
        None
    )
    name = get_name_info(int(handle))
    CloseHandle(handle)
    return filepath_from_devicepath(name)
 
def get_open_file_handles():
    """Return all the open file handles."""
    print 'get_open_file_handles'
    result = set()
    this_pid = os.getpid()
    for pid, handle in get_handles():
        if pid == this_pid:
            continue
        duplicate = get_process_handle(pid, handle)
        if duplicate is None:
            continue
        else:
            # get the type info and name info of the handle
            type = get_type_info(handle)
            name = get_name_info(handle)
            # add the handle to the result only if it is a file
            if type and type == 'File':
                # the name info represents the path to the object,
                # we need to convert it to a file path and then
                # test that it does exist
                if name:
                    file_path = filepath_from_devicepath(name)
                    if os.path.exists(file_path):
                        result.add(file_path)
    return result
 
def get_open_file_handles_under_directory(directory):
    """get the open files under a directory."""
    result = set()
    all_handles = get_open_file_handles()
    # to avoid issues with Libraries on Windows 7 and later, we will
    # have to get the real path
    directory = get_real_path(os.path.abspath(directory))
    print 'Dir ' + directory
    if not directory.endswith(os.path.sep):
        directory += os.path.sep
    for file in all_handles:
        print 'Current file ' + file
        if directory in file:
            result.add(file)
    return result

The above code uses undocumented functions from the ntdll which I supposed Microsoft does not want me to use. An while it works, the solution does no scale since the process of querying the Object Manager is vey expensive and can rocket your CPU if performed several times. Nevertheless the above code works correctly and could be used to write a tools similar to those written by sysinternals.

I hope someone will find a use for the code, in my case it is code that I’ll have to throw away :(

Read more
mandel

In the last post I explained how to set the security attributes of a file on Windows. What naturally follows such a post is explaining how to implement the os.access method that takes into account such settings because the default implementation of python will ignore them. Lets first define when does a user have read access in our use case:

I user has read access if the user sid has read access our the sid of the ‘Everyone’ group has read access.

The above also includes any type of configuration like rw or rx. In order to be able to do this we have to understand how does Windows NT set the security of a file. On Windows NT the security of a file is set by using a bitmask of type DWORD which can be compared to a 32 bit unsigned long in ANSI C, and this is as far as the normal things go, let continue with the bizarre Windows implementation. For some reason I cannot understand the Windows developers rather than going with the more intuitive solution of using a bit per right, they instead, have decided to use a combination of bits per right. For example, to set the read flag 5 bits have to be set, for the write flag they use 6 bits and for the execute 4 bits are used. To make matters more simple the used bitmask overlap, that is if we remove the read flag we will be removing bit for the execute mask, and there is no documentation to be found about the different masks that are used…

Thankfully for use the cfengine project has had to go through this process already and by trial an error discovered the exact bits that provide the read rights. Such a magic number is:

0xFFFFFFF6

Therefore we can easily and this flag to an existing right to remove the read flag. The number also means that the only import bit that we are interested in are bits 0 and 3 which when set mean that the read flag was added. To make matters more complicated the ‘Full Access’ rights does not use such flag. In order to know if a user has the Full Access rights we have to look at bit 28 which if set does represent the ‘Full Access’ flag.

So to summarize, to know if a user has the read flag we have to look at bit 28 to test for the ‘Full Access’ flag, if the ‘Full Access’ was not granted we have to look at bits 0 and 3 and when both of them are set the usre has the read flag, easy right ;) . Now to the practical example, the bellow code does exactly what I just explained using python and the win32api and win32security modules.

from win32api import GetUserName
 
from win32security import (
    LookupAccountName,
    LookupAccountSid,
    GetFileSecurity,
    SetFileSecurity,
    ACL,
    DACL_SECURITY_INFORMATION,
    ACL_REVISION
)
from ntsecuritycon import (
    FILE_ALL_ACCESS,
    FILE_GENERIC_EXECUTE,
    FILE_GENERIC_READ,
    FILE_GENERIC_WRITE,
    FILE_LIST_DIRECTORY
)
 
platform = 'win32'
 
EVERYONE_GROUP = 'Everyone'
ADMINISTRATORS_GROUP = 'Administrators'
 
def _int_to_bin(n):
    """Convert an int to a bin string of 32 bits."""
    return "".join([str((n >> y) & 1) for y in range(32-1, -1, -1)])
 
def _has_read_mask(number):
    """Return if the read flag is present."""
    # get the bin representation of the mask
    binary = _int_to_bin(number)
    # there is actual no documentation of this in MSDN but if bt 28 is set,
    # the mask has full access, more info can be found here:
    # http://www.iu.hio.no/cfengine/docs/cfengine-NT/node47.html
    if binary[28] == '1':
        return True
    # there is no documentation in MSDN about this, but if bit 0 and 3 are true
    # we have the read flag, more info can be found here:
    # http://www.iu.hio.no/cfengine/docs/cfengine-NT/node47.html
    return binary[0] == '1' and binary[3] == '1'
 
def access(path):
    """Return if the path is at least readable."""
    # for a file to be readable it has to be readable either by the user or
    # by the everyone group
    security_descriptor = GetFileSecurity(path, DACL_SECURITY_INFORMATION)
    dacl = security_descriptor.GetSecurityDescriptorDacl()
    sids = []
    for index in range(0, dacl.GetAceCount()):
        # add the sid of the ace if it can read to test that we remove
        # the r bitmask and test if the bitmask is the same, if not, it means
        # we could read and removed it.
        ace = dacl.GetAce(index)
        if _has_read_mask(ace[1]):
            sids.append(ace[2])
    accounts = [LookupAccountSid('',x)[0] for x in sids]
    return GetUserName() in accounts or EVERYONE_GROUP in accounts

When I wrote this my brain was in a WTF state so I’m sure that the horrible _int_to_bin function can be exchanged by the bin build in function from python. If you fancy doing it I would greatly appreciate it I cannot take this any longer ;)

Read more
mandel

While working on making the Ubuntu One code more multiplatform I founded myself having to write some code that would set the attributes of a file on Windows. Ideally os.chmod would do the trick, but of course this is windows, and it is not fully supported. According to the python documentation:

Note: Although Windows supports chmod(), you can only set the file’s read-only flag with it (via the stat.S_IWRITE and stat.S_IREAD constants or a corresponding integer value). All other bits are ignored.

Grrrreat… To solve this issue I have written a small function that will allow to set the attributes of a file by using the win32api and win32security modules. This solves partially the issues since 0444 and others cannot be perfectly map to the Windows world. In my code I have made the assumption that using the groups ‘Everyone’, ‘Administrators’ and the user name would be close enough for our use cases.

Here is the code in case anyone has to go through this:

from win32api import MoveFileEx, GetUserName
 
from win32file import (
    MOVEFILE_COPY_ALLOWED,
    MOVEFILE_REPLACE_EXISTING,
    MOVEFILE_WRITE_THROUGH
)
from win32security import (
    LookupAccountName,
    GetFileSecurity,
    SetFileSecurity,
    ACL,
    DACL_SECURITY_INFORMATION,
    ACL_REVISION
)
from ntsecuritycon import (
    FILE_ALL_ACCESS,
    FILE_GENERIC_EXECUTE,
    FILE_GENERIC_READ,
    FILE_GENERIC_WRITE,
    FILE_LIST_DIRECTORY
)
 
EVERYONE_GROUP = 'Everyone'
ADMINISTRATORS_GROUP = 'Administrators'
 
def _get_group_sid(group_name):
    """Return the SID for a group with the given name."""
    return LookupAccountName('', group_name)[0]
 
 
def _set_file_attributes(path, groups):
    """Set file attributes using the wind32api."""
    security_descriptor = GetFileSecurity(path, DACL_SECURITY_INFORMATION)
    dacl = ACL()
    for group_name in groups:
        # set the attributes of the group only if not null
        if groups[group_name]:
            group_sid = _get_group_sid(group_name)
            dacl.AddAccessAllowedAce(ACL_REVISION, groups[group_name],
                group_sid)
    # the dacl has all the info of the dff groups passed in the parameters
    security_descriptor.SetSecurityDescriptorDacl(1, dacl, 0)
    SetFileSecurity(path, DACL_SECURITY_INFORMATION, security_descriptor)
 
def set_file_readonly(path):
    """Change path permissions to readonly in a file."""
    # we use the win32 api because chmod just sets the readonly flag and
    # we want to have imore control over the permissions
    groups = {}
    groups[EVERYONE_GROUP] = FILE_GENERIC_READ
    groups[ADMINISTRATORS_GROUP] = FILE_GENERIC_READ
    groups[GetUserName()] = FILE_GENERIC_READ
    # the above equals more or less to 0444
    _set_file_attributes(path, groups)

For those who might want to remove the read access from a group, you just have to not pass the group in the groups parameter which would remove the group from the security descriptor.

Read more
mandel

At the moment I am sprinting in Argentina trying to make the Ubuntu One port to Windows better by adding support to the sync daemon used on Linux. While the rest of the guys are focused in accomodating the current code to my “multiplatform” requirements, I’m working on getting a number of missing parts to work on windows. One of this parts is the lack of network manager on Windows.

One of the things we need to know to coninusly sync you files on windows is to get an event when your network is present, or dies. As usual this is far easier on Linux than on Windows. To get this event you have to implement the ISesNetwork interface from COM that will allow your object to register to network status changes. Due to the absolute lack of examples on the net (or how bad google is getting ;) ) I’ve decided to share the code I managed to get working:

"""Implementation of ISesNework in Python."""
 
import logging
import logging.handlers
 
import pythoncom
 
from win32com.server.policy import DesignatedWrapPolicy
from win32com.client import Dispatch
 
# set te logging to store the data in the ubuntuone folder
handler = logging.handlers.RotatingFileHandler('network_manager.log', 
                    maxBytes=400, backupCount=5)
service_logger = logging.getLogger('NetworkManager')
service_logger.setLevel(logging.DEBUG)
service_logger.addHandler(handler)
 
## from EventSys.h
PROGID_EventSystem = "EventSystem.EventSystem"
PROGID_EventSubscription = "EventSystem.EventSubscription"
 
# sens values for the events, this events contain the uuid of the
# event, the name of the event to be used as well as the method name 
# of the method in the ISesNetwork interface that will be executed for
# the event.
 
SUBSCRIPTION_NETALIVE = ('{cd1dcbd6-a14d-4823-a0d2-8473afde360f}',
                         'UbuntuOne Network Alive',
                         'ConnectionMade')
 
SUBSCRIPTION_NETALIVE_NOQOC = ('{a82f0e80-1305-400c-ba56-375ae04264a1}',
                               'UbuntuOne Net Alive No Info',
                               'ConnectionMadeNoQOCInfo')
 
SUBSCRIPTION_NETLOST = ('{45233130-b6c3-44fb-a6af-487c47cee611}',
                        'UbuntuOne Network Lost',
                        'ConnectionLost')
 
SUBSCRIPTION_REACH = ('{4c6b2afa-3235-4185-8558-57a7a922ac7b}',
                       'UbuntuOne Network Reach',
                       'ConnectionMade')
 
SUBSCRIPTION_REACH_NOQOC = ('{db62fa23-4c3e-47a3-aef2-b843016177cf}',
                            'UbuntuOne Network Reach No Info',
                            'ConnectionMadeNoQOCInfo')
 
SUBSCRIPTION_REACH_NOQOC2 = ('{d4d8097a-60c6-440d-a6da-918b619ae4b7}',
                             'UbuntuOne Network Reach No Info 2',
                             'ConnectionMadeNoQOCInfo')
 
SUBSCRIPTIONS = [SUBSCRIPTION_NETALIVE,
                 SUBSCRIPTION_NETALIVE_NOQOC,
                 SUBSCRIPTION_NETLOST,
                 SUBSCRIPTION_REACH,
                 SUBSCRIPTION_REACH_NOQOC,
                 SUBSCRIPTION_REACH_NOQOC2 ]
 
SENSGUID_EVENTCLASS_NETWORK = '{d5978620-5b9f-11d1-8dd2-00aa004abd5e}'
SENSGUID_PUBLISHER = "{5fee1bd6-5b9b-11d1-8dd2-00aa004abd5e}"
 
# uuid of the implemented com interface
IID_ISesNetwork = '{d597bab1-5b9f-11d1-8dd2-00aa004abd5e}'
 
class NetworkManager(DesignatedWrapPolicy):
    """Implement ISesNetwork to know about the network status."""
 
    _com_interfaces_ = [IID_ISesNetwork]
    _public_methods_ = ['ConnectionMade',
                        'ConnectionMadeNoQOCInfo', 
                        'ConnectionLost']
    _reg_clsid_ = '{41B032DA-86B5-4907-A7F7-958E59333010}' 
    _reg_progid_ = "UbuntuOne.NetworkManager"
 
    def __init__(self, connected_cb, disconnected_cb):
        self._wrap_(self)
        self.connected_cb = connected_cb 
        self.disconnected_cb = disconnected_cb
 
    def ConnectionMade(self, *args):
        """Tell that the connection is up again."""
        service_logger.info('Connection was made.')
        self.connected_cb()
 
    def ConnectionMadeNoQOCInfo(self, *args):
        """Tell that the connection is up again."""
        service_logger.info('Connection was made no info.')
        self.connected_cb()
 
    def ConnectionLost(self, *args):
        """Tell the connection was lost."""
        service_logger.info('Connection was lost.')
        self.disconnected_cb() 
 
    def register(self):
        """Register to listen to network events."""
        # call the CoInitialize to allow the registration to run in an other
        # thread
        pythoncom.CoInitialize()
        # interface to be used by com
        manager_interface = pythoncom.WrapObject(self)
        event_system = Dispatch(PROGID_EventSystem)
        # register to listent to each of the events to make sure that
        # the code will work on all platforms.
        for current_event in SUBSCRIPTIONS:
            # create an event subscription and add it to the event
            # service
            event_subscription = Dispatch(PROGID_EventSubscription)
            event_subscription.EventClassId = SENSGUID_EVENTCLASS_NETWORK
            event_subscription.PublisherID = SENSGUID_PUBLISHER
            event_subscription.SubscriptionID = current_event[0]
            event_subscription.SubscriptionName = current_event[1]
            event_subscription.MethodName = current_event[2]
            event_subscription.SubscriberInterface = manager_interface
            event_subscription.PerUser = True
            # store the event
            try:
                event_system.Store(PROGID_EventSubscription, 
                                   event_subscription)
            except pythoncom.com_error as e:
                service_logger.error(
                    'Error registering to event %s', current_event[1])
 
        pythoncom.PumpMessages()
 
if __name__ == '__main__':
    from threading import Thread
    def connected():
        print 'Connected'
 
    def disconnected():
        print 'Disconnected'
 
    manager = NetworkManager(connected, disconnected)
    p = Thread(target=manager.register)
    p.start()

The above code represents a NetworkManager class that will execute a callback according to the event that was raised by the sens subsystem. It is important to note that in the above code the ‘Connected’ event will be fired 3 times since we registered to three different connect events while it will fire a single ‘Disconnected’ event. The way to fix this would be to register just to a single event according to the windows system you are running on, but since we do not care in the Ubuntu One sync daemon, well I left it there so everyone can see it :)

Read more
mandel

Sometimes on Linux we take for granted DBus. On the Ubntu One Windows port we have had to deal with the fact that DBus on Windows is not that great and therefore had to write our own IPC between the python code and the c# code. To solve the IPC we have done the following:

Listen to a named pipe from C#

The approach we have followed here is pretty simple, we create a thread pool that will create NamedPipe. The reason for using a threadpool is to avoid the situation in which we only have a single thread dealing with the messages from python and we have a very chatty python developer. The code in c# is very straight forward:

/*
 * Copyright 2010 Canonical Ltd.
 * 
 * This file is part of UbuntuOne on Windows.
 * 
 * UbuntuOne on Windows is free software: you can redistribute it and/or modify		
 * it under the terms of the GNU Lesser General Public License version 		
 * as published by the Free Software Foundation.		
 *		
 * Ubuntu One on Windows is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the	
 * GNU Lesser General Public License for more details.	
 *
 * You should have received a copy of the GNU Lesser General Public License	
 * along with UbuntuOne for Windows.  If not, see <http://www.gnu.org/licenses/>.
 * 
 * Authors: Manuel de la Peña <manuel.delapena@canonical.com>
 */
using System;
using System.IO;
using System.IO.Pipes;
using System.Threading;
using log4net;
 
namespace Canonical.UbuntuOne.ProcessDispatcher
{
    /// <summary>
    /// This oject represents a listener that will be waiting for messages
    /// from the python code and will perform an operation for each messages
    /// that has been recived. 
    /// </summary>
    internal class PipeListener : IPipeListener
    {
        #region Helper strcut
 
        /// <summary>
        /// Private structure used to pass the start of the listener to the 
        /// different listening threads.
        /// </summary>
        private struct PipeListenerState
        {
            #region Variables
 
            private readonly string _namedPipe;
            private readonly Action<object> _callback;
 
            #endregion
 
            #region Properties
 
            /// <summary>
            /// Gets the named pipe to which the thread should listen.
            /// </summary>
            public string NamedPipe { get { return _namedPipe; } }
 
            /// <summary>
            /// Gets the callback that the listening pipe should execute.
            /// </summary>
            public Action<object> Callback { get { return _callback; } }
 
            #endregion
 
            public PipeListenerState(string namedPipe, Action<object> callback)
            {
                _namedPipe = namedPipe;
                _callback = callback;
            }
        }
 
        #endregion
 
        #region Variables
 
        private readonly object _loggerLock = new object();
        private ILog _logger;
        private bool _isListening;
        private readonly object _isListeningLock = new object();
 
        #endregion
 
        #region Properties
 
        /// <summary>
        /// Gets the logger to used with the object.
        /// </summary>
        internal ILog Logger
        {
            get
            {
                if (_logger == null)
                {
                    lock (_loggerLock)
                    {
                        _logger = LogManager.GetLogger(typeof(PipeListener));
                    }
                }
                return _logger;
            }
            set
            {
                _logger = value;
            }
        }
 
        /// <summary>
        /// Gets if the pipe listener is indeed listening to the pipe.
        /// </summary>
        public bool IsListening
        {
            get { return _isListening; }
            private set
            {
                // we have to lock to ensure that the threads do not screw each
                // other up, this makes a small step of the processing to be sync :(
                lock (_isListeningLock)
                {
                    _isListening = value;
                }
            }
        }
 
        /// <summary>
        /// Gets and sets the number of threads that will be used to listen to the 
        /// pipe. Each thread will listeng to connections and will dispatch the 
        /// messages when ever they are done.
        /// </summary>
        public int NumberOfThreads { get; set; }
 
        /// <summary>
        /// Gets and sets the pipe stream factory that know how to generate the streamers used for the communication.
        /// </summary>
        public IPipeStreamerFactory PipeStreamerFactory { get; set; }
 
        /// <summary>
        /// Gets and sets the action that will be performed with the message of that 
        /// is received by the pipe listener.
        /// </summary>
        public IMessageProcessor MessageProcessor { get; set; }
 
        #endregion
 
        #region Helpers
 
        /// <summary>
        /// Helper method that is used in another thread that will be listening to the possible events from 
        /// the pipe.
        /// </summary>
        private void Listen(object state)
        {
            var namedPipeState = (PipeListenerState)state;
 
            try
            {
                var threadNumber = Thread.CurrentThread.ManagedThreadId;
                // starts the named pipe since in theory it should not be present, if there is 
                // a pipe already present we have an issue.
                using (var pipeServer = new NamedPipeServerStream(namedPipeState.NamedPipe, PipeDirection.InOut, NumberOfThreads,PipeTransmissionMode.Message,PipeOptions.Asynchronous))
                {
                    Logger.DebugFormat("Thread {0} listenitng to pipe {1}", threadNumber, namedPipeState.NamedPipe);
                    // we wait until the python code connects to the pipe, we do not block the 
                    // rest of the app because we are in another thread.
                    pipeServer.WaitForConnection();
 
                    Logger.DebugFormat("Got clien connection in tread {0}", threadNumber);
                    try
                    {
                        // create a streamer that know the protocol
                        var streamer = PipeStreamerFactory.Create();
                        // Read the request from the client. 
                        var message = streamer.Read(pipeServer);
                        Logger.DebugFormat("Message received to thread {0} is {1}", threadNumber, message);
 
                        // execute the action that has to occur with the message
                        namedPipeState.Callback(message);
                    }
                    // Catch the IOException that is raised if the pipe is broken
                    // or disconnected.
                    catch (IOException e)
                    {
                        Logger.DebugFormat("Error in thread {0} when reading pipe {1}", threadNumber, e.Message);
                    }
 
                }
                // if we are still listening, we will create a new thread to be used for listening,
                // otherwhise we will not and not lnger threads will be added. Ofcourse if the rest of the
                // threads do no add more than one work, we will have no issues with the pipe server since it
                // has been disposed
                if (IsListening)
                {
                    ThreadPool.QueueUserWorkItem(Listen, namedPipeState);
                }
            }
            catch (PlatformNotSupportedException e)
            {
                // are we running on an OS that does not have pipes (Mono on some os)
                Logger.InfoFormat("Cannot listen to pipe {0}", namedPipeState.NamedPipe);
            }
            catch (IOException e)
            {
                // there are too many servers listening to this pipe.
                Logger.InfoFormat("There are too many servers listening to {0}", namedPipeState.NamedPipe);
            }
        }
 
        #endregion
 
        /// <summary>
        /// Starts listening to the different pipe messages and will perform the appropiate
        /// action when a message is received.
        /// </summary>
        /// <param name="namedPipe">The name fof the pipe to listen.</param>
        public void StartListening(string namedPipe)
        {
            if (NumberOfThreads < 0)
            {
                throw new PipeListenerException(
                    "The number of threads to use to listen to the pipe must be at least one.");
            }
            IsListening = true;
            // we will be using a thread pool that will allow to have the different threads listening to 
            // the messages of the pipes. There could be issues if the devel provided far to many threads
            // to listen to the pipe since the number of pipe servers is limited.
            for (var currentThreaCount = 0; currentThreaCount < NumberOfThreads; currentThreaCount++)
            {
                // we add an new thread to listen
                ThreadPool.QueueUserWorkItem(Listen, new PipeListenerState(namedPipe, MessageProcessor.ProcessMessage));
            }
 
        }
 
        /// <summary>
        /// Stops listening to the different pipe messages. All the thread that are listening already will 
        /// be forced to stop.
        /// </summary>
        public void StopListening()
        {
            IsListening = false;
        }
 
    }
}

Sending messages from python

Once the pipe server is listening in the .Net side we simple have to use the CallNamedPipe method to be able to send messages to .Net. In my case I have used Json as a stupid protocol, ideally you should do something smart like protobuffers.

 call the pipe with the message
    try:
        data = win32pipe.CallNamedPipe(pipe_name, 
            data_json, len(data_json), 0 )
    except Exception, e:
        print "Error: C# client is not listening!! %s" % e.message

Read more
mandel

It is not a secret that I love Spring.Net, it just makes the development of big application a pleasure. During the port of Ubuntu One to Windows I have been using the framework to initialise the WCF service that we use to provide other .Net applications the ability of communicating with Ubuntu One. Yes, this is our DBus alternative!

The idea behind using WCF is to allow other applications to use the different features that Ubuntu One provides, the very first application that we would like to use this would be Banshee on Windows (I have to start looking into that, but I have too much to do right now). In order to provide this functionality we use named pipes to allow the communication, there are two reasons for this:

  • For an application to host a WCF service that uses a binding besides the named pipe binding requires special permissions. This is clearly a no no for a user application like Ubuntu One.
  • Named pipes are dammed efficient!!! Named pipes on Windows are at the kernel level, cool :)

Initially I though of hosting the WCF services as a Windows services, why not?!?! Once I had this feature implemented, I realized the following. It turns out that while impersonation does get spawn within different threads, this is not the case for processes. This is a major pain in the ass. The main reason for this being a problem is the fact that if an application is executed in a different user space, the different env variables that are used are those of the user executing the code. This means that things like your user roming app dir will not be able to use, plus other security issues.

After realizing that the WCF services could not be hosted on a Windows service, I moved to write a work a round that would do the following:

  1. Configure the WCF services to use named pipes only for the current user.
  2. Start a console application that will host the WCF services.
  3. Start the different WCF clients for Ubuntu One (currently is our clietn app, but should it could be your own!

Although the definition of the solution is simple, we have to work around the issue that up ’til now all our WCF services were defined through configuration and were injected by the spring.net IoC. Usually you can change the location of you app domain configuration by using the following code:

AppDomain.CurrentDomain.SetData(“APP_CONFIG_FILE”,”c:\\ohad.config);

In theory wth the above code you can redirect the configuration to a new file, and if you use for example:

System.Configuration.ConfigurationSettings.AppSettings["my_setting"]

you will be able to get the value of your new configuration. Unfortunatly, the Spring.Net IoC uses the ConfigurationManager class which ignores that setting… Now what?

Well, re-writting all the code to not use Spring.Net IoC was not an option because it means changing a lot of work and does mean to move from an application where dependencies are injected to one were we have to manually init all the different objects. After some careful though, I move to use a small CLR detail that I knew to make the AppDomain that executed our code to use the users configuration. The trick is the following, use one AppDomain to start the application. This would be a dummy AppDomain that does not execute any code at all but launches a second AppDomain whose configuration is the correct one and which will execute the actual code.

In case I did not make any sense, here is an example code:

using System;
using Canonical.UbuntuOne.Common.Container;
using Canonical.UbuntuOne.Common.Utils;
using log4net;
 
namespace Canonical.UbuntuOne.ProcessDispatcher
{
 
    static class Program
    {
        private static readonly ILog _logger = LogManager.GetLogger(typeof(Program));
        private static readonly ConfigurationLocator _configLocator = new ConfigurationLocator();
 
        /// <summary>
        /// This method starts the service.
        /// </summary>
        static void Main()
        {
 
            _logger.Debug("Redirecting configuration");
 
            // Setup information for the new appdomain.
            var setup = new AppDomainSetup
            {
                ConfigurationFile = _configLocator.GetCurrentUserDaemonConfiguration()
            };
 
            // Create the new appdomain with the new config.
            var executionAppDomain = AppDomain.CreateDomain("ServicesAppDomain",
                AppDomain.CurrentDomain.Evidence, setup);
 
            // Call the write config method in that appdomain.
            executionAppDomain.DoCallBack(() =>
            {
                _logger.Debug("Starting services.");
                // use the IoC to get the implementation of the SyncDaemon service, the IoC will take care of 
                // setting the object correctly.
                ObjectsContainer.Initialize(new SpringContainer());
                var syncDaemonWindowsService = 
                     ObjectsContainer.GetImplementationOf<SyncDaemonWindowsService>();
                // To run more than one service you have to add them here
                syncDaemonWindowsService.Start();
                while (true) ;
            });
 
        }
    }
}

Well I hope this helps someone else :D

Read more
mandel

On the process on porting Ubuntu One to windows we took a number of decisions that we expect to make peoples live better when installing it.

Python

One of the most important decisions that had to be taken was what to do with python. Most of the code (probably all) of the sync daemon has been written in python and reusing that code is a plus.

In this situation we could have chosen to different paths:

  • Dependency: Add the presence of a python runtime as a dependency. That is either add a bootstrap that installs python, install it in its normal location through a python installer or ask the user to do it for us.
  • Embedded:Use py2exe or pyInstaller to distribute the python binaries so that we do not “require” python to be there.

Both options have their pros an cons but we decide for second one for the following reasons:

  • A user could change the python runtime and brake Ubuntu One
  • More than one runtime could be there.
  • Is a normal user really interested about having python in their machine?

Unfortunately so far I have not managed to use pyInstaller which I’ve been told is smarter than py2exe in terms of creating smaller binary packages (anyone with experience with that is more than welcome to help me).

But Pyhon is not THAT heavy

Indeed, Python is not that heavy and should not make the msi huge. But of course Ubuntu One has a number of dependencies that are not Python alone:

  • Ubuntu SSO: In the last iteration of Ubuntu One the service started using the Ubuntu SSO. This code has been re-writen in C# and in the near future will be exposed by a WCF service so that things such as the Ubuntu One music store could be used on Windows (banshee sounds like a great app to offer on windows)
  • Gnome keyring: This is actually a dependency from Ubuntu SSO, but it has to be taken into account. We needed a place where we could store secrets. I have implemented a small lib that allows to store secrets in the Current User registry key that uses the DPAPI so that the secrets are safe. Later in the cycle I’ll add a WCF service that will allow other apps to store secrets there and might even add an abstraction layer so that it uses the GnomeKeyring code done by NDesk when on Linux.
  • Dependency Injection: I have intially started using WPF but I do not deny the option of using GTK# in the future, or at least have the option. For that I have sued the DI of Spring.Net so that contributors can add their own views. You could even customize your Ubuntu One by modifying the DI and point to a dll that provides a new UI.
  • PDBs are present: Currently PDB files are present and the code is compiled in DEBUG mode, this does make a diff in the size of it.

At the end of the day, all this adds up making the msi quite big. Currently I’m focused on porting all the required code, in the next cycle I’ll make sure you have a nice and small package :D

Read more
mandel

As some of you may know I am the person that is currently working on the port of Ubuntu One to Windows. Recently a colleague asked me how to install Ubuntu One in its current alpha state on Windows, and I though that posting the instructions for the adventurous would be a nice thing to do (I might get someone interested to help me too ;) ).

Setting the build environment

Because the .msi that we generate does not have the digital signature of Canonical we do not distribute it yet. But you shall not worry since all the code is open source are you are more than able to compile, test and create the installer by yourself. To do that you have to set up your environment (I should create an msi or script for this….). In order to set everything, follow this steps:

  1. Install python 2.6 for windows and extensions (installer)
  2. Install py2exe
  3. Patch py2exe with this
  4. Add the following implementation of XDG for windows.
  5. Install this python packages:

    • twisted
    • oauth
    • ubuntuone-storage-protocol

    It is important that if you use easy_install to install the packages you need to use the -Z option so that the dependecies are not isntalled as eggs. Py2exe cannot work with eggs and by using the -Z option the egss will be automatically extracted for you.

Creating the .msi

As usual everything in the build process has been automated. To automate the process we have used Nant which will build, test and create the .msi to be used.

A list of the commands can be found in the ubuntu wiki, nevertheless here they are:

bootstrapper
Creates a bootstrapper that will allow to install in the users machine the Ubuntu One solution plus a number of useful applications (Tomboy + GtkSharp)
build
Compiles the different projects that are part of the solution.
clean
Removes the different results from the last compilation/task .
installer
Compiles the application using the build task, runs the unit tests usint the tests task and creates a msi installer that can be used to install Ubuntu One in the users machine (do not confuse with the bootstrapper, it only installes Ubuntu One)
tests
Compiles the solution using the build task and runs the different unit tests. The output of the tests can be found in the test-results dir.

In order to build the msi you will have to execute the following from the root of the directory:

tools\Nant\bin\nant.exe installer

Once you have done that you will be able to find the msi in the install directory that you will be able to use to install the app in your machine.

Installing

Well it is an .msi, so double click ;)

Using

As I mentioned, this is an alpha, an very very early alpha and it means that there is some work to get it running. Currently the most blocking issue is the fact that we do not have an implementation of Ubuntu SSO on windows an therefore we cannot retrieve the OAuth tokens required by Ubuntu One. Here are the instructions to get them:

1. Get credentials from Linux

The first step is to get your credentials from a machine that you already have paired. To do so, launch seahorse (the image might help)

Once you have opened seahorse you should be able to find an UbuntuOne token (if not, you will need to pair your machine to Ubuntu One). Right click on it and selected the properties options which should open a dialog like the following:

At this point simple click on the + sign and select “Show password” so that you can copy paste the Oauth tokens.

2. Set you OAuth in Windows

Currently the OAuth in Windows are read from an env variable. To be able to start syncing in your Windows machine you will have to set the env variable with the tokens you just retrieved from your Linux box. This example will be using Windows XP but it should be the same in other Windows versions.

To access to the env vars in Windows XP right click in “My Computer” and select “Properties”:

This will launch the system properties dialog. Select the “Advance” tab where you will find the option of “Enviroment Variables”:

Once the “Enviroment Variables” dialog is launched you will have to create a new env variable in the “User Variables” section:

The data to be used in the following:

Variable Name
UbuntuOne
Variable value
Your OAuth token from Linux.
Sync
If you did not restart your machine after the installer, do it. In the next boot time you will have the following:

Not all the actions of the menu are yet there, but for sure you can use the “Synchronize Now” option.

How can I help

Well the easiest way to help is to file bugs, secondly join #ubuntuone on freenode and look for mandel (me :D ) and I will happily explain the C# code as well as the python code and the work we have to do. This is not an easy project so do not get scared by the amount of code done so far or were to start, I’m here for that

Happy syncing!

Read more
mandel

I have been working for about 2 moths now and after releasing our internal alpha release I have found a very interesting bug. In our port to windows we have decided to try and make your live as nice as possible in an environment as rough as Windows and to do so we allow our port to auto-update to always deliver the latests bug fixes.

Ofcourse to ensure that we are updating you system with the correct data we always perform a check sum of the msi. While our msi is updating to S3 using python, the code that downloads it is C#. Here are the different codes to calcualte the checksum:

    fd = open(filename,'r')
    md5_hash = hashlib.md5(fd.read()).hexdigest()
private static string Checksum(AlgoirhtmsEnum algorithm, string filePath)
        {
            HashAlgorithm hash;
            switch (algorithm)
            {
                case AlgoirhtmsEnum.SHA1:
                    hash = new SHA1Managed();
                    break;
                case AlgoirhtmsEnum.SHA256:
                    hash = new SHA256Managed();
                    break;
                case AlgoirhtmsEnum.SHA384:
                    hash = new SHA384Managed();
                    break;
                case AlgoirhtmsEnum.SHA512:
                    hash = new SHA512Managed();
                    break;
                default:
                    hash = new MD5CryptoServiceProvider();
                    break;
 
            }
            var checksum = "";
            using (var stream = new FileStream(filePath, FileMode.Open, FileAccess.Read))
            {
                var md5 = hash.ComputeHash(stream);
                for (var i = 0; i < md5.Length; i++)
                {
                    checksum += md5[i].ToString("x2").ToLower();
                }
            }
            return checksum;
        }

Believe it or not the hash returned by each piece of code was different. WTF!!!! After a ridiculous time looking at it I managed to spot the issue. If you are using python on windows, unless you use the b option when opening a file, python will convert all the CRLF to LF making the hash to be different, how to fixed this? simply open the file this way:

fd = open(filename,'rb')

Go an figure….

Read more