Iterator Methods and System.IO

June 4, 2013 - C#, Programming

One of the more interesting omissions from .NET so far has been that the System.IO namespace has not been updated to take advantage of new Language and CLR features. One of the more prominent features that I can think of would be Enumerator methods.

An Iterator method- also known by other names in other languages, such as generators in Python, is essentially a construct that allows a method to yield both a result value as well as flow control. For example:

will give us this output:

For comparison, the equivalent VB.NET Code:

Of course this is a rather simple example. The real power of this language feature comes when it is exploited to save both processing power and programming time.

Dealing with files is almost inevitable for desktop software. While Windows Phone and RT Applications have access to various alternate Storage techniques (in fact, actual File Access is forbidden for security reasons, to my understanding). This highlights an interesting historical artifact of the IO Namespace from before the introduction of Iterator Methods. As a result, Many methods are more Monolithic; for example, if you want to find a certain file in a directory of 100,000 files, you need to use one of the methods of the DirectoryInfo class to actually return all the files in that directory into a 100,000 item array, then search that array. Ideally, you would be able to search through the directory one file at a time, stopping the iteration when you find the file you need. But in the given case, even if the File you wanted happened to be the first item in the array, the System.IO namespace methods will still give you back a 100,000 item array containing all the items in that folder.

Of course, the nice thing about .NET is that it provides a wealth of language features that you can exploit for your own nefarious completely positive and happy rainbow purposes. This is one good example.

Creating a Iterator for Files

In order to create a File Iterator, we are going to have to use P/Invoke and use the File API provided by Windows. The File API Provided by Windows comes in the form of the FindFirstFile, FindNextFile, and FindClose API Functions. The first step is to get the proper P/Invoke declarations for these methods into a C# Class. An absolutely invaluable resource for this is P/Invoke.

. In order to go for the best organization, I opted to put these declarations in a separate class, which I called “FSAPI” because I didn’t think calling Timothy or Pauline seemed particularly appropriate:

With the Declarations in place, we only need to use them. In this case the FileFindData struct is not actually used by the P/Invoked API, instead we intend to use that to pass more information back to the caller. in particular we want to have recursive capability, and the WIN32_FIND_DATA structure doesn’t lend itself well to that since the cFileName Field is very likely to be too short for many full paths and also it’s not exactly good form to start smacking about with structures like this; It’s sort of like going to a Jewish or Islamic festival and throwing ham at everyone, it’s just not good form.

One of the requisites for finding files recursively is that the File Mask you pass into the Find*File API is actually a mask for everything it finds; what this means is that if you specify a mask it will not find Directories unless those Directories happen to match that mask. In order to deal with this, we will pass in * as the mask to get all files, and then filter the results ourself using a Regular Expression. “Oh no, Not that” you say. It’s not really changing much, however; the standard File Mask syntax can be considered a limited subset of a Regular Expression, so we should be able to get by with a few replacements:

With that replacement in place, we should be able to use FitsMask() to determine if a File result matches a given file mask. In fact, we could replace the basic logic with a delegate method for testing both directories as well as files to see if they should be yielded, but we can add that after we get the basic method completed.

The core logic is in retrieving Files and Directories from a single Directory, using an Iterator method, naturally:

Some notes: this just uses the FindFile API Functions, iterates and yield returns each result, and closes the Find Handle if necessary in a finally Block. We will build the extra functionality on top of this.

the logic here is basically to go through all the files in the given folder, if it finds a directory and the recursive argument is true, it will recursively call itself on that subdirectory. If it’s a file, it will yield a FileFindData structure to the caller. However we can improve this by actually abstracting the logic for recursion selection and whether a file matches a filter to a delegate parameter:

And There you have it! This could be expanded to provide Extension Methods on DirectoryInfo that return Directories and Files if necessary. One idea might be to change the delegate to accept the actual WIN32_FIND_DATA structure for filtering purposes, making it easier to filter on more of the File’s various properties.

Have something to say about this post? Comment!