One of the more interesting omissions from .NET so far has been that the System.IO namespace has not been updated to take advantage of new Language and CLR features. One of the more prominent features that I can think of would be Enumerator methods.
An Iterator method- also known by other names in other languages, such as generators in Python, is essentially a construct that allows a method to yield both a result value as well as flow control. For example:
1 2 3 4 5 6 7 8 9 |
public static IEnumerable<int> getRange(int start,int end,int step) { for(int i=start;i<end ;i+=step) yield return i; } public static void Main(String[] args) { foreach(int iterate in getRange(5,50,10)) Console.WriteLine(iterate); } |
will give us this output:
1 2 3 4 5 |
5 15 25 35 45 |
For comparison, the equivalent VB.NET Code:
1 2 3 4 5 6 7 8 9 10 |
Public Shared Iterator Function getRange(Start As Integer,End As Integer,pStep As Integer) As IEnumerable(Of Integer) For I As Integer = Start to End Step pStep Yield I Next I End Function Public Shared Sub Main(Args As String()) For Each iterate As Integer In getRange(5,50,10) Console.WriteLine(iterate) Next End Sub |
Of course this is a rather simple example. The real power of this language feature comes when it is exploited to save both processing power and programming time.
Dealing with files is almost inevitable for desktop software. While Windows Phone and RT Applications have access to various alternate Storage techniques (in fact, actual File Access is forbidden for security reasons, to my understanding). This highlights an interesting historical artifact of the IO Namespace from before the introduction of Iterator Methods. As a result, Many methods are more Monolithic; for example, if you want to find a certain file in a directory of 100,000 files, you need to use one of the methods of the DirectoryInfo class to actually return all the files in that directory into a 100,000 item array, then search that array. Ideally, you would be able to search through the directory one file at a time, stopping the iteration when you find the file you need. But in the given case, even if the File you wanted happened to be the first item in the array, the System.IO namespace methods will still give you back a 100,000 item array containing all the items in that folder.
Of course, the nice thing about .NET is that it provides a wealth of language features that you can exploit for your own nefarious completely positive and happy rainbow purposes. This is one good example.
Creating a Iterator for Files
In order to create a File Iterator, we are going to have to use P/Invoke and use the File API provided by Windows. The File API Provided by Windows comes in the form of the FindFirstFile, FindNextFile, and FindClose API Functions. The first step is to get the proper P/Invoke declarations for these methods into a C# Class. An absolutely invaluable resource for this is P/Invoke.
. In order to go for the best organization, I opted to put these declarations in a separate class, which I called “FSAPI” because I didn’t think calling Timothy or Pauline seemed particularly appropriate:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
internal class FSAPI { public const int MAX_PATH = 260; public const int MAX_ALTERNATE = 14; private struct twointsonestruct { uint High; uint Low; } public struct FileFindData { public WIN32_FIND_DATA APIStructure; public String FullFileName; } [StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)] public struct WIN32_FIND_DATA { public FileAttributes dwFileAttributes; public FILETIME ftCreationTime; public FILETIME ftLastAccessTime; public FILETIME ftLastWriteTime; public uint nFileSizeHigh; //changed all to uint, otherwise you run into unexpected overflow public uint nFileSizeLow; //| public uint dwReserved0; //| public uint dwReserved1; //v [MarshalAs(UnmanagedType.ByValTStr, SizeConst = MAX_PATH)] public string cFileName; [MarshalAs(UnmanagedType.ByValTStr, SizeConst = MAX_ALTERNATE)] public string cAlternate; public long FileSize { get { long b = nFileSizeLow; b = b << 32; b = b | (uint)nFileSizeHigh; return b; } } public String Filename { get { return cFileName.Replace("\0", "").Trim(); } } } [DllImport("kernel32.dll", CharSet = CharSet.Auto)] public static extern IntPtr FindFirstFile(string lpFileName, out WIN32_FIND_DATA lpFindFileData); [DllImport("kernel32.dll", CharSet = CharSet.Auto)] public static extern bool FindNextFile(IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData); [DllImport("kernel32.dll")] public static extern bool FindClose(IntPtr hFindFile); } |
With the Declarations in place, we only need to use them. In this case the FileFindData struct is not actually used by the P/Invoked API, instead we intend to use that to pass more information back to the caller. in particular we want to have recursive capability, and the WIN32_FIND_DATA structure doesn’t lend itself well to that since the cFileName Field is very likely to be too short for many full paths and also it’s not exactly good form to start smacking about with structures like this; It’s sort of like going to a Jewish or Islamic festival and throwing ham at everyone, it’s just not good form.
One of the requisites for finding files recursively is that the File Mask you pass into the Find*File API is actually a mask for everything it finds; what this means is that if you specify a mask it will not find Directories unless those Directories happen to match that mask. In order to deal with this, we will pass in * as the mask to get all files, and then filter the results ourself using a Regular Expression. “Oh no, Not that” you say. It’s not really changing much, however; the standard File Mask syntax can be considered a limited subset of a Regular Expression, so we should be able to get by with a few replacements:
1 2 3 4 5 |
private static bool FitsMask(string sFileName, string sFileMask) { Regex mask = new Regex(sFileMask.Replace(".", "[.]").Replace("*", ".*").Replace("?", ".")); return mask.IsMatch(sFileName); } |
With that replacement in place, we should be able to use FitsMask() to determine if a File result matches a given file mask. In fact, we could replace the basic logic with a delegate method for testing both directories as well as files to see if they should be yielded, but we can add that after we get the basic method completed.
The core logic is in retrieving Files and Directories from a single Directory, using an Iterator method, naturally:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
public static IEnumerable<FSAPI.WIN32_FIND_DATA> FindFiles(String sMask) { String usemask = sMask; FSAPI.WIN32_FIND_DATA fdata; IntPtr fHandle = IntPtr.Zero; try { fHandle = FSAPI.FindFirstFile(usemask, out fdata); while (fHandle != IntPtr.Zero) { yield return fdata; if (!FSAPI.FindNextFile(fHandle, out fdata)) break; } } finally { if (fHandle != IntPtr.Zero) FSAPI.FindClose(fHandle); } } |
Some notes: this just uses the FindFile API Functions, iterates and yield returns each result, and closes the Find Handle if necessary in a finally Block. We will build the extra functionality on top of this.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
public static IEnumerable<FSAPI.FileFindData> FindFiles(String sDirectory,String sMask,bool recursive) { String[] ExcludeFolders = new String[] { ".", ".." }; String usemask = Path.Combine(sDirectory, "*"); foreach (var iterate in FindFiles(usemask)) { if(((FileAttributes)iterate.dwFileAttributes).HasFlag(FileAttributes.Directory)) { if (!ExcludeFolders.Contains(iterate.Filename) && recursive) { //recursive call. String subdir = Path.Combine(sDirectory, iterate.Filename); foreach (var subiterate in FindFiles(subdir,sMask,true)) { var copied = subiterate; copied.FullFileName = Path.Combine(subdir, subiterate.FullFileName); yield return copied; } } } else { if (FitsMask(iterate.Filename, sMask)) { FSAPI.FileFindData ffd = new FSAPI.FileFindData(); ffd.APIStructure = iterate; ffd.FullFileName = ffd.APIStructure.Filename; yield return ffd; } } } } |
the logic here is basically to go through all the files in the given folder, if it finds a directory and the recursive argument is true, it will recursively call itself on that subdirectory. If it’s a file, it will yield a FileFindData structure to the caller. However we can improve this by actually abstracting the logic for recursion selection and whether a file matches a filter to a delegate parameter:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
class FSEnumerator { private static bool FitsMask(string sFileName, string sFileMask) { Regex mask = new Regex(sFileMask.Replace(".", "[.]").Replace("*", ".*").Replace("?", ".")); return mask.IsMatch(sFileName); } /// <summary> /// delegate method used for testing if a Directory should be recursed into. /// </summary> /// <param name="Name">Name of the directory. Does not include full path.</param> /// <param name="FullPath">Full path to this directory.</param> /// <returns>true to indicate that this Filter passes. False otherwise.</returns> public delegate bool EntryTestRoutine(String Name, String FullPath); //default EntryTestRoutine implementation for Directory Recursion. private static readonly String[] ExcludeFolders = new String[] { ".", ".." }; private static bool RecursionTest(String dName,String FullPath) { return !ExcludeFolders.Contains(dName); } public static IEnumerable<FSAPI.FileFindData> FindFiles(String sDirectory,String sMask,bool recurse) { return FindFiles(sDirectory,RecursionTest,(fn,ff)=>FitsMask(fn,sMask)); } public static IEnumerable<FSAPI.FileFindData> FindFiles(String sDirectory,EntryTestRoutine recurseTest, EntryTestRoutine filterTest) { recurseTest = recurseTest ?? RecursionTest; filterTest = filterTest ?? ((ds,fp)=>FitsMask(ds,"*")); String usemask = Path.Combine(sDirectory, "*"); foreach (var iterate in FindFiles(usemask)) { if(((FileAttributes)iterate.dwFileAttributes).HasFlag(FileAttributes.Directory)) { String subdir = Path.Combine(sDirectory, iterate.Filename); if (recurseTest(iterate.Filename, subdir)) { //recursive call. foreach (var subiterate in FindFiles(subdir,recurseTest,filterTest)) { var copied = subiterate; copied.FullFileName = Path.Combine(subdir, subiterate.FullFileName); yield return copied; } } } else { String fullpath = Path.Combine(sDirectory,iterate.Filename); if (filterTest(iterate.Filename, fullpath) ) { FSAPI.FileFindData ffd = new FSAPI.FileFindData(); ffd.APIStructure = iterate; ffd.FullFileName = fullpath; yield return ffd; } } } } public static IEnumerable<FSAPI.WIN32_FIND_DATA> FindFiles(String sMask){ String usemask = sMask; FSAPI.WIN32_FIND_DATA fdata; IntPtr fHandle = IntPtr.Zero; try { fHandle = FSAPI.FindFirstFile(usemask, out fdata); while (fHandle != IntPtr.Zero) { yield return fdata; if (!FSAPI.FindNextFile(fHandle, out fdata)) break; } } finally { if (fHandle != IntPtr.Zero) FSAPI.FindClose(fHandle); } } //FileSystem Enumerator public static IEnumerable<FSAPI.WIN32_FIND_DATA> FindFiles(String sDirectory, String sMask) { FSAPI.WIN32_FIND_DATA fdata; String usemask = Path.Combine(sDirectory, sMask); foreach (var iterate in FindFiles(usemask)) yield return iterate; } } |
And There you have it! This could be expanded to provide Extension Methods on DirectoryInfo that return Directories and Files if necessary. One idea might be to change the delegate to accept the actual WIN32_FIND_DATA structure for filtering purposes, making it easier to filter on more of the File’s various properties.
Have something to say about this post? Comment!