Embarassingly, I have come back to the FileIterator for other reasons only to discover, much to my horror, that it doesn’t work! It seems as if the work queue is empty far too fast or there are any number of other issues with the program that I’ve been unable to identify. It is unclear what the cause of the problem is- I suspect I will be “forced” to rewrite it entirely. If anybody sees what might be causing it please let me know so I can try to address it!
In a previous post, I looked at using the Win32 API through P/Invoke to directly call the Win32 FindFirstFile, FindNextFile, and FindClose() methods to perform a File search and retrieve those results using an Enumerator Method.
Here, we will look at what we can do to create a fully asynchronous search; in terms of allowing the results to process as the search continues, as well as perform that evaluation in a manner that best uses the available processing by processing multiple elements at the same time.
The first thing is that this won’t be an implementation of an interface; this has a large disadvantage in terms of Usability, but a wrapper could be created to provide IEnumerable<?> capabilities, so it won’t be something that should stop us.
The methodology I’ve thought of for this particular implementation is to use A background worker. When a search is started, the class will spin off that background worker, then proceed to perform it’s search, calling the appropriate filter delegate, and then if the resulting file is determined to “match” then it is added to a ConcurrentQueue. That ConcurrentQueue is examined intently by the BackgroundWorker, which waits until the Queue is Empty and the search is complete before it allows itself to exit. This is to protect from the eventuality where the actual Search logic that adds items to the Queue isn’t able to keep up with the BackgroundWorker Dequeue-ing elements.
We want to allow extensibility and custom filtering for both determining if the File counts as a “match” as well as whether a Directory should be recursed into. Appropriate overloads can be added that provide their own definitions of the more complex delegates, but we want to aim for a core level of functionality on which to build that simpler syntax.
The delegates
While we could feasibly use Predicate<> types as the parameters to the constructor of our search class for filtering, we will instead create specific delegates so we can more fully document the parameters and purpose of the delegates in XML documentation.
Since we will be dealing with the Win32 Find API, the appropriate declarations are a must. The required declarations are the struct, WIN32_FIND_DATA, the constants MAX_PATH and MAX_ALTERNATE, the ERROR_NO_MORE_FILES flag, as well as the FindFirstFile,FindNextFile, and FindClose API functions. We place these in the conventional NativeMethods class.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
public class NativeMethods { public const int MAX_PATH = 260; public const int MAX_ALTERNATE = 14; public static readonly IntPtr ERROR_NO_MORE_FILES = new IntPtr(18); [DllImport("kernel32.dll", CharSet = CharSet.Auto)] public static extern IntPtr FindFirstFile(string lpFileName, out WIN32_FIND_DATA lpFindFileData); [DllImport("kernel32.dll", CharSet = CharSet.Auto)] public static extern bool FindNextFile(IntPtr hFindFile, out WIN32_FIND_DATA lpFindFileData); [DllImport("kernel32.dll")] public static extern bool FindClose(IntPtr hFindFile); [StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)] public struct WIN32_FIND_DATA { public FileAttributes dwFileAttributes; public FILETIME ftCreationTime; public FILETIME ftLastAccessTime; public FILETIME ftLastWriteTime; public uint nFileSizeHigh; //changed all to uint, otherwise you run into unexpected overflow public uint nFileSizeLow; //| public uint dwReserved0; //| public uint dwReserved1; //v [MarshalAs(UnmanagedType.ByValTStr, SizeConst = MAX_PATH)] public string cFileName; [MarshalAs(UnmanagedType.ByValTStr, SizeConst = MAX_ALTERNATE)] public string cAlternate; public long FileSize { get { long b = nFileSizeLow; b = b << 32; b = b | (uint) nFileSizeHigh; return b; } } public String Filename { get { return cFileName.Replace("\0", "").Trim(); } } } } |
This also adds a few helpers. The actually search will be given two delegates, and we want to be able to pass in this information. So we will wrap the WIN32_FIND_DATA into a “FileSearchResult” class:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
public class FileSearchResult { [StructLayout(LayoutKind.Explicit)] struct LongToInt { [FieldOffset(0)] public long Long64; [FieldOffset(0)] public int LeftInt32; [FieldOffset(4)] public int RightInt32; } private NativeMethods.WIN32_FIND_DATA _FindData; private String _FullPath; public NativeMethods.WIN32_FIND_DATA FindData { get { return _FindData; } } public String FullPath { get { return _FullPath; } } private DateTime DateTimeFromFileTime(FILETIME source) { var str = new LongToInt() { LeftInt32 = source.dwLowDateTime, RightInt32 = source.dwHighDateTime }; return DateTime.FromFileTime(str.Long64); } public DateTime DateCreated { get { return DateTimeFromFileTime(_FindData.ftCreationTime); } } public DateTime DateAccessed { get { return DateTimeFromFileTime(_FindData.ftLastAccessTime); } } public DateTime DateModified { get { return DateTimeFromFileTime(_FindData.ftLastWriteTime); } } public FileAttributes Attributes { get { return _FindData.dwFileAttributes; } } public String FileName { get { return _FindData.cFileName.Trim(); } } public FileSearchResult(NativeMethods.WIN32_FIND_DATA pFindData, String pFullPath) { _FindData = pFindData; _FullPath = pFullPath; } } |
The purpose of this class is simple: First, we want to avoid instantiating a FileSystemInfo object if possible. The delegate method may not need that extra information, so we won’t create it ourselves. Instead, we provide easier wrappers around the WIN32_FIND_DATA, which includes wrapping the FILETIME structures and exposing them as the ‘everyday’ DateTime class type, which is far more familiar.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 |
public class FileFinder { /// <summary > /// Method called with search results for processing. /// </summary> /// <param name="foundData">Data of found item.</param> public delegate void FoundRoutine(FileSearchResult foundData); /// <summary> /// SearchFilter predicate function, used to determine if a found result should invoke the FoundRoutine. /// </summary> /// <param name="foundData">Data result of the file that was found.</param> /// <returns>true to indicate that this item passes the filter; false to reject it. </returns> public delegate bool SearchFilter(FileSearchResult foundData); private static readonly int MaxSpunThreads = 5; private readonly List<FileFinder> ChildSearchers = new List<FileFinder>(); private readonly SearchFilter Filter; private readonly SearchFilter RecurseTest; //FileFinder class implements a queue-based algorithm that //can call back asynchronously with each element as it is retrieved. private readonly String SearchMask; private readonly String SearchString; private readonly ConcurrentQueue<FileSearchResult> WorkQueue = new ConcurrentQueue<FileSearchResult>(); private readonly FoundRoutine performAction; private Action<FileFinder> SearchComplete; private bool _Finished; private bool _SearchComplete; private BackgroundWorker bw; public FileFinder(String sDir, String sMask, SearchFilter pFilter, FoundRoutine act, bool recurse) : this(sDir, sMask, pFilter, act, recurse ? ((data) => true) : (SearchFilter) null) { } public FileFinder(String sDir, String sMask, SearchFilter pFilter, FoundRoutine act, SearchFilter RecursionTest) { if (sDir == null) throw new ArgumentNullException("sDir"); if (sMask == null) throw new ArgumentNullException("sMask"); if (act == null) throw new ArgumentNullException("act"); SearchMask = sMask; String fullpath = Path.Combine(sDir, sMask); RecurseTest = RecursionTest; SearchString = fullpath; Filter = pFilter; performAction = act; } public bool Finished { get { return _Finished; } } public static IEnumerable<FileSearchResult> Enumerate(String sDir, String sMask, SearchFilter pFilter, bool recursive) { return Enumerate(sDir, sMask, pFilter, recursive ? ((a) => true) : (SearchFilter) null); } public static IEnumerable<FileSearchResult> Enumerate(String sDir, String sMask, SearchFilter pFilter, SearchFilter RecursionTest) { var buffer = new ConcurrentQueue<FileSearchResult>(); var ff = new FileFinder(sDir, sMask, pFilter, buffer.Enqueue, RecursionTest); //start a thread, call ff.Start() from within the thread to prevent us from being blocked. var WorkThread = new Thread(ff.Start); WorkThread.Start(); bool dequeueSuccess = false; //variables used to store if we dequeued an items successfully. do { FileSearchResult currentitem = null; if (dequeueSuccess = buffer.TryDequeue(out currentitem)) { yield return currentitem; } } while (dequeueSuccess || !ff.Finished); //loop while we are getting items or the Searcher is still working. } public void Cancel() { foreach (FileFinder iterate in ChildSearchers) { iterate.Cancel(); } if (bw != null) bw.CancelAsync(); _Finished = _SearchComplete = true; } public void Start() { _SearchComplete = false; _Finished = false; String usepath = SearchString.Substring(0, SearchString.Length - Path.GetFileName(SearchString).Length); if (bw == null) bw = new BackgroundWorker(); //connect up the event handler for the background worker. bw.DoWork += bw_DoWork; bw.RunWorkerAsync(); //if we are running a recursive search, create another Finder to search for directories. if (RecurseTest != null) { var DirFinder = new FileFinder(usepath, "*", (dat) => { return (dat.Attributes & FileAttributes.Directory) == FileAttributes.Directory && dat.FileName != "." && dat.FileName != ".." && RecurseTest(dat); //find directories, but we don't want . or .. recursing into those folders is asking for trouble. }, (data) => { //with each directory, create a new FileFinder that searches that directory //in the same way that we are. //we will add it to our ChildSearchers list and then get it to remove itself when it is finished. //this requires locking around the List<T>. var TargetSearcher = new FileFinder(data.FullPath, SearchMask, Filter, performAction, RecurseTest); lock (ChildSearchers) { ChildSearchers.Add(TargetSearcher); }; TargetSearcher.SearchComplete = (a) => { lock(ChildSearchers){ChildSearchers.Remove(a);} }; TargetSearcher.Start(); }, false); DirFinder.Start(); } //find all entries and add them to the work queue. NativeMethods.WIN32_FIND_DATA retrieved; IntPtr findHandle = NativeMethods.FindFirstFile(SearchString, out retrieved); //search as long as the findhandle is not a null pointer and is not the error value indicating we found all the files. while (findHandle != IntPtr.Zero && findHandle != NativeMethods.ERROR_NO_MORE_FILES) { String fullpath = Path.Combine(usepath, retrieved.cFileName); if (Filter == null || Filter(new FileSearchResult(retrieved, fullpath))) WorkQueue.Enqueue(new FileSearchResult(retrieved, fullpath)); //find next! if (!NativeMethods.FindNextFile(findHandle, out retrieved)) break; //if FindNextFile returns 0, break. } if (findHandle != IntPtr.Zero && findHandle != NativeMethods.ERROR_NO_MORE_FILES) { NativeMethods.FindClose(findHandle); } _SearchComplete = true; } private void bw_DoWork(object sender, DoWorkEventArgs e) { var SpanThreads = new HashSet<Thread>(); //keep track of the Threads we've spun so far. int spuncount = 0; //number of spun threads. do { if (!WorkQueue.IsEmpty && spuncount < MaxSpunThreads) { //if it's not empty, and we haven't hit the spun thread limit, we need to spin one out. FileSearchResult popped = null; if (!bw.CancellationPending && WorkQueue.TryDequeue(out popped)) { //if the dequeue was successful, spin off a new thread to perform action. if (performAction != null) { var SpinIt = new Thread(() => { performAction(popped); lock (SpanThreads) { SpanThreads.Remove(Thread.CurrentThread); } ; }); lock (SpanThreads) { SpanThreads.Add(SpinIt); } ; SpinIt.Start(); } } } //keep going as long as the search is not complete, the workqueue isn't empty, or there //are threads still running. lock (SpanThreads) { spuncount = SpanThreads.Count; } } while (!_SearchComplete || !WorkQueue.IsEmpty || spuncount > 0); _Finished = true; if (SearchComplete != null) SearchComplete(this); } } |
And there it is. It even has the IEnumerable
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
class Program { static void FileFound(FileSearchResult founddata) { Console.WriteLine("Found:" + founddata.FullPath); Console.WriteLine("Created:" + founddata.DateCreated.ToString()); } static void Main(string[] args) { foreach (var iterate in FileFinder.Enumerate("L:\\web", "*.png", null, (a) => true)) { FileFound(iterate); } Console.ReadKey(); } } |
As it is there are some improvements I can think of. One would involve using Multicast Delegates rather than the basic Delegates, which is a fancy way of saying “use events fool!”. But at the same time that would probably fit more appropriately as a subclass which provides it’s own delegate implementations that instead call into events it declares, or something to that effect.
The full source to this Project can be found Here in a small github repo I created for it. Who knows, maybe it will mutate into a File Search API to Rival “BCFile”, but for .NET.
Have something to say about this post? Comment!