A File Iterator that actually works | BASeCamp Programming Blog

As it turns out, the Asynchronous File Iterator I wrote about way back in June 2013 has something of an issue. The big one being that it actually doesn’t work properly! Serves me right not testing it on real data. I actually discovered this issue when I returned to it to try to change it to use C# 6.0 features, and found that it didn’t work as is anymore (if at all).

I also noticed that it was- to be honest, poorly designed. I think I may have rushed through it to get it into a blog post without fully considering the design decisions. This became apparent as I watched the search results trickle in while there were over 4 threads running. The problem was that every directory search would be a new thread, which included, of course, those subdirectory searches performed by a subdirectory search.

I decided to return to the drawing board and try to redesign it in a more sane manner. Since this is a fairly straightforward program/implementation it seemed easier for me to do it this way than figure out how to fix the original, particularly since the original appeared to have been rushed to meet some imagined posting deadline.

The main purpose of the asynchronous search is to try to increase performance over a fully synchronous search. With most Hard Disks, reading directory information from multiple locations won’t significantly impact performance, particularly with an SSD. For the moment I’ve decided on two threads maximum. Due to the issues with the original, I’ve decided to have a limitation- threads will only be started for the directories in the initial search directory, and any search in a directory lower down (higher up?) the directory tree will instead be performed synchronously. Naturally, our friend FindFirstFile() and co. appear. The way I implemented it doesn’t use the Mask parameter to FindFirstFile in order to allow only a single directory traversal to grab directories as well as files, and then the files are filtered based on the provided mask using the Like functionality I described in a previous blog post. (Interestingly, I wrote about it twice- in the linked post, as well as here. It’s pretty obvious you’ve written a lot of content when you find you wrote about things twice without realizing it.)

In my mind, the idea behind the class is to be a “lower level” class that will be consumed by a more advanced Search class, with the advanced search class supporting things like Advanced Filters and perhaps Actions that can be taken on each result. The more “low-level” class would expose two events. An event fired when a search completes, and an event fired when an item is found.

The “old” class I discussed in the previous post used a design that managed state via fields in the class. In my redesign I’ll “cheat” and make sure the actual search function itself is synchronous, and provide asynchronous behaviour via a wrapper. I should mention that I actually use the ‘Thread’ model, rather than an async routine. I admit this is because I’ve yet to fully grok async. The fact that my work requires .NET Framework 4 and thus we cannot use the async stuff (meaning I don’t get any experience with it through my work) doesn’t help. On the other hand- this does mean this code will work in older versions of C#. At any rate, since we will be dealing with events, I suppose we ought to define them first:

public class AsyncFileFindCompleteEventArgs : EventArgs
{
    public enum CompletionCauseEnum
    {
        Complete_Success,
        Complete_Cancelled
    }
    private CompletionCauseEnum _completionCause = CompletionCauseEnum.Complete_Success;
    public CompletionCauseEnum CompletionCause { get { return _completionCause; } set { _completionCause = value; } }
    public AsyncFileFindCompleteEventArgs(CompletionCauseEnum CompletionType)
    {
        _completionCause = CompletionType;
    }
    //fired when a FileFind operation completes.
}
public class AsyncFileFoundEventArgs : EventArgs
{
    private FileSearchResult _Result = null;
    public FileSearchResult Result { get { return _Result; } }
    public AsyncFileFoundEventArgs(FileSearchResult result)
    {
        _Result = result;
    }
}

public class AsyncFileFindCompleteEventArgs : EventArgs

{

public enum CompletionCauseEnum

{

Complete_Success,

Complete_Cancelled

}

private CompletionCauseEnum _completionCause = CompletionCauseEnum.Complete_Success;

public CompletionCauseEnum CompletionCause { get { return _completionCause; } set { _completionCause = value; } }

public AsyncFileFindCompleteEventArgs(CompletionCauseEnum CompletionType)

{

_completionCause = CompletionType;

}

//fired when a FileFind operation completes.

}

public class AsyncFileFoundEventArgs : EventArgs

{

private FileSearchResult _Result = null;

public FileSearchResult Result { get { return _Result; } }

public AsyncFileFoundEventArgs(FileSearchResult result)

{

_Result = result;

}

The Completion event will contain a property to indicate “how” it completed. Alternatively, it could use distinct events, or even use a event heirarchy (eg AsyncFileFindCompletion event being an abstract base class with a “AsyncFileFindCancelled” subclass and a “AsyncFileFindFinished” subclass.). Cancelling is one consideration that is pretty important- we want to be able to cancel the search at any time by calling a Cancel method. Calling the Cancel method should guarantee that, after you call it, you will not receive any events from files being found from that search.

Similarly, since this will be asynchronous and the caller will be given control right away, we’ll want to have some sort of re-entrancy protection. the Start method will detect if a search is already in progress, and, if so, it should raise this exception:

public class SearchAlreadyInProgressException : Exception
{
    public SearchAlreadyInProgressException(SerializationInfo info,StreamingContext context):base(info,context)
    {

    }
    public SearchAlreadyInProgressException():base("AsyncFileFinder Search Already started")
    {

    }
}

public class SearchAlreadyInProgressException : Exception

{

public SearchAlreadyInProgressException(SerializationInfo info,StreamingContext context):base(info,context)

{

}

public SearchAlreadyInProgressException():base("AsyncFileFinder Search Already started")

{

}

And, the main class itself:

public class AsyncFileFinder
{
    public delegate bool FilterDelegate(FileSearchResult fsearch);

    private bool _Cancelled = false;
    private bool isChild = false;
    private String _SearchDirectory = "";
    private String _SearchMask = "*";

     
    private FilterDelegate FileFilter = null;
    private FilterDelegate DirectoryRecursionFilter = null;
    private Thread SearchThread = null;
    private bool _IsSearching = false;
    private ConcurrentQueue<FileSearchResult> FoundElements = new ConcurrentQueue<FileSearchResult>();

    public event EventHandler<AsyncFileFindCompleteEventArgs> AsyncFileFindComplete;
    public event EventHandler<AsyncFileFoundEventArgs> AsyncFileFound;
    public String SearchDirectory
    {
        get { return _SearchDirectory; }
        set { _SearchDirectory = value; }
    }

    public String SearchMask
    {
        get { return _SearchMask; }
        set { _SearchMask = value; }
    }

    public void Cancel()
    {
        lock (ChildDirectorySearchers)
        {
            foreach (var iterate in ChildDirectorySearchers)
            {
                iterate.Cancel();
            }
        }
        _IsSearching = false;
        _Cancelled = true;
    }

    private void FireAsyncFileFound(AsyncFileFoundEventArgs e)
    {
        lock (this)
        {
            var copied = AsyncFileFound;
            if (copied != null) copied(this, e);
        }
    }

    private void FireAsyncFileFindComplete(AsyncFileFindCompleteEventArgs.CompletionCauseEnum completionCauseEnum)
    {
        var copied = AsyncFileFindComplete;
        if (copied != null) copied(this, new AsyncFileFindCompleteEventArgs(completionCauseEnum));
    }

    public void FireAsyncFileFindComplete()
    {
        FireAsyncFileFindComplete(AsyncFileFindCompleteEventArgs.CompletionCauseEnum.Complete_Success);
    }

    public bool IsSearching
    {
        get { return _IsSearching; }
    }

    public bool HasResults()
    {
        return !FoundElements.IsEmpty;
    }

    public FileSearchResult GetNextResult()
    {
        FileSearchResult ResultItem = null;
        while (!FoundElements.TryDequeue(out ResultItem))
        {
        }
        return ResultItem;
    }

    public AsyncFileFinder(String pSearchDirectory, String pSearchMask, FilterDelegate pFileFilter = null, FilterDelegate pDirectoryRecursionFilter = null,
        bool pIsChild = false)
    {
        _SearchDirectory = pSearchDirectory;
        _SearchMask = pSearchMask;
        FileFilter = pFileFilter;
        DirectoryRecursionFilter = pDirectoryRecursionFilter;
        isChild = pIsChild;
    }

    public void Start()
    {
        //let's default our Filters if they aren't provided.
        if (FileFilter == null) FileFilter = ((s) => true);
        if (DirectoryRecursionFilter == null) DirectoryRecursionFilter = ((s) => true);
        if (IsSearching) throw new SearchAlreadyInProgressException();
        //alright, we want to do this asynchronously, so start StartInternal on another thread.
        _IsSearching = true;
        SearchThread = new Thread(StartSync);
        SearchThread.Start();
    }

    private bool FitsMask(string fileName, string fileMask)
    {
        return fileName.Like(fileMask);
    }

    private List<AsyncFileFinder> ChildDirectorySearchers = new List<AsyncFileFinder>();
    private int MaxChildren = 2;

    public void StartSync()
    {
        ChildDirectorySearchers = new List<AsyncFileFinder>();
        Debug.Print("StartSync Called, Searching in " + _SearchDirectory + " For Mask " + _SearchMask);
        String sSearch = Path.Combine(_SearchDirectory, "*");
        Queue<String> Directories = new Queue<string>();
        //Task: 
        //First, Search our folder for matching files and add them to the queue of results.
        Debug.Print("Searching for files in folder");
        NativeMethods.WIN32_FIND_DATA FindData;
        IntPtr fHandle = NativeMethods.FindFirstFile(sSearch, out FindData);
        while (fHandle != IntPtr.Zero)
        {
            if (_Cancelled)
            {
                FireAsyncFileFindComplete(AsyncFileFindCompleteEventArgs.CompletionCauseEnum.Complete_Cancelled);
                return;
            }
            //if the result is a Directory, add it to the list of result directories if it passes the recursion test.
            if ((FindData.dwFileAttributes & FileAttributes.Directory) == FileAttributes.Directory)
            {
                if (FindData.Filename != "." && FindData.Filename != "..")
                    if (DirectoryRecursionFilter(new FileSearchResult(FindData, Path.Combine(sSearch, FindData.Filename))))
                    {
                        Debug.Print("Found Directory:" + FindData.Filename + " Adding to Directory Queue.");
                        Directories.Enqueue(FindData.Filename);
                    }
            }
            else if (FindData.Filename.Length > 0)
            {
                //make sure it matches the given mask.
                if (FitsMask(FindData.Filename, _SearchMask))
                {
                    FileSearchResult fsr = new FileSearchResult(FindData, Path.Combine(_SearchDirectory, FindData.Filename));
                    if (FileFilter(fsr) && !_Cancelled)
                    {
                        Debug.Print("Found File " + fsr.FullPath + " Raising Found event.");
                        FireAsyncFileFound(new AsyncFileFoundEventArgs(fsr));
                    }
                }
            }
            FindData = new NativeMethods.WIN32_FIND_DATA();
            if (!NativeMethods.FindNextFile(fHandle, out FindData))
            {
                Debug.Print("FindNextFile returned False, closing handle...");
                NativeMethods.FindClose(fHandle);
                fHandle = IntPtr.Zero;
            }
        }


        //find all directories in the search folder which also satisfy the Recursion test.
        //Construct a new AsyncFileFinder to search within that folder with the same Mask and delegates for each one.
        //Allow MaxChildren to run at once. When a running filefinder raises it's complete event, remove it from the List, and start up one of the ones that have not been run.
        //if isChild is true, we won't actually multithread this task at all.

        Debug.Print("File Search completed. Starting search of " + Directories.Count + " directories found in folder " + _SearchDirectory);
        while (Directories.Count > 0 || ChildDirectorySearchers.Count > 0)
        {
            if (_Cancelled)
            {
                break;
            }
            while (ChildDirectorySearchers.Count >= MaxChildren) Thread.Sleep(5);
            //add enough AsyncFileFinders to the ChildDirectorySearchers bag to hit the MaxChildren limit.

            if (Directories.Count == 0)
            {
                Debug.Print("No directories left. Waiting for Child Search instances to complete.");
                Thread.Sleep(5);
                continue;
            }
            Debug.Print("There are " + ChildDirectorySearchers.Count + " Searchers active. Starting more.");
            String startchilddir = Directories.Dequeue();
            startchilddir = Path.Combine(_SearchDirectory, startchilddir);
            AsyncFileFinder ChildSearcher = new AsyncFileFinder(startchilddir, _SearchMask, FileFilter, DirectoryRecursionFilter, true);
            ChildSearcher.AsyncFileFound += (senderchild, foundevent) =>
            {
                AsyncFileFinder source = senderchild as AsyncFileFinder;

                if (!_Cancelled) FireAsyncFileFound(foundevent);
            };
            ChildSearcher.AsyncFileFindComplete += (ob, ev) =>
            {
                AsyncFileFinder ChildSearch = (AsyncFileFinder) ob;
                lock (ChildDirectorySearchers)
                {
                    Debug.Print("Child Searcher " + ChildSearch.SearchDirectory + " issued a completion event, removing from list.");
                    ChildDirectorySearchers.Remove(ChildSearch);
                }
            };

            ChildDirectorySearchers.Add(ChildSearcher);

            if (!isChild)
            {
                Debug.Print("Starting sub-search asynchronously");
                ChildSearcher.Start();
            }
            else
            {
                Debug.Print("Starting sub-search synchronously");
                ChildSearcher.StartSync();
            }
        }
        Debug.Print("Exited Main Search Loop: Queue:" + Directories.Count + " Child Searchers:" + ChildDirectorySearchers.Count);


        _IsSearching = false;
        FireAsyncFileFindComplete
            (_Cancelled ? AsyncFileFindCompleteEventArgs.CompletionCauseEnum.Complete_Cancelled : AsyncFileFindCompleteEventArgs.CompletionCauseEnum.Complete_Success);
    }
}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

public class AsyncFileFinder

{

public delegate bool FilterDelegate(FileSearchResult fsearch);

private bool _Cancelled = false;

private bool isChild = false;

private String _SearchDirectory = "";

private String _SearchMask = "*";

private FilterDelegate FileFilter = null;

private FilterDelegate DirectoryRecursionFilter = null;

private Thread SearchThread = null;

private bool _IsSearching = false;

private ConcurrentQueue<FileSearchResult> FoundElements = new ConcurrentQueue<FileSearchResult>();

public event EventHandler<AsyncFileFindCompleteEventArgs> AsyncFileFindComplete;

public event EventHandler<AsyncFileFoundEventArgs> AsyncFileFound;

public String SearchDirectory

{

get { return _SearchDirectory; }

set { _SearchDirectory = value; }

}

public String SearchMask

{

get { return _SearchMask; }

set { _SearchMask = value; }

}

public void Cancel()

{

lock (ChildDirectorySearchers)

{

foreach (var iterate in ChildDirectorySearchers)

{

iterate.Cancel();

}

_IsSearching = false;

_Cancelled = true;

}

private void FireAsyncFileFound(AsyncFileFoundEventArgs e)

{

lock (this)

{

var copied = AsyncFileFound;

if (copied != null) copied(this, e);

}

private void FireAsyncFileFindComplete(AsyncFileFindCompleteEventArgs.CompletionCauseEnum completionCauseEnum)

{

var copied = AsyncFileFindComplete;

if (copied != null) copied(this, new AsyncFileFindCompleteEventArgs(completionCauseEnum));

}

public void FireAsyncFileFindComplete()

{

FireAsyncFileFindComplete(AsyncFileFindCompleteEventArgs.CompletionCauseEnum.Complete_Success);

}

public bool IsSearching

{

get { return _IsSearching; }

}

public bool HasResults()

{

return !FoundElements.IsEmpty;

}

public FileSearchResult GetNextResult()

{

FileSearchResult ResultItem = null;

while (!FoundElements.TryDequeue(out ResultItem))

{

}

return ResultItem;

}

public AsyncFileFinder(String pSearchDirectory, String pSearchMask, FilterDelegate pFileFilter = null, FilterDelegate pDirectoryRecursionFilter = null,

bool pIsChild = false)

{

_SearchDirectory = pSearchDirectory;

_SearchMask = pSearchMask;

FileFilter = pFileFilter;

DirectoryRecursionFilter = pDirectoryRecursionFilter;

isChild = pIsChild;

}

public void Start()

{

//let's default our Filters if they aren't provided.

if (FileFilter == null) FileFilter = ((s) => true);

if (DirectoryRecursionFilter == null) DirectoryRecursionFilter = ((s) => true);

if (IsSearching) throw new SearchAlreadyInProgressException();

//alright, we want to do this asynchronously, so start StartInternal on another thread.

_IsSearching = true;

SearchThread = new Thread(StartSync);

SearchThread.Start();

}

private bool FitsMask(string fileName, string fileMask)

{

return fileName.Like(fileMask);

}

private List<AsyncFileFinder> ChildDirectorySearchers = new List<AsyncFileFinder>();

private int MaxChildren = 2;

public void StartSync()

{

ChildDirectorySearchers = new List<AsyncFileFinder>();

Debug.Print("StartSync Called, Searching in " + _SearchDirectory + " For Mask " + _SearchMask);

String sSearch = Path.Combine(_SearchDirectory, "*");

Queue<String> Directories = new Queue<string>();

//Task:

//First, Search our folder for matching files and add them to the queue of results.

Debug.Print("Searching for files in folder");

NativeMethods.WIN32_FIND_DATA FindData;

IntPtr fHandle = NativeMethods.FindFirstFile(sSearch, out FindData);

while (fHandle != IntPtr.Zero)

{

if (_Cancelled)

{

FireAsyncFileFindComplete(AsyncFileFindCompleteEventArgs.CompletionCauseEnum.Complete_Cancelled);

return;

}

//if the result is a Directory, add it to the list of result directories if it passes the recursion test.

if ((FindData.dwFileAttributes & FileAttributes.Directory) == FileAttributes.Directory)

{

if (FindData.Filename != "." && FindData.Filename != "..")

if (DirectoryRecursionFilter(new FileSearchResult(FindData, Path.Combine(sSearch, FindData.Filename))))

{

Debug.Print("Found Directory:" + FindData.Filename + " Adding to Directory Queue.");

Directories.Enqueue(FindData.Filename);

}

else if (FindData.Filename.Length > 0)

{

//make sure it matches the given mask.

if (FitsMask(FindData.Filename, _SearchMask))

{

FileSearchResult fsr = new FileSearchResult(FindData, Path.Combine(_SearchDirectory, FindData.Filename));

if (FileFilter(fsr) && !_Cancelled)

{

Debug.Print("Found File " + fsr.FullPath + " Raising Found event.");

FireAsyncFileFound(new AsyncFileFoundEventArgs(fsr));

}

FindData = new NativeMethods.WIN32_FIND_DATA();

if (!NativeMethods.FindNextFile(fHandle, out FindData))

{

Debug.Print("FindNextFile returned False, closing handle...");

NativeMethods.FindClose(fHandle);

fHandle = IntPtr.Zero;

}

//find all directories in the search folder which also satisfy the Recursion test.

//Construct a new AsyncFileFinder to search within that folder with the same Mask and delegates for each one.

//Allow MaxChildren to run at once. When a running filefinder raises it's complete event, remove it from the List, and start up one of the ones that have not been run.

//if isChild is true, we won't actually multithread this task at all.

Debug.Print("File Search completed. Starting search of " + Directories.Count + " directories found in folder " + _SearchDirectory);

while (Directories.Count > 0 || ChildDirectorySearchers.Count > 0)

{

if (_Cancelled)

{

break;

}

while (ChildDirectorySearchers.Count >= MaxChildren) Thread.Sleep(5);

//add enough AsyncFileFinders to the ChildDirectorySearchers bag to hit the MaxChildren limit.

if (Directories.Count == 0)

{

Debug.Print("No directories left. Waiting for Child Search instances to complete.");

Thread.Sleep(5);

continue;

}

Debug.Print("There are " + ChildDirectorySearchers.Count + " Searchers active. Starting more.");

String startchilddir = Directories.Dequeue();

startchilddir = Path.Combine(_SearchDirectory, startchilddir);

AsyncFileFinder ChildSearcher = new AsyncFileFinder(startchilddir, _SearchMask, FileFilter, DirectoryRecursionFilter, true);

ChildSearcher.AsyncFileFound += (senderchild, foundevent) =>

{

AsyncFileFinder source = senderchild as AsyncFileFinder;

if (!_Cancelled) FireAsyncFileFound(foundevent);

};

ChildSearcher.AsyncFileFindComplete += (ob, ev) =>

{

AsyncFileFinder ChildSearch = (AsyncFileFinder) ob;

lock (ChildDirectorySearchers)

{

Debug.Print("Child Searcher " + ChildSearch.SearchDirectory + " issued a completion event, removing from list.");

ChildDirectorySearchers.Remove(ChildSearch);

}

};

ChildDirectorySearchers.Add(ChildSearcher);

if (!isChild)

{

Debug.Print("Starting sub-search asynchronously");

ChildSearcher.Start();

}

else

{

Debug.Print("Starting sub-search synchronously");

ChildSearcher.StartSync();

}

Debug.Print("Exited Main Search Loop: Queue:" + Directories.Count + " Child Searchers:" + ChildDirectorySearchers.Count);

_IsSearching = false;

FireAsyncFileFindComplete

(_Cancelled ? AsyncFileFindCompleteEventArgs.CompletionCauseEnum.Complete_Cancelled : AsyncFileFindCompleteEventArgs.CompletionCauseEnum.Complete_Success);

}

OK so it ended up a bit longer than I originally expected. The Debug statements use the Debug class I discussed Previously. I’ve plonked the whole thing into a new FileIterator repository. This was totally not because I couldn’t figure out how to change the existing AsyncFileIterator repository to accept my changes, that was just a coincidence.

Now that it actually works, I’ll be able to properly revise it to take advantage of C# 5.0 and CD 6.0 features and discuss those improvements to the project in a later post.

Have something to say about this post? Comment!