30 Jun 2013 @ 7:11 AM 


Embarassingly, I have come back to the FileIterator for other reasons only to discover, much to my horror, that it doesn’t work! It seems as if the work queue is empty far too fast or there are any number of other issues with the program that I’ve been unable to identify. It is unclear what the cause of the problem is- I suspect I will be “forced” to rewrite it entirely. If anybody sees what might be causing it please let me know so I can try to address it!

In a previous post, I looked at using the Win32 API through P/Invoke to directly call the Win32 FindFirstFile, FindNextFile, and FindClose() methods to perform a File search and retrieve those results using an Enumerator Method.

Here, we will look at what we can do to create a fully asynchronous search; in terms of allowing the results to process as the search continues, as well as perform that evaluation in a manner that best uses the available processing by processing multiple elements at the same time.

The first thing is that this won’t be an implementation of an interface; this has a large disadvantage in terms of Usability, but a wrapper could be created to provide IEnumerable<?> capabilities, so it won’t be something that should stop us.

The methodology I’ve thought of for this particular implementation is to use A background worker. When a search is started, the class will spin off that background worker, then proceed to perform it’s search, calling the appropriate filter delegate, and then if the resulting file is determined to “match” then it is added to a ConcurrentQueue. That ConcurrentQueue is examined intently by the BackgroundWorker, which waits until the Queue is Empty and the search is complete before it allows itself to exit. This is to protect from the eventuality where the actual Search logic that adds items to the Queue isn’t able to keep up with the BackgroundWorker Dequeue-ing elements.

We want to allow extensibility and custom filtering for both determining if the File counts as a “match” as well as whether a Directory should be recursed into. Appropriate overloads can be added that provide their own definitions of the more complex delegates, but we want to aim for a core level of functionality on which to build that simpler syntax.

The delegates

While we could feasibly use Predicate<> types as the parameters to the constructor of our search class for filtering, we will instead create specific delegates so we can more fully document the parameters and purpose of the delegates in XML documentation.

Since we will be dealing with the Win32 Find API, the appropriate declarations are a must. The required declarations are the struct, WIN32_FIND_DATA, the constants MAX_PATH and MAX_ALTERNATE, the ERROR_NO_MORE_FILES flag, as well as the FindFirstFile,FindNextFile, and FindClose API functions. We place these in the conventional NativeMethods class.

This also adds a few helpers. The actually search will be given two delegates, and we want to be able to pass in this information. So we will wrap the WIN32_FIND_DATA into a “FileSearchResult” class:

The purpose of this class is simple: First, we want to avoid instantiating a FileSystemInfo object if possible. The delegate method may not need that extra information, so we won’t create it ourselves. Instead, we provide easier wrappers around the WIN32_FIND_DATA, which includes wrapping the FILETIME structures and exposing them as the ‘everyday’ DateTime class type, which is far more familiar.

And there it is. It even has the IEnumerable implementation, as a static method. here’s a short program that uses it:

As it is there are some improvements I can think of. One would involve using Multicast Delegates rather than the basic Delegates, which is a fancy way of saying “use events fool!”. But at the same time that would probably fit more appropriately as a subclass which provides it’s own delegate implementations that instead call into events it declares, or something to that effect.

The full source to this Project can be found Here in a small github repo I created for it. Who knows, maybe it will mutate into a File Search API to Rival “BCFile”, but for .NET.

Posted By: BC_Programming
Last Edit: 21 May 2015 @ 11:02 PM

EmailPermalinkComments Off on Asynchronous, Queue-based File Iterators
Tags
Tags: , ,
Categories: C#, Programming
 04 Jun 2013 @ 11:45 PM 

One of the more interesting omissions from .NET so far has been that the System.IO namespace has not been updated to take advantage of new Language and CLR features. One of the more prominent features that I can think of would be Enumerator methods.

An Iterator method- also known by other names in other languages, such as generators in Python, is essentially a construct that allows a method to yield both a result value as well as flow control. For example:

will give us this output:

For comparison, the equivalent VB.NET Code:

Of course this is a rather simple example. The real power of this language feature comes when it is exploited to save both processing power and programming time.

Dealing with files is almost inevitable for desktop software. While Windows Phone and RT Applications have access to various alternate Storage techniques (in fact, actual File Access is forbidden for security reasons, to my understanding). This highlights an interesting historical artifact of the IO Namespace from before the introduction of Iterator Methods. As a result, Many methods are more Monolithic; for example, if you want to find a certain file in a directory of 100,000 files, you need to use one of the methods of the DirectoryInfo class to actually return all the files in that directory into a 100,000 item array, then search that array. Ideally, you would be able to search through the directory one file at a time, stopping the iteration when you find the file you need. But in the given case, even if the File you wanted happened to be the first item in the array, the System.IO namespace methods will still give you back a 100,000 item array containing all the items in that folder.

Of course, the nice thing about .NET is that it provides a wealth of language features that you can exploit for your own nefarious completely positive and happy rainbow purposes. This is one good example.

Creating a Iterator for Files

In order to create a File Iterator, we are going to have to use P/Invoke and use the File API provided by Windows. The File API Provided by Windows comes in the form of the FindFirstFile, FindNextFile, and FindClose API Functions. The first step is to get the proper P/Invoke declarations for these methods into a C# Class. An absolutely invaluable resource for this is P/Invoke.

. In order to go for the best organization, I opted to put these declarations in a separate class, which I called “FSAPI” because I didn’t think calling Timothy or Pauline seemed particularly appropriate:

With the Declarations in place, we only need to use them. In this case the FileFindData struct is not actually used by the P/Invoked API, instead we intend to use that to pass more information back to the caller. in particular we want to have recursive capability, and the WIN32_FIND_DATA structure doesn’t lend itself well to that since the cFileName Field is very likely to be too short for many full paths and also it’s not exactly good form to start smacking about with structures like this; It’s sort of like going to a Jewish or Islamic festival and throwing ham at everyone, it’s just not good form.

One of the requisites for finding files recursively is that the File Mask you pass into the Find*File API is actually a mask for everything it finds; what this means is that if you specify a mask it will not find Directories unless those Directories happen to match that mask. In order to deal with this, we will pass in * as the mask to get all files, and then filter the results ourself using a Regular Expression. “Oh no, Not that” you say. It’s not really changing much, however; the standard File Mask syntax can be considered a limited subset of a Regular Expression, so we should be able to get by with a few replacements:

With that replacement in place, we should be able to use FitsMask() to determine if a File result matches a given file mask. In fact, we could replace the basic logic with a delegate method for testing both directories as well as files to see if they should be yielded, but we can add that after we get the basic method completed.

The core logic is in retrieving Files and Directories from a single Directory, using an Iterator method, naturally:

Some notes: this just uses the FindFile API Functions, iterates and yield returns each result, and closes the Find Handle if necessary in a finally Block. We will build the extra functionality on top of this.

the logic here is basically to go through all the files in the given folder, if it finds a directory and the recursive argument is true, it will recursively call itself on that subdirectory. If it’s a file, it will yield a FileFindData structure to the caller. However we can improve this by actually abstracting the logic for recursion selection and whether a file matches a filter to a delegate parameter:

And There you have it! This could be expanded to provide Extension Methods on DirectoryInfo that return Directories and Files if necessary. One idea might be to change the delegate to accept the actual WIN32_FIND_DATA structure for filtering purposes, making it easier to filter on more of the File’s various properties.

Posted By: BC_Programming
Last Edit: 04 Jun 2013 @ 11:48 PM

EmailPermalinkComments Off on Iterator Methods and System.IO
Tags
Tags: , ,
Categories: C#, Programming

 Last 50 Posts
 Back
Change Theme...
  • Users » 47469
  • Posts/Pages » 391
  • Comments » 105

PP



    No Child Pages.

Windows optimization tips



    No Child Pages.

Soft. Picks



    No Child Pages.

VS Fixes



    No Child Pages.

PC Build 1: “FASTLORD”



    No Child Pages.