Most application frameworks/languages provide access to the Command Line parameters passed to your application. Generally it is passed to your application as either a string or an array of strings. What you do not get automatically is the functionality to parse out switches.
Command-line parameters used to be the only way to communicate with a program. Fundamentally, the command line was your UI into the program. Different platforms took different approaches. Unix-like systems typically take the “invasive” route; they replace wildcards and then pass the resulting command line to the application. This means that you don’t have to do any shell expansion of wildcards (as it is known) but you have to account for the fact that your command line could include a lot of files. It’s a trade-off, really. Either way, I figured for the purposes of this library, we could stick to the platform- if the program is run with a wildcard, you’ll see the wildcard on windows, but it will have been expanded if you run the same program on Linux. It might be worth adding an option to “auto-expand” wildcards- just for consistencies sake, but that seems like a post for another day.
Either way, most applications also include flags and switches. This is more a De Facto standard that has cropped up- there is no hard and fast rulebook about what flags and switches are or how you are supposed to pass arguments, which can cause no end of confusion when it comes to reading application documentation. the .NET language just gives you the string, and leaves it up to you to decide how to interpret it. Some language libraries provide functionality to parse the Command Line appropriately, such as Python. C# doesn’t come with such a class…. So let’s make one!
First we need to determine what exactly can exist in a command line. My method allows for two things: Switches, and arguments. A Switch can include an argument, separated from the switch with a colon. For example:
1 |
someprogram.exe /switch:argument /sw:"file 1.txt" "filename.txt" /doall |
In this case, we have three switches- switch, sw, and doall. The first two include an argument. My “syntax” allows for quotes in the arguments of switches as well as the “loose” arguments. We will evidently need classes to represent and parse Arguments, and another one for Switches. The parsing can be done sequentially. Although it’s not a recommended best practice, I chose to use by reference parameters in the class constructors. In order to keep things generic and accessible, both Switches and Arguments will derive from a CommandLineElement abstract class, which will force each base class to implement toString(). the ArgumentItem class will be used for parsing both “loose” arguments, as well as arguments found after a switch.
Arguments
Arguments are simple- if the first letter of the position is a quote, we look for the next quote that isn’t doubled up. Otherwise, we look for either the next whitespace or the end of the string. Each argument only needs the actual argument value.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
/// <summary> /// Represents an Argument. This includes arguments that exist bare on the command line as well as arguments used with a given switch. /// </summary> public class ArgumentItem : CommandLineElement { public static ArgumentItem Empty = new ArgumentItem(); private String _Argument; protected internal ArgumentItem() { _Argument = ""; } /// <summary> /// Construct an Instance from the given string, assuming the start of an Argument element at the given position. /// </summary> /// <param name="strParse">Command Line to parse.</param> /// <param name="Position">Location to start. This variable will be updated with the ending position of the argument that was discovered upon return.</param> public ArgumentItem(String strParse, ref int Position) { int startpos = Position; int sloc = startpos; while (char.IsWhiteSpace(strParse.ElementAt(sloc))) sloc++; if (strParse.ElementAt(sloc) == '"') { sloc++; while (true) { if (sloc >= strParse.Length) break; bool doublequote = strParse.Length > sloc + 2 && strParse.Substring(sloc, 2).Equals("\""); //if we found a quote and it's not a double quote... if (strParse.ElementAt(sloc) == '"' && !doublequote) { sloc++; break; } if (doublequote) sloc++; //add an extra spot for the dual quote. sloc++; } } else { sloc = strParse.IndexOfAny(new char[] {'/', ' '}, sloc); } _Argument = strParse.Substring(Position, sloc - startpos); Position = sloc; } /// <summary> /// returns the Argument this Object represents. This will include quotation marks if they were used in the originally parsed string. /// </summary> public String Argument { get { return _Argument; } } /// <summary> /// implicitly converts an ArgumentItem to a String. /// </summary> /// <param name="value">ArgumentItem to implicitly convert.</param> /// <returns>the result from calling Chomp() on the given instance.</returns> public static implicit operator String(ArgumentItem value) { return value.Chomp(); } /// <summary> /// returns the Argument value. If it starts with and endswith quotation marks, they will be removed. /// </summary> /// <returns></returns> public String Chomp() { if (_Argument.StartsWith("\"") && Argument.EndsWith("\"")) return _Argument.Substring(1, _Argument.Length - 2); else return _Argument; } public override string ToString() { if (Argument.Any(Char.IsWhiteSpace)) return "\"" + Argument + "\""; else return Argument; } } |
The constructor is where the important stuff happens. the by reference parameter is used to define the starting position, and we update it when the constructor returns to point at the character after the argument. The class also defines some statics for implicit conversions to and from a string.
Now that we have the Argument class, we can define the Switch class. The actual syntax of switches often depends on the application but also seems to depend on the platform. for example, Linux tools favour the hyphen for single letter flags, and double hyphens for multi-character flags. Switches are also called flags. forward slash is not generally used as a switch or flag indicator. Windows platforms prefer the forward slash but generally allow for single hyphens as well. We aim to support all three syntaxes, and make the client application not have to worry about which it is. We also add support for arguments- a switch can be specific as such:
1 |
someprogram /d:argument.exe |
The element after the colon will be parsed as an argument and attached to the switch itself. But enough waffling- on to the Switch:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
public class Switch : CommandLineElement { internal static String[] SwitchPreceders = new string[] {"--", "-", "/"}; private ArgumentItem _Argument = ArgumentItem.Empty; private String _SwitchValue; public Switch(String strParse, ref int StartLocation) { while (String.IsNullOrWhiteSpace(strParse.ElementAt(StartLocation).ToString())) StartLocation++; var sLoc = StartLocation; var retrieved = SwitchPreceders. FirstOrDefault((s) => !(sLoc + s.Length > strParse.Length) && strParse.Substring(sLoc, s.Length) .Equals(s, StringComparison.OrdinalIgnoreCase)); if (retrieved == null) { throw new ArgumentException("Passed String " + strParse + " Does not have a switch preceder at position " + StartLocation); } var NextSpace = strParse.IndexOfAny(new char[] {' ', '\t', '/', ':'}, sLoc + 1); //if(((NextSpace-sLoc)-sLoc+1) <= 0) throw new ArgumentException("Error Parsing Switch"); _SwitchValue = strParse.Substring(sLoc + 1, NextSpace - sLoc - 1); sLoc += retrieved.Length; //we don't want the switch itself. //now we need to determine where the Switch ends. colon or space seems reasonable. If a colon, the next entity will be an argument. StartLocation = NextSpace; //if the char at NextSpace is a Colon... if (strParse.ElementAt(NextSpace) == ':') { //interpret as an argument NextSpace++; _Argument = new ArgumentItem(strParse, ref NextSpace); } StartLocation = NextSpace; } public String SwitchValue { get { return _SwitchValue; } } public String Argument { get { return _Argument; } } //Constructs an instance of a switch from the given location. public static bool SwitchAtPos(String strParse, int Location) { var retrieved = SwitchPreceders. FirstOrDefault((s) => !(Location + s.Length > strParse.Length) && strParse.Substring(Location, s.Length) .Equals(s, StringComparison.OrdinalIgnoreCase)); return retrieved != null; } public override string ToString() { return "//" + _SwitchValue + ":" + _Argument.ToString(); } public bool HasArgument() { return _Argument.Equals(ArgumentItem.Empty); } } |
With the basic parsing logic completed, we need to consider how we want this to be used. Best way is to think of how we would like to use them:
1 2 |
CmdParser cp = new CmdParser(); if(cp.HasSwitch("f") && cp["f"].HasArgument()) _usefilename=cp["f"].Argument; |
Some basic take-aways from this. First, the Core Parser Object needs to provide an Indexer. In the above example, we see it is accessing Switches by passing in the Switch name. Other possibilities include using direct numeric indexes to refer to any argument- much like you would access elements in the framework provided args[] String array. Another possibility is to have the Argument of a switch auto-populate, rather than be null, when accessed:
1 |
_usefilename=cp.HasSwitch("f")?cp["f"].Argument:_usefilename; |
Have something to say about this post? Comment!