This is part of a occasionally updated series on various programming languages. It should not be interpreted as a benchmark, but rather as a casual look at various programming languages and how they might be used by somebody for a practical purpose.
Currently, there are Articles written regarding Python,C#, Java and VB6 (merged for some reason),Scala,F# & Ruby,Perl, Delphi, PHP,C++,Haskell,D, VB.NET, QuickBASIC, and Rust
In order to compare various languages, I will be implementing a anagram search program in a variety of languages and testing and analyzing the resulting code. This entry describes the C# entrant into that discussion. A naive anagram search program will typically have dismal performance, because the “obvious” brute force approach of looking at every word and comparing it to every other word is a dismal one. The more accepted algorithm is to use a Dictionary or Hashmap, and map lists of words to their sorted versions, so ‘mead’ and ‘made’ both sort to ‘adem’ and will be present in a list in the Hashmap with that key. A psuedocode representation:
- Loop through all words. For each word:
- create a new word by sorting the letters of the word.
- use the sorted word as a key into a dictionary/hashmap structure, which indexes Lists. Add this word (the normal unsorted one) to that list.
- When complete, anagrams are stored in the hashmap structure indexed by their sorted version. Each value is a list, and those that contain more than one element are word anagrams of one another.
The basic idea is to have a program that isn’t too simple, but also not something too complicated. I’ll also admit that I chose a problem that could easily leverage the functional constructs of many languages. Each entry in this series will cover an implementation in a different language and an analysis of such.
Implemented in C#, the described algorithm might look like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
using System; using System.Collections.Generic; using System.Linq; using System.Runtime.InteropServices; using System.Text; using System.IO; namespace anagramsx { class Program { private static IEnumerable<String> readlines(String filename) { return new StreamReader(filename).ReadToEnd().ToLower().Split('\n'); } static String sortchars(String sumstring) { List<char> sortme = sumstring.ToList(); sortme.Sort((a, b) => a.CompareTo(b)); return new string(sortme.ToArray()); } static void Main(string[] args) { const string readfile = "D:\\dict.txt"; Dictionary<String, List<String>> anagrams = new Dictionary<string, List<string>>(); DateTime starttime = DateTime.Now; foreach (String line in readlines(readfile)) { String sorted = sortchars(line); if (!anagrams.ContainsKey(sorted)) anagrams.Add(sorted, new List<string>()); anagrams[sorted].Add(line); } Console.WriteLine("Time:" + (DateTime.Now-starttime).TotalSeconds.ToString() + " Seconds."); } } } |
One of the contributing factors to the amount of code here is the creation of a method to sort characters in a string as well as availing the use of foreach() with a stream contributes to this. Much of the size (compared to Python) can be attributed to the different typing of the language, too.
In Visual Studio’s default debug mode, all optimizations are disabled and the IDE hooks a lot of the activity of the program itself. My initial run took over 20 seconds to complete. I disabled the Visual Studio hosting process and activated the Release build, as well as changing the target platform from x86 to “Any CPU”. This reduced the time from 20 seconds to around 10 seconds; changing the implementation of the iterator method to read the entire file at once rather than one line at a time reduced it to ~0.8 seconds.
This might actually provide some insight into why so many detractors seem to have such an easy time thinking C# is slow. They will usually start up Visual Studio, make a quick test program, run it, and find it very slow, and then write off the language as slow, almost dismissively. This, without realizing that the language itself is not slow, but debugging a program tends to put a damper on things; what with the lack of optimizations in order to allow for better RTTI for the debugger.
Have something to say about this post? Comment!