This is part of a occasionally updated series on various programming languages. It should not be interpreted as a benchmark, but rather as a casual look at various programming languages and how they might be used by somebody for a practical purpose.
Currently, there are Articles written regarding Python,C#, Java and VB6 (merged for some reason),Scala,F# & Ruby,Perl, Delphi, PHP,C++,Haskell,D, VB.NET, QuickBASIC, and Rust
In order to compare various languages, I will be implementing a anagram search program in a variety of languages and testing and analyzing the resulting code. This entry describes the VB.NET entrant into that discussion. A naive anagram search program will typically have dismal performance, because the “obvious” brute force approach of looking at every word and comparing it to every other word is a dismal one. The more accepted algorithm is to use a Dictionary or Hashmap, and map lists of words to their sorted versions, so ‘mead’ and ‘made’ both sort to ‘adem’ and will be present in a list in the Hashmap with that key. A psuedocode representation:
- Loop through all words. For each word:
- create a new word by sorting the letters of the word.
- use the sorted word as a key into a dictionary/hashmap structure, which indexes Lists. Add this word (the normal unsorted one) to that list.
- When complete, anagrams are stored in the hashmap structure indexed by their sorted version. Each value is a list, and those that contain more than one element are word anagrams of one another.
The basic idea is to have a program that isn’t too simple, but also not something too complicated. I’ll also admit that I chose a problem that could easily leverage the functional constructs of many languages. Each entry in this series will cover an implementation in a different language and an analysis of such.
The Visual Basic.NET implementation I came up with looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
Imports System.IO Imports System.Text Imports System.Collections.Generic Module anagrams 'Anagrams Application 'Written By: Michael Burgwin for http://bc-programming/blogs 'entry into Anagrams Performance/language comparisons. Public Function SortChars(sumstring As String) As String Dim sortme = sumstring.ToList() Dim sorted = From f In sumstring Order By Char.ToLower(f) Return New String(sorted.ToArray()) End Function Sub Main(Args As String()) Dim Anagrams As New Dictionary(Of String, List(Of String)) Dim usefilename As String = "D:\dict.txt" Dim StartTime = Now Using fDictionary As New StreamReader(New FileStream(usefilename, FileMode.Open)) Do While Not fDictionary.EndOfStream Dim LineRead = fDictionary.ReadLine().ToLower() 'get sorted version. Dim SortedVersion = SortChars(LineRead) If Not Anagrams.ContainsKey(SortedVersion) Then Anagrams.Add(SortedVersion, New List(Of String)) End If Anagrams(SortedVersion).Add(LineRead) Loop End Using Dim Duration = Now - StartTime Console.WriteLine("Calculation Time:" & Duration.ToString()) Console.ReadKey() End Sub End Module |
Performance? Compiling this with vbc /optimize /debug- resulted in an average run-time of around 1.3 seconds. That’s a bit over twice as slow as C#. So, some C# pundits might go, “Ha! Well that proves C# is faster than C#, doesn’t it, Case closed we can all go home”
Not Exactly.
The slowdown, I suspect, is from the use of Linq:
1 2 3 4 5 |
Public Function SortChars(sumstring As String) As String Dim sortme = sumstring.ToList() Dim sorted = From f In sumstring Order By f Return New String(sorted.ToArray()) End Function |
This is different from the C# implementation of the same method, which uses the Sort() method of List
1 2 3 4 5 |
Public Function SortChars(sumString As String) As String Dim sortme = sumString.ToList() sortme.Sort(Function(ca, cb) ca.CompareTo(cb)) Return New String(sortme.ToArray()) End Function |
It now runs in 1.1 seconds. But it turns out this is partly due to a bug in the C# implementation, since it didn’t account for mixed case comparisons. The VB.NET implementation I have lowercases each string input. the C# version uses an iterator method; even changing the source to take account of casing, it still runs in 0.7 seconds, so we have some ground to make up. Let’s start by re-implementing the iterator method from the C# version. Here is the new implementation. it doesn’t actually add an iterator Function because the function in C# just returned an IEnumerable anyway, so we just have a similar function and use that, but it gives the same effect:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
Option Strict On Option Explicit On Imports System.IO Imports System.Text Imports System.Collections.Generic Module anagrams 'Anagrams Application 'Written By: Michael Burgwin for http://bc-programming/blogs 'entry into Anagrams Performance/language comparisons. Private Function readlines(Filename As String) As IEnumerable(Of String) Dim nl As Char = CChar(Environment.NewLine) Return New StreamReader(Filename).ReadToEnd().ToLower().Split(nl) End Function Public Function SortChars(sumString As String) As String Dim sortme = sumString.ToList() sortme.Sort(Function(ca, cb) ca.CompareTo(cb)) Return New String(sortme.ToArray()) End Function Sub Main(Args As String()) Dim Anagrams As New Dictionary(Of String, List(Of String)) Dim usefilename As String = "D:\dict.txt" Dim StartTime = Now For Each LineRead In readlines(usefilename) 'get sorted version. Dim SortedVersion = SortChars(LineRead) If Not Anagrams.ContainsKey(SortedVersion) Then Anagrams.Add(SortedVersion, New List(Of String)) End If Anagrams(SortedVersion).Add(LineRead) Next Dim Duration = Now - StartTime Console.WriteLine("Calculation Time:" & Duration.ToString()) Console.ReadKey() End Sub End Module |
The Execution Time of this version hovers around .62 seconds- .2 seconds faster than the C# implementation, which is fairly surprising. Obviously, this doesn’t mean that VB.NET is faster than C# either; just means that this particular implementation is faster. It also Highlights something I did not expect, which is that the language used really does affect speed; not because one language is innately faster, but because you write code differently in different languages and they compile to different IL in many cases. I was expecting nearly zero difference between C# and VB.NET, which is why I didn’t originally create a VB.NET implementation- and yet here we have it performing .2 seconds faster than the C# version with what appears to be equivalent code.
Have something to say about this post? Comment!