This is part of a occasionally updated series on various programming languages. It should not be interpreted as a benchmark, but rather as a casual look at various programming languages and how they might be used by somebody for a practical purpose.
Currently, there are Articles written regarding Python,C#, Java and VB6 (merged for some reason),Scala,F# & Ruby,Perl, Delphi, PHP,C++,Haskell,D, VB.NET, QuickBASIC, and Rust
In order to compare various languages, I have been implementing a anagram search program in a variety of languages and testing and analyzing the resulting code. This entry describes the Java entrant into that discussion. A naive anagram search program will typically have dismal performance, because the “obvious” brute force approach of looking at every word and comparing it to every other word is a dismal one. The more accepted algorithm is to use a Dictionary or Hashmap, and map lists of words to their sorted versions, so ‘mead’ and ‘made’ both sort to ‘adem’ and will be present in a list in the Hashmap with that key. A psuedocode representation:
- Loop through all words. For each word:
- create a new word by sorting the letters of the word.
- use the sorted word as a key into a dictionary/hashmap structure, which indexes Lists. Add this word (the normal unsorted one) to that list.
- When complete, anagrams are stored in the hashmap structure indexed by their sorted version. Each value is a list, and those that contain more than one element are word anagrams of one another.
The basic idea is to have a program that isn’t too simple, but also not something too complicated. I’ll also admit that I chose a problem that could easily leverage the functional constructs of many languages. Each entry in this series will cover an implementation in a different language and an analysis of such.
Java
Here is the Java implementation I came up with:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
import java.util.*; import java.io.*; public class anagram { public static void sortchars(char[] sortvalues) { for(int x = 0;x<sortvalues.length;x++){ for(int y=x+1;y<sortvalues.length;y++){ if(sortvalues[x] < sortvalues[y]){ char temp = sortvalues[x]; sortvalues[x]=sortvalues[y]; sortvalues[y] = temp; } } } } public static String sortstring(String sortstring){ char[] stringchars = new char[sortstring.length()]; sortstring.getChars(0,sortstring.length(),stringchars,0); sortchars(stringchars); return new String(stringchars); } public static void main(String[] args) { final String filename = "D:\\dict.txt"; //HashMap we use to store and sort out anagrams. try { HashMap<String,List<String>> anagrams = new HashMap<String,List<String>>(); long starttime = System.currentTimeMillis(); Scanner sc = new Scanner(new File(filename)); while(sc.hasNext()){ String readline = sc.nextLine(); String sorted = sortstring(readline); if(!anagrams.containsKey(sorted)) anagrams.put(sorted, new LinkedList<String>()); List<String> addto = anagrams.get(sorted); addto.add(readline); } long endtime = System.currentTimeMillis(); System.out.println(endtime-starttime); } catch(FileNotFoundException exx){ System.out.println("File not found."); } } } |
This implementation averaged around 1800 milliseconds, or 1.8 seconds. I assume it would be on par with the C# implementation if I had used something better than Bubblesort to sort the characters. It’s a shame that you have to implement your own sort, I started writing a Merge sort but then I realized that is exactly what my point was. I wanted to create a program to look for anagrams, and I ended up having to write my own primitives to do so, and concentrate on sorting an array, which should be built-in IMO.
I won’t deny it- I really do not like Java at all. I find it obtuse and ham-fisted, like trying to write a program while wearing gloves. The Language has various design decisions that impair how you can implement certain constructs, and force you to shoe-horn your solutions into not just Object Oriented solutions, but ‘impaired’ Object oriented solutions. It is not a language I enjoy working in, but I prefer it to the other entry in this post- VB6.
Visual Basic 6
Visual Basic 6 is, for lack of a better word, old and pretty crap these days. I used it for far too long as I described in some previous postings on the subject. Anyway, this is the implementation I came up with for it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
Option Explicit Sub ShellSort(A() As String, Optional ByVal LB As Long = 0, Optional ByVal UB As Long = 0) Dim N As Long Dim H As Long Dim I As Long Dim j As Long Dim T As String If LB = 0 And UB = 0 Then LB = LBound(A) UB = UBound(A) End If ' sort array[lb..ub] ' compute largest increment N = UB - LB + 1 H = 1 If (N < 14) Then H = 1 Else Do While H < N H = 3 * H + 1 Loop H = H \ 3 H = H \ 3 End If Do While H > 0 ' sort by insertion in increments of h For I = LB + H To UB T = A(I) For j = I - H To LB Step -H If A(j) <= T Then Exit For A(j + H) = A(j) Next j A(j + H) = T Next I H = H \ 3 Loop End Sub Public Function SortString(ByVal intext As String) As String Dim Result() As String Dim returnval As String ReDim Result(Len(intext) - 1) Dim I As Long For I = 0 To Len(intext) - 1 Result(I) = Mid$(intext, I + 1, 1) Next I ShellSort Result For I = 0 To UBound(Result) returnval = returnval + Result(I) Next I SortString = returnval End Function Public Sub Test() Dim fnum As Long Dim AllText As String Dim splitresult() As String Dim starttime As Single, endtime As Single starttime = Timer Dim anagrams As Dictionary Set anagrams = New Dictionary fnum = FreeFile() Open "D:\dict.txt" For Input As #fnum AllText = Input$(LOF(fnum), fnum) Close #fnum splitresult = Split(AllText, vbLf) AllText = "" Dim iterate As String Dim I As Long For I = 0 To UBound(splitresult) - 1 iterate = splitresult(I) Dim Key As String Key = SortString(iterate) If Not anagrams.Exists(Key) Then anagrams.Add Key, New Collection End If anagrams.Item(Key).Add iterate Next I endtime = Timer() MsgBox "Time:" + Str(endtime - starttime) End Sub |
This implementation took a dismal 9 seconds. This was with the compiler set to optimize for small code as well as with “Favour Pentium Pro” selected in the project compile options dialog. Fast code was pretty much the same speed, for what it’s worth. My guess as to why this code is such a poor performer comes from several reasons. First, it relies heavily on late binding; the collection is inherently late-bound, and those collections are placed in a Dictionary which is also late bound. This implementation even uses a ShellSort rather than a Bubblesort like the Java implementation, but still performs far slower. (I happened to have a ShellSort routine lying around in my VB projects directory so repurposed it). Another reason might be the way I chose to acquire input, which is through a fell swoop using Input$ to read ther entire text of the file, and then use the built-in Split() function to explode the resulting string to the desired array.
Update: March 5, 2013 I tried to improve the Java implementation by swapping the insertion/bubble sort with the following:
1 2 3 4 5 |
public static String sortstringnew(String sortstring){ char[] chars = sortstring.toCharArray(); Arrays.sort(chars); return new String(chars); } |
This actually resulted in a reduced speed. according to the javaDoc, Arrays.sort() uses a modified QuickSort. This explains the issue; Bubblesort/Insertion sort will be faster with fewer than 10 or so elements, which would explain why my original implementation is faster. I still think I can make the Java implementation faster, though.
Have something to say about this post? Comment!
One thought on “Programming Language Performance, Part IV: Java and VB6”
I almost never comment, but i did some searching and wound up here BASeCamp Programming Blog
? Blog Archive
Programming Language Performance, Part IV: Java and VB6 ? BASeCamp Programming Blog.
And I do have 2 questions for you if you tend not to mind.
Could it be just me or does it appear like some of these remarks look like they are left by brain dead visitors?
😛 And, if you are writing at additional places, I would like to follow everything fresh you have to post.
Could you make a list of all of all your shared
sites like your twitter feed, Facebook page or linkedin
profile?