In order to compare various languages, I will be implementing a anagram search program in a variety of languages and testing and analyzing the resulting code. This entry describes the Scala entrant into that discussion. A naive anagram search program will typically have dismal performance, because the “obvious” brute force approach of looking at every word and comparing it to every other word is a dismal one. The more accepted algorithm is to use a Dictionary or Hashmap, and map lists of words to their sorted versions, so ‘mead’ and ‘made’ both sort to ‘adem’ and will be present in a list in the Hashmap with that key. A psuedocode representation:
- Loop through all words. For each word:
- create a new word by sorting the letters of the word.
- use the sorted word as a key into a dictionary/hashmap structure, which indexes Lists. Add this word (the normal unsorted one) to that list.
- When complete, anagrams are stored in the hashmap structure indexed by their sorted version. Each value is a list, and those that contain more than one element are word anagrams of one another.
The basic idea is to have a program that isn’t too simple, but also not something too complicated. I’ll also admit that I chose a problem that could easily leverage the functional constructs of many languages. Each entry in this series will cover an implementation in a different language and an analysis of such.
Scala, in brief, is a functional programming language designed to run on the Java Virtual Machine. The Java implementation of this algorithm was around 1.8 seconds, total time. (for those with a short memory). I was curious how Scala would stack up. This is the code I ended up with:
def ssort(xs:List[Char]):List[Char] =
if (xs.isEmpty) Nil
def insert(x:Char,xs : List[Char]):List[Char] =
if(xs.isEmpty || x <= xs.head) x :: xs
else xs.head :: insert(x,xs.tail)
val starttime = java.lang.System.currentTimeMillis()
val dictfile = "D:\\dict.txt"
var anagrams = new scala.collection.mutable.HashMap[String,List[String]]
for(line <- Source.fromFile(dictfile).getLines())
var charray = line.toCharArray().toList
var sorted = java.lang.String.valueOf(ssort(charray).toArray)
//map contains x checks if key exists.
if (!(anagrams contains sorted))
anagrams+= (sorted -> List(line));
var grabbed = anagrams(sorted);
var appended = line :: grabbed
anagrams(sorted) = appended;
val executionspeed= java.lang.System.currentTimeMillis()-starttime;
println("Execution took " + executionspeed);
I ended up having to write my own sort routine, but was able to trim the fat and freak around with some of the List manipulation Scala provides. This implementation averages around 1.4 seconds total time. I must admit I was surprised, since it is ‘generally accepted’ that Scala is slower than Java. I guess this once more drives in my point. First- I’m hardly an expert Scala programmer. In fact, this is my first real attempt to do anything with it at all, aside from install it. (I should note that “Programming in Scala, Second Edition” was a valuable resource in my efforts to get this working.). I’ve worked far more with Java, and yet a quick Scala program has outperformed it.
To be fair, once again I think the sort functionality has been the deciding factor; the Java implementation used a naive bubble sort; the Scala implementation uses the cons operator (which prepends a list), which is likely slightly faster, in addition to the immutable state of Lists in Scala. This does however touch on one of my points; Impressions on the speed of a language often have very little bearing on either how fast they are, or how easy they are to use. One of the common compliants I can find about Scala versus Java is that Scala is slower; this non-trivial example, written by somebody with no actual Scala experience (that is, me) outperformed my Java implementation, and it did so with less code, as well.