This is part of a occasionally updated series on various programming languages. It should not be interpreted as a benchmark, but rather as a casual look at various programming languages and how they might be used by somebody for a practical purpose.
Currently, there are Articles written regarding Python,C#, Java and VB6 (merged for some reason),Scala,F# & Ruby,Perl, Delphi, PHP,C++,Haskell,D, VB.NET, QuickBASIC, and Rust
In order to compare various languages, I will be implementing a anagram search program in a variety of languages and testing and analyzing the resulting code. This entry describes the python entrant into that discussion. A naive anagram search program will typically have dismal performance, because the “obvious” brute force approach of looking at every word and comparing it to every other word is a dismal one. The more accepted algorithm is to use a Dictionary or Hashmap, and map lists of words to their sorted versions, so ‘mead’ and ‘made’ both sort to ‘adem’ and will be present in a list in the Hashmap with that key. A psuedocode representation:
- Loop through all words. For each word:
- create a new word by sorting the letters of the word.
- use the sorted word as a key into a dictionary/hashmap structure, which indexes Lists. Add this word (the normal unsorted one) to that list.
- When complete, anagrams are stored in the hashmap structure indexed by their sorted version. Each value is a list, and those that contain more than one element are word anagrams of one another.
The basic idea is to have a program that isn’t too simple, but also not something too complicated. I’ll also admit that I chose a problem that could easily leverage the functional constructs of many languages. Each entry in this series will cover an implementation in a different language and an analysis of such.
Implemented in python, the solution is surprisingly short:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import time #create empty Hashmap. anag_dict = {} #start the timer startc = time.clock() #iterate through all the lines in the file. with open("D:\dict.txt", "rt") as words_fp: for line in words_fp: #lowercase the word. word = line.lower() #create the key by sorting the letters in the word. key = "".join(sorted(word)) #if the given key is not present, initialize that key with a new empty list. if not key in anag_dict: anag_dict[key] = [] #add this word to the appropriate list. anag_dict[key].append(word) #print the execution time. endc = time.clock() print("time:", endc-startc) |
this solution is nice and short (as I noted) because of the first-class dictionary and list support in Python, as well as it’s dynamic typing. The resulting program was run without the shown comments, to prevent parsing overhead (if applicable). over the course of several dozen runs, the speed of this solution averages around 1.2 seconds on my machine, running it via Python 3.2.3. The Python language provides a number of powerful constructs that often employ functional programming elements into traditional imperative language design concepts. I often hear the claim that Python is “Object Oriented” but have never really figured out why that is said. For example, the “join” function used here is not an object method, and the program itself does not belong to an object. Instead I typify Python as one language among many that provides the tools for a programmer to use any number of programming styles, without being forced to use a specific one. This is something that should be noted and even lauded, not the fact that it happens to support Object Oriented programming.
Have something to say about this post? Comment!