Programming Language Performance Part VII: Perl

This is part of a occasionally updated series on various programming languages. It should not be interpreted as a benchmark, but rather as a casual look at various programming languages and how they might be used by somebody for a practical purpose.
Currently, there are Articles written regarding Python,C#, Java and VB6 (merged for some reason),Scala,F# & Ruby,Perl, Delphi, PHP,C++,Haskell,D, VB.NET, QuickBASIC, and Rust

In order to compare various languages, I have been implementing a anagram search program in a variety of languages and testing and analyzing the resulting code. This entry describes the Perl entrant into that discussion. A naive anagram search program will typically have dismal performance, because the “obvious” brute force approach of looking at every word and comparing it to every other word is a dismal one. The more accepted algorithm is to use a Dictionary or Hashmap, and map lists of words to their sorted versions, so ‘mead’ and ‘made’ both sort to ‘adem’ and will be present in a list in the Hashmap with that key. A psuedocode representation:

Loop through all words. For each word:
1. create a new word by sorting the letters of the word.
2. use the sorted word as a key into a dictionary/hashmap structure, which indexes Lists. Add this word (the normal unsorted one) to that list.
When complete, anagrams are stored in the hashmap structure indexed by their sorted version. Each value is a list, and those that contain more than one element are word anagrams of one another.

The basic idea is to have a program that isn’t too simple, but also not something too complicated. I’ll also admit that I chose a problem that could easily leverage the functional constructs of many languages. Each entry in this series will cover an implementation in a different language and an analysis of such.

Here is the Perl implementation I came up with:

#!/usr/bin/perl -w
use strict;
use Time::HiRes;
my %words = ();

my $start = Time::HiRes::gettimeofday();

while(<>) {
chomp;

$_ = lc $_;
my $windex = pack "C*",
   sort unpack "C*", $_;
if (exists $words{$windex}) {
        next if $words{$windex} =~ /\b$_\b/;
        $words{$windex} .= " ";
    }
    $words{$windex} .= $_;
}
print Time::HiRes::gettimeofday() - $start;

#!/usr/bin/perl -w

use strict;

use Time::HiRes;

my %words = ();

my $start = Time::HiRes::gettimeofday();

while(<>) {

chomp;

$_ = lc $_;

my $windex = pack "C*",

sort unpack "C*", $_;

if (exists $words{$windex}) {

next if $words{$windex} =~ /\b$_\b/;

$words{$windex} .= " ";

}

$words{$windex} .= $_;

}

print Time::HiRes::gettimeofday() - $start;

Perl is often described as a write-only language. This can be found in the languages first-class treatment of regular expressions, which themselves are easily considered write-only. Reading the above as it is, I have trouble deciphering much of what is happening. Of course, the version I wrote did include comments, but it’s interesting to note that what Perl has in brevity it lacks in readability in a lot of cases. This script runs in 3.1 seconds average time. Unlike the other examples, this one requires the use of a different command line “perl anagrams.pl < D:\dict.txt" in my case, since the <> will operate on the default input which is of course stdin, so we just pump the dictionary file into it.

Perl gets is expressiveness from it’s rather copious use of pretty much every character on a standard US keyboard for some form of syntax. One of the more curious contingencies of the language was the concept of various contexts; Scalar context, List context, etc. I personally feel that the idea feels a bit hackish but at the same time, this is the same language from which PHP inherited it’s (silly IMO) feature whereby “5” + “5” is 10 rather than “55”, so many such shortcuts are designed in because they have some benefit. Additionally, a lot of the languages flexibility as well as the fervor of it’s devotees comes from the many “slogans” that have come to represent it, usually coined by the languages original creator, Larry Wall. For example, one such slogan is “There is more than one way to do it” Which, more than describing that different algorithms can be used for a given problem, detail that a single algorithm could be implemented in various ways within Perl, as well as having different ways to perform the otherwise trivial operations, such as accessing and changing a value in a hash.

Have something to say about this post? Comment!

Tags: Anagrams Perl Functional Hashmap Scripting

BASeCamp Programming Blog

Multi-lingual when he wants to be.

Programming Language Performance Part VII: Perl