27 Jan 2013 @ 10:00 AM 

This is part of a occasionally updated series on various programming languages. It should not be interpreted as a benchmark, but rather as a casual look at various programming languages and how they might be used by somebody for a practical purpose.
Currently, there are Articles written regarding Python,C#, Java and VB6 (merged for some reason),Scala,F# & Ruby,Perl, Delphi, PHP,C++,Haskell,D, VB.NET, and even QuickBASIC

Haskell is one of those languages that, to paraphrase Rodney Dangerfield, can’t get any respect. I mean, sure, it get’s a lot of use in academia and by some people, but by and large it is ignored in favour of other languages such as C++,Java, and C#.

One of the reasons Haskell has that difficulty finding use as a mainstream language is that it leans heavily on the Functional Programming paradigm. We saw a bit of this in the F# example. Haskell, however, abandons almost all Imperative stepping-stones; while F# has various imperative constructs and is essentially a multi-paradigm language as a result, Haskell provides very little that could be used to fake imperative styles, and those they do have are tedious to use, and as such discourage their use.

Like the other examples, this example implements an algorithm to find all the anagrams in a dictionary list of words. The Haskell Implementation:

(I’ll be surprised if the above is syntax highlighted through the plugin I use…)

This is the part I have to admit ignorance. I did write this, but at this point, I have absolutely no clue how it works. This is arguably because I tend to think in terms of imperative programming styles, rather than functional, and so some of this is just gobbledygook to me. This is particularly unhelpful given the fact that I’m supposed to be explaining how it works. Most of the work is done in the last few lines, which work to sort each entry and place them in a Map appropriately:

this line will take each element of words, and create a map; each sequence element of the group will be used as x in the delegate, and that creates the map element where the key is the sorted equivalent of x (the word) and the value is a list of the word. To my understanding elements that map to the same key will be appended to that list, but if not, well I guess it’s being done elsewhere. Most likely the work to properly “condense” everything is done in the following line, which filters the aforementioned words listing.

One of the other things that haskell get’s some flak for is speed. That is, in the negatives. This implementation executes in 3.1 seconds, which is about equal with the perl implementation. So some of the speed concerns might be justified- since Haskell is in fact compiled, not interpreted, so being comparable in speed is a bit of a worry. On the other hand, it’s still quite a bit faster than a few of the implementations, and really I don’t know haskell so an “expert” haskell programmer could almost certainly come up with a blazing fast implementation. The language itself is terse, and very expressive; The code that actually runs here is shorter than the current shortest implementation, Ruby. It’s worth noting that the shortest implementations are all very expressive languages, which is unlikely to be a coincidence.

Posted By: BC_Programming
Last Edit: 03 Mar 2013 @ 11:23 AM

EmailPermalinkComments (0)
Tags
 03 Jan 2013 @ 10:06 PM 

This is part of a occasionally updated series on various programming languages. It should not be interpreted as a benchmark, but rather as a casual look at various programming languages and how they might be used by somebody for a practical purpose.
Currently, there are Articles written regarding Python,C#, Java and VB6 (merged for some reason),Scala,F# & Ruby,Perl, Delphi, PHP,C++,Haskell,D, VB.NET, and even QuickBASIC

Somehow while creating the various anagram implementations for the previous entries in this series, I neglected to create one for PHP. This was primarily because I hadn’t really even thought about it, in much the same way as I didn’t create a Javascript version of the script. Nonetheless, it seems like a worthy endeavour; PHP is a very popular and well-used language, and there is no reason not to see how it stacks up against other languages in this sort of situation.

One of the main things when creating a PHP implementation is to get an environment set up. If I was to implement the script and run it on my webhost, the results would no longer correspond properly; what has to happen is the PHP has to be interpreted on my machine for the results to be something we can compare. PHP is designed more as a web-based language- as opposed to a local scripting environment. By this I mean it’s prevalent- and almost universally used on the server side; whereas other languages like Python, Perl, etc. find a lot of use client-side as well. Because I hadn’t ever actually run PHP client-side I had to do a little searching. But the search was relatively short. Since I plan to run these tests on my Windows machine, I needed the PHP interpreter for Windows. What I didn’t need, however, was Apache; and while I have IIS installed and could probably run PHP that way, I am concerned that the web server components might add some overhead to the running of the script, and generally make the timing “off” in comparison to the other languages I used. As I write this I am installing the file from here, which ought to give me the tools to write a PHP equivalent of the Anagrams Algorithm. This entry (evidently) describes the PHP entrant into that discussion. A naive anagram search program will typically have dismal performance, because the “obvious” brute force approach of looking at every word and comparing it to every other word is a dismal one. The more accepted algorithm is to use a Dictionary or Hashmap, and map lists of words to their sorted versions, so ‘mead’ and ‘made’ both sort to ‘adem’ and will be present in a list in the Hashmap with that key. A psuedocode representation:

  1. Loop through all words. For each word:
    1. create a new word by sorting the letters of the word.
    2. use the sorted word as a key into a dictionary/hashmap structure, which indexes Lists. Add this word (the normal unsorted one) to that list.
  2. When complete, anagrams are stored in the hashmap structure indexed by their sorted version. Each value is a list, and those that contain more than one element are word anagrams of one another.

The basic idea is to have a program that isn’t too simple, but also not something too complicated. I’ll also admit that I chose a problem that could easily leverage the functional constructs of many languages. Each entry in this series will cover an implementation in a different language and an analysis of such.

PHP is one of those languages that have a lot of fiobles and historical issues that are maintained to attempt to make sure things are compatible. In some ways, it reminds me of the story of Visual Basic 6. after several language iterations, it still holds to it’s roots. So on the one hand, you have very old code that still works well in the latest version as it used to, and on the other you have some tiny incompatibilities that might require changes to certain kinds of code. One of the most popular PHP-detracting on the web today is PHP: A Fractal of Bad Design Which makes some good points, but also ignores the fact that PHP was really never designed- it ‘evolved’. It was never really designed as a programming language, but rather as a way to get what needed to be done done in the fastest possible way and with the fewest headaches. The inevitable result is what we have now, which has some crusty deposits, but those crusty deposits are what makes PHP so useful.

As far as the anagrams concept goes, I figure the best way to accomplish the goal is to use associative arrays. PHP has these built in, and this is arguably an extremely awesome feature that is frequently overlooked. This is the resulting implementation:

A run of this took 116 seconds. Now, PHP being slower than many other languages, directly, is not amazing. However this is quite a bit slower than any other implementation by a factor of ten. Therefore I think it is fair to PHP to see if perhaps I did something wrong in the above implementation.

I am fairly certain that the issue is with my use of array_merge, since it rather re-creates the array every time, which can be costly given the dictionary has over 200K words. I changed it to use array_push:

I also “fixed” a minor issue from before. We are basically working with a top level associative array which uses the sorted words as their key, each one of which stores another array of the words that sort to that sorted word. Each of which would be anagrams of one another, naturally.
I also added a check at the bottom to make sure that I was getting some sort of result. This one clocked in at 114 seconds, which is still rather dismal. At this point I’m still convinced I’m just not using it “properly” but it is also possible that PHP really has this issue regarding speed. That said, I could probably find a Dictionary/HashTable class implementation on-line that would work better than a nested associative array. Another thing might be that I am simply using a slower PHP implementation. I’m using the installation from “php-5.3.20-nts-Win32-VC9-x86.msi” which is accessible from here. I went with the “Non thread-safe” version since that is usually faster. It’s also possible that running this on a Linux machine will yield faster results, since that is PHP’s ‘native’ environment. I considered the stumbling blocks with other languages, particularly Java and C#, with which I only got maximum performance when disabling debug mode. With that in mind, I sought to make tweaks to the php.ini file that came with the PHP interpreter, in the interest of increasing speed. I wasn’t able to make any significant changes to increase speed, though I did get it down to 109 seconds.

I think a fair assessment here has to do with the File I/O being done only a tiny bit at a time. This is one of the issues I had with some of my other implementations as well. In this case it creates and allocates more memory (possibly, depending on the specific PHP implementation of adding elements). So it makes sense to try to do it in one fell swoop, and then iterate over those elements. With that in mind, I made some changes:

Which still didn’t fix the speed problem (113 seconds this time). I did notice that when I had error display on, I got a lot of notices- as the loop ran- about undefined indices when setting $anagrams[$sorted]. So if I had to hazard a guess I’d say that the speed issue is a combination of a very long loop (200K words) combined with it being entirely interpreted that results in the rather slow run-time. The language itself is a bit messy, though it’s certainly not as bad as Perl as far as special punctuation marks go. Of course with with the loss of that special punctuation ($ for scalars, @ for arrays and % for hashmaps in Perl, but PHP only keeps $ and lacks the multiple contexts) we get a simplified language that is a bit easier to work with cognitively but lacks some power and capability. Nonetheless, it is still a very workable language, and most of it’s capability can be leverages by using existing frameworks and code; PEAR libraries can be found for pretty much every purpose. I hazard to wonder how it can be considered scalable, given the speed issues I’ve encountered, but I’m still quite certain I’m simply using it incorrectly. I have to say I was surprised that I remembered the foreach syntax off the top of my head- (foreach($array as $value)). Not because it’s complex or esoteric but because I have been messing with C#, java, and other languages a lot in the meantime, many of which use very slight differences- Java uses it’s for loop but with a special syntax for(value:array) C# uses a foreach loop foreach(value in array), Scala uses for(value < - array), and Python doesn't technically have a standard for loop and everything is really a foreach for value in array:. F# uses a similiar syntax, replacing the colon with it’s own equivalent keyword for value in array do. I was convinced when I used as that I was remembering some other language implementation and would have to look it up, but to my surprise I remembered it correctly. probably just luck though, to be fair. Even though C# is my “main” language for desktop development I do in fact find myself using the Java for syntax for foreach loops, too. (or even trying to use for(value in array) and wondering why the IDE yells at me on that line until I facepalm that I used the wrong keyword).

One might thing this particular example proves PHP is “slow”, but I disagree. I think it just shows that PHP is not well suited to iterating over hundreds of thousands of elements. But given it’s particular domain- web pages- you shouldn’t be presenting that much information to the user anyway, so that sort of falls out of the languages intended use cases’ scope.

In the works: Part XI, Haskell. I had a nice long one almost finished but I appear to have either forgotten to save it when I restarted my PC or something along those lines, so I’ve had to rewrite it. Thankfully I still have the actual Haskell code, which is arguably the most intensive part of each entry in the series.

Posted By: BC_Programming
Last Edit: 27 Jan 2013 @ 09:31 AM

EmailPermalinkComments (0)
Tags
 22 Sep 2012 @ 12:10 AM 

This is part of a occasionally updated series on various programming languages. It should not be interpreted as a benchmark, but rather as a casual look at various programming languages and how they might be used by somebody for a practical purpose.
Currently, there are Articles written regarding Python,C#, Java and VB6 (merged for some reason),Scala,F# & Ruby,Perl, Delphi, PHP,C++,Haskell,D, VB.NET, and even QuickBASIC

Much like my previous posts on the subject of programming language polarization, this one comes as a response to what I feel are flawed perspectives on programming languages. In particular, this comes in response to some claims that C# is a “beginners” language; this poster also made the claim that they tested older languages as well. This is doubtful, however, because that claim was backed up with no reproducibles such as source code and timings. Not to mention the notation that “Even OLD Basic loops and functions are faster than C#” which seems hard to prove since “OLD” Basic (BASICA and GW-BASIC) didn’t have loops or functions. (with the exception of DEF FN statements, which aren’t quite the same idea and are more similar to macros). Additionally, empty loops and functions aren’t typical constituents of a program. That is, a test seeing that some specific structure is faster on Language X than Language Y is focussing on a small area of a much larger picture.

The purpose of this short series of posts is to provide a relatively simple, but not entirely trivial, algorithm; I will implement it into several languages, and then compare performance results as well as of course the resulting code itself. The purpose of this experiment is to provide actual results. Many claim that “Language X is faster than Language Y” but as described above they typically focus far too much; for example, I have seen claims that loops are faster in C++ than, say, C#; The actual test code was not provided, of course, so it’s impossible to show what they were really measuring. (Hint: It wasn’t loops). More importantly, languages are designed for solving problems, not looping or simple if instructions done repeatedly.

I will not, of course, claim this to be unbiassed. I feel that the performance of a language is not nearly as important as that languages succinctness and expressiveness, because those two qualities contribute more to code quality and code quality is directly proportional to how maintainable a codebase is, and the maintainability of a codebase should be a more important concern than how many milliseconds a contrived example takes to execute. Many of the “swings” being taken are by C/C++ users; usually claiming that garbage collected languages aren’t “real programming” or that without determinate memory management all programs will be really slow and unusable. Perhaps ironically, while C/C++ users are making these claims, creating their derived tests and contrived scenarios where Garbage Collection fails, those using those “not real” programming languages are writing programs and getting work done. It’s arguable whether the C/C++ userbase that make these attacks are making them because they feel that newer garbage collected languages are a true detriment to programming or whether they- either conciously or not- realize that these languages relieve a burden of responsibility from the programmer and that in making programming more accessible, it becomes far less of an elite skillset. Therefore, they need to attempt to trivialize the abilities of those using those new technologies in order to restore their own egos. This is highly speculative, of course, but from what I’ve seen a lot of those who push C/C++ and criticize various managed frameworks and Virtual Machines really don’t have the first clue about that which they complain. Many of their arguments are so easily refuted as to be silly.

This series of posts is not attempting to discredit C/C++; in fact I’m doubtful either one of those languages will show up in the tests at all; rather, it is designed as a comparison of the same thing implemented in different languages. I have tried to do so with as little bias as possible but I must admit not being familiar with many of the languages I have used for these tests, so my lack of ability may impair the performance of some implementations. Corrections or improvements to these implementations are both welcome and encouraged.

I hold the opinion that overall, language performance only matters in some very specific cases. For example, determinate memory management is useful most when you have a small, finite amount of memory; for example, if you are writing a program to run on a small embedded system with a limited feature set and only a few hundred K of memory, it makes sense to use a language like C. In fact, the 640K limitation of DOS could almost single-handedly be pointed to as the reason that most earlier Garbage Collection implementations did not get critical acclaim. THe languages architecture and design were usually very powerful and they did have garbage collectors that tried to be as conservative as possible (or used XMS for their heaps), but their was still the issue that DOS itself only had 640K to deal with. Additionally, programmers had become used to allocating and freeing all the memory they used.

This resulted in a powerful backlash to the implementation and use of Garbage collected languages- or even any language that claimed to be “easy to use”. This is because programmers are, to be quite honest, egotistical douchebags when it comes to programming. They like to think of themselves as “better” than the unwashed masses that cannot program or have no desire to learn to do so; they feel smarter. When C was getting popularity, Assembly programmers cried out that it wasn’t “real” programming. Eventually, however, even the last Assembly die-hards acknowledged that C had a very important role to play in programming. With the introduction of languages like Pascal and Modula, C Programmers cried out that using Pascal wasn’t “real” programming. Of course, without actually defining what “real programming” was, they were just being, for lack of a better word, whiny bitches. What they didn’t like wasn’t that the language wasn’t similar to C, but rather that the language was easier to use. This trivialized the effort that they put into learning C. In order to prove to themselves that they made the correct choice in learning C, they research and find myriad flaws in the language in question, and declare it to be “beyond repair” then fall back on C again.

As time progressed, C++ started to come to the fore. C++ got exactly the same criticisms as C; mostly ethereal claims based on little actual evidence, and usually saying something that indicated that using C++ wasn’t “real programming” or various other detrimental factors.

C++ is popular now, and now other languages- such as Java, C#, and various other languages, have born a lot of criticism from people that learned C++. The issue is not whether C++ is the better language at all; what it boils down to is that the C++ programmers are trying to convince themselves that they made the right choice learning C++, or that they made the right choice not learning Java, or not learning C#. I find this sort of logic somewhat silly, since there is nothing mutually exclusive about programming languages.

I suppose an important term to define is what it means for a language to be “powerful”. In my definition, the power of a programming language comes from it’s expressiveness; how easy it is to represent abstract concepts in said language. For example; even take a relatively basic concept such as an array. in C, there truly is no such thing as an array; what you are able to do is use the subscript operator for pointer arithmetic as well as for declarations and allocations, but truly there is no actual “array” objects or item that you access. You cannot determine the length of an array, for example, because, fundamentally, there is no array- just a block of bytes that you happen to be able to access using subscripts. Compare this to higher level languages, where Array’s are first-class citizens with a large variety of capabilities and methods. the C implementation is of course a lot faster, because you are really just doing pointer math, and there is no extra data to keep track of. However, the latter is more powerful because it takes the burden of tracking that data off the hands of the programmer. a C program might declare an array, add values into the array, sort the array, and display the array as separate operations. A Higher level language such as ruby however can do this in a single statement. The ruby solution, while many tens of lines shorter, will often be slower, because of ruby’s interpreted nature. However, while a C++ implementation will be faster and longer, a hand-tuned assembly variant will be even faster and longer. I guess the question is why C/C++ purists seem to think that the performance difference between C++ and higher level languages is so important but why the same gains accessible via the use of hand-tuned assembly is not. And I think it boils down to a simple fact- many C/C++ purists do not know assembly. Therefore advocating it’s use would be like a bricklayer encouraging the use of wood siding.

given the above, the power of languages differs, and the fact is, C# is a more powerful language than C++. A lot of C#-like behaviour- things like iterators, lambda expressions, etcetera can either be mocked up using C++ templates or macros or are part of the C++11 specification, but C# has had many of these features for a while now, and they are first-class citizens. Many C++ purists point to the fact that C# compiles to IL code- as opposed to native code- as if this was some sort of Achilles heel. Much the same is said for Java bytecode. What such arguments fail to take into consideration is the wealth of information and metadata that IL code has. No C++ program can dynamically construct class instances from some other C++ program, because, fundamentally, a C++ program, once compiled, is just flat code; there are no classes or objects or instances, or methods. with C# and Java, the class and function information remains intact. a C# program could run another C# program directly in it’s own process; or construct and use objects from some other C# program. This is very much the same for Java. Introspection and RTTI (Run-Time Type Information) are very powerful language features which can only, at best, be emulated in C++ code using strict coding practices and templates.

Whether C++ is faster than C# is a point of debate; but whether it is faster to develop in C# or C++ is a no-brainer question; C++ takes a significant investment of programmer time at each step; design takes longer because C++’s implementation of OOP is imcomplete and fractured; differences in the implementation of templates between compilers can make code compile and run fine on one compiler, compile and crash on another, and not even compile on yet another. This can be traced back to the complex standard of the language, which can make parsers nearly impossible to use and complicates static analysis. This brings us, yet again, back to introspection; if all types and data of a structure are preserved, static analysis becomes a lot easier, if still far from trivial.

Essentially, an argument that language A is faster than language B only matters if language A and language B are equally easy to use by the programmer. Arguing that C++ is faster than C# seems reasonable to people, but those same people jump to defend C++ in the equally solid argument that C is better than C++; or that Assembly is better than C. It used to be impossible to write and publish a game with reasonable performance without learning assembly. Over time, it became possible to write those programs in C; and later, in higher level languages. As computers have gotten more and more memory, faster and more processors and cores, and faster hard drives, micromanaging those resources within your program like a Nanny becomes a game of diminishing returns. Even today many AAA games are no longer entirely written in C++ or C, instead opting for Python or another script language for some of the scripting. Not every piece of code ever written needs to run as fast as humanly possible; in fact, some accepted “fastest” algorithms have worst-cases that can be slower than the accepted O(n2) slow algorithm; for example, with certain array configurations and sizes, a QuickSort can run slower than a bubblesort. It all depends on the data and what is being done to it. Benchmarking how fast empty loops run in different languages tells you nothing about which language is faster, and as argued above the end-result performance is no more important than the time required to make that end product.

That said, one could argue that I was making this argument to skirt the “obvious” fact that C# is slower than, say, Python, as evidenced by the above. I decided to give this a try with a nontrivial example. In this case, a program to find all anagrams in a given list of words. An anagram, for those not in the know, is a word formable by rearranging the letters of another word.

To those who don’t look before they leap, they may find performance dismal; the brute force approach of looking at every word and comparing it to every other word is dismal at best, performance-wise. The accepted algorithm is to use a Dictionary structure, and use the sorted letters of a word as a key into that dictionary, and essentially add each word to a list. A psuedocode representation:

  1. Loop through all words. For each word:
    1. create a new word by sorting the letters of the word.
    2. use the sorted word as a key into a dictionary/hashmap structure, which indexes Lists. Add this word (the normal unsorted one) to that list.
  2. When complete, anagrams are stored in the hashmap structure indexed by their sorted version. Each value is a list, and those that contain more than one element are word anagrams of one another.

The basic idea is to have a program that isn’t too simple, but also not something too complicated. I’ll also admit that I chose a problem that could easily leverage the functional constructs of many languages. Each entry in this series will cover an implementation in a different language and an analysis of such.

Posted By: BC_Programming
Last Edit: 15 Apr 2013 @ 04:26 AM

EmailPermalinkComments (0)
Tags

 Last 50 Posts
 Back
Change Theme...
  • Users » 43332
  • Posts/Pages » 362
  • Comments » 106
Change Theme...
  • VoidVoid « Default
  • LifeLife
  • EarthEarth
  • WindWind
  • WaterWater
  • FireFire
  • LightLight

PP



    No Child Pages.

Windows optimization tips



    No Child Pages.

Software Picks



    No Child Pages.

BC’s Todo List



    No Child Pages.