In order to compare various languages, I will be implementing a anagram search program in a variety of languages and testing and analyzing the resulting code. This entry describes the python entrant into that discussion. A naive anagram search program will typically have dismal performance, because the “obvious” brute force approach of looking at every word and comparing it to every other word is a dismal one. The more accepted algorithm is to use a Dictionary or Hashmap, and map lists of words to their sorted versions, so ‘mead’ and ‘made’ both sort to ‘adem’ and will be present in a list in the Hashmap with that key. A psuedocode representation:
The basic idea is to have a program that isn’t too simple, but also not something too complicated. I’ll also admit that I chose a problem that could easily leverage the functional constructs of many languages. Each entry in this series will cover an implementation in a different language and an analysis of such.
Implemented in python, the solution is surprisingly short:
this solution is nice and short (as I noted) because of the first-class dictionary and list support in Python, as well as it’s dynamic typing. The resulting program was run without the shown comments, to prevent parsing overhead (if applicable). over the course of several dozen runs, the speed of this solution averages around 1.2 seconds on my machine, running it via Python 3.2.3. The Python language provides a number of powerful constructs that often employ functional programming elements into traditional imperative language design concepts. I often hear the claim that Python is “Object Oriented” but have never really figured out why that is said. For example, the “join” function used here is not an object method, and the program itself does not belong to an object. Instead I typify Python as one language among many that provides the tools for a programmer to use any number of programming styles, without being forced to use a specific one. This is something that should be noted and even lauded, not the fact that it happens to support Object Oriented programming.
758 total views, 6 views today
Much like my previous posts on the subject of programming language polarization, this one comes as a response to what I feel are flawed perspectives on programming languages. In particular, this comes in response to some claims that C# is a “beginners” language; this poster also made the claim that they tested older languages as well. This is doubtful, however, because that claim was backed up with no reproducibles such as source code and timings. Not to mention the notation that “Even OLD Basic loops and functions are faster than C#” which seems hard to prove since “OLD” Basic (BASICA and GW-BASIC) didn’t have loops or functions. (with the exception of DEF FN statements, which aren’t quite the same idea and are more similar to macros). Additionally, empty loops and functions aren’t typical constituents of a program. That is, a test seeing that some specific structure is faster on Language X than Language Y is focussing on a small area of a much larger picture.
The purpose of this short series of posts is to provide a relatively simple, but not entirely trivial, algorithm; I will implement it into several languages, and then compare performance results as well as of course the resulting code itself. The purpose of this experiment is to provide actual results. Many claim that “Language X is faster than Language Y” but as described above they typically focus far too much; for example, I have seen claims that loops are faster in C++ than, say, C#; The actual test code was not provided, of course, so it’s impossible to show what they were really measuring. (Hint: It wasn’t loops). More importantly, languages are designed for solving problems, not looping or simple if instructions done repeatedly.
I will not, of course, claim this to be unbiassed. I feel that the performance of a language is not nearly as important as that languages succinctness and expressiveness, because those two qualities contribute more to code quality and code quality is directly proportional to how maintainable a codebase is, and the maintainability of a codebase should be a more important concern than how many milliseconds a contrived example takes to execute. Many of the “swings” being taken are by C/C++ users; usually claiming that garbage collected languages aren’t “real programming” or that without determinate memory management all programs will be really slow and unusable. Perhaps ironically, while C/C++ users are making these claims, creating their derived tests and contrived scenarios where Garbage Collection fails, those using those “not real” programming languages are writing programs and getting work done. It’s arguable whether the C/C++ userbase that make these attacks are making them because they feel that newer garbage collected languages are a true detriment to programming or whether they- either conciously or not- realize that these languages relieve a burden of responsibility from the programmer and that in making programming more accessible, it becomes far less of an elite skillset. Therefore, they need to attempt to trivialize the abilities of those using those new technologies in order to restore their own egos. This is highly speculative, of course, but from what I’ve seen a lot of those who push C/C++ and criticize various managed frameworks and Virtual Machines really don’t have the first clue about that which they complain. Many of their arguments are so easily refuted as to be silly.
This series of posts is not attempting to discredit C/C++; in fact I’m doubtful either one of those languages will show up in the tests at all; rather, it is designed as a comparison of the same thing implemented in different languages. I have tried to do so with as little bias as possible but I must admit not being familiar with many of the languages I have used for these tests, so my lack of ability may impair the performance of some implementations. Corrections or improvements to these implementations are both welcome and encouraged.
I hold the opinion that overall, language performance only matters in some very specific cases. For example, determinate memory management is useful most when you have a small, finite amount of memory; for example, if you are writing a program to run on a small embedded system with a limited feature set and only a few hundred K of memory, it makes sense to use a language like C. In fact, the 640K limitation of DOS could almost single-handedly be pointed to as the reason that most earlier Garbage Collection implementations did not get critical acclaim. THe languages architecture and design were usually very powerful and they did have garbage collectors that tried to be as conservative as possible (or used XMS for their heaps), but their was still the issue that DOS itself only had 640K to deal with. Additionally, programmers had become used to allocating and freeing all the memory they used.
This resulted in a powerful backlash to the implementation and use of Garbage collected languages- or even any language that claimed to be “easy to use”. This is because programmers are, to be quite honest, egotistical douchebags when it comes to programming. They like to think of themselves as “better” than the unwashed masses that cannot program or have no desire to learn to do so; they feel smarter. When C was getting popularity, Assembly programmers cried out that it wasn’t “real” programming. Eventually, however, even the last Assembly die-hards acknowledged that C had a very important role to play in programming. With the introduction of languages like Pascal and Modula, C Programmers cried out that using Pascal wasn’t “real” programming. Of course, without actually defining what “real programming” was, they were just being, for lack of a better word, whiny bitches. What they didn’t like wasn’t that the language wasn’t similar to C, but rather that the language was easier to use . This trivialized the effort that they put into learning C. In order to prove to themselves that they made the correct choice in learning C, they research and find myriad flaws in the language in question, and declare it to be “beyond repair” then fall back on C again.
As time progressed, C++ started to come to the fore. C++ got exactly the same criticisms as C; mostly ethereal claims based on little actual evidence, and usually saying something that indicated that using C++ wasn’t “real programming” or various other detrimental factors.
C++ is popular now, and now other languages- such as Java, C#, and various other languages, have born a lot of criticism from people that learned C++. The issue is not whether C++ is the better language at all; what it boils down to is that the C++ programmers are trying to convince themselves that they made the right choice learning C++, or that they made the right choice not learning Java, or not learning C#. I find this sort of logic somewhat silly, since there is nothing mutually exclusive about programming languages.
I suppose an important term to define is what it means for a language to be “powerful”. In my definition, the power of a programming language comes from it’s expressiveness; how easy it is to represent abstract concepts in said language. For example; even take a relatively basic concept such as an array. in C, there truly is no such thing as an array; what you are able to do is use the subscript operator for pointer arithmetic as well as for declarations and allocations, but truly there is no actual “array” objects or item that you access. You cannot determine the length of an array, for example, because, fundamentally, there is no array- just a block of bytes that you happen to be able to access using subscripts. Compare this to higher level languages, where Array’s are first-class citizens with a large variety of capabilities and methods. the C implementation is of course a lot faster, because you are really just doing pointer math, and there is no extra data to keep track of. However, the latter is more powerful because it takes the burden of tracking that data off the hands of the programmer. a C program might declare an array, add values into the array, sort the array, and display the array as separate operations. A Higher level language such as ruby however can do this in a single statement. The ruby solution, while many tens of lines shorter, will often be slower, because of ruby’s interpreted nature. However, while a C++ implementation will be faster and longer, a hand-tuned assembly variant will be even faster and longer. I guess the question is why C/C++ purists seem to think that the performance difference between C++ and higher level languages is so important but why the same gains accessible via the use of hand-tuned assembly is not. And I think it boils down to a simple fact- many C/C++ purists do not know assembly. Therefore advocating it’s use would be like a bricklayer encouraging the use of wood siding.
given the above, the power of languages differs, and the fact is, C# is a more powerful language than C++. A lot of C#-like behaviour- things like iterators, lambda expressions, etcetera can either be mocked up using C++ templates or macros or are part of the C++11 specification, but C# has had many of these features for a while now, and they are first-class citizens. Many C++ purists point to the fact that C# compiles to IL code- as opposed to native code- as if this was some sort of Achilles heel. Much the same is said for Java bytecode. What such arguments fail to take into consideration is the wealth of information and metadata that IL code has. No C++ program can dynamically construct class instances from some other C++ program, because, fundamentally, a C++ program, once compiled, is just flat code; there are no classes or objects or instances, or methods. with C# and Java, the class and function information remains intact. a C# program could run another C# program directly in it’s own process; or construct and use objects from some other C# program. This is very much the same for Java. Introspection and RTTI (Run-Time Type Information) are very powerful language features which can only, at best, be emulated in C++ code using strict coding practices and templates.
Whether C++ is faster than C# is a point of debate; but whether it is faster to develop in C# or C++ is a no-brainer question; C++ takes a significant investment of programmer time at each step; design takes longer because C++’s implementation of OOP is imcomplete and fractured; differences in the implementation of templates between compilers can make code compile and run fine on one compiler, compile and crash on another, and not even compile on yet another. This can be traced back to the complex standard of the language, which can make parsers nearly impossible to use and complicates static analysis. This brings us, yet again, back to introspection; if all types and data of a structure are preserved, static analysis becomes a lot easier, if still far from trivial.
Essentially, an argument that language A is faster than language B only matters if language A and language B are equally easy to use by the programmer. Arguing that C++ is faster than C# seems reasonable to people, but those same people jump to defend C++ in the equally solid argument that C is better than C++; or that Assembly is better than C. It used to be impossible to write and publish a game with reasonable performance without learning assembly. Over time, it became possible to write those programs in C; and later, in higher level languages. As computers have gotten more and more memory, faster and more processors and cores, and faster hard drives, micromanaging those resources within your program like a Nanny becomes a game of diminishing returns. Even today many AAA games are no longer entirely written in C++ or C, instead opting for Python or another script language for some of the scripting. Not every piece of code ever written needs to run as fast as humanly possible; in fact, some accepted “fastest” algorithms have worst-cases that can be slower than the accepted O(n 2 ) slow algorithm; for example, with certain array configurations and sizes, a QuickSort can run slower than a bubblesort. It all depends on the data and what is being done to it. Benchmarking how fast empty loops run in different languages tells you nothing about which language is faster, and as argued above the end-result performance is no more important than the time required to make that end product.
That said, one could argue that I was making this argument to skirt the “obvious” fact that C# is slower than, say, Python, as evidenced by the above. I decided to give this a try with a nontrivial example. In this case, a program to find all anagrams in a given list of words. An anagram, for those not in the know, is a word formable by rearranging the letters of another word.
To those who don’t look before they leap, they may find performance dismal; the brute force approach of looking at every word and comparing it to every other word is dismal at best, performance-wise. The accepted algorithm is to use a Dictionary structure, and use the sorted letters of a word as a key into that dictionary, and essentially add each word to a list. A psuedocode representation:
The basic idea is to have a program that isn’t too simple, but also not something too complicated. I’ll also admit that I chose a problem that could easily leverage the functional constructs of many languages. Each entry in this series will cover an implementation in a different language and an analysis of such.
732 total views, 2 views today
Call me old fashioned, or possibly slow, but for some reason I never seem to be using the latest version of a piece of software. Until recently I was doing all my .NET work with Visual Studio 2008; this was because VS2010, bless it’s heart, felt sluggish to me.
With the pending release of Visual Studio 2012, which as I write this is available for a free download as a Release Candidate, I decided I’d bite the bullet and start switching. This was also because I wanted to dip into XNA, and As near as I could tell the latest version only worked in conjunction with VS2010. I had to reinstall Resharper to get proper VS2010 support, since I had installed Resharper before I installed VS2010, and after applying my own preferences to both Visual Studio as well as Resharper, I was able to get back into coding. (Am I the only person who hates the preferences IDE’s have to automatically complete parentheses and braces and stuff? I always find myself typing the ending parenthesis, ending up with double, so I delete the extra ones, then I forget where I was in the nesting; and if you get used to that behaviour, suddenly you find yourself not typing ending parentheses in plain-text editors. You can’t win! I’m not a big fan of that sort of autocomplete; Actually, I don’t really like any form of autocomplete, but that’s sounds like material for another post altogether.
The End result is BCDodgerX , which is available on my main downloads page. It is essentially a rewrite of BCDodger, with an unimaginative X added onto the end that means pretty much nothing.
Overall, VS2010 is actually quite good. Call it a belated review; I almost purposely fall several versions behind for some reason. I cannot say I’m overly fond of the use of 3-D Acceleration within a desktop application, but at the same time all the Controls still have the Windows Look and Feel (which is my main beef with Java’s Swing libraries, which have a look and feel all their own), and the desktop itself is accelerated with Aero anyway so I suppose it’s only a natural progression. (Besides, I don’t play games very often and this 9800GT should get some use…).
The tricky question now is when I should start migrating my VS2008 projects to 2010, and whether I should switch to the latest framework. I can switch to VS2010 without using the latest framework, of course, but I wonder what benefits I will see? One day I’m sure I’ll just say “screw it” and open say, BASeBlock in VS2010 and jump in; I’m falling behind, after all (What with the aforementioned release of 2012 on the horizon). And VS2010 is definitely an improvement both tool and functionality wise over 2008, so there isn’t really a good reason not to switch now. No doubt I’ll keep making excuses for myself. Oh well.
At first, I thought I hated XNA; but now I know that what I actually hate is 3D programming. I imagine this is mostly because I got extremely rusty at it; additionally, I had never truly done 3-D programming, at least in the context of a game. My experience at that point was pretty much limited to the addition of 3-D graphic capabilities to a graphing application that I wrote (and never posted on my site because it hasn’t worked in ages, is old, and uses libraries/classes/modules I have updated that are no longer source compatible etc.). Of course that didn’t have constantly changing meshes, used DirectX7, and it was shortly after I had finished that feature that I abandoned the project, for whatever reason. I had never dealt with 3-D in a gaming capacity.
The purpose of XNA is to attempt to simplify the task of creating games- both 3D and 2D, for Windows as well as XBox 360. And it definitely does this; however you can really only simplify it so much, particularly when dealing with 3D Programming. My first quick XNA program was basically just to create a bunch of cubes stacked on one another. This is a very common theme given the popularity of games like Minecraft, but my goal was to eventually create a sorta 3-D version of Breakout (or, rather, BASeBlock 3D).
I was able to get the blocks visible, after a lot of cajoling, and doing the work on paper (Visualizing 3-D space and coordinates are not my forte). But it ran at 10fps! This was because I was adding every single block’s vertices to the VertexBuffer; for a set of blocks in a “standard” arrangement of, around 1920 blocks (which is probably a number that would make the 2-D version go around 10fps, to be absolutely fair here), that is over 11520 faces, each of which actually consist of a triangle list of 6 points (I tried a triangle fan but it didn’t seem to even exist (?), oh well) meaning that I was loading the VertexBuffer with over 69120 texture-mapped vertices. That’s a lot to process. The big issue here is Hidden Surface Removal; obviously, if we had a cube of blocks like this, we don’t need to add the vertices of blocks that aren’t visible. I’ll admit this is the part I sort of gave up on that project for the time being; that would involve quite a bit of matrix math to determine what faces were visible on each block, which ones needed to be added, etc based on the camera position, and I kind of like to understand what I’m doing, and I, quite honestly, don’t have a good grasp over how exactly Matrices are used in 3-D math, or dot products (at least in 3-D), and I prefer not to fly blind. So I’ve been reading a few 3-D programming books that cover all the basics; the book itself I believe goes through the creation of a full 3-D rasterization engine and has a lot of in-depth on the mathematics required; this, paired with concepts from Michael Abrash’s “Graphics Programming Black Book” should give me the tools to properly determine which blocks and faces should be added or omitted.
Anyway, scrapping that project for the time being, I decided to make something 2-D; but since I was more or less trying to learn some of the XNA basics, I didn’t want too much of the concepts of the game itself getting in the way, so I chose something simple- I just re-implemented BCDodger. I added features, and it runs much better this way, but the core concept is the same.
XNA is quite powerful- I have no doubt about that. Most of my issues with it are minor. One example is that XACT doesn’t seem to support anything other than WAV files, which is a bit of a pain; this is why BCDodgerX’s installer is over twice the size of BASeBlock’s, despite having far less content). Another minor peeve is that there is no real way to draw lines, or geometry; everything has to be a sprite. you can fake lines by stretching a 1×1 pixel as needed, but that just feels hacky to me. On the other hand, it’s probably pretty easy to wrap some of that up into a class or set of classes to handle “vector” drawing, so it’s probably just me being used to GDI+’s lush set of 2-D graphics capabilities. Another big problem I had was with keyboard input- that is, getting text entry “properly” without constant repeats and so forth. Normally, you would check if a key was down in Update(), and act accordingly. This didn’t work for text input for whatever reason, and when it did it was constrained to certain characters. I ended up overriding the Window Procedure and handling the Key events myself to get at Key input data as needed, then hooked those to actual events and managed the input that way.
Overall, I have to conclude that XNA is actually quite good. There are some intrinsic drawbacks- for example it isn’t cross platform (to Linux or OSX), and the aforementioned basic problems I had, which were probably just me adjusting my mindset. It’s obviously easier than using Managed DirectX yourself, or DirectX directly (if you’ll pardon the alliteration), and it is designed for Rapid creation of Games. With the exception of the High-Score entry (Which took a bit for me to get implemented properly) BCDodgerX was a single evening of work.
368 total views, no views today
It is a frequent point of debate in many web communities that contain programmers- or back-seat programmers, as it were, to argue that such and such language is better than another, or “if it was written in this language it would be faster”. A lot of the time this is coming from people calling themselves “real programmers” or at least putting on the air that “real” programmers don’t use managed languages like Java or C#; instead they seem to think that “real” programmers use C. And arguably, any programmer should at least be able to read C. However, a good programmer is defined not by what language they use, but their product. a Program that works is a program that works, regardless of the language it was written in.
Anyway, the idea that C- or C++ is superior to Java- or, more generally, that any language is better than another for speed- comes with the implicit assumption that there is some limit. Because you know what can probably create faster code than C++? C. And faster than that? Hand-tuned assembly. Saying C++ is superior is basically saying that it’s worth it to dump the language-based advantages that Java has over C++ for the speed improvement, but somehow it isn’t worth it to make nearly the same amount of gains by switching to C, or by switching from C to Assembly.
Thing is, unless you choose Assembly language, there will always be a language that can make your program faster. But the thing is that we use Programming languages to try to abstract away the details. Instead of having a series of direct instructions to the Floating Point Unit, placing values on the FPU and CPU stacks to perform operations in Assembly, we simply use C or a higher-level language and give them a in-fix expression as we are used to it. Can you sometimes make such code run faster in Assembly? probably; if you take advantage of U and V pipe-lining, make sure to reduce wait states and memory accesses, and so forth.
The bigger question is, “Is it worth it”. And largely, no, it isn’t worth it. In fact, it very seldom is worth it.
another point is that the primary thing you are doing with a programming language now is interfacing with libraries. C++, C, and Assembly don’t make the Libraries run any faster. On windows if you allocate memory- whether by way of creating objects in Java or C#, using new in C++, using malloc, alloc, or whatever in C, it all boils down to OS-specific functions; in Windows case, all of those eventually become calls to LocalAlloc. (Or GlobalAlloc). But whether you make that call from Java or C doesn’t make that function execute faster.
Sure, you can argue that Java or C# probably has more overhead; from new Object() to the actual Allocation there is probably some garbage collection housekeeping and allocating the various fields. But the fact is that in C you will usually be doing that housekeeping yourself anyway; depending on the nature of the memory allocation and what it is for, you will probably be making a lot of calls to malloc() and free(); and every single time is a tango with the evil poltergeist known as memory allocation problems. accidentally forget to free a block of memory and then reassign the value being used to store the pointer- leak. Accidentally call free twice on the same block of memory? double-free. All that extra code adds up, and while I don’t think it quite equals the time penalties associated with Java, which might accure to about a tenth of a second over a constant year of execution, it certainly takes a toll on anybody trying to work on the code. Having to remember “I have to allocate that structure on 16-byte boundaries, and I need to make sure all the char arrays are packed before I call this function,” etc.
And even then, you could easily eke out a similar performance gain over a C or C++ implementation by completely retooling the entire program in Assembly.
For any number of projects written in Java or C#; particularly Games- you can usually find a number of posts on forusm calling for or at least implying that the game should be re-written in C or C++ “for speed”. But why stop at C or C++? Why not call for the entire thing to be re-written in Assembly? Because it’s not worth the effort. But the thing is, it’s not worth the effort to rewrite it in C or C++, either; by the time any such rewrite is completed, computers will have gained enough speed to make the speed improvement moot. The reason that Assembly language isn’t used is because it is no longer necessary.
Programs used to pretty much have to be written in Assembly to be reasonably fast. QDOS/MS-DOS was originally coded in Assembly language. Same with every early Operating System. But those Operating Systems were dead simple by comparison. Now, C is the main language used for Operating Systems. Did C code get faster? Not really, at least not in comparison to hand-tuned Assembly. but the fact is that if writing twice as much code could make your code 10% faster, it was only worth it if that 10% speed difference actually mattered. With the 386, it did, and could often mean your program showing a chart a full second before your competition. Now, your program running 10% faster can often mean that it shows a chart imperceptibly faster, which is hardly justification for tripling the amount of code, making it difficult to maintain with low-level code, and discarding numerous useful abstractions.
That last word sort of touches on what a Programming Language truly is- just a set of abstractions. Let’s take a simple language construct that exists in nearly every language- the simple if. In assembly, the equivalent is to use a compare instruction, and then use a Jump instruction (for example, JNE (Jump if not equal) to Jump to a specific address. Most Assemblers also add features that don’t directly translate to Machine code, such as Macros, making some of this a little easier. Your typical C if statement might take quite a few lines to perform all the needed compares. But it certainly takes more work. Can you make that if run faster? Well, probably, if you are a skilled Assembler programmer. But most people aren’t, and even in the case that one is they would only do this in a time critical inner loop.
Nowadays, inner loop speed is not as critical as it used to be. While a Assembly rewrite to a critical section of code might make that code 200% faster on a 386, that doesn’t mean it would have the same effect on a modern machine, because modern machines have multiple processors, some of which need to access the cache concurrently; there are also numerous pipelining issues to consider. For the most part optimizing code with Assembly on a modern processor is not a feasible solution.
It used to be that Assembly had to be used for everything. If you used C, you were a loser and your code was too slow for general use. Then it was that C was used for everything. If you used C++, you were a loser and your code was too slow for general use. Now, it feels like C++ is used “for everything”. If you use anything else, your code is too slow for general use.
But the fact is that people sticking to these older- if arguably faster- languages are sticking also to the same languages that have made possible almost every single security vulnerability in every modern Operating System. C code just begs to be buffer overflowed, and the simplest oversight can topple a companies entire network, pass along trade secrets, or otherwise be a gigantic pain in the ass. In today’s networked world, Speed is not something that is required of programming languages, both because if it’s myriad trade-offs, such as costing more programmer time and anguish, as well as the fact that you can always buy a faster computer later. Obviously, as long as the algorithms used are sound, you aren’t going to be getting gigantic gains on your implementations just by switching to another language. Sometimes it can even make things slower. Managed languages are a good idea for Application code- and games- simply because they are for Applications. Games don’t need to be close to the hardware because, like a wife after a long marriage, the hardware hasn’t even been touched in years. All “hardware” interaction is done today either through OpenGL or DirectX, which themselves delegate through software-based drivers anyway.
Computer time is cheap. Programmer time is not.
456 total views, 2 views today
AMD, like Cyrix VIA, and Nextech (I believe was the name) were all clone makers, they made pin compatible processors for PCs. Their primary advantage was their lower cost. the AMD k5, designed to compete with the Pentium, as a Socket 5 processor just like the Pentium. The idea was that as a lower cost alternative their processors could be used in machines instead of Intel's. AMD, specifically, excelled in integer operations, doing them a lot faster than the equivalent Intel Processor. So in some cases the AMD processor was not only cheaper but also a better choice, if it was for use in applications that did a lot of integer arithmetic. Nextech was working on a new processor to compete with the Pentium and the K6; AMD bought the company and relabelled the in-development Nextech chip the K6-2; the K6 and K6-2 are completely different processors, and not in any way the same (they were basically designed by two different companies). The K6-2 supported a set of 3-D extensions (much like MMX)- whether this was Nextech or AMD that implemented it, I don't remember- at the same time it supported MMX, and it's floating point performance no longer sucked ass, and was very nearly comparable to Intel's offering.
Over time, all the other clone vendors died, or were purchases; VIA, to my recollection, bought Cyrix, made a few processors (the VIA Samuel C3 being the only one I distinctly recall) and then killed their processor division entirely, focusing on their motherboards and embedded solutions. AMD became the only competitor to intel that had any “weight”. Also, as their processors became equal to Intels both in performance and price, they started being made using different Pin designs. I believe this was originally because the socket or slot for some Pentium processor was patented so AMD couldn't make a compatible equivalent; at the very least, the Pentium itself was named the Pentium- rather than the 586- in order to prevent other vendors from using the same name.
One interesting thing about the Pentium Processor is that it is the first CISC instruction set processor to be considered SuperScalar. This is because of it's pipeline architecture which allows it to, in many cases, execute two instructions per clock cycle. The Pentium came in two revisions; the earlier versions didn't have things like MMX, and in many cases had the infamous FPU issue (Intel Errata #23). The second generation came in higher clock speeds (90Mhz, 100Mhz,and 133Mhz, as opposed to the 50 and 75Mhz of their original incarnations), as well as any number of improvements, such as a smaller die size and an on-chip APIC. It didn't have MMX, that was the third revision, which came in even higher clock speeds, FSB/Clock:66/166,66/200,66/233,66/266 (mobile only for the last one). the third revision had MMX, a smaller still die size, lower voltage requirements, a 16KB write-back cache (compared to the earlier versions 8KB). The interesting thing about some pentium boards, including those designed for slot CPUs, is that a lot of them actually had two processor slots. Usually the second one was labelled “for testing only” but you could literally plug in another processor and have dual processors. The only downside is that you pretty much required Windows NT to use them (9x doesn't support multiple cores or processors). Heck that wouldn't even work with XP Home, which only supports a single physical processor.
AMD's lower cost offerings impacted Intel's market, so they came up with their own low-cost alternatives. Which isn't too surprising given they'd been doing that for years, with the 386SX and 486SX, The 386SX being a slower variant of the 386DX, whereas the 486SX was a 486DX with it's FPU disabled. Installing the companion “co-processor”, the 487SX was actually installing a 486DX, which then took over all system operations from the SX. In addition, they created lower-cost upgrade capabilities for the 486, since the k5 was almost feature par with the pentium (and better in some ways, with 6 5-stage instruction pipelines rather than 2). To compete with this they created the Pentium “Overdrive” chip, which would be installed in a 486DX board, and take over all operations from the installed 486DX. Naturally, it was on a 486 board so some operations would still be slow, particularly bus transfers and DMA, but it sped up processor intensive tasks, and sped up a lot of tasks because of that. Later, with the K6 and K6-2 eating into their Pentium II Market share, they came up with another lower cost segment, the Pentium Celeron.
sidebar:*technically, the first Intel 6th generation processor was not the Pentium II, but rather the Pentium Pro*
Of important note is that the first Celerons were not Pentium processors, but rather Pentium II processors; it took a generation for Intel to catch on to AMD's low-cost niche tactic and come up with a response in the celeron. The Celeron was typically a slotted processor, at least all that I've seen are. The basic difference is that it has less on-die cache, and no L2 cache (some revisions had 128KB, compare to the Pentium II's standard 512KB). Ironically, the Celeron usually performed much worse than the K6 and K6-2 it was designed to compete with; Not to mention the awkwardness of the slotted processor design. Even so, and particularly through partnerships with retail computer manufacturers, Intel was able to squeeze the Celeron boards into the market. (the “Barbie” and “Hot wheels” machines from mid to late 1998 are a good example of this, since they sported celeron processors). The Celeron Brand lives on, but it is still a lower cost alternative to their other offerings, and is almost never a wise choice for a desktop machine. Many users are woo'd by the higher clock speed, but with so little cache, the clock speed barely compensates.
The 6th generation gave us the above Pentium II's, Celerons and K6-2s; the seventh gave us Pentium 2s…. Wait? P2s? What about Pentium III's and K6-2s? Well, they aren't 7th gen processors, since they are based on the same die as the sixth gen chips (for Intel, this was the Pentium Pro, for AMD, the K6-2).
The Original Pentium III was practically a Pentium 2 with SSE (MMX2) and a higher clock speed. An interesting sidebar is that the P6 chips from intel (pentium Pro, Pentium II, and Pentium III) are only fully utilized by NT versions of windows; since the Microops that the CISC instructions are reduced to are optimized for use with 32-bit code. windows 9x executes a good half of the time in 16-bit mode (for compatibility with older software, mostly) so you don't get the biggest improvement with it.
Intel failed miserably on their first attempts at a consumer-appealing x64 architecture. The Itanium was 64-bit, but it's execution of 32-bit code had to be fully emulated. It found some uses in business and servers, but it's limited ability with 32-bit code abhorred it's adoption in the consumer sector.
AMD created it's own 64-bit processors, but made it so that 64-bit was just another “mode” of the processor. In this way, 32-bit code could be run quite easy with minimal virtualization. Intel followed suite with their own extensions that implemented the same instruction set as AMD, making it compatible.
I'm not nearly as familiar with their history after around the Pentium III/Athlon XP area.
The two are practically the same now. They offer consumers a choice, but at the same time that choice is practically useless. The fact is that we've pretty much hit the architectural limit that different die configurations can give us, and we are not easily able to reduce the process further without invoking the dangers of quantum tunnelling. The best considerations for the future is to add more processor cores, and, even more important, have software that is better able to extort the best power from those cores. My opinion is that the big problem right now is not the hardware, or the software, but rather the programming languages that are dominant in the industry today, largely C/C++. What is needed is the adoption of one of the myriad languages that have built in support for concurrent execution of constructs; for example, some languages are able to compile a simple for iterative construct in a way that it can execute on separate cores. This approach is particularly powerful in a stateless environment, such as a functional language. to that end many functional languages include built-in concurrency support. What makes this particularly interesting is that most programmers think of “concurrency” and immediately think of threads; but threading is only one of the ways that concurrency can be achieved, and it is one of the least powerful, as well. Erlang, for example, takes the approach of sending messages between processes, instead of having different threads. Since Erlang is a functional' language, most of it's constructs are largely stateless; this is as opposed to most imperative languages which are typically state-heavy. It is the abundance of state in our standardized' programming languages that is causing the difficulties we are seeing with concurrency, not the cores or the implementations thereof. Consider for a moment that most of the benchmarking tools being used to compare processors are he written in C/C++. In order to trust the performance results, you have to trust that the code is making the best use of the available hardware. But the fact is that imperative stateful programming abhors concurrency; threads deadlock, and you have data synchronization issues and race conditions to deal with. So, while processor performance benchmarks might state that a Bulldozer is “worse” than another CPU, I move that that result is as much a testament to weaknesses in the program and the stateful imperative programming paradigm at least with regards to it's use with concurrent solutions. This is why I have never put faith in benchmarks; the fact is that any weakness being shown could easily be an oversight or problem with the software being used to test. If a benchmark tool only uses two threads, how can you trust it's result when it runs on 6 cores? And even if it was to use more, you're still placing your trust in how the program was written. And while one could argue that the test will show how a lot of current software and games run on a given system, it doesn't test the actual potential of that system; a properly written game could be written to take advantage of 6 cores and it would scream compared to running that program with fewer cores. At this point, concurrency is the answer to improving system speed, and in order to properly leverage concurrency, we don't just need more cores, but we need software and programming languages that provide built-in support for concurrency constructs. C/C++ simply does not offer this, and while I'm sure a library could be written that does, there are already loads of languages that provide built-in support for concurrency in any number of different ways; either through C# and .NET's addition of parallel constructs in C# 5.0, or the ability of functional languages to make assumptions because the code is primarily stateless and thus easier to make parallelize.
Personally I don't have a preference for either. I used a K6-2 for nearly 5 years, a Pentium 4 for about a year, and am now using Intel Q8200 and a laptop with a Intel T3200 (I think). Maybe my next build wilent l be AMD, I don't know. Either way, I'm not going to base any of my choices on how a given system performed with a piece of software. The heart of the matter is that I never trust software. I don't even trust software I wrote half the time. Software is a loose thread on a sock. If you pull out the thread, the sock is going to fall down regardless of how well formed the ankle is, and you cannot declare “this ankle sucks, because my sock keeps falling down” just as you cannot say “This hardware sucks, because this piece of software says so”.
350 total views, 2 views today
And now, for something completely different- a post that actually is about programming. A short overview of anagrams and a breakdown of a common algorithm that is used for discovering them.
An anagram is one word that is another word when you rearrange the letters; for example, “deposit” and “posited” are anagrams of one another; as are “dare” and “dear”. The idea behind the algorithm is that, given a list of words, to find out which words are anagrams of one another. This sounds fairly simple- one could conceivably create a function That determines if two words are anagrams of one another (by looping through the characters, comparing length, etc) and then use that function on every single possible pairing of the words in a dictionary.
This approach is simple-minded, and will take an intractably long time based on the number of words. For example, Linux systems generally have a word dictionary located at /user/share/dict/words not unlike the one you can download here ; in this case, the dictionary contains over 235,923 different words. Simply iterating through every word and comparing it to every other word will take 235,923 squared calls to the aforementioned “isAnagram” function; that’s over 55,659,661,929 separate comparisons! Even if the IsAnagram Function only took a millisecond to execute, that’s still over 55,656,661 seconds, or over two years . Clearly this is not a good solution.
One method that problems like these can be solved is to realize that the best way of going about it is to change the problem into something that will “sort itself out” so to speak. For example, rather then searching through every word and comparing it to every other word to find a match, one would be better off finding out what the words that are anagrams have in common, and how we can use that to our advantage. In the case of anagrams, what they have in common is rather simple; they have the same letters. The question is, how can we leverage this in a way that makes it easier to find groups of words that are anagrams of one another?
If we were able to create a unique value that all anagrams share that can be used to correlate them, we can create a data structure and only iterate through the list of words a single time. This value is the sorted letters of the word.
For example, both posited and deposit, when their letters are sorted create deiopst; in fact, it seems quite logical to conclude that all words that are anagrams of each other will create the same sorted sequence. This makes the entire thing almost crystal clear.
Instead of comparing every word to every other word in a brute-force approach, we can instead use a List indexed by the sorted strings, where each item contains a list of the words found that “sort” to that set of letters. For each word, we sort the letters, and then we look in the dictionary; if there is a existing item for that sorted set, we add it to the list that is already there, otherwise, we add a new one. Rather straightforward.
This is in fact a very old algorithm, conceived as early as the 60′s. The original concept was merely to bring together those words that were anagrams; the list was sorted based on the sorted letter set of each word, meaning that words whose component letters sorted the same would be contiguous in the list.
Here is the implementation of the Program in C#:
Here, I leverage Generics, and also nest Generic Classes within one another; note the Dictionary
992 total views, no views today
Companies and businesses want to prosper. At the core, being a successful business means that you make a sizable revenue; But, when those start to falter, how do businesses determine the cause? They generally measure something; for a programming firm, this may be lines of code written. For a restaurant, perhaps the time it takes for a customer to be seated, and so forth. And on the surface, it makes sense.
The problem is, it only makes sense, in theory . Why? because both methods are trying to tack a quantitative statement onto something that cannot be measured in this way; additionally, they forget the human factor.
Let’s look at a hypothetical situation; Programmer Joe. Programmer Joe has worked at a Software startup since it’s early days, and enjoys his job. However, one day, the CEO calls them all into the office to discuss declining revenue. Realistically, the revenue is declining because a competitor has just released a product that directly competes with the startup’s flagship software package. However, management believes that the loose leash they’ve been keeping the programmers on is to blame; so one of the idea men determines that they should implement a management technique known as metrics. In this case- they start with something basic; each programmer’s performance will be judged directly correspondent to the number of lines of code they write.
Programmer Joe continues his work as normal; he notices some of his co-workers checking in code that re-rolls standard library routines, and cuts out excess code like this. One day he’s called into the Project Manager’s office.
“Joe… I don’t know quite how to explain this- maybe you can. Somehow, you’ve managed to accumulate a [i] negative [/i] lines per day metric… any idea how that is possible?”
“Well, I’ve noticed a lot of redundant code being checked in, and trimmed it down. the code still works the same way- it’s just faster; also, I’ve managed to fix half the bugs on the list just by doing that.”
The Project Manager may, in this case, take this particular information to the higher levels. they decide that the metric is flawed- what they need ot measure is lines of code written minus the number of bugs introduced.
A feature request arrives from a client near the end of the day, at 4:30PM. the Project Manager asks if Joe can look over the request and if he can maybe stay late to try to get some of those features implemented. So Joe spends the next three hours creating a prototype for the new feature and once he has the main functionality down he checks the code in and goes home. The next day, Joe continues his work on this new feature. However, he is once again called into the Project Manager’s office.
The Project manager hands Joe a large stack of papers.
“you know what those are, Joe?”
“Those are the bug reports filed on code you’ve checked in recently.”
“All the code I’ve been working on recently has been a prototype… I haven’t yet gotten it fully integrated into the rest of the system.”
I think you can see where I was going there. basically, the fact is that havign a measuring metric that measures small fiobles in the entire process is doomed to cause hiccups within that process and even bring business to a standstill. In this case, rather then worry about the number of lines of code written or the number of bugs introduced, it might be better to focus on fixing the bugs and adding features; the number of lines of code in a project does not directly correspond to the quality of that project as a whole, and in some cases can even do so inversely.
In a strange coincidence, a local Coffee shop that Programmer Joe visits on his way to work had noticably changed. When Joe used to go though their drive thru to grab his morning coffee and a box of donuts for his colleagues, the staff were friendly and often asked him how things were going. Recently, however, Joe has felt like a piece of scrap metal on an assembly line. Once he got to the pay window, he would often hear audible grumbles of discontent as he simply reached into his pocket for his wallet. His donuts have often been obviously simply thrown in the box with little care, too. Joe couldn’t understand it, since these were the very same people that would serve him before. Eventually, Joe decided to go elsewhere for his morning coffee.
The previous paragraph is not an entire fabrication; in fact, I work at a location that does just this. They time every single drive thru customer’s time at the window, and it’s treated as the single most important measurement of performance in the entire store. I’d even go so far as to say it seems to be used to reflect directly customer satisfaction. However, this simply is not the case. The franchise, in particular, places a recommended limit of 42 seconds for waiting at the window. a reasonable time frame, depending on the volume of customers. However, at my location it has now been decreed that we shall not take longer then 30 seconds or, from what I hear of others, they will be verbally “abused” about it.
In any case, a little backstory is probably in order. Lately it would appear that revenue has been dwindling; there are far fewer customers then I remember coming in, at all times of the day. Additionally, various seemingly pedantic rules have been places on our release of such seemingly trite things as butters and napkins. It’s fair to assume in this case that they obviously want to bring business back up again- and that is certainly something anybody in that position would try to do.
However, with that said, they are going about it wrong. In the Retail and Customer Service industry- where almost all revenue is coming from consumers who purchase your product by coming to your establishment, the all-time, 100% most important thing for business is customer satisfaction. No exceptions. Interestingly enough, since this has started, I’ve gotten dozens of complaints from Customers who visit regularly about the shoddy way they were treated while going through the drive thru at some other time during the day. This certainly is not the fault of the workers at the time; since, as I said, they are essentially being “forced” to try to reduce times to <30 seconds. But when you get customers, the main source of revenue for the store, complaining about something that is the result of a change to the store policy that was introduced to try to increase revenue it is pretty solid evidence that the technique has failed miserably.
The problem here is not that those in charge know, as well as anybody else, that customer satisfaction is the single most important thing to the stores success, but rather in that they have tried to assign a single metric to measure this particular quality. And they do not correspond. I certainly won’t argue that customers prefer fast service— it certainly is on their list of hopes when they enter a drive-thru to be out as fast as possible— but I think what one needs to realize is that having their orders (literally) thrown into their car for the sake of speed of service doesn’t please the customer. They aren’t going to think, “well, golly, they threw the sandwich right into the passenger seat and refused to carry on a quick little bit of small-talk while I waited, but damn, it took only 30 seconds, so I am going to say I am satisfied”. I’m sorry, but this does not happen. As long as a customer is not waiting an exceedingly long time for their purchase, they tend not to notice it. Think of it this way- there is the old adage where you can either have fast service, good service, or right service, but you can only choose two. I put forth that, for many consumers it also holds true that when one of these is omitted, and the other two are done in a way that exceeds their expectations, they are likely to generally be satisfied. For example, if a customer is greeted with courteous fervor, and perhaps a friendly conversation ensues in addition to their order being made perfect, they are less likely to notice that they had to wait in the line for a few minutes. Now, if they had to wait that same amount of time and they were treated brashly, they are more likely to take offense. In fact, the single most important thing to almost all customers is not the time they wait, or even the fact that their order is made perfectly to their specifications, but rather to be treated as people, and not as some mindless consumer whose particular interests and concerns are of no importance. The sad fact is using this particular measurement metric encourages the workers to do the latter.
The thing is, there are even further flaws in the mechanism. First, the device doesn’t even work half the time, so times come out skewed and sometimes two vehicles get counted in the same interval. Additionally, since the time measured is the entire time the customer is at the window, this is not simply a measure of the efficiency of the workers to get the product to the customer but also a test of how quickly the customer can pay for their product. if a customer needs to dig for change for 10 extra seconds after the employees have successfully finished their entire order and simply await payment, why is this 10-seconds attributed to the detraction of the employees? “Because we have no way of knowing” the people who read the recorded times may say. The issue here is that obviously if there is one fact you don’t know about what the measurement indicates then there are certainly more, and in fact it could even be put forth that the measurement is completely meaningless. a Customer could be at the window for 10 seconds and still not be pleased, be it because of the shoddy service received while being squeezed through in a tiny window of time, or because they got their order mixed up with another, or some other error instigated by the fact that inhuman demands are made of people there. On the converse, a person could wait for a entire minute at the window and still be perfectly pleased with their service, so using the measure of window time as some sort of barometer by which to judge the performance of a business is downright ridiculous, and this only multiplies when one considers that such time-based constraints are not placed on those consumers who decide to enter the establishment for their service.
And while the latter example has certainly acquired far more attention for personal reasons, it certainly is no more or less ludicrous then the software companies implementation; performance metrics have been tried in nearly every industry and in every industry they have failed to provide the hoped for results. The key here is that the measure is not strictly of customer satisfaction or even of revenue; but more directly, of the value of the business.
Value is a perception and must be communicated and measured to be perceived as value. Value measurement is the process by which management decides on operational performance measures that will enable them to secure the the owners return on investment. Value measures must be aligned with the business strategy positioning the business. Measurement includes measuring lead factors and lag factors. Lead factors identify performance measures that will proactively provide an indication of whether objectives will be achieved or not. Lead factors allow management to receive warning sign proactively. Lag factors measure how successful management was in terms of creating value. Financial reports are the ultimate lag measures of success. When a business lead measures indicate that the business is performing well but the business lag factors are showing the contradictory then it is time to review the lead measures.
The value measurement process determines the behaviour of the business and aligns the behaviour of human capital with the value expectations of all stakeholders e.g. owners, management, customers, employees and partners. Value should be measured and reported on from every stakeholder’s perspective. A true balanced scorecard will drive performance improvements for all five stakeholder dimensions. This all applies to ANY business, regardless of industry.
Flipping quickly back to software, an excellent overview of the problems present with Performance metrics as used in that industry can be found here: http://discuss.joelonsoftware.com/default.asp?biz.5.304155.19
258 total views, 2 views today
In my previous entry, I discussed and provided code that would allow for the detection of Triple-Clicks on a window. By extending the provided architecture, it is relatively trivial to allow any number of consecutive clicks; for example, detecting pentuple clicks.
412 total views, no views today
When entering the domain of programming, one finds a plethora of programming languages to choose from; and there are more being introduced nearly every day. Each language was designed for a purpose. From Assembly language, designed simply as a way to write machine code easier, to C, designed to be as “close to the machine” as possible without actually becoming a low-level language, and Perl, designed for text processing.
The interesting thing? They don’t have to be used as designed. Perl, while created by Larry Wall to make text processing of reports easier, has been applied in all sorts of problem domains effectively. Visual Basic can be used to write STDCall DLLs. Scripting languages, once relegated to creating silly animations to annoy users visiting web pages, has found use both on the desktop as well as server side web processing.
As the Information Technology industry has aged so too have the techniques used for programming. new techniques are tried all the time. Previously, the widely held winning formula was called “Structured programming” because, rather then using Goto’s or conditional jumps, the language itself contained constructed for “structuring” code, a method that allows the code to be more readable and organized. It also embraced a top-down approach, whereby the “function” is created before the routine that requires it. Original “pioneers” for this strategem are numerous, but the recognized “original breeds” are Pascal and C. Pascal was created as a teaching tool. by enforcing a number of things widely held asgood programming practice as part of the language, it “conditioned” new programmers to keep these habits when they moved to other languages. As Pascal evolved, however, it was made more useful, and more applicable to the creation of real applications. C, on the other hand, was designed to allow programmers easy access to a wide variety of functions, both of the language as well as the “bare metal”. This “bare metal” approach was facilitated with “naked” functions as well as inline assembly.
The “Structured Programming” concept dominated the academic circles, and was implemented as a standard by programming firms the world over. However- there was a pattern to how it was used, and this pattern, once recognized, was used to develop the “next leap”.
Program source code that was “structured” had it’s many components separated into various modules, and procedures. Each module would often hold the structure definitions and the procedures to handle them. For example, a program designed to graph functions might have a module, perhaps, modFunctions, depending on the naming convention. This module would define the “structure”- or in memory representation – of the various properties of a function- the expression, range, etc. The routines in the module would be used to manipulate draw, set, and retrieve the various items stored in the structure.
Eventually, this was migrated into the structure itself- that is, a structure might be defined to hold both the data relevant to the function as well as pointers to routines that manipulated it. The common convention was that the first parameter to these routines would be a pointer to the structure being acted on.
This type of programming was first implemented in C, and it worked great. essentially, it combined the concepts of Functions, and the data they manipulated, and created a single entity- the resulting structure. Oftentime the only routine required was a function that returned a pointer to the structure, after initializing the various function pointers. this because known as a constructor.
The problem with this technique was simply that there was a lot of work involved with using this technique- initializing pointers and other “bookkeeping” type tasks. the concept, however, was solid. Languages sprung up that implemented this technique internally, and exposed only those parts of the “object” as it became known, that needed to be coded or used. The concept spread like wildfire. Languages that once conformed to the “structured” paradigm were adapted and given object-based features. Pascal variants known as “Object Pascal” which more fully implemented the loosely object-like syntax of Pascal itself, were created. C was extended into C++, implementing nearly all of the Object Oriented concepts, such as inheritance and interfaces, and so forth. BASIC, in the form of Visual Basic, got Object Based, and later, with VB4, loosely object oriented. Entire languages sprung up on the concept that a “purely” object-oriented language would be the best choice. An early language of this form- or more precisely, the first more popular language of this form, was Java.
With Java, everything was an object. a simple console application was comprised of at least one object. at this point, the code that created the objects, – the very “template” from which they were instantiated, were known as “classes”. it was not until a “class” was instantiated that it became an object.
While Java, with it’s pure Object Oriented approach, created clean code as opposed to a structured language, or half-object based language, it also had shortfalls. While Object Oriented Programming was a excellent concept- it couldn’t always be used to apply to everything, and oftentimes a short quick little program was all that was needed. Java was however well suited to internet applications, since it was also a byte-code based interpreted language.
Now that the Object Oriented approach has matured and entire frameworks have been created based on the concept, such as .NET, one cannot help but wonder wether the idea is to make it easier for the programmer or the user. Just because a language makes it easier to program, or a certain conceptual method of creating said program makes it trivial to do something, doesn’t mean that the user’s experience with the application is any better. What if a user requests a feature not implemented by the run-time? Sure, there are “native” method calls in almost all such language, but is this not, in some sense, an admittance of defeat? that the Object Oriented approach is not flawless? Or, perhaps it is less controversial- maybe it is simply a lesser admittance of failure on the part of the Object Oriented design set forth by the creator of the framework. But the fact remains- the framework does not make implementing the feature easier. Is this not what the framework is for, however?
So the ultimate question is- are we any better, now, with class framework and libraries all around us, then we were when we had basic Input/output and generic function libraries, and were left to fend for ourselves with all sorts of mundane programming tasks that we now take for granted?
131 total views, no views today