Language Performance Part XII: C++ | BASeCamp Programming Blog

This is part of a occasionally updated series on various programming languages. It should not be interpreted as a benchmark, but rather as a casual look at various programming languages and how they might be used by somebody for a practical purpose.
Currently, there are Articles written regarding Python,C#, Java and VB6 (merged for some reason),Scala,F# & Ruby,Perl, Delphi, PHP,C++,Haskell,D, VB.NET, QuickBASIC, and Rust

Though I didn’t plan on it, somehow I got around to actually implementing the C++ variant of the anagrams program I have been re-implementing in various languages to try to compare their linquistic and performance attributes.

The algorithm is relatively simple. it seeks to take an input file, a list of words, and determine which words are anagrams of one another. This is done by sorting each word (by it’s letters) and using that sorted word as an index into Hashmap or Hashtable or equivalent structure, adding the original word to a list stored in that structure. When the program is complete, each entry in the hashmap that has more than one item contains words that are anagrams of one another.

The C++ program I created that did this is this:

#include "stdafx.h"
#include <string>
#include <algorithm>
#include <vector>
#include <hash_map>
#include <iostream>
#include <fstream>
#include <ctime>
using namespace std;

//function for sorting a word.
string sortword(string word){
	string sortedword = word;
	sort(sortedword.begin(),sortedword.end());
	return sortedword;
}
int _tmain(int argc, _TCHAR* argv[])
{
	//hash_map<string, int> yeah;
	hash_map<string,vector<string>> WordHash;

	string line;
	ifstream myfile;
	clock_t startTime = clock();
	myfile.open("D:\\dict.txt");
	cout << "processing..." << endl;
	while(myfile.good()){

		getline(myfile,line);
		string sortedword = sortword(line);
		if(WordHash.count(sortedword)==0){
			vector<string> createme;
			WordHash[sortedword]=createme;
		}
		WordHash[sortedword].push_back(line);

	}
	cout << double( clock() - startTime ) / (double)CLOCKS_PER_SEC<< " seconds." << endl;
	cout << "Finished." << endl;
	cout << "processed " << WordHash.size() << " entries.";
	int test;
	cin >> test;

}

#include "stdafx.h"

#include <string>

#include <algorithm>

#include <vector>

#include <hash_map>

#include <iostream>

#include <fstream>

#include <ctime>

using namespace std;

//function for sorting a word.

string sortword(string word){

string sortedword = word;

sort(sortedword.begin(),sortedword.end());

return sortedword;

}

int _tmain(int argc, _TCHAR* argv[])

{

//hash_map<string, int> yeah;

hash_map<string,vector<string>> WordHash;

string line;

ifstream myfile;

clock_t startTime = clock();

myfile.open("D:\\dict.txt");

cout << "processing..." << endl;

while(myfile.good()){

getline(myfile,line);

string sortedword = sortword(line);

if(WordHash.count(sortedword)==0){

vector<string> createme;

WordHash[sortedword]=createme;

}

WordHash[sortedword].push_back(line);

}

cout << double( clock() - startTime ) / (double)CLOCKS_PER_SEC<< " seconds." << endl;

cout << "Finished." << endl;

cout << "processed " << WordHash.size() << " entries.";

int test;

cin >> test;

}

The execution time of this little bugger? around half a second, on average. This puts it on par with F#; but given it is C++, you would expect it to have the blazing speed that people often attribute C++ with; Too often you see people saying “Language X is too slow; they should rewrite this in C++”. Now, is this implementation of the algorithm as I’ve done it in C++ the fastest possible one? No, certainly not. But I think the code maintenance consideration is a big one. It makes sense to strive for the fastest possible program, but when you are only getting tiny gains from a complete rewrite and have to start really digging into the implementation to make things faster, I think you’ve passed the point where it can be particularly useful.

In this specific case, the RAII that I use appears to have a rather large effect on performance. the inner conditional if, for example, constructs a vector, and then tears it down when it goes out of scope. the cost of the delete each iteration ends up with a total time of around 18% of the entire program run. An optimization here might be to use a dynamically allocated vector. Another option might be to read the documentation: the entire if block is actually redundant, since the HashMap will actually construct the vector itself as part of it’s templature. As a result, the new code is:

#include "stdafx.h"
#include <string>
#include <algorithm>
#include <vector>
#include <hash_map>
#include <iostream>
#include <fstream>
#include <ctime>
using namespace std;

//function for sorting a word.
string sortword(string word){
	string sortedword = word;
	sort(sortedword.begin(),sortedword.end());
	return sortedword;
}
int _tmain(int argc, _TCHAR* argv[])
{
	//hash_map<string, int> yeah;
	hash_map<string,vector<string>> WordHash;

	string line;
	ifstream myfile;
	myfile.open("D:\\dict.txt");
	cout << "processing..." << endl;
	clock_t startTime = clock();
	while(myfile.good()){
		getline(myfile,line);
		string sortedword = sortword(line);
		WordHash[sortedword].push_back(line);
	}
	cout << double( clock() - startTime ) / (double)CLOCKS_PER_SEC<< " seconds." << endl;
	cout << "Finished." << endl;
	cout << "processed " << WordHash.size() << " entries.";
	int test;
	cin >> test;

}

#include "stdafx.h"

#include <string>

#include <algorithm>

#include <vector>

#include <hash_map>

#include <iostream>

#include <fstream>

#include <ctime>

using namespace std;

//function for sorting a word.

string sortword(string word){

string sortedword = word;

sort(sortedword.begin(),sortedword.end());

return sortedword;

}

int _tmain(int argc, _TCHAR* argv[])

{

//hash_map<string, int> yeah;

hash_map<string,vector<string>> WordHash;

string line;

ifstream myfile;

myfile.open("D:\\dict.txt");

cout << "processing..." << endl;

clock_t startTime = clock();

while(myfile.good()){

getline(myfile,line);

string sortedword = sortword(line);

WordHash[sortedword].push_back(line);

}

cout << double( clock() - startTime ) / (double)CLOCKS_PER_SEC<< " seconds." << endl;

cout << "Finished." << endl;

cout << "processed " << WordHash.size() << " entries.";

int test;

cin >> test;

}

The new execution time averages around 0.4 seconds- so we saved about 18% of the execution time, as one would expect. And the resulting code is much shorter, which cannot be overstated.

But still- this difference does not really convince me of C++ being the “fastest” language. Surely you can write faster code in C++ if you know exactly what you are doing, but that isn’t usually needed. In this case, we are dealing with over 200 thousand words, and truthfully very few of the programs I’ve presented on this issue really present scalability problems; even Ruby’s ten seconds could be mitigated if necessary using any number of caching techniques. Of course the downside here is that such improvements require more code, which means more interpretation, which means it could very well end up right back where it started anyway.

Have something to say about this post? Comment!