Roman Numerals

April 29, 2021 - Programming

I recently stumbled upon, this page discussing the topic of Roman Numerals.

What alarmed me was the amount of code presented there for solving the problem. The reason I found it alarming is because I have a rather small, 20~ or so line function in Visual Basic “Classic” that accomplishes the same task. The difference seems to be that I solved the problem differently. (Also, I didn’t have support for the ‘overline’ Numerals, though it was fairly easy to add those in.)

The way my algorithm works is that it goes through each place value, starting from the ones place, and determines the value that fits for that place. A text string of the Roman numeral characters and an offset into that string is altered as the iteration proceeds.

First we define the Roman Numeral digits. Since we want each one to be one character, we use placeholders here for the overline values- (SQPYRZ map to overline IVXLCD respectively). The algorithm itself takes each digit, decides what to add to the result string, prepending a value to the start of the existing result. The selected value is chosen based on an offset that starts at 1, increments by 2 for each place value in the decimal value, and gets an added 1 added on if it is above 6. (This is to address the overline I, which is only really used in combination and not independently.

After the result is built, the placeholder characters used for the overline roman numerals are replaced- In this case I have the value with a backtick in front of them. (there is a unicode code point that can also be used, though in this specific implementation it doesn’t work so well due to the limitations of “Classic” Visual Basic.

When I initially revisited this ancient routine, it actually did not support overline at all. I thought perhaps that is why the linked implementation is so long in the tooth. But, I also noticed that for values that use an overline, the value I received as a result was actually the same as the correct result, and merely lacking the overline digits. I just had to add the additional values to the string array- but of course we needed to know they were different and thus the “placeholder” characters.

After finally grasping how the algorithm accomplished it’s task, I set about writing a C# implementation. It uses the same algorithm, though I did simplify some of it thanks to the added tooling C# provides, such as a local function and Enumerable.Repeat() to condense some of the cases together

A nice, fairly short little function. Instead of direct string manipulation the C# version uses a StringBuilder and a helper local function for prepending to said builder. (Though realistically how large most Roman numerals will be makes it a bit of premature optimization to use a StringBuilder) The idea behind AsciiReplacements is that an enumerated value could be passed in to the function to indicate whether to use Ascii replacements, or unicode replacements that use the combining overline code point to create the overline characters. I believe right now the value will crash beyond a certain number of digits, since I will exceed the length of the digit string. This is actually somewhat C# specific- the Visual Basic implementation got away with it as the mid() function used there will return an empty string for values beyond the length of the input string, which results in incorrect values lacking the appropriate higher place values (eg. what I witnessed with values that otherwise would require overline roman numerals), but in the C# implementation, Substring() will in fact throw an exception instead. The linked implementation, I believe, addresses this. I think this could be handled by dealing with values that are 4 million or larger in a special way. I think the other implementation already addresses that.

Now onto a more stylistic disagreement, and something that bothers me a bit about the linked code sample- The gigantic identifier names. Now, Clean Code proposes that we not be afraid of long names, which may be the philosophy that was employed here. But having a simple comparison between the parameters wrapped as a bool function named “CanInputBeWrittenAsSmallerNumeralInFrontOfLargerNumeral” still doesn’t seem very “clean” to me, and that it is possibly a misapplication of the rules described in The book regarding descriptive function names. It feels like in this case, The use of longer names is used to try to avoid comments; but that is not my understanding of the central ruleset being proposed by the book. It says we shouldn’t be afraid of using a long name; it does not say, “use long names to avoid commenting your code”.

Which isn’t to suggest my implementation is better. If anything it has the opposite problem; Many of the variables are a single character- What is I? What is N? Shouldn’t Temp be called Result? There is room for improvement in the opposite direction in mine- I mostly left the artifacts of the original Visual Basic implementation which had some questionable variable naming.

Have something to say about this post? Comment!