Algorithms and specifications
Feb 19, 2020
In this article I want to talk about 2 topics in which I think that most programmers tend to fail: Algorithms and Specifications.
I’m going to use a real experience, which happened some time ago:
For 4 years now I have maintained a multi-currency node (Bitcoin, Bitcoin Cash and Litecoin) called Knuth (a.k.a. Bitprim). In November 2017 Bitcoin Cash made its first protocol change after its birth in August of the same year. My job at that time was to update the code of our node to support the protocol changes. From that moment I want to write this article, but … for one or several reasons I did not do it at that moment, I am doing it now.
The most important change was in the Difficulty Adjustment Algorithm, from now DAA.
Here the description of the algorithm.
I do not want to go into detail about the concept of difficulty or the DAA. For this you can refer to: Difficulty.
What interests me are points 2 and 3 of the description of the DAA:
Both point to the footnote [2]
:
The specification of the algorithm points to its implementation, in a function called GetSuitableBlock. Here the code:
What the algorithm does is basically create a sequence of 3 elements (array), order it from least to greatest and return the second element.
The complexity in time of this algorithm is:
- Best case: 0 swaps, 3 comparisons
- Worst case: 2 swaps, 3 comparisons
- Average case: 7/6 swaps, 3 comparisons; assuming a uniform distribution of the input data.
Now, look again at the algorithm. An array is being created (using the input data), then sort it up and return the middle element. This is a known algorithm and is called median, in particular, median of 3 elements.
The median is a selection algorithm. Unlike the sorting (inplace) algorithms, the selection algorithms should not mutate the input data, but return one of the elements.
Here is a sketch of the median of 3 algorithm, in C++
:
Or if you prefer the inline version of the algorithm:
I leave the analysis of the code for the reader, for the lazy: what the algorithm does is simply select the middle element between a
, b
and c
, pretending that the 3 were sorted in ascending order. It does this without mutating or reordering the input data.
The time complexity of median_3
is:
- Best case: 0 swaps, 2 comparisons
- Worst case: 0 swaps, 3 comparisons
- Average case: 0 swaps, 8/3 comparisons; assuming a uniform distribution of the input data.
Now, we could use our new algorithm in the original GetSuitableBlock
function:
Much shorter and understandable, right?.
Before continuing, we have to fix something: we do not know if the Natural Ordering specified in the CBlockIndex
class is given by the block’s timestamp (nTime
attribute).
We need a version of median_3
that accepts a form of comparison specified by the user: we need you to accept a strict weak ordering relation (for more information see here).
Now, we can correctly implement GetSuitableBlockNewVersion
, comparing by nTime
:
We have one last problem to solve. Let’s make a small test of the original algorithm and the new one:
The code above prints:
What we are trying to prove with the previous code is the stability of both algorithms. Our median_3
algorithm is stable which means that the relative order of the equivalent elements is preserved (for more information see here).
To prove it with data, we will use the previous example, in which we have the following input data for our algorithms:
Where the first element of each pair is nHeight
, the identifier of the block, and the second element is the timestamp called nTime
.
Note that the nTime
of the first 2 elements is the same.
If we sort the previous sequence by nTime
using a stable ordering algorithm, such as Merge sort we would have something like this:
Note that the middle element is the one with nHeight = 1
. This indicates that our algorithm behaved in a stable manner but not the original algorithm used in the Bitcoin Cash DAA.
In my first implementation of DAA in the Bitprim node I used a code similar to median_3
which was also stable, since I had not verified the code of the specification, I had mistakenly assumed that it was also stable.
Then this caused runtime errors of our node on a difficulty change. It did not always happen, but there was a particular case in which we could detect it. After several hours of debugging I could detect that the problem was that the algorithm used by me was not compatible with the “specified” in DAA.
Therefore, I had to “correct” my algorithm to make it non-stable in the same way as that of the specification.
Actually, if I remember correctly, the first version of the DAA specification did not mention the GetSuitableBlock
code, but said that the median of 3 elements was calculated. Since the implementation of the median was “incorrect” they had to adapt the specification to be consistent with the code.
Keep in mind, that once the code of a Bitcoin node (or any cryptocurrency) is in operation, a modification in its behavior introduces incompatibilities with previous versions and produces the so-called forks. So once the code is running, it’s about not changing it. For this reason it is why the specification had to be adapted instead of correcting the code.
Before finishing, let’s make a comparison of both algorithms, GetSuitableBlock
vs. median_3
:
median_3
does not make any swap,GetSuitableBlock
can make between 0, 7/6 or 2 swaps, unnecessarily. (Efficiency)GetSuitableBlock
creates an array, unnecessarily. (Efficiency)median_3
performs 2, 8/3 or 3 comparisons,GetSuitableBlock
always performs 3 comparisons. (Efficiency)median_3
is stable,GetSuitableBlock
is not.median_3
is what anyone expects from an algorithm that calculates the median of 3 elements. (Correctness)
And now, to conclude, some conclusions:
The author of the DAA specification could have chosen a known and “standard” algorithm, but he did not. And perhaps the worst of all is that the specification refers to the code. The code must never be specification. The code must be created from a specification. So if a specification refers to code, there is no such specification.
Bye!
Share