Fibonacci: You're Also Doing It Wrong
Posted
Earlier today, I saw C++ Weekly Ep 13 Fibonacci: You're Doing It Wrong and in my opinion the suggestion made there to use Binet's formula to calculate arbitrary elements of the Fibonacci sequence is dangerous. And as somebody is wrong on the internet I decided to write this short blog post on the topic.
At the first glance, using Binet's formula seems like a great suggestion. It is a closed form solution. It has complexity, i.e., fast. It is easy to implement. And more.
However, as usual when floating point is involved, danger lurks around every corner.
Although the complexity seems attractive, it is only if
pow
is Which it is for basic datatypes, but in other cases, such as arbitrary precision floats it is far more likely to be Alsopow
is really, really slow. We'll show later how slow it really is.Although the implementation seems simple, it is easy to get wrong. In fact the version shown in the video is wrong.
constexpr int fib(const int i) { const static auto sqrt_5 = std::sqrt(5); if(i == 0) return 0; if(i == 1) return 1; return static_cast<int>((std::pow(1 + sqrt_5, i) - std::pow(1 - sqrt_5, i)) / (std::pow(2, i) * sqrt_5)); }
This version is wrong for values of
i
as small as 10:0: 0 0 true 1: 1 1 true 2: 1 1 true 3: 2 2 true 4: 3 3 true 5: 5 5 true 6: 8 8 true 7: 13 13 true 8: 21 21 true 9: 34 34 true 10: 55 54 false 11: 89 89 true 12: 144 143 false 13: 233 232 false 14: 377 377 true 15: 610 610 true 16: 987 986 false 17: 1597 1596 false 18: 2584 2584 true 19: 4181 4181 true 20: 6765 6764 false 21: 10946 10945 false 22: 17711 17710 false 23: 28657 28656 false 24: 46368 46367 false 25: 75025 75025 true 26: 121393 121392 false 27: 196418 196418 true 28: 317811 317811 true 29: 514229 514228 false 30: 832040 832039 false 31: 1346269 1346268 false 32: 2178309 2178309 true 33: 3524578 3524577 false 34: 5702887 5702886 false 35: 9227465 9227465 true 36: 14930352 14930351 false 37: 24157817 24157816 false 38: 39088169 39088168 false 39: 63245986 63245985 false 40: 102334155 102334154 false 41: 165580141 165580140 false 42: 267914296 267914295 false 43: 433494437 433494437 true 44: 701408733 701408732 false 45: 1134903170 1134903169 false 46: 1836311903 1836311903 true
The first column is the index, the second column is the correct value, the third one is computed with Binet's formula and the last shows if the values match. What happened? Rounding errors happened. Compounded by the truncation due to the cast to
int
. This can be alleviated by usingstd::round
before casting resulting in:0: 0 0 true 1: 1 1 true 2: 1 1 true 3: 2 2 true 4: 3 3 true 5: 5 5 true 6: 8 8 true 7: 13 13 true 8: 21 21 true 9: 34 34 true 10: 55 55 true 11: 89 89 true 12: 144 144 true 13: 233 233 true 14: 377 377 true 15: 610 610 true 16: 987 987 true 17: 1597 1597 true 18: 2584 2584 true 19: 4181 4181 true 20: 6765 6765 true 21: 10946 10946 true 22: 17711 17711 true 23: 28657 28657 true 24: 46368 46368 true 25: 75025 75025 true 26: 121393 121393 true 27: 196418 196418 true 28: 317811 317811 true 29: 514229 514229 true 30: 832040 832040 true 31: 1346269 1346269 true 32: 2178309 2178309 true 33: 3524578 3524578 true 34: 5702887 5702887 true 35: 9227465 9227465 true 36: 14930352 14930352 true 37: 24157817 24157817 true 38: 39088169 39088169 true 39: 63245986 63245986 true 40: 102334155 102334155 true 41: 165580141 165580141 true 42: 267914296 267914296 true 43: 433494437 433494437 true 44: 701408733 701408733 true 45: 1134903170 1134903170 true 46: 1836311903 1836311903 true
Problem solved, let's go home. Except we didn't solve the problem. We only solved it for 32 bit integers. For 64 bit integers all elements of the series beyond the 75th are wrong.
47: 2971215073 2971215073 true 48: 4807526976 4807526976 true 49: 7778742049 7778742049 true 50: 12586269025 12586269025 true 51: 20365011074 20365011074 true 52: 32951280099 32951280099 true 53: 53316291173 53316291173 true 54: 86267571272 86267571272 true 55: 139583862445 139583862445 true 56: 225851433717 225851433717 true 57: 365435296162 365435296162 true 58: 591286729879 591286729879 true 59: 956722026041 956722026041 true 60: 1548008755920 1548008755920 true 61: 2504730781961 2504730781961 true 62: 4052739537881 4052739537881 true 63: 6557470319842 6557470319842 true 64: 10610209857723 10610209857723 true 65: 17167680177565 17167680177565 true 66: 27777890035288 27777890035288 true 67: 44945570212853 44945570212853 true 68: 72723460248141 72723460248141 true 69: 117669030460994 117669030460994 true 70: 190392490709135 190392490709135 true 71: 308061521170129 308061521170129 true 72: 498454011879264 498454011879264 true 73: 806515533049393 806515533049393 true 74: 1304969544928657 1304969544928657 true 75: 2111485077978050 2111485077978050 true 76: 3416454622906707 3416454622906706 false 77: 5527939700884757 5527939700884755 false 78: 8944394323791464 8944394323791463 false 79: 14472334024676221 14472334024676218 false 80: 23416728348467685 23416728348467676 false 81: 37889062373143906 37889062373143896 false 82: 61305790721611591 61305790721611584 false 83: 99194853094755497 99194853094755488 false 84: 160500643816367088 160500643816367040 false 85: 259695496911122585 259695496911122528 false 86: 420196140727489673 420196140727489600 false 87: 679891637638612258 679891637638612096 false 88: 1100087778366101931 1100087778366101632 false 89: 1779979416004714189 1779979416004713728 false 90: 2880067194370816120 2880067194370815488 false 91: 4660046610375530309 4660046610375529472 false 92: 7540113804746346429 7540113804746344448 false
This is much more difficult to solve as double precision floating point numbers with their 53 bit (1 bit implied) mantissa simply cannot represent all 64 bit integer values.
Although GCC allows it,
std::pow
andstd::round
cannot legally be used in aconstexpr
function.
So what's the alternative? Exponentiation by squaring. This nifty algorithm allows us to compute powers (both of integers and matrices) in time, where is the exponent. What use is computing the power of a matrix in this context? Simply put, the recursion in the Fibonacci sequence can be represented by
Therefore, matrix exponentiation results directly in computation of Fibonacci numbers. Here is a quick implementation:
template<typename Element>
struct mat2
{
Element elements[2][2];
};
template<typename Element>
constexpr mat2<Element> operator * (mat2<Element> const & a_, mat2<Element> const & b_)
{
auto const & a = a_.elements;
auto const & b = b_.elements;
return mat2<Element>{{
{a[0][0] * b[0][0] + a[0][1] * b[1][0], a[0][0] * b[0][1] + a[0][1] * b[1][1]},
{a[1][0] * b[0][0] + a[1][1] * b[1][0], a[1][0] * b[0][1] + a[1][1] * b[1][1]}
}};
}
template<typename Element>
constexpr mat2<Element> mat_pow(mat2<Element> m, unsigned n)
{
if(!n)
return mat2<Element>{{{1,0},{0,1}}};
--n;
auto r = m;
while(n)
{
if(n & 1)
r = r * m;
m = m * m;
n >>= 1;
}
return r;
}
template<typename Integer>
constexpr Integer fib(unsigned n)
{
constexpr auto a = mat2<Integer>{{{1,1},{1,0}}};
auto an = mat_pow(a, n);
return an.elements[1][0];
}
Not only is this implementation exact for any integer type and constexpr
-legal (in C++14), it is also significantly faster than the floating point implementation using Binet's formula. On my computer which has an i7-4790k running at 4 GHz, using MinGW-w64 G++ 5.2 and -O3 optimization the computation of all Fibonacci numbers for between 0 and 92 (inclusive) takes 1.28 μs while the floating point version takes 11 μs. Almost a factor of 10. How come an algorithm is faster than an algorithm? The elusive constant factor. And the constant factor of std::pow
is typically quite large.
Postscript: But it works for me!
So you read this article, and tried it yourself and the Binet version works just fine even without the call to std::round
? You are probably using an x86 (32 bit) compiler. Most compilers default to allowing more precise operations on floating point values and the x86 FPU's default type is an 80-bit floating point type. Therefore, you are probably working with a higher precision type. It will still fail for 64-bit results though. To replicate this behavior on a 64 bit compiler (which by default uses SSE instructions instead of the FPU as they are usually faster), you can force the use of 80-bit floats by using long double
instead of double
: const static auto sqrt_5 = std::sqrt(5.0l);
With this change the version without rounding works for up to 86 and the one with rounding works up to 84 (didn't I mention floating point has many pitfalls?).
Postscript 2: The naive version is the fastest on my compiler!
In one of the comments on the old site, a user pointed out that with Clang 3.8, the naive version was fastest. This was due to the fact, that the compiler performed partial loop unrolling, as can be seen here.
So yeah, unsurprisingly, even can be faster than or for such a small