## Fibonacci: You're Also Doing It Wrong

Ear­li­er to­day, I saw C++ Week­ly Ep 13 Fi­bonac­ci: You're Do­ing It Wrong and in my opin­ion the sug­ges­tion made there to use Bi­net's for­mu­la to cal­cu­late ar­bi­trary el­e­ments of the Fi­bonac­ci se­quence is dan­ger­ous. And as some­body is wrong on the in­ter­net I de­cid­ed to write this short blog post on the top­ic.

At the first glance, us­ing Bi­net's for­mu­la seems like a great sug­ges­tion. It is a closed form so­lu­tion. It has $\mathcal{O}(1)$ com­plex­i­ty, i.e., fast. It is easy to im­ple­ment. And more.

How­ev­er, as usu­al when float­ing point is in­volved, dan­ger lurks around ev­ery cor­ner.

1. Al­though the $\mathcal{O}(1)$ com­plex­i­ty seems at­trac­tive, it is on­ly $\mathcal{O}(1)$ if pow is $\mathcal{O}(1).$ Which it is for ba­sic datatypes, but in oth­er cas­es, such as ar­bi­trary pre­ci­sion floats it is far more like­ly to be $\mathcal{O}(\log n).$ Al­so pow is re­al­ly, re­al­ly slow. We'll show lat­er how slow it re­al­ly is.

2. Al­though the im­ple­men­ta­tion seems sim­ple, it is easy to get wrong. In fact the ver­sion shown in the video is wrong.

constexpr int fib(const int i)
{
const static auto sqrt_5 = std::sqrt(5);
if(i == 0) return 0;
if(i == 1) return 1;

return static_cast<int>((std::pow(1 + sqrt_5, i) - std::pow(1 - sqrt_5, i)) / (std::pow(2, i) * sqrt_5));
}


This ver­sion is wrong for val­ues of i as small as 10:

 0:                   0                   0 true
1:                   1                   1 true
2:                   1                   1 true
3:                   2                   2 true
4:                   3                   3 true
5:                   5                   5 true
6:                   8                   8 true
7:                  13                  13 true
8:                  21                  21 true
9:                  34                  34 true
10:                  55                  54 false
11:                  89                  89 true
12:                 144                 143 false
13:                 233                 232 false
14:                 377                 377 true
15:                 610                 610 true
16:                 987                 986 false
17:                1597                1596 false
18:                2584                2584 true
19:                4181                4181 true
20:                6765                6764 false
21:               10946               10945 false
22:               17711               17710 false
23:               28657               28656 false
24:               46368               46367 false
25:               75025               75025 true
26:              121393              121392 false
27:              196418              196418 true
28:              317811              317811 true
29:              514229              514228 false
30:              832040              832039 false
31:             1346269             1346268 false
32:             2178309             2178309 true
33:             3524578             3524577 false
34:             5702887             5702886 false
35:             9227465             9227465 true
36:            14930352            14930351 false
37:            24157817            24157816 false
38:            39088169            39088168 false
39:            63245986            63245985 false
40:           102334155           102334154 false
41:           165580141           165580140 false
42:           267914296           267914295 false
43:           433494437           433494437 true
44:           701408733           701408732 false
45:          1134903170          1134903169 false
46:          1836311903          1836311903 true


The first col­umn is the in­dex, the sec­ond col­umn is the cor­rect val­ue, the third one is com­put­ed with Bi­net's for­mu­la and the last shows if the val­ues match. What hap­pened? Round­ing er­rors hap­pened. Com­pound­ed by the trun­ca­tion due to the cast to int. This can be al­le­vi­at­ed by us­ing std::round be­fore cast­ing re­sult­ing in:

 0:                   0                   0 true
1:                   1                   1 true
2:                   1                   1 true
3:                   2                   2 true
4:                   3                   3 true
5:                   5                   5 true
6:                   8                   8 true
7:                  13                  13 true
8:                  21                  21 true
9:                  34                  34 true
10:                  55                  55 true
11:                  89                  89 true
12:                 144                 144 true
13:                 233                 233 true
14:                 377                 377 true
15:                 610                 610 true
16:                 987                 987 true
17:                1597                1597 true
18:                2584                2584 true
19:                4181                4181 true
20:                6765                6765 true
21:               10946               10946 true
22:               17711               17711 true
23:               28657               28657 true
24:               46368               46368 true
25:               75025               75025 true
26:              121393              121393 true
27:              196418              196418 true
28:              317811              317811 true
29:              514229              514229 true
30:              832040              832040 true
31:             1346269             1346269 true
32:             2178309             2178309 true
33:             3524578             3524578 true
34:             5702887             5702887 true
35:             9227465             9227465 true
36:            14930352            14930352 true
37:            24157817            24157817 true
38:            39088169            39088169 true
39:            63245986            63245986 true
40:           102334155           102334155 true
41:           165580141           165580141 true
42:           267914296           267914296 true
43:           433494437           433494437 true
44:           701408733           701408733 true
45:          1134903170          1134903170 true
46:          1836311903          1836311903 true


Prob­lem solved, let's go home. Ex­cept we didn't solve the prob­lem. We on­ly solved it for 32 bit in­te­gers. For 64 bit in­te­gers all el­e­ments of the se­ries be­yond the 75th are wrong.

47:          2971215073          2971215073 true
48:          4807526976          4807526976 true
49:          7778742049          7778742049 true
50:         12586269025         12586269025 true
51:         20365011074         20365011074 true
52:         32951280099         32951280099 true
53:         53316291173         53316291173 true
54:         86267571272         86267571272 true
55:        139583862445        139583862445 true
56:        225851433717        225851433717 true
57:        365435296162        365435296162 true
58:        591286729879        591286729879 true
59:        956722026041        956722026041 true
60:       1548008755920       1548008755920 true
61:       2504730781961       2504730781961 true
62:       4052739537881       4052739537881 true
63:       6557470319842       6557470319842 true
64:      10610209857723      10610209857723 true
65:      17167680177565      17167680177565 true
66:      27777890035288      27777890035288 true
67:      44945570212853      44945570212853 true
68:      72723460248141      72723460248141 true
69:     117669030460994     117669030460994 true
70:     190392490709135     190392490709135 true
71:     308061521170129     308061521170129 true
72:     498454011879264     498454011879264 true
73:     806515533049393     806515533049393 true
74:    1304969544928657    1304969544928657 true
75:    2111485077978050    2111485077978050 true
76:    3416454622906707    3416454622906706 false
77:    5527939700884757    5527939700884755 false
78:    8944394323791464    8944394323791463 false
79:   14472334024676221   14472334024676218 false
80:   23416728348467685   23416728348467676 false
81:   37889062373143906   37889062373143896 false
82:   61305790721611591   61305790721611584 false
83:   99194853094755497   99194853094755488 false
84:  160500643816367088  160500643816367040 false
85:  259695496911122585  259695496911122528 false
86:  420196140727489673  420196140727489600 false
87:  679891637638612258  679891637638612096 false
88: 1100087778366101931 1100087778366101632 false
89: 1779979416004714189 1779979416004713728 false
90: 2880067194370816120 2880067194370815488 false
91: 4660046610375530309 4660046610375529472 false
92: 7540113804746346429 7540113804746344448 false


This is much more dif­fi­cult to solve as dou­ble pre­ci­sion float­ing point num­bers with their 53 bit (1 bit im­plied) man­tis­sa sim­ply can­not rep­re­sent all 64 bit in­te­ger val­ues.

3. Al­though GCC al­lows it, std::pow and std::round can­not legal­ly be used in a constexpr func­tion.

So what's the al­ter­na­tive? Ex­po­nen­ti­a­tion by squar­ing. This nifty al­go­rithm al­lows us to com­pute pow­ers (both of in­te­gers and ma­tri­ces) in $\mathcal{O}(\log n)$ time, where $n$ is the ex­po­nent. What use is com­put­ing the pow­er of a ma­trix in this con­text? Sim­ply put, the re­cur­sion in the Fi­bonac­ci se­quence can be rep­re­sent­ed by

$\left(\begin{array}{cc}1 & 1 \\1 & 0 \end{array}\right)^n\left(\begin{array}{c}1\\0\end{array}\right)=\left(\begin{array}{c}F_{n+1}\\F_n\end{array}\right)$

There­fore, $\mathcal{O}(\log n)$ ma­trix ex­po­nen­ti­a­tion re­sults di­rect­ly in $\mathcal{O}(\log n)$ com­pu­ta­tion of Fi­bonac­ci num­bers. Here is a quick im­ple­men­ta­tion:

template<typename Element>
struct mat2
{
Element elements;
};

template<typename Element>
constexpr mat2<Element> operator * (mat2<Element> const & a_, mat2<Element> const & b_)
{
auto const & a = a_.elements;
auto const & b = b_.elements;
return mat2<Element>{{
{a * b + a * b, a * b + a * b},
{a * b + a * b, a * b + a * b}
}};
}

template<typename Element>
constexpr mat2<Element> mat_pow(mat2<Element> m, unsigned n)
{
if(!n)
return mat2<Element>{{{1,0},{0,1}}};
--n;

auto r = m;
while(n)
{
if(n & 1)
r = r * m;
m = m * m;
n >>= 1;
}
return r;
}

template<typename Integer>
constexpr Integer fib(unsigned n)
{
constexpr auto a = mat2<Integer>{{{1,1},{1,0}}};
auto an = mat_pow(a, n);
return an.elements;
}


Not on­ly is this im­ple­men­ta­tion ex­act for any in­te­ger type and constexpr-le­gal (in C++14), it is al­so sig­nif­i­cant­ly faster than the float­ing point im­ple­men­ta­tion us­ing Bi­net's for­mu­la. On my com­put­er which has an i7-4790k run­ning at 4 GHz, us­ing MinGW-w64 G++ 5.2 and -O3 op­ti­miza­tion the com­pu­ta­tion of all Fi­bonac­ci num­bers for $n$ be­tween 0 and 92 (in­clu­sive) takes 1.28 μs while the float­ing point ver­sion takes 11 μs. Al­most a fac­tor of 10. How come an $\mathcal{O}(\log n)$ al­go­rithm is faster than an $\mathcal{O}(1)$ al­go­rithm? The elu­sive con­stant fac­tor. And the con­stant fac­tor of std::pow is typ­i­cal­ly quite large.

### Post­script: But it works for me!

So you read this ar­ti­cle, and tried it your­self and the Bi­net ver­sion works just fine even with­out the call to std::round? You are prob­a­bly us­ing an x86 (32 bit) com­pil­er. Most com­pil­ers de­fault to al­low­ing more pre­cise op­er­a­tions on float­ing point val­ues and the x86 FPU's de­fault type is an 80-bit float­ing point type. There­fore, you are prob­a­bly work­ing with a high­er pre­ci­sion type. It will still fail for 64-bit re­sults though. To repli­cate this be­hav­ior on a 64 bit com­pil­er (which by de­fault us­es SSE in­struc­tions in­stead of the FPU as they are usu­al­ly faster), you can force the use of 80-bit floats by us­ing long double in­stead of double: const static auto sqrt_5 = std::sqrt(5.0l); With this change the ver­sion with­out round­ing works for $n$ up to 86 and the one with round­ing works up to 84 (didn't I men­tion float­ing point has many pit­falls?).

### Post­script 2: The naive ver­sion is the fastest on my com­pil­er!

In one of the com­ments on the old site, a us­er point­ed out that with Clang 3.8, the naive ver­sion was fastest. This was due to the fact, that the com­pil­er per­formed par­tial loop un­rolling, as can be seen here.

So yeah, un­sur­pris­ing­ly, even $\mathcal{O}(n)$ can be faster than $\mathcal{O}(\log n)$ or $\mathcal{O}(1)$ for such a small $n.$