Fibonacci: You're Also Doing It Wrong

Ear­li­er to­day, I saw C++ Week­ly Ep 13 Fi­bonac­ci: You're Do­ing It Wrong and in my opin­ion the sug­ges­tion made there to use Bi­net's for­mu­la to cal­cu­late ar­bi­trary el­e­ments of the Fi­bonac­ci se­quence is dan­ger­ous. And as some­body is wrong on the in­ter­net I de­cid­ed to write this short blog post on the top­ic.

At the first glance, us­ing Bi­net's for­mu­la seems like a great sug­ges­tion. It is a closed form so­lu­tion. It has O(1)\mathcal{O}(1) com­plex­i­ty, i.e., fast. It is easy to im­ple­ment. And more.

How­ev­er, as usu­al when float­ing point is in­volved, dan­ger lurks around ev­ery cor­ner.

  1. Al­though the O(1)\mathcal{O}(1) com­plex­i­ty seems at­trac­tive, it is on­ly O(1)\mathcal{O}(1) if pow is O(1).\mathcal{O}(1). Which it is for ba­sic datatypes, but in oth­er cas­es, such as ar­bi­trary pre­ci­sion floats it is far more like­ly to be O(logn).\mathcal{O}(\log n). Al­so pow is re­al­ly, re­al­ly slow. We'll show lat­er how slow it re­al­ly is.

  2. Al­though the im­ple­men­ta­tion seems sim­ple, it is easy to get wrong. In fact the ver­sion shown in the video is wrong.

    constexpr int fib(const int i)
    {
        const static auto sqrt_5 = std::sqrt(5);
    
        if(i == 0) return 0;
        if(i == 1) return 1;
    
        return static_cast<int>((std::pow(1 + sqrt_5, i) - std::pow(1 - sqrt_5, i)) / (std::pow(2, i) * sqrt_5));
    }
    

    This ver­sion is wrong for val­ues of i as small as 10:

     0:                   0                   0 true
     1:                   1                   1 true
     2:                   1                   1 true
     3:                   2                   2 true
     4:                   3                   3 true
     5:                   5                   5 true
     6:                   8                   8 true
     7:                  13                  13 true
     8:                  21                  21 true
     9:                  34                  34 true
    10:                  55                  54 false
    11:                  89                  89 true
    12:                 144                 143 false
    13:                 233                 232 false
    14:                 377                 377 true
    15:                 610                 610 true
    16:                 987                 986 false
    17:                1597                1596 false
    18:                2584                2584 true
    19:                4181                4181 true
    20:                6765                6764 false
    21:               10946               10945 false
    22:               17711               17710 false
    23:               28657               28656 false
    24:               46368               46367 false
    25:               75025               75025 true
    26:              121393              121392 false
    27:              196418              196418 true
    28:              317811              317811 true
    29:              514229              514228 false
    30:              832040              832039 false
    31:             1346269             1346268 false
    32:             2178309             2178309 true
    33:             3524578             3524577 false
    34:             5702887             5702886 false
    35:             9227465             9227465 true
    36:            14930352            14930351 false
    37:            24157817            24157816 false
    38:            39088169            39088168 false
    39:            63245986            63245985 false
    40:           102334155           102334154 false
    41:           165580141           165580140 false
    42:           267914296           267914295 false
    43:           433494437           433494437 true
    44:           701408733           701408732 false
    45:          1134903170          1134903169 false
    46:          1836311903          1836311903 true
    

    The first col­umn is the in­dex, the sec­ond col­umn is the cor­rect val­ue, the third one is com­put­ed with Bi­net's for­mu­la and the last shows if the val­ues match. What hap­pened? Round­ing er­rors hap­pened. Com­pound­ed by the trun­ca­tion due to the cast to int. This can be al­le­vi­at­ed by us­ing std::round be­fore cast­ing re­sult­ing in:

     0:                   0                   0 true
     1:                   1                   1 true
     2:                   1                   1 true
     3:                   2                   2 true
     4:                   3                   3 true
     5:                   5                   5 true
     6:                   8                   8 true
     7:                  13                  13 true
     8:                  21                  21 true
     9:                  34                  34 true
    10:                  55                  55 true
    11:                  89                  89 true
    12:                 144                 144 true
    13:                 233                 233 true
    14:                 377                 377 true
    15:                 610                 610 true
    16:                 987                 987 true
    17:                1597                1597 true
    18:                2584                2584 true
    19:                4181                4181 true
    20:                6765                6765 true
    21:               10946               10946 true
    22:               17711               17711 true
    23:               28657               28657 true
    24:               46368               46368 true
    25:               75025               75025 true
    26:              121393              121393 true
    27:              196418              196418 true
    28:              317811              317811 true
    29:              514229              514229 true
    30:              832040              832040 true
    31:             1346269             1346269 true
    32:             2178309             2178309 true
    33:             3524578             3524578 true
    34:             5702887             5702887 true
    35:             9227465             9227465 true
    36:            14930352            14930352 true
    37:            24157817            24157817 true
    38:            39088169            39088169 true
    39:            63245986            63245986 true
    40:           102334155           102334155 true
    41:           165580141           165580141 true
    42:           267914296           267914296 true
    43:           433494437           433494437 true
    44:           701408733           701408733 true
    45:          1134903170          1134903170 true
    46:          1836311903          1836311903 true
    

    Prob­lem solved, let's go home. Ex­cept we didn't solve the prob­lem. We on­ly solved it for 32 bit in­te­gers. For 64 bit in­te­gers all el­e­ments of the se­ries be­yond the 75th are wrong.

    47:          2971215073          2971215073 true
    48:          4807526976          4807526976 true
    49:          7778742049          7778742049 true
    50:         12586269025         12586269025 true
    51:         20365011074         20365011074 true
    52:         32951280099         32951280099 true
    53:         53316291173         53316291173 true
    54:         86267571272         86267571272 true
    55:        139583862445        139583862445 true
    56:        225851433717        225851433717 true
    57:        365435296162        365435296162 true
    58:        591286729879        591286729879 true
    59:        956722026041        956722026041 true
    60:       1548008755920       1548008755920 true
    61:       2504730781961       2504730781961 true
    62:       4052739537881       4052739537881 true
    63:       6557470319842       6557470319842 true
    64:      10610209857723      10610209857723 true
    65:      17167680177565      17167680177565 true
    66:      27777890035288      27777890035288 true
    67:      44945570212853      44945570212853 true
    68:      72723460248141      72723460248141 true
    69:     117669030460994     117669030460994 true
    70:     190392490709135     190392490709135 true
    71:     308061521170129     308061521170129 true
    72:     498454011879264     498454011879264 true
    73:     806515533049393     806515533049393 true
    74:    1304969544928657    1304969544928657 true
    75:    2111485077978050    2111485077978050 true
    76:    3416454622906707    3416454622906706 false
    77:    5527939700884757    5527939700884755 false
    78:    8944394323791464    8944394323791463 false
    79:   14472334024676221   14472334024676218 false
    80:   23416728348467685   23416728348467676 false
    81:   37889062373143906   37889062373143896 false
    82:   61305790721611591   61305790721611584 false
    83:   99194853094755497   99194853094755488 false
    84:  160500643816367088  160500643816367040 false
    85:  259695496911122585  259695496911122528 false
    86:  420196140727489673  420196140727489600 false
    87:  679891637638612258  679891637638612096 false
    88: 1100087778366101931 1100087778366101632 false
    89: 1779979416004714189 1779979416004713728 false
    90: 2880067194370816120 2880067194370815488 false
    91: 4660046610375530309 4660046610375529472 false
    92: 7540113804746346429 7540113804746344448 false
    

    This is much more dif­fi­cult to solve as dou­ble pre­ci­sion float­ing point num­bers with their 53 bit (1 bit im­plied) man­tis­sa sim­ply can­not rep­re­sent all 64 bit in­te­ger val­ues.

  3. Al­though GCC al­lows it, std::pow and std::round can­not legal­ly be used in a constexpr func­tion.

So what's the al­ter­na­tive? Ex­po­nen­ti­a­tion by squar­ing. This nifty al­go­rithm al­lows us to com­pute pow­ers (both of in­te­gers and ma­tri­ces) in O(logn)\mathcal{O}(\log n) time, where nn is the ex­po­nent. What use is com­put­ing the pow­er of a ma­trix in this con­text? Sim­ply put, the re­cur­sion in the Fi­bonac­ci se­quence can be rep­re­sent­ed by

(1110)n(10)=(Fn+1Fn)\left(\begin{array}{cc}1 & 1 \\1 & 0 \end{array}\right)^n\left(\begin{array}{c}1\\0\end{array}\right)=\left(\begin{array}{c}F_{n+1}\\F_n\end{array}\right)

There­fore, O(logn)\mathcal{O}(\log n) ma­trix ex­po­nen­ti­a­tion re­sults di­rect­ly in O(logn)\mathcal{O}(\log n) com­pu­ta­tion of Fi­bonac­ci num­bers. Here is a quick im­ple­men­ta­tion:

template<typename Element>
struct mat2
{
    Element elements[2][2];
};

template<typename Element>
constexpr mat2<Element> operator * (mat2<Element> const & a_, mat2<Element> const & b_)
{
    auto const & a = a_.elements;
    auto const & b = b_.elements;
    return mat2<Element>{{
        {a[0][0] * b[0][0] + a[0][1] * b[1][0], a[0][0] * b[0][1] + a[0][1] * b[1][1]},
        {a[1][0] * b[0][0] + a[1][1] * b[1][0], a[1][0] * b[0][1] + a[1][1] * b[1][1]}
    }};
}

template<typename Element>
constexpr mat2<Element> mat_pow(mat2<Element> m, unsigned n)
{
    if(!n)
        return mat2<Element>{{{1,0},{0,1}}};
    --n;

    auto r = m;
    while(n)
    {
        if(n & 1)
            r = r * m;
        m = m * m;
        n >>= 1;
    }
    return r;
}

template<typename Integer>
constexpr Integer fib(unsigned n)
{
    constexpr auto a = mat2<Integer>{{{1,1},{1,0}}};
    auto an = mat_pow(a, n);
    return an.elements[1][0];
}

Not on­ly is this im­ple­men­ta­tion ex­act for any in­te­ger type and constexpr-le­gal (in C++14), it is al­so sig­nif­i­cant­ly faster than the float­ing point im­ple­men­ta­tion us­ing Bi­net's for­mu­la. On my com­put­er which has an i7-4790k run­ning at 4 GHz, us­ing MinGW-w64 G++ 5.2 and -O3 op­ti­miza­tion the com­pu­ta­tion of all Fi­bonac­ci num­bers for nn be­tween 0 and 92 (in­clu­sive) takes 1.28 μs while the float­ing point ver­sion takes 11 μs. Al­most a fac­tor of 10. How come an O(logn)\mathcal{O}(\log n) al­go­rithm is faster than an O(1)\mathcal{O}(1) al­go­rithm? The elu­sive con­stant fac­tor. And the con­stant fac­tor of std::pow is typ­i­cal­ly quite large.

Post­script: But it works for me!

So you read this ar­ti­cle, and tried it your­self and the Bi­net ver­sion works just fine even with­out the call to std::round? You are prob­a­bly us­ing an x86 (32 bit) com­pil­er. Most com­pil­ers de­fault to al­low­ing more pre­cise op­er­a­tions on float­ing point val­ues and the x86 FPU's de­fault type is an 80-bit float­ing point type. There­fore, you are prob­a­bly work­ing with a high­er pre­ci­sion type. It will still fail for 64-bit re­sults though. To repli­cate this be­hav­ior on a 64 bit com­pil­er (which by de­fault us­es SSE in­struc­tions in­stead of the FPU as they are usu­al­ly faster), you can force the use of 80-bit floats by us­ing long double in­stead of double: const static auto sqrt_5 = std::sqrt(5.0l); With this change the ver­sion with­out round­ing works for nn up to 86 and the one with round­ing works up to 84 (didn't I men­tion float­ing point has many pit­falls?).

Post­script 2: The naive ver­sion is fastest on my com­pil­er!

In one of the com­ments on the old site, a us­er point­ed out that with Clang 3.8, the naive ver­sion was fast­ed. This was due to the fact, that the com­pil­er per­formed par­tial loop un­rolling, as can be seen here.

So yeah, un­sur­pris­ing­ly, even O(n)\mathcal{O}(n) can be faster than O(logn)\mathcal{O}(\log n) or O(1)\mathcal{O}(1) for such a small n.n.