Floating-Point Pitfalls¶

For an automation programme, you will spend a lot of time with double and float values: sensor readings, control signals, time integrals, transforms. Floating-point numbers are extremely useful, but they have sharp edges that catch every beginner at least once.

This page is the short list of behaviours that will surprise you and how to handle them.

Floating-point is not exact¶

This is the headline. Floats and doubles cannot represent most decimal fractions exactly. They store the closest number their binary representation can hold.

The canonical demo:

double a = 0.1;
double b = 0.2;
double c = a + b;

std::cout << c << "\n";        // 0.3
std::cout << (c == 0.3) << "\n"; // 0   (false!)

0.1, 0.2, and 0.3 are all rounded when stored. 0.1 + 0.2 lands close to, but not exactly on, the stored representation of 0.3. The difference is around 5 × 10⁻¹⁷, invisible when printed but very visible when compared with ==.

This is not a C++ quirk; it is how IEEE 754 floating-point works in every language. Python, Java, JavaScript, MATLAB: same numbers, same behaviour.

Never compare floats with `==`¶

The most common consequence of the above:

if (computedValue == 0.3) { /* almost certainly never runs */ }
if (sensorReading == 0.0) { /* probably wrong */ }

Two safer patterns:

Compare with a tolerance¶

bool approximatelyEqual(double a, double b, double tolerance = 1e-9) {
    return std::abs(a - b) < tolerance;
}

if (approximatelyEqual(computedValue, 0.3)) { /* ... */ }

The right tolerance depends on the size of the numbers and how they were computed. For sensor readings calibrated to two decimal places, 1e-3 might be appropriate; for tightly converged numerics, 1e-12. Pick deliberately.

Use ranges instead of exact targets¶

if (temperature > 79.95 && temperature < 80.05) {
    // "at 80", but a range, not a point
}

Whenever you find yourself comparing a measured value for exact equality, ask whether the question really wants "near this value." It almost always does.

Sums lose precision¶

Adding many floats produces accumulated error. The classic pitfall — add 0.1 ten million times and you should get exactly 1,000,000:

#include <iostream>
#include <iomanip>

int main() {
    double total = 0.0;
    for (int i = 0; i < 10'000'000; ++i) {
        total += 0.1;
    }
    std::cout << total << "\n";                       // 1e+06  (looks exact — it isn't)
    std::cout << std::setprecision(17) << total << "\n"; // 999999.99983897537
}

▶ Run on Compiler Explorer

At std::cout's default precision the sum prints as 1e+06, which looks perfect — you only see the drift once you ask for full precision: the true total is 999999.99983897537, off by about 0.00016. The error in each addition is tiny; ten million of them add up. For sums of millions of samples, consider:

Use double, not float. double has roughly 15-16 decimal digits of precision; float has 6-7.
Use std::accumulate with care. Or look up Kahan summation if accuracy matters more than speed.
Where you can, count in integers and convert to a float only at the end.

For control loops that integrate over time, drift is something to watch for over long runs.

`NaN`, infinity, and division by zero¶

Floating-point has special values that integer arithmetic does not:

double inf  = 1.0 / 0.0;      // +infinity
double ninf = -1.0 / 0.0;     // -infinity
double nan  = 0.0 / 0.0;      // NaN, "not a number"
double nan2 = std::sqrt(-1.0); // NaN

Unlike integer division by zero (which is undefined behaviour and may crash), floating-point division by zero does not crash on any platform you will use: it produces infinity or NaN, and the program keeps running. (Strictly, the C++ standard itself leaves floating-point division by zero undefined; it is the IEEE 754 standard that defines the infinity/NaN result, and every compiler and CPU in this course follows IEEE 754 — so in practice you can rely on it.)

That sounds harmless until you propagate a NaN through your math:

double x = std::sqrt(-1.0);    // NaN
double y = x + 1.0;             // NaN
double z = std::sin(y);          // NaN
if (z < 1.0) { /* ... */ }       // false! every ordered comparison with NaN is false

NaN poisons every expression it touches, and comparisons with it behave strangely: every ordered comparison (<, >, <=, >=) and == is false — even nan == nan is false. The one that surprises people is !=: nan != nan is true, and so is nan != anything. That inversion (!= true while == false) is in fact the standard trick to detect a NaN by hand: x != x is true only when x is NaN. If your sensor pipeline starts producing zeros and you see no errors, suspect a NaN.

To check explicitly:

#include <cmath>

if (std::isnan(x)) { /* handle the bad value */ }
if (std::isinf(x)) { /* handle the infinity */ }
if (std::isfinite(x)) { /* x is a regular, usable number */ }

Integer division catches people too¶

Not strictly a floating-point issue, but related and very common:

int    a = 10 / 3;        // 3, fractional part discarded
double b = 10 / 3;        // also 3.0!, division happens in int, then converted
double c = 10.0 / 3;      // 3.333…, at least one operand is a double

If you want a floating-point result, make sure at least one operand is double (or float). Writing the literal as 10.0 is the simplest way.

`float` vs `double`¶

Default to double. Use float only when you have a specific reason: typically memory-constrained embedded code where you have lots of values to store, or when interfacing with a library (graphics, ML) that uses float.

	`float`	`double`
Size	4 bytes	8 bytes
Significant digits	~6-7	~15-16
Speed	Often the same on modern CPUs	Often the same on modern CPUs

On a microcontroller without a hardware FPU, float operations may be much faster than double (the compiler emulates double in software). If you target such a platform, check the datasheet and benchmark.

Measuring time: prefer integers¶

For timing in control loops, std::chrono uses integer types under the hood. Durations are exact; no floating-point drift.

#include <chrono>

auto start = std::chrono::steady_clock::now();
// ... do work ...
auto elapsed = std::chrono::steady_clock::now() - start;
auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(elapsed).count();

Resist the temptation to track simulation time as double t += dt. Over a long run, accumulated error in t adds up. Use an integer step count and multiply when you need a time value.

When you have to print a float¶

std::cout formats doubles with a default precision of 6 significant digits, which is often misleading:

double x = 0.1 + 0.2;
std::cout << x << "\n";                                       // 0.3, misleading
std::cout << std::setprecision(17) << x << "\n";              // 0.30000000000000004

For debugging precision issues, set a high precision explicitly. For user-facing output, std::fixed << std::setprecision(2) gives a fixed two-decimal-places display.

Summary¶

Floats and doubles are approximations of decimal numbers. They are not exact.
Never use == to compare floats. Use a tolerance or a range.
Long sums accumulate error. Use double (not float) and consider Kahan summation for high-precision sums.
Division by zero does not crash for floats; under IEEE 754 (which all course platforms use) it produces infinity or NaN.
NaN poisons every expression it touches. Every ordered comparison and == against it is false — even nan == nan — but != is true, so x != x detects a NaN.
Default to double. Use float only with reason.
For timing, use std::chrono (integer-based).