Explainer: Numbers
Numbers – as in maths, not Apple’s spreadsheet – were there at the dawn of computing, and have played a major part in hardware, system software and apps ever since. This article explains some of the numeric types used by your Mac, and how they can catch you out.
Numbers in computing fall into two broad classes: those represented exactly, which are mainly integers, and those normally approximated, including most floating point numbers.
Integers
These are the simplest to represent in binary and hexadecimal, and those that play the fewest tricks. They come in several varieties, determined by their size in bytes, and whether they can be negative rather than only positive. Some of us still remember when the standard integer was represented in just eight bits. The largest unsigned integer is then 1111 1111 in binary, or FF in hexadecimal, that’s 255 in regular decimal notation. If one of those bits is used to indicate whether they include negative values, they can only lie between -127 and +127.
Soon integers grew to 16 bits, then 32, and now the standard length of 64 bits, offering a range of numbers beyond our comprehension, or even the largest of distributed file systems.
Most problems that arise in integers do so from any of five causes:
- the order of bytes, which can be ‘big-endian’ or ‘little-endian’ according to processor type and setting;
- conversion between different lengths;
- whether signed or unsigned;
- overflow, in which the product of two integers requires a number larger than the maximum for their length;
- arithmetic operations such as division, when performed by zero.
Together, these can result in quite complex errors. For example, suitably misinterpreted as a signed integer using the wrong byte order, the 32-bit unsigned integer for 65,535 (0000 FFFF) can become -2,147,418,112 (FFFF 0000).
Floating point numbers
Integers are fine for counting integral objects such as people and file sizes, but in the real world most things have to be measured in floating point or decimal numbers like 3.14159. In maths, those numbers come from a continuous range that has to include extremely large positive and negative values, and many very close to zero. They’re most familiar to us from engineering or scientific notation expressing them in terms of a number from 1.0 to almost 10.0, multiplied by a power of ten, e.g. 1.68301 x 10e-6, which is just above zero at 0.00000168301.
The most widely used form of floating point number in macOS is the Double, which uses 64 bits to encode a number using similar principles to engineering/scientific notation, only the powers used aren’t decimal but binary, making them more difficult to read and understand. In decimal notation, with the radix 10, 0.00000168301 has the significand of 1.68301 and the exponent of -6, making it 1.68301 x 10e-6. As a computer Double, the radix is 2 (binary), so it has a significand of 1.76476389376 and an exponent of -20, making it 1.76476389376e-20.
Some Doubles are exact expressions of the number they’re trying to represent. An obvious example is 1.0, represented as 1.0e0, but even fairly simple numbers like 71.3927 are confusing, with a representation of 1.1155109375e6 (radix 2). To convert between regular decimal floating point and 32- and 64-bit floating point numbers, and their hex representations, my free Mints has a Floating Point Explorer window. This is explained here.
![]()
Unlike mathematical numbers, there’s a finite number of different Doubles, and their distribution is far from even. The same Double representing 71.39270000000000 also represents 71.39270000000001, and all the numbers in between them, all but one of which is only an approximation. Around those numbers, there are roughly 70 trillion different floating point numbers per unit (1.0) step in number. These become more dense around zero, and less dense at the extreme ends of the number line. As Doubles become larger in absolute value (disregarding their sign), so they become less precise in absolute but not relative terms.
Errors
Because they’re only approximations, Doubles suffer several problems that can adversely affect calculating with them. These include rounding and cancellation errors.
Rounding errors occur because Doubles have fixed length, so the last place has to be rounded up or down to give the best approximation to the real number. The standard for floating point (IEEE 754) specifies no less than five different rounding functions, that can result in a Double being rounded up or down. Although the relative errors from rounding should be small, they can accumulate in long series of calculations to the point where they affect overall accuracy.
Cancellation errors can be very large, even when only the result of a single operation. This term refers to potentially highly inaccurate results from subtracting numbers that are very close in value. When almost all the digits of the result are lost, these errors can be catastrophic, and may cause the order of calculations to determine the result.
These can be illustrated by two simple calculations, each of which should return a result of exactly 0.0:((10000000.001 - 10000000.000) - 0.001) * 1.0e8
and(10000000.001 - (10000000.000 + 0.001)) * 1.0e8
Yet using Swift Doubles, the first returns the incorrect result of 0.016391277311150754.
With a whole IEEE standard to themselves, floating point numbers have grown their own subdivision of errors and non-errors. The most commonly encountered of these is the NaN, Not a Number, which used to puzzle those plugging through spreadsheets when a formula attempted a heinous crime such as division by zero. The joy of NaNs is their propagation: once a NaN creeps into a calculation, it’s likely to turn the whole thing NaN. Then there are two different signed zeroes, +0 and -0, or if you really want a choice, why not have an unsigned zero too, and then decide whether you want all three to be equal or not.
Others
Some systems also support extended precision beyond Doubles. One of the advances brought by the first widely used maths coprocessor, Intel’s 8087, was the availability of 80-bit Extended calculations. Although valuable for some, in general, mixing precisions leads to further strange errors that can prove hard to trace. macOS tries to avoid those, and ARM processors don’t have any Extended features, which have to be implemented in additional libraries for those that need them.
Most recently, to accommodate AI using neural networks, smaller floating point numbers have become popular. bfloat16 numbers use only 16 bits of storage, but cover the same range as 32-bit floating point numbers with reduced precision. These promise huge gains in speed by allowing arithmetic instructions on twice the numbers at once, and are supported in CPUs in Apple’s M2 and later chips, and in GPUs.
You will occasionally come across other numeric formats, including fixed point and arbitrary precision. These don’t normally have any direct support in general purpose processors, but are implemented in libraries, making them considerably slower and non-transferable. And then there are arrays of numbers in vectors and matrices, complex numbers, and everything else that mathematicians have devised. There is no end.
Further reading
Start with Jean-Michel Muller et al (2018), Handbook of Floating-Point Arithmetic, 2nd ed, Birkhäuser, ISBN 978 3 319 76525 9. Then progress to Peter Kornerup and David W Matula (2010), Finite Precision Number Systems and Arithmetic, Cambridge UP, ISBN 978 0 521 76135 2. Complete the basics with Jean-Michel Muller (2006), Elementary Functions, Algorithms and Implementation, 3rd ed, Birkhäuser, ISBN 978 1 4899 7981 0. You can then progress to matrices, for which there is a huge literature.
