Java uses IEEE (Institute of Electronics and Electrical Engineers) Standard 754 to store real numbers. While knowledge of the standard may no longer be crucial for programmers due to abstraction of this detail it is placed here for completeness and as a reference in case the need arises to parse raw data and the result is not as expected. IEEE 754 Applet.
Related Articles:
From: "The Java Virtual Machine Specification" at
http://java.sun.com/docs/books/vmspec/2nd-edition/html/Concepts.doc.html#33377
From: "The Java Language Specification" at
http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#230798
From: "The Java API"
http://java.sun.com/j2se/1.4/docs/api/java/lang/Float.html
IEEE 754:
The IEEE 754 Standard uses 1-plus form of the binary normalized
fraction (rounded). The fraction part is called the mantissa.
1-plus normalized scientific notation base two is then
N = ± (1.b_{1}b_{2}b_{3}b_{4 }...)_{2} x 2^{+}E
The 1 is understood to be there and is not recorded.
The Java primitive data type float is 4 bytes, or 32 bits:
1 bit Sign |
8 bit exponent |
23 bit mantissa |
While the double is 8 bytes, or 64 bits, formatted as follows:
1 bit Sign |
11 bit exponent |
52 bit mantissa |
Sign: 0 ® positive, 1 ® negative.
Exponent: excess-127 format for float, excess-1023 format for double.
Mantissa: normalized 1-plus fraction with the 1 to the left of the radix point not recorded, float: b_{1}b_{2}b_{3}b_{4}…b_{23}, double: b_{1}b_{2}b_{3}b_{4}…b_{52}. This value is rounded based on the value of the next least significant bit not recorded (if there is a 1 in b_{24}, b_{53} respectively, increment the least significant bit).
The Java Float:
The largest positive finite float
literal is 3.40282347e+38f
.
0111 1111 0111 1111 1111 1111 1111 1111
The smallest positive finite nonzero literal of type float
is 1.40239846e-45f
.
0000 0000 0000 0000 0000 0000 0000 0001
The largest positive finite double
literal is 1.79769313486231570e+308
.
The smallest positive finite nonzero literal of type double
is 4.94065645841246544e-324
.
A compile-time error occurs if a nonzero floating-point literal is too large,
so that on rounded conversion to its internal representation it becomes an IEEE
754 infinity. A program can represent infinities without producing a
compile-time error by using constant expressions such as 1f/0f
or -1d/0d
or by using the predefined constants POSITIVE_INFINITY
and NEGATIVE_INFINITY
of the classes Float
and Double
.
A compile-time error occurs if a nonzero floating-point literal is too small, so that, on rounded conversion to its internal representation, it becomes a zero. A compile-time error does not occur if a nonzero floating-point literal has a small value that, on rounded conversion to its internal representation, becomes a nonzero denormalized number.
When the exponent field is all zeros, the mantissa is interpreted to be denormalized.
Not-a-Number, NaN, occurs from 0/0 or /.
Possible Float Representations | Exponent, (E) | Mantissa (fraction part, f) |
Evaluation |
00000000000000000000000000000000 | E_{min} - 1, (-127) | f = 0 | +0 |
10000000000000000000000000000000 | E_{min} - 1, (-127) | f = 0 | -0 |
*00000000f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i} | E_{min} - 1, (-127) | f ¹
0, f_{i} = 0 or 1 but not all 0 |
+0.f x 2^{Emin} |
*eeeeeeee*********************** | -126 < E < 127 | any | +1.f x 2^{E} |
011111111f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i} | E_{max} + 1, (128) | f ¹
0, f_{i} = 0 or 1 but not all 0 |
NaN |
01111111100000000000000000000000 | E_{max} + 1, (128) | f = 0 | + |
11111111100000000000000000000000 | E_{max} + 1, (128) | f = 0 | - |
Where do all the floating point numbers fall?
The maximum number of distinct values that can be represented with 32 bits is 2^{32} whether the format is unsigned integer, two's complement integer, or IEEE 754 single precision rounded. Interestingly, there are more distinct Java floats (IEEE 754 single precision rounded) in [-1 to +1] than in the rest of the number line, i.e. (- to -1) union (1 to +).
[-1 to +1] | |
10111111100000000000000000000000 (-1.0) | |
... | 2^{31} distinct values |
10111111000000000000000000000000 (-0.5) | |
10111111000000000000000000000001 (-5.0000006e-1) | |
... | |
10000000000000000000000000000000 (-0) | |
00000000000000000000000000000000 (+0) | |
... | |
00111111000000000000000000000000 (0.5) | |
... | |
00111111100000000000000000000000 (1.0) | |
(- to -1) union (1 to +) | 2^{31} - (2^{23} - 1) + 1 distinct values |
includes the 2^{23} - 1 non-distinct representations for NaN | |
011111111f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i}f_{i} |
Graphically, the spacing between distinct values (ε) doubles with each
exponential increment away from zero. For the number 2^{20} this spacing
is 0.0625 to the left and 0.125 to the right. To see this consider the right side
of 2^{20} which contains (2^{21} - 2^{20})/ 0.125 =
8388608 = 2^{23} distinct values. For the number 2^{40}
this spacing is 65,536 to the left and 131,072 to the right. Real numbers are assigned the distinct
representation that is closest to that real number. At run time all values
larger than the largest distinct value are assigned to positive infinity.
Similarly, values further from zero in the negative direction than the
furthest distinct negative number are assigned negative infinity.
Try it! Type in 25431.1234 and click on the Parse Input button. Then try 25431.1230 or 25431.1239, they all have the same representation as a Java Float. Here
25431.1234 falls between 16,384 (2^{14}) and 32,768 (2^{15}),
with 23 bits of precision at the range 2^{14} there remains 9 bits for
the mantissa, so the least significant bit increments at 2^{-9}
≈ .002 =
ε. All values between 25431.1221 and
25431.1240 have the same floating point representation, 46C6AE3F.
Conversion Example 1: Write the number 1234.0 as a Java float.
1 bit Sign |
8 bit exponent |
23 bit mantissa |
0 |
100 0100 1 |
001 1010 0100 0000 0000 0000 |
Here we have too many 0's and 1's to read comfortably so group this Java word into 4 digit binary segments and convert to Hex by "lookup".
0100 0100 1001 1010 0100 0000 0000 0000
Hex: 4 4 9 A 4 0 0 0
Decimal: 1234
Conversion Example 2: Write the number 25,431.1234 as a Java float.
1 bit Sign |
8 bit exponent |
23 bit mantissa |
0 |
10001101 |
1000 1101 0101 1100 0111 111 |
0100 0110 1100 0110 1010 1110 0011 1111
Hex: 4 6 C 6 A E 3 F
Decimal: 25,341.123 (note the loss of precision).
Conversion Example 3: 25341.1234 - 0.01234 = 25341.11106, Carry out this same calculation using IEEE 754 floating point single precision machine numbers.
1 .10001101010111000111111 x 2^{14} - 1 .10010100010110110110110 x 2^{-7} =
1 .10001101010111000111111 x 2^{14} - .00000000000000000000110 x 2^{14} =
1 .10001101010111000111001 x 2^{14}
0100 0110 1100 0110 1010 1110 0011 1001 = 25431.111
HEX: 4 6 C 6 A E 3 9
Decimal: 25341.111 (note the further loss of precision).
Java Primitive Data Types: | Integers | Base Conversion | Number Systems |
Complements in Radix r | Euclid's Division Algorithm | IEEE 754 applet | Homework |