Understanding Floating Point Representation
©2002 Sridhar Narayan
Say you use 4-4 fixed point representation. Can you represent 0.03125, i.e.
1/32 ? How about 32.5 ?
You can if you are willing to use different scale factors for different
numbers. Thus, 0.03125 would be represented as 000.00001 using the available
8-bits. 32.5 would be represented as 100000.10 using 8 bits.
To generate the floating point representation for a number:
- Start with the binary representation for the number. Thus 0.03125
is 0.00001.
- Render the number into a standard (normal) form, a process called
normalization. The normal form for IEEE representation is 1.bbbbbb where the
b's represent bits, 0 or 1. Thus 0.00001 is normalized to 1.0 by shifting
the binary point right 5 places. The scale factor for the number is therefore
2-5. In the same manner 32.5, that is 100000.10 would
be normalized to 1.0000010, with a scale factor of 25
- Store a bit corresponding to the sign of the number, the bits to
the right of the binary point in the normalized representation (known as the
mantissa), and the exponent of the scale factor, in adjacent bits in a 32-bit
location. IEEE uses the following format for the 32-bits:
- Left most bit = sign bit, 1 = negative, 0 = positive
- Next eight = exponent. The value 127 is added to the exponent
before it is stored. This is called a bias. Thus, exponent values
in the range -126 to +127 are stored as positive integers in the range 1
to 254. Stored exponent values of 0 and 255 have special interpretation.
- Last 23 bits = mantissa
So 0.03125 is represented in IEEE notation in a 32-bit field as
Sign = 0, Exponent = -5+127 = 122, Mantissa = 0000000...0
Put it all together as 0 01111010 00000000
...0 which of course (in hex) is 3D000000
You can use the data representation applet to verify this.