Understanding Floating Point Representation

©2002 Sridhar Narayan

Say you use 4-4 fixed point representation. Can you represent 0.03125, i.e. 1/32 ? How about 32.5 ?

You can if you are willing to use different scale factors for different numbers. Thus, 0.03125 would be represented as 000.00001 using the available 8-bits. 32.5 would be represented as 100000.10 using 8 bits.

To generate the floating point representation for a number:
  1. Start with the binary representation for the number. Thus 0.03125 is 0.00001.
  2. Render the number into a standard (normal) form, a process called normalization. The normal form for IEEE representation is 1.bbbbbb where the b's represent bits, 0 or 1. Thus 0.00001 is normalized to 1.0 by shifting the binary point right 5 places. The scale factor for the number is therefore 2-5.  In the same manner 32.5, that is 100000.10 would be normalized to 1.0000010, with a scale factor of  25
  3. Store a bit corresponding to the sign of the number, the bits to the right of the binary point in the normalized representation (known as the mantissa), and the exponent of the scale factor, in adjacent bits in a 32-bit location. IEEE uses the following format for the 32-bits:
    1. Left most bit = sign bit, 1 = negative, 0 = positive
    2. Next eight = exponent. The value 127 is added to the exponent before it is stored. This is called a bias. Thus, exponent values in the range -126 to +127 are stored as positive integers in the range 1 to 254. Stored exponent values of 0 and 255 have special interpretation.
    3. Last 23 bits = mantissa
So 0.03125 is represented in IEEE notation in a 32-bit field as

Sign = 0, Exponent = -5+127 = 122, Mantissa = 0000000...0
Put it all together as 0   01111010   00000000 ...0 which of course (in hex) is 3D000000

You can use the data representation applet to verify this.