Amu_Ke_Fundye
Computer Arithmetic
Point Representation
Because of computer hardware limitation everything including the sign of number has to be represented either by 0’s or 1’s. So, for a positive number the leftmost bit or sign bit is always 0 and for a negative number the sign bit should be 1.
Floating Point Representation
A floating point number can be represented using two points. First is called mantissa (m) and other one is exponent (e). Thus, in a number system with base r, a floating point number with mantissa m and exponent e will be represented as (m × re).
The value of m may be a fraction or an integer. Thus, a number (2.25)10 can be represented as 0.225 × 101.
Here, m = 225 and e = 1, r = 10.
For n bit register, MSB will be sign bit and (n – 1) bits will be magnitude.
So, positive largest number that can be stored is (2n-1 – 1) and negative lowest number is –(2n-1 – 1).
Actual Number Finding Technique
Here, we always store exponent in positive. Biased number is also called excess number. Since, exponent is stored in biased form, so bias number is added to the actual exponent of the given number. Actual number can be calculated from the contents of the registers by using following formula
Actual number = (–1)s (1 + m) × 2e–Bias
S = Sign bit
m = Mantissa value of register
e = Exponent value of register
Bias = Bias number of n bits used to represent exponent, then
Bias number = (2n–1 –1)
Range of exponent = –(2k–1 –1) to 2k–1.
IEEE Floating Point Representation
It provides a 32-bit format for single-precision values, and a 64-bit format for double-precision values.
The double-precision format contains a mantissa field that is more than twice as long as the mantissa field of the single-precision format, permitting greater accuracy.
The mantissa field assumes an implicit leading bit of 1, and the
Exponent field adopts the excess system with a bias value of 127 for the single-precision format, and a bias of 1023 for the double-precision format.
Representations are reserved for special values such as zero, infinity, NAN (not-a-number), denormalised values.
Ranges of Normalized numbers using single precision
A normalized number is represented in the format:
(–1)S . M . 2E, where 1.0 ≤ M < 2.0 and –126 ≤ E ≤ 127.
The smallest positive number is: 1.0 × 2–126 which is equivalent to 1.2 × 10–38
The largest positive number is: (2 – 2–23) × 2127, minutely less than 2 × 2127 = 2128 which is equivalent to 3.4 × 1038.
The range for positive normalized numbers in this format is 1.2 × 10–38 to 3.4 × 1038.
Normalization using Single Precision Floating Point Representation
Step 1: Determine the sign bit. Save this for later.
Step 2: Convert the absolute value of the number to normalized form.
Step 3: Determine the eight–bit exponent field.
Step 4: Determine the 23–bit significant. There are shortcuts here.
Step 5: Arrange the fields in order.
Step 6: Rearrange the bits, grouping by fours from the left.
Step 7: Write the number as eight hexadecimal digits.
Example: The Negative Number – 0.750
Step 1: The number is negative. The sign bit is S = 1.
Step 2: 0.750 = 1.5 ∙ 0.50 = 1.5 ∙ 2–1. The exponent is P = – 1.
Step 3: P + 127 = – 1 + 127 = 126. As an eight–bit number, this is 0111 1110.
Step 4: Convert 1.5 to binary. 1.5 = 1 + ½ = 1.12. The significant is 10000.
To get the significant, drop the leading “1.” from the number.
Note that we do not extend the significant to its full 23 bits, but only place a few zeroes after the last 1 in the string.
Step 5: Arrange the bits: Sign | Exponent | Significand
Sign Exponent Significand
1 0111 1110 1000 … 00
Step 6: Rearrange the bits
1011 1111 0100 0000 … etc.
Step 7: Write as 0xBF40. Extend to eight hex digits: 0xBF40 0000.
Regards
Amrut Jagdish Gupta
Comments
Post a Comment