Questions about 64-bit stuff
3 views (last 30 days)
Show older comments
Here's my question based on what I understand from the book I'm reading. Hopefully, someone can understand my rough idea:
1) Say we have 64 bits. Each bit is either 0 or 1. So 64 bits can only have 64 spots to store 0s or 1s. But 10^380 is a very long (and huge number). We need 380 spots to write down 10^380, i.e., 10...000. Then how could it be possible for computer to store this number? I'm totally lost here.
2)"uint64" data type means computer require 64 bits to store such a number. The maximum integer of this type it can store is 2^64 - 1. Comparing to "double", which also uses 64 bits to store a fractions. Yet the largest number it can store is 1.79x10^380. 10^380 is a very very large number, in comparison to 2^64. How could this be? I mean why don't we just throw away (literally throw away) "uint64" because it uses the same amount of memory like "double" and can store even larger numbers.
Unless I'm crazy here, or misunderstand something. Someone please help explain.
Thanks.
1 Comment
Accepted Answer
Jan
on 27 Jul 2015
You can write down 10^380 with even less than 64 bits: 6 bytes are enough (1 per character - it does not matter here that Matlab uses 2 bytes per char):
'1', '0', '^', '3', '8', '0'
But of course you have a limited accuracy with 6 characters. You can represent 11^381 also, but not 10.1^380. A similar effect occurs for the double format: You have one bit for the sign, some bits for the exponent and some for the mantissa. By this way you get about 16 digits and numbers up to 10^380. But you cannot store e.g. 18 valid digits in such a number due to the limited precision.
In uint64 you can store integers up to 2^64-1 exactly. The greater range of the double format is an effect of the limited precision of the mantissa. So you can see a double as 1 sign bit + a uint52 and a 11 bit exponent.
0 Comments
More Answers (3)
Image Analyst
on 27 Jul 2015
4 Comments
Steven Lord
on 27 Jul 2015
You might be interested in section 7 of the introduction chapter of Cleve's Numerical Computing with MATLAB.
The main assumption you're making that is not correct is that double precision numbers are equally spaced (in terms of absolute difference) throughout the range covered by double precision. This IS valid for uint64 (the uniform spacing is 1) but is NOT valid for double.
Muthu Annamalai
on 27 Jul 2015
@Huy Truong - you just answered the question, "What is the difference between floating point and fixed point numbers ?"
Dynamic range
Stephen23
on 27 Jul 2015
Edited: Stephen23
on 28 Jul 2015
The core difference is this:
- floating point classes (e.g. double and single) split their total number of bits into three groups: the main part encodes the digits (or fraction), a smaller part encodes the magnitude, and one bit encodes the sign.
- integer classes only encode the digits, and possibly the sign.
This means floating point numbers encode a value a bit like this:
X * ZZZZZZZZZZZZZZZ * 2^YYYYY
where the X is the sign bit, the Z's are the digits, and the Y's are the exponent (multiplier) bits. The advantage of doing this is it is possible to encode a reasonably large range of magnitudes (the range of 2^YYYYY) with the same precision (how many Z digits there are). Note it is not possible to represent all integers within that range!
An integer can be much simpler:
XZZZZZZZZZZZZZZZZZZZZ
Why do we not "throw away" the integer classes: they encode precise integer values right until their limits (see how there are more Z digits for the same number of bits) so their memory usage can be much more efficient, and because many operations can be applied directly to the bits themselves their operations can be faster.
2 Comments
James Tursa
on 27 Jul 2015
Edited: James Tursa
on 27 Jul 2015
"... adding two double has a more complicated mechanism happening inside computer than adding two pure integers?"
Yes. When adding two doubles, the code has to check for special bit patterns first (NaN, inf, denormalized). If they are present, then special code to determine the result must be used. If normal bit patterns are present, then you need to handle the difference in exponents for the two numbers to get the mantissas to effectively "line-up" for the addition. Then you need to account for a possible difference in signs (one positive and the other negative). And the result might overflow into an inf pattern, or underflow into a denormalized pattern.
For adding two integers bit patterns, the bits are already "lined-up" since there are no exponents to worry about. So a simple algorithm to add the bits works. And if 2's complement bit format is used (which is typical in modern computers), the exact same algorithm works for positive and negative operands. Overflow/underflow can be detected by examining the register overflow bit and also depends on the signed/unsigned status of the operands. But overall this can be much less work than adding doubles (although micro-code for adding doubles is still pretty fast).
Walter Roberson
on 27 Jul 2015
What you are missing is that a 64 bit double cannot represent every number in the range up to 10^308. 64 bit doubles can only precisely represent some values in that range.
The smallest positive integer that a 64 bit double in IEEE 754 format cannot represent properly is 2^53 + 1.
Numbers represented in double are restricted to about 16 digits in accuracy. Once the values get above about 10^16 then the distance between adjacent representable numbers becomes larger than 1.
0 Comments
See Also
Categories
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!