Header Ads

Difference Between Float And Double

Many newbie programmers / students who are enrolled in Computer Science ask the frequently asked questions that are relevant to the particular field within the Computer Science that they studying. Most beginner courses start with the topics of the number system that is used in the modern computers, including the binary, decimal, octal and hexadecimal system. These are the computer number formats that are the internal representations of numeric values in computers (or calculators and any other kind of digital computers). These values are stored as “grouping of bits”.
As we know computers represent data in sets of binary digits (i.e., in the combination of 1s and 0s, such as, 1111 represents 15 in decimal system), it makes sense to teach about the different number formats that are used to represent a dynamic range of values, because they  make up the basic blocks of the calculation/number processing in any kind of operation. Once the number system is defined in the classroom (often poorly), students are tempted to ove onto the different number formats within the same type (i.e., floating-point arithmetic) that have certain precision and number range. Thus, they are forced to learn the nuances between certain types. Two of the most commonly used data types are Float and Double, and while they target the same needs (i.e., floating-point arithmetic), there are quite some difference in their internal representation and overall effect on the calculation in the program. It’s unfortunate that many programmers miss the nuances between Flat and Double data types, and end up misusing them in places where they shouldn’t be used in the first place. Ultimately resulting in miscalculations in other parts of the program.
In this article, I am going to tell you the difference between float and double with the code examples in C programming language. Let’s get started!

Float vs Double… What’s the deal?

Float and Double are the data representation that are used for the floating-point arithmetic operations, think of the decimal numbers that you calculate in the mathematics class, such as, 20.123, 16.23, 10.2, etc., they are not whole numbers (i.e., 2, 5, 15, etc.), thus they require the consideration of fractions in the binary. As  the resultant decimal numbers (i.e., 20.123, 16.23, etc.) cannot be easily represented with a normal binary format (i.e., Integer). The main difference between Float and Double is that the former is the single precision (32-bit) floating point data, while the latter is double precision (64-bit) floating point data type. Double is called “double” because it’s basically a double precision version of Float. If you are calculating a huge amount (think of the thousands of 0’s in the number), then the inaccuracies will be smaller in the Double and you won’t lose much precision.
It is better to elaborate using the code examples. The following is the operation on Float and Double through the math functions provided in C language:


#include <stdio.h>
int main() {
float num1 = 1.f / 82;
float num2 = 0;
for (int i = 0; i < 738; ++i)
num2 += num1;
printf(“%.7g\n”, num2);
double num3 = 1.0 / 82;
double num4 = 0;
for (int i = 0; i < 738; ++i)
num4 += num3;
printf(“%.15g\n”, num4);
getchar();
}

It prints the following:
9.000031
8.99999999999983














Here, you can see that the slight difference in the precision of Float and Double gives a different answer altogether, albeit Double seems to be more accurate than Float.
Following is the example of sqrt() function in C:

#include <stdio.h>
#include <math.h>
int main() {
float num1 = sqrt(2382719676512365.1230112312312312);
double num2 = sqrt(2382719676512365.1230112312312312);
printf(“%f \n”, num1);
printf(“%f \n”, num2);
getchar();
}

It gives the following output:
48813108.000000
48813109.678778
Here, you can see that the answer in Double has a better precision.
All in all, it is better to use Double for floating-point arithmetic, as several standard math functions in C operate on Double and modern computers are extremely fast and efficient for Double floating-point calculations. This leads to reducing the need to use Float, unless you need to operate on a lot of floating-point numbers (think of large arrays with thousands of 0’s in the numbers) or you are operating on a system that doesn’t support double-precision floating point, as many GPUs, low-powered devices and certain platforms (ARM Cortex-M2, Cortex-M4, etc.) don’t support Double yet, then you should use Float. Additionally, one thing to remember is that certain GPUs / CPUs work better / efficient in Float processing, like in the calculation of vectors / matrix, so you  might need to look in the hardware specification manual / documentation to better decide which one you should use for a particular machine.
There is rarely a reason to use Float instead of Double in the code targeting modern computers. The extra precision in Double reduces, but does not eliminate, the chance of rounding errors or other imprecision that can cause problems in other parts of the program. Many math functions or operators convert and return Double, so you don’t need to cast the numbers back to Float, as that might lose the precision. For a detailed analysis on Floating-point arithmetic, I highly recommend you to read this awesome article (http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html).

Summary

So… in a nutshell:
Places where you should use Float:
  • If you are targeting hardware where single-precision is faster than the double-precision.
  • Your application makes heavy use of floating-point arithmetic, like thousands of numbers with thousands of 0’s.
  • You are doing very low-level optimization. For instance, you are using special CPU instructions (i.e., SSE, SSE2, AVX, etc.) that operate on multiple numbers / arrays / vectors at a time.

Conclusion

In this article I have highlighted the difference between Float and Double, and which one should be used in specific places. Arguably, it’s better to use Double in most places blindly, especially if you are targeting modern computers, as the chances of low-efficiency due to the use of Double floating-point arithmetic is highly unlikely. If you have any questions, then you can ask in the comment section below

No comments