Email: contact@fastbitlab.com

Microcontroller Embedded C Programming Lecture 63| Single vs double precision

Post author:FastBitLab
Post published:July 7, 2022
Post category:Blog

Single vs double precision

In this article, let’s explore the IEEE-754 floating-point formats, to store a real number in this format. The IEEE-754 is a standard for representing and manipulating floating-point quantities that are followed by all modern computer systems and microcontrollers.

Figure 1. The IEEE-754 floating-point standard

For example, consider the above number. This is a very big number. How do you save this number in your memory? So, you can’t save this by converting this into its binary equivalent, that would consume a lot of computer memory. That’s why this standard says don’t store this number as it is.

Instead of that, approximate this number and then only store the required information, such as the Sign, the Exponent, and then the Mantissa part, so that what we call as significand, as shown in Figure 2.

Figure 2.The IEEE-754 floating-point standard

In this format, the number will be approximated, and it will be saved in the memory.

Now the next question is, how many bits are required to store all these things?

There are two formats. One is a single-precision format, which consumes 32 bits to store all this information such as Significand, Exponent, and the Sign.

Figure 3. Single precision representation

In this format, 23 bits are given to Significand, 8 bits are given to Exponent, and 1 bit is given to Sign. So, what we call single-precision representation.

And there is also one more representation called double precision, which consumes 64 bits. So, double precision does a higher level of approximation; that is, the result that we get from double-precision implementation is more accurate compared to the single-precision implementation or single precision storage.

Figure 4. Double precision representation

Here, as you can see Figure 4, 52 bits are given to store the significand, 11 bits are used to store the Exponent, and 1 bit is used to store the Sign. But it consumes double the memory of single-precision storage.

Figure 5. The IEEE-754 floating-point standard

Consider the above number. That number has the Integer part, Decimal point, and Fractional part.

Let’s see how we use this single-precision and double-precision storage in’ C’ programming. For that, you have to use some special data types. For example, if we want to store this number in memory, you cannot use integer data types such as int, char, and long. So, if you use int, char, or long to store this data only the integer part will be stored. So, you will lose the fractional part.

Instead of that, use the data types which are used to represent these decimal numbers: float and double. Now the float is for 32-bit floating-point representation, which is single precision, and double is for 64-bit floating-point representation, which is double precision.

Format specifier for float and double data types

Now, let’s explore the Format specifier for float and double data types when we do inputting or outputting these decimal numbers through our ‘C’ program.

Use %lf format specifier to read or write double type variable
Use %f format specifier to read or write float type variable
Use %e or %le format specifier to read or write real numbers in scientific notation. So, %e for float scientific notation and %le for double scientific notation.
All constants with a decimal point are considered as double by default by the compiler.

FastBit Embedded Brain Academy Courses

Click here: https://fastbitlab.com/course1

Tags: Microcontroller Embedded C programming Lectures

FastBitLab

The FastBit Embedded Brain Academy uses the power of internet to bring the online courses related to the field of embedded system programming, Real time operating system, Embedded Linux systems, etc at your finger tip with very low cost. Backed with strong experience of industry, we have produced lots of courses with the customer enrolment over 3000+ across 100+ countries.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Single vs double precision

FastBitLab

You Might Also Like

STM32-LTDC, LCD-TFT, LVGL (MCU3) Lecture 15| Exercise-001 : Displaying VIGBYOR bars on the display

FSM Lecture 63- Atmega328p Timer peripheral explanation

FreeRTOS Lecture 8 – STM32F4 Discovery and Nucleo: Board Details

Leave a Reply Cancel reply