Skip to content

Commit

Permalink
Numbers and Types (#3)
Browse files Browse the repository at this point in the history
* Numerical Representations

* Primitive Data Types

* Cousins to Sheep

* Three to Eight for Byte
  • Loading branch information
nguy8tri authored Dec 22, 2022
1 parent eba0815 commit ba9beac
Show file tree
Hide file tree
Showing 4 changed files with 350 additions and 0 deletions.
Binary file added Chapters/Images/ASCII_Table.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Chapters/Images/Memory_Layout.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
162 changes: 162 additions & 0 deletions Chapters/Numerical_Representations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
# Numerical Representation

## Introduction to Numbers
___
Numbers form a point of interest in every culture. Sayings like "third times the charm", "four is an unlucky number", or "thirteen is cursed" are things some of us may have heard before.

Therefore, the title of this may sound silly, but let's think for a moment and ask ourselves a question: Why do we represent numbers the way we do? To be more specific, let's look at a following number:

<div style="text-align: center", size="4">

5,973

</div>

What do each of these digits mean though? Let's write out the full breakdown of the number:<br />

<div style="text-align: center">

$$5\times10^3+9\times10^2+7\times10^1+3\times10^0$$

</div>

So let's think about this system we have today. The number of digits we have is 10, ranging from 0 to 9, each "place" is a power of 10, and the places obviously increase by 1 each time. This system is called *decimal* for its use of 10.

This seems like an obvious mathematical fact, but why? Have you ever questioned this format? Specifically, why do we use 10 for all of these?

The answer lies in the fact that humans have 10 fingers. From the Chinese to the Arabs to the Romans, every ancient civilization has always counted numbers this way because we have 10 figures, toes, or *digits*. Because of this, many areas of our life rely on this simple fact, from money, politics, and culture.

## Binary
___

However, what if we could break this system? In other words, what if we all had less or more than 10 digits? Or what if we met aliens that had a different number of digits? How do they count?

Let's assert that the new species *alans machinicus* has 2 fingers. We therefore can assume their system is in "base 2", or what is colloquially known as *binary*. Assuming they also use arabic numerals, their numbers will look like this

<div style="text-align: center", size="4">

10011010

</div>

Now let's break this down. Since they only have 2 fingers, they only have 2 options for digits, where we have chosen 0 and 1. Their "places" will be in powers of 2, and the powers will still increase sequentially, so the breakdown is

<div style="text-align: center">

$$1\times2^7+0\times2^6+0\times2^5+1\times2^4+1\times2^3+0\times2^2+1\times2^1+0\times2^0$$

</div>

If you want this number in our terms, just compute this expression we have here, and you'll find that it is the number

<div style="text-align: center", size="4">

154

</div>

To avoid confusion, it is common to notate these numbers with the subscript that is associated with this base

<div style="text-align: center", size="4">

Number<sub>base</sub>

</div>

In this case:

<div style="text-align: center", size="4">

10011010<sub>2</sub> = 154<sub>10</sub>

</div>

Ignore this if you don't want to nerd out, but the *alans machinicus* would actually expand the above number as:

<div style="text-align: center">

$$1_2\times10_2 ^{111_2}+0_2\times10_2^{110_2}+0_2\times10_2^{101_2}+1_2\times10_2^{100_2}+1_2\times10_2^{11_2}+0_2\times10_2^{10}+1_2\times10_2^1+0_2\times10_2^0$$

</div>


## Hexdecimal
___

In Latin, "Hex" means 6 and "decimal" is a reference to 10, so 6 + 10 is 16, so we'll be working with base 16 now. Let's say that the a new mutation arises in the population where people have 16 fingers (8 on each hand), and scientists name them a new species, *Ritchius Strostrupicus*. A millenium later, they form their own culture, customs, and number system. Based on our conversation from earlier, if they say the phrase


<div style="text-align: center", size="4">

"I have 40<sub>16</sub> cousins" (That's alot)

</div>

we would interpret it as

<div style="text-align: center", size="4">

"I have 64<sub>10</sub> cousins" (That's even more)

</div>

An astute person, however, would notice that the number of possible digits is 16 in hexdecimal, which exceeds the amount in any system. Therefore, we substitute letters for numbers so that the possible digits are 0123456789ABCDEF, 0 to F, or from 0 to 16. The "places" have bases of 16, and the powers increase sequentially. Therefore, the number

<div style="text-align: center", size="4">

F94A23C5<sub>16</sub>

</div>

can be represented as

<div style="text-align: center">

$$15\times16^7+9\times16^6+4\times16^5+10\times16^4+2\times16^3+3\times16^2+12\times16^1+5\times16^0$$

</div>

or

<div style="text-align: center">

F94A23C5<sub>16</sub> = 4,182,385,605<sub>10</sub>

</div>

## Numerical System Generalization
___

Let's say we have a number in base N. The possible digits range from 0 to N - 1, the "places" have bases of N, and the powers increase sequentially

One last thing though - You'll run into binary and hexdecimal (And decimal obviously) when working with code at one point or another, and so within coding, they earn a special notation for their numbers. For binary, it is

<div style="text-align: center">

0bNumber

</div>

in decimal

<div style="text-align: center">

0dNumber

</div>

and in hexdecimal

<div style="text-align: center">

0xNumber

</div>

As a final example, let's represent the number 9,096 in all the systems we've learned, in this format!

<div style="text-align: center">

0b10001110001000 = 0d9096 = 0x2388

</div>
188 changes: 188 additions & 0 deletions Chapters/Primitive_Data_Types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
# Primitive Data Types


## Introduction
___
Here, we talk about the characteristics of each primitive type, including memory size, storage characteristics and more.

----------------
Problem 1:

You are an electrical engineer working for Leibniz Microelectronics. You are tasked with making a system that will be used by computers to represent integers. For simplicity, you create a large scale model of 8 lightbulbs in a row (In other words, an array of lights). They are modifiable in any aspect (color, brightness, etc.), and you can represent the integer in whatever base you wish. What base would be the easiest to use?

Answer: Binary. If I choose 0 for off and 1 for on, it is easy for both you and a computer to read the number in binary. However, if we chose base 16, we would have to choose many colors/brightnesses to represent each digit, and its hard even for a human to read off that number.

----------------

As is illustrated, using binary is the easiest because its uses the mimimum possible light states (on or off) and is also the easiest to read. In other words, the use of binary allows for the discretization of values.

Binary has some special terminology for you to be aware of:

<br/>

Table 1 - Binary Sizes
Term|Defintion
:---:|---
Bit|One Binary Digit
Byte|Eight Binary Digits

<br/>

Generally, the sizes of primitives come in different quantities depending on your system. However, they typically follow this ratio:

<br/>

Table 2 - Common Type Sizes
Type|Size (Bytes)
:---:|---
Boolean (`bool`) | 1
Character (`char`) | 2
Integer (`int`) | 4
Float (`float`) | 4
Double (`double`) | 8

<br/>

## Endianess
___
In Western cultures, we take it for granted that we read things from left to right. However, before the meeting between the East and the West (1200 A.D.), many Asian countries read from right to left, and wrote that way too. So that begs the question, which way should computers read information (bytes). Thus, we introduce the following terms:

<br/>
Table 3 - Endianess

Term|Defintion
:---:|---
Most Significant Byte (MSB) | The byte that defines the beginning of the data
Least Significant Byte (LSB) | The byte that defines the end of the data
Little Endian | The data is organized from right to left, so the LSB is to the left and the MSB is to the right
Big Endian | The data is organized from left to right, so the MSB is to the left and the MSB is to the right

<br/>


## Integers
___
Integers are a bit more complicated than you would think. A positive integer would be represented how you think it would, using the normal conversion discussed in **Numerical Representation**. If an integer is defined in code such that it only supports positive numbers, it is called *unsigned*.

But what about negative integers, or *signed* integers? What happens is that the Most Significant Bit (MSBit) is used to indicate the sign of the digit (1 for negative, 0 for positive). However, it will be in the interest of the computer to be able to directly add binary numbers together, but unfortunately, manipulating the MSBit alone won't cut it.

----------------
Example 1:

Let's say we're using an 4 bit integer. Its really 3 bits since one is used for the sign. Let us say we want to subtract 4 and 3 in binary. There's two ways of doing this. We can simply subtract them

<div style="text-align: center">

$$0100_2-0011_2 = 0001_2 = 1_{10}$$


</div>

However, if we add 4 and -3, watch what happens:

<div style="text-align: center">

$$0100_2+1011_2 = 1111_2 = -7_{10}$$


</div>

You will notice it adds in the wrong direction.

----------------

Therefore, we need to change up the representation of the negative numbers in addition to the negative. *Two's Complement* is a system in which we can achieve mathematical equivalence in base 2. The simple rule is that to negate a number b<sub>2</sub>

1. Negate b<sub>2</sub>
2. Add 1

If you know the notation for this, it would be:

<div style="text-align: center">

$$!(b_2)<<1$$


</div>

This works regardless of whether b<sub>2</sub> is positive or negative.

In C++, you can represent Integers in many different ways. They can be *unsigned* (*signed* by default), and you can also make them different lengths (8, 16, 32, or 64). For a size N, the syntax is of the form:

<div style="text-align: center">

`intN_t` (*signed*)


`uintN_t` (*unsigned*)

</div>

For any size N, the range of numbers an integer can represent in decimal is:

<div style="text-align: center">

*Signed*: -2<sup>N/2</sup> to 2<sup>N/2 - 1</sup>

*Unsigned* 0 to 2<sup>N</sup> - 1

</div>

## Floats and Doubles
___
Floats are actually quite different. As a first, these are always signed. But secondly, representing them is very different. We need to form an equivalent to the integer case, so take the decimal number

<div style="text-align: center">

15.72
$$1\times10^1+5\times10^0+7\times10^{-1}+2\times10^{-2}$$


</div>

As you can see, just as the "places" can increment value, they also decrement, so you can imaging the same happens in binary.

----------------
Problem 2:

Represent the fraction 1/3 in binary using an 8 bit float. Note: The sign bit should be first.

Answer: You can't, but you can get close. We can choose the first (sign) and second (whole number) bits to be 0. The determination of the other bits is this:
- Bit 3 (Tenth Place): If we turn this on, then our number is at least 2<sup>-1</sup> or 1/2, so we need this to be zero
- Bit 4 (Hundreth Place): This should be on, since this will guarentee a number of at least 1/4
- Bit 5 (Thousanth Place): This should be off, since 1/4 + 1/8 is greater than 1/3
- Bit 6 (Ten Thousanth Place): This should be on since 1/4 + 1/16 is less than 1/3
- Bit 7 (Hundred Thousanth Place): This should be off since 1/4 + 1/16 + 1/32 is more than 1/3
- Bit 8 (Millionth Place): This should be on since 1/4 + 1/16 + 1/64 is less than 1/3

This results in the binary number 0.010101<sub>2</sub>, but this is 0.328125<sub>10</sub>, which is close to, but not 1/3.

----------------

This is the weakness of a float/double, since they will never be able to accuratly represent all possible non-integers. Any other representation will generate some degree of error, so it is in the best interest to minimize this error. Luckily, the Institute of Electrical and Electronics Engineers (IEEE) found a way to do so, which is this:

- The first bit shall always be dedicated to the **sign** S<sub>2</sub>
- The next few bits shall be dedicated to an **exponent** E<sub>2</sub>
- The remaining bits shall be dedicated to the **mantissa** or **significand** M<sub>2</sub>. It is to be interpreted as part of the fractional number (1.M)<sub>2
- An offset O<sub>2</sub> is sometimes employed, which is typically the maximum possible number of a signed integer with the same bit size as the exponent.

This is arranged such that the resulting float/double number is:

<div style="text-align: center">

$$(-1)^{S_2}\times10^{E_2 - O_2}\times(1.M)_2$$

</div>

This guarentees a very good accuracy of floats/doubles, but you'll sometimes see a very small error, which is why when you make a double variable with value 3.0 and print it, it might end up as 2.999999999


## Characters
___
These have a very simple representation, as they are the equivalent of an unsigned integer with less space. A mapping of char to integer values is shown with the following ASCII table:
![](Images/ASCII_Table.png)
<div style="text-align: center"> Figure 1 - ASCII Table</div>

## Booleans
___
Booleans are quite simple. It is 0 for false and 1 for true, but you'll notice that we use an entire byte for it as opposed to a single bit. This is because since everything else is a whole multiple of a byte, people decided to make it at least a byte too.

0 comments on commit ba9beac

Please sign in to comment.