Skip to content

Commit

Permalink
Hash Map Lesson: Make use of terms more consistent
Browse files Browse the repository at this point in the history
  • Loading branch information
softy-dev committed Dec 1, 2024
1 parent 0124936 commit 1d8798c
Showing 1 changed file with 11 additions and 11 deletions.
22 changes: 11 additions & 11 deletions javascript/computer_science/hash_map_data_structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ You might be thinking, wouldn't it just be better to save the whole name as a ha

### Buckets

Buckets are storage that we need to store our elements. Simply, it's an array. For a specific key, we decide which bucket to use for storage through our hash function. The hash function returns a number that serves as the index of the array at which we store this specific key value pair. Let's say we wanted to store a person's full name as a key "Fred" with a value of "Smith":
Buckets are storage that we need to store our elements. We can consider each index of an array to have a bucket. For a specific key, we decide which bucket to use for storage through our hash function. The hash function returns a number that serves as the index of the array at which we store this specific key value pair. Let's say we wanted to store a person's full name as a key "Fred" with a value of "Smith":

1. Pass "Fred" into the hash function to get the hash code which is `385`.
1. Find the bucket at index `385`.
Expand All @@ -98,20 +98,20 @@ This is an oversimplified explanation; we'll discuss more internal mechanics lat

Now if we wanted to get a value using a key:

1. To retrieve the value, we hash the key and calculate its bucket number.
1. To retrieve the value, we hash the key and calculate the index of its bucket.
1. If the bucket is not empty, then we go to that bucket.
1. Now we compare if the node's key is the same key that was used for the retrieval.
1. If it is, then we can return the node's value. Otherwise, we return `null`.

Maybe you are wondering, why are we comparing the keys if we already found the index of that bucket? Remember, a hash code is just the location. Different keys might generate the same hash code. We need to make sure the key is the same by comparing both keys that are inside the bucket.

This is it, making this will result in a hash table with `has`, `set` and `get`.
This is it, making this will result in a hash map with `has`, `set` and `get`.

#### Insertion order is not maintained

A hash map does not guarantee insertion order when you iterate over it. The translation of hash codes to indexes does not follow a linear progression from the first to the last index. Instead, it is more unpredictable, irrespective of the order in which items are inserted. That means if you are to retrieve the array of keys and values to iterate over them, then they will not be in order of when you inserted them.

Some libraries implement hash tables with insertion order in mind such as JavaScript's own `Map`. For the coming project however we will be implementing an unordered hash table.
Some libraries implement hash maps with insertion order in mind such as JavaScript's own `Map`. For the coming project however we will be implementing an unordered hash map.
Example: if we insert the values `Mao`, `Zach`, `Xari` in this order, we may get back `["Zach", "Mao", "Xari"]` when we call an iterator.

If iterating over a hash map frequently is your goal, then this data structure is not the right choice for the job, a simple array would be better.
Expand Down Expand Up @@ -152,9 +152,9 @@ Up until now, our hash map is a one-dimensional data structure. What if each `No

You probably understand by this point why we must write a good hashing function which eliminates as many collisions as possible. Most likely you will not be writing your own hash functions, as most languages have it built in, but understanding how hash functions work is important.

### Growth of a hash table
### Growth of a hash map

Let's talk about the growth of our buckets. We don't have infinite memory, so we can't have infinite number of buckets. We need to start somewhere, but starting too big is also a waste of memory if we're only going to have a hash map with a single value in it. So to deal with this issue, we should start with a small array for our buckets. We'll use an array of size `16`.
Let's talk about our number of buckets. We don't have infinite memory, so we can't have an infinite amount of them. We need to start somewhere, but starting too big is also a waste of memory if we're only going to have a hash map with a single value in it. So to deal with this issue, we should start with a small array for our buckets. We'll use an array of size `16`.

<div class="lesson-note lesson-note--tip" markdown="1">

Expand All @@ -168,17 +168,17 @@ For example, if we are to find the bucket where the value `"Manon"` will land, t

As we continue to add nodes into our buckets, collisions get more and more likely. Eventually, however, there will be more nodes than there are buckets, which guarantees a collision (check the additional resources section for an explanation of this fact if you're curious).

Remember we don't want collisions. In a perfect world each bucket will either have 0 or 1 node only, so we grow our buckets to have more chance that our nodes will spread and not stack up in the same buckets. To grow our buckets, we create a new buckets list that is double the size of the old buckets list, then we copy all nodes over to the new buckets.
Remember we don't want collisions. In a perfect world, each bucket will either have 0 or 1 node only, so we grow our buckets array to have more chance that our nodes will spread and not stack up in the same buckets. To grow our array, we create a new one that is double its size and then copy all existing nodes over to the buckets of this new array, hashing their keys again.

#### When do we know that it's time to grow our buckets size?
#### When do we know that it's time to grow our buckets array?

To deal with this, our hash map class needs to keep track of two new fields, the `capacity` and the `load factor`.

- The `capacity` is the total number of buckets we currently have.

- The `load factor` is a number that we assign our hash map to at the start. It's the factor that will determine when it is a good time to grow our buckets. Hash map implementations across various languages use a load factor between `0.75` and `1`.
- The `load factor` is a number that we assign our hash map to at the start. It's the factor that will determine when it is a good time to grow our buckets array. Hash map implementations across various languages use a load factor between `0.75` and `1`.

The product of these two numbers gives us a number, and we know it's time to grow when there are more entries in the hash map than that number. For example, if there are `16` buckets, and the load factor is `0.8`, then we need to grow the buckets when there are more than `16 * 0.8 = 12.8` entries - which happens on the 13th entry. Setting it too low will consume too much memory by having too many empty buckets, while setting it too high will allow our buckets to have many collisions before we grow them.
The product of these two numbers gives us a number, and we know it's time to grow when there are more entries in the hash map than that number. For example, if there are `16` buckets, and the load factor is `0.8`, then we need to grow the buckets array when there are more than `16 * 0.8 = 12.8` entries - which happens on the 13th entry. Setting it too low will consume too much memory by having too many empty buckets, while setting it too high will allow our buckets to have many collisions before we resize the array.

### Computation complexity

Expand Down Expand Up @@ -208,7 +208,7 @@ The following questions are an opportunity to reflect on key topics in this less
- [What does it mean to hash?](#what-is-a-hash-code)
- [What are buckets?](#buckets)
- [What is a collision?](#collisions)
- [When is it a good time to grow our table?](#when-do-we-know-that-its-time-to-grow-our-buckets-size)
- [When is it a good time to grow our buckets array?](#when-do-we-know-that-its-time-to-grow-our-buckets-array)

### Additional resources

Expand Down

0 comments on commit 1d8798c

Please sign in to comment.