Hash_table_analysis

Introduction

In this project I have studied how a hash-table based on the chain method works.
The main point of study is certain hash functions and programme performance depending on hash functions, table sizes and various loads.

Input data: L.N. Tolstoy's text 'War and Peace' (eng. version)
Stress Testing: Search for each word in the list 5 times (element of Hash Table)
Hash table capacity : 6900
List capacity: 200

To compile:

Main version with optimization:

nasm -f elf64 crc32_optimization.s -o crc32_optimization.o

g++ crc32_optimization.o main.cpp List/List_functions.cpp phash_table_func.cpp phash_func.cpp -mavx2 -O2 -o name_file

Version without optimization:

g++ main.cpp ../List/List_functions.cpp phash_table_func.cpp phash_func.cpp -o name_file

Also you can compile with flag -O2 for optimization.

Task_1

Learning hash functions

I had 6 hash functions to consider:

always return 1
value of the first ASCII character
string length
sum of the ASCII characters in word
ROL
CRC32

First of all, we need to determine the number of collisions depending on our hash functions,
then take the speed of the programme and the reasons for their changes.For clarity, here are the charts for each hash function,
which show the distribution of list lengths depending on the hash table element:

All compilation took place without any flags.

1. hash always returns 1

Opinion:

The running time of the programme with this hash function is 6,287 seconds.
This is really a lot, most likely due to the sheer number of collisions.
The large number of collisions caused our list to expand forever (realloc).
But the biggest burden is checking the words in the list,
since the length of the list is huge, so the check takes a very long time.

2. hash returns value of the first ASCII character

Opinion:

The running time of a programme with this hash function is 0,673 seconds. The reasons for the load and the speed of the programme are the same as in the first case. However, there are fewer collisions, so the speed is higher.

3. hash returns length of string

Opinion:

Obviously, word lengths are limited, so there are plenty of collisions. The running time of the programme: 1,725 seconds

4. hash returns sum of ASCII characters

Opinion:

This hash function gives the least amount of collisions, so it works faster than the previous ones. Running time: 0,099 seconds.
Interesting deviations can be seen in the graph. I have checked it and indeed these values correspond to reality.
Our test contains words consisting of more than 32 characters, and the ASCII value of a character is really that big (more than 255)

5. ROL hash function

This feature is more interesting than the ones before it.
To understand how this hash function works, here is a screenshot of its implementation.

size_t hash_rol(char* str)
{
    assert (str != nullptr);

    int hash = *str;
    str++;

    while (*str != '\0')
    {
        hash = (hash << 1) ^ *str;

        str++;
    }

    return hash;
}

Opinion:

This function gives a very small number of collisions. It works very quickly,
running time: 0,63 seconds
There are so few matches that the list does not have to expand and the search is very fast,
this hash function can be called "good" in our situation.

6. CRC32 hash function

Finally, CRC32 is the best hash function on out list.

Implementation:

size_t hash_CRC32(char* str)
{
    assert(str != nullptr);

    unsigned int crc = 0xFFFFFFFFUL;

    while (*str != '\0')
    {
        crc = CRCTable[(crc ^ *(str)) & 0xFF] ^ (crc >> 8);
        str++;
    }

    return crc;
}

Opinion:

This hash function is interesting, it gives the least amount of collisions, which will result in a big performance gain on more data.
Especially when the stress testing is to find items in a list. However, in our case the program performs slightly less than the ROL hash function.
This is due to the timing of the hash function itself.
Running time: 0,64 seconds.
ROL hash function works faster than CRC32.
But I will work with CRC32 as in our case lesser number of matches is much more useful than the hash algorithm itself.

Conclusion for the Task 1:

HASH FUNCTION	TIME
return 1	6,287 seconds
first ASCII	0,673 seconds
length of str	1,725 seconds
sum of ASCII	0,099 seconds
ROL	0,063 seconds
CRC32	0,064 seconds

As you can see, the program with ROL is slightly faster, but the variance of ROL is greater than that of CRC32.
If we increase the amount of input data,
the performance of the program with CRC32 will be much better,
because the cost of finding an element in the list is much higher than the hash calculation.
And CRC32 has obviously less elements in one list
It is possible to optimise the hash function and make it faster than ROL .Next we will do all the work with the CRC32 function

Obviously the running time of the hash table is directly related to the hash function and the size of the table and list

Task_2

Optimization

Run the original programme via the vallgrind tool - callgrind

Based on this result (the self column), we can see that the function with the highest load is "phash_table_find_el".
And the hash function itself accordingly

For clarity, the results of compiler optimisation with the `-O2` flag are as follows

The original `phash_table_find_el` version

struct Plist* phash_table_find_el(Phash_table* hash_table, char* word, int* item_num_of_list)
{
    size_t offset = hash_table->hash_func(word) % hash_table->capacity;

    int flag = 0;

    for (int i = 1; (hash_table->hash_list[offset].size >= i) && (flag == 0) ; i++)
    {
        if (strncmp(word, hash_table->hash_list[offset].data[i].value, strlen(word) + 1) == 0)
        {
            flag = i;
        }
    }

    *item_num_of_list = flag;

    return hash_table->hash_list + offset;
}

I used optimization using AVX2 to compare strings

struct Plist* phash_table_find_el(Phash_table* hash_table, char* word, int* item_num_of_list)
{
    size_t offset = hash_table->hash_func(word) % hash_table->capacity;

    __m128i string = _mm_loadu_si128 ((__m128i *)word);
    __m128i listStr {};

    int flag = 0;

    for (int i = 1; (hash_table->hash_list[offset].size >= i) && (flag == 0) ; i++)
    {
        listStr = _mm_loadu_si128 ((__m128i *)hash_table->hash_list[offset].data[i].value);

        int cmp = _mm_cmpestri (string, strlen (word) + 1, listStr, hash_table->hash_list[offset].data[i].len_str + 1, _SIDD_CMP_EQUAL_EACH | _SIDD_CMP_EQUAL_ORDERED | _SIDD_UBYTE_OPS); // | _SIDD_NEGATIVE_POLARITY);

        if (cmp == 0) 
            flag = i;
    }

    *item_num_of_list = flag;

    return hash_table->hash_list + offset;
}

This optimization gave a relatively good increase in the performance of the program, below is a table with time values and callgrind results.

I use flag `-O2` because AVX2 doesn't work correctly without this flag.

As you can see, our function has accelerated by 3 million units self. which is an excellent result,
but if we talk about more down to earth values, here is a table of the running times. You can find all of the proufs in the ./analytics/screenshots/ folder

WITHOUT OPTIMIZATION	0,064 seconds
WITH -O2	0,043 seconds
WITH -O2 + AVX2	0,038 seconds

Total increase of 13,16%

Now let's rewrite our hash function into assembler

Original version `hash_CRC32`

size_t hash_CRC32(char* str)
{
    assert(str != nullptr);

    unsigned int crc = 0xFFFFFFFFUL;

    while (*str != '\0')
    {
        crc = CRCTable[(crc ^ *(str)) & 0xFF] ^ (crc >> 8);
        str++;
    }

    return crc;
}

After optimization:

section .text

global crc32_asm


crc32_asm:
    mov eax, 0FFFFFFFFh

.loop:
    cmp byte [rdi], 00h
    je .exit

    movzx r10d, byte [rdi]

    crc32 eax, r10b

    inc rdi
    jmp .loop

.exit:
    not eax
    ret

We got an increase of 10.2 million units .self

WITHOUT OPTIMIZATION	0,064 seconds
WITH -O2	0,043 seconds
WITH -O2 + AVX2	0,038 seconds
WITH -O2 + AVX2 + ASM	0,036 seconds

Total increase of 5.55% compared to last optimization

As for the other functions, we may notice a large value in strlen, fill_words_text, plist_constructor,
we can omit them because these functions are only used once when filling the table.
There is no repetition of them, so there will be no improvement in optimization when the table is filled permanently
This optimisation is therefore sufficient to improve performance. The results of the stress test run times are shown below.
Finally, I reduce the number of function calls and various cosmetic changes

This is a study paper and there were 3 optimisations to try

I decided to rewrite strchr in asm

section .text

global strchr_asm


strchr_asm:

mov rax, rsi
    
.loop:
    cmp byte [rdi], al
    je .exit

    cmp byte [rdi], 00h
    je .exitnull

    inc rdi

    jmp .loop

.exit:
    mov rax, rdi 
    ret 

.exitnull:
    mov rax, 0
    ret

WITHOUT OPTIMIZATION	0,064 seconds
WITH -O2	0,043 seconds
WITH -O2 + AVX2	0,038 seconds
WITH -O2 + AVX2 + ASM	0,036 seconds
ALL OPTIMIZATION	0,035 seconds

Total increase of 2,8%

This gain is very small, so this optimisation does not make further sense.
But for training purposes I wanted to show that there is no need to write unnecessary optimizations without a real gain.

Total:

HASH FUNCTION	TIME
return 1	6,287 seconds
first ASCII	0,673 seconds
length of str	1,725 seconds
sum of ASCII	0,099 seconds
ROL	0,063 seconds
CRC32	0,064 seconds

CRC32 is the most accurate function with the lowest correlation.

WITHOUT OPTIMIZATION	0,064 seconds
WITH -O2	0,043 seconds
WITH -O2 + AVX2	0,038 seconds
WITH -O2 + AVX2 + ASM	0,036 seconds

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
List		List
NO_OPTIMIZATION		NO_OPTIMIZATION
analytics		analytics
hash_statistics		hash_statistics
README.md		README.md
War_and_Peace.txt		War_and_Peace.txt
callgrind_manual_WIN10.txt		callgrind_manual_WIN10.txt
crc32_optimization.s		crc32_optimization.s
main.cpp		main.cpp
phash_func.cpp		phash_func.cpp
phash_table.h		phash_table.h
phash_table_func.cpp		phash_table_func.cpp

shaazmik/Hash_table_analysis

Folders and files

Latest commit

History

Repository files navigation

Hash_table_analysis

Table of contents

Introduction

To compile:

Task_1

Learning hash functions

All compilation took place without any flags.

1. hash always returns 1

Opinion:

2. hash returns value of the first ASCII character

Opinion:

3. hash returns length of string

Opinion:

4. hash returns sum of ASCII characters

Opinion:

5. ROL hash function

Opinion:

6. CRC32 hash function

Opinion:

Conclusion for the Task 1:

Obviously the running time of the hash table is directly related to the hash function and the size of the table and list

Task_2

Optimization

Run the original programme via the vallgrind tool - callgrind

For clarity, the results of compiler optimisation with the -O2 flag are as follows

The original phash_table_find_el version

I used optimization using AVX2 to compare strings

This optimization gave a relatively good increase in the performance of the program, below is a table with time values and callgrind results.

I use flag -O2 because AVX2 doesn't work correctly without this flag.

Total increase of 13,16%

Now let's rewrite our hash function into assembler

Original version hash_CRC32

After optimization:

We got an increase of 10.2 million units .self

Total increase of 5.55% compared to last optimization

This is a study paper and there were 3 optimisations to try

I decided to rewrite strchr in asm

Total increase of 2,8%

Total:

CRC32 is the most accurate function with the lowest correlation.

All prufs can be checked in ./analytics/screenshots/

All tables (excel) can be checked in ./hash_statistics/

From this work we can conclude:

the speed of the hash table is directly dependent on the size of the lists and the hash table and hash functions.

If the hash table and list are sized correctly, there is not much need to optimise the program.

In our case Hash_table capacity = 6900, List capacity = 20.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

For clarity, the results of compiler optimisation with the `-O2` flag are as follows

The original `phash_table_find_el` version

I use flag `-O2` because AVX2 doesn't work correctly without this flag.

Original version `hash_CRC32`

All prufs can be checked in `./analytics/screenshots/`

All tables (excel) can be checked in `./hash_statistics/`

Packages