Hash Table
Last updated
Last updated
A hash table is a type of data structure that stores pairs of key-value.
The key is sent to a hash function that performs arithmetic operations on it.
The result (commonly called the hash value or hashing) is the index of the key-value pair in the hash table
Key: unique integer that is used for indexing the values.
Value: data that are associated with keys.
A hash function takes a group of characters (called a key) and maps it to a value of a certain length (called a hash value or hash).
The hash value is representative of the original string of characters, but is normally smaller than the original.
Hashing is done for indexing and locating items in databases because it is easier to find the shorter hash value than the longer string.
Hashing is also used in encryption.
This term is also known as a hashing algorithm or message digest function.
A collision occurs when two keys get mapped to the same index.
There are several ways of handling collisions.
If a pair is hashed to a slot which is already occupied, it searches linearly for the next free slot in the table.
The hash table will be an array of linked lists.
All keys mapping to the same index will be stored as linked list nodes at that index.
The size of the hash table can be increased in order to spread the hash entries further apart.
A threshold value signifies the percentage of the hash-table that needs to be occupied before resizing.
A hash table with a threshold of 0.6 would resize when 60% of the space is occupied.
As a convention, the size of the hash-table is doubled. This can be memory intensive.
Hash tables have incredibly quick lookups, but remember that we need a reliable collision solution; normally, we don't need to worry about this because our language in the computer beneath the hood takes care of it for us.
It allows us to respond quickly, and depending on the type of hash table, such as maps in JavaScript, we can have flexible keys instead of an array with 0 1 2 3 only numbered indexes.
The disadvantage of hash tables is that they are unordered.
It's difficult to go through everything in an orderly fashion.
Furthermore, it has slow key iteration.
That is, if I want to retrieve all the keys from a hash table, I have to navigate the entire memory space.
Fast lookups^
Fast inserts
Flexible Key
Good collision resolution needed
Unordered
Slow key iteration
We've noticed a few differences between hash tables and arrays.
When it comes to looking for items, hash tables are usually faster.
In arrays, you must loop over all items before finding what you are looking for, while with a hash table, you go directly to the item's location.
Inserting an item in Hash tables is also faster because you simply hash the key and insert it.
In arrays shifting the items is important first before inserting another one.
Important Note: When choosing data structures for specific tasks, you must be extremely cautious, especially if they have the potential to harm the performance of your product.
Having an O(n) lookup complexity for functionality that must be real-time and relies on a large amount of data could make your product worthless.
Even if you feel that the correct decisions have been taken, it is always vital to verify that this is accurate and that your users have a positive experience with your product.
Operation | Average | Worst |
---|---|---|
Search
O(1)
O(n)
Insertion
O(1)
O(n)
Deletion
O(1)
O(n)
Space
O(n)
O(n)