-
heap is a data structure capable of giving you the smallest (or the largest) element in constant time, while adding or removing the smallest (or the largest) element in logarithmic time.
-
a heap is like a binary tree with these properties:
- it must have all of its nodes in a specific order, and
- its shape must be complete (all the levels of the tree must be completely filled except maybe for the last one and the last level must have the left-most nodes filled, always).
- a max heap's root node must have all its children either greater than or equal to its children.
-
a min heap is the opposite. duplicate values are allowed.
-
heaps can be represented with linked lists, queues (arrays), or binary trees.
-
in the case of an array heap:
- the parent node index is given by
n / 2
- the left children index is
2 * n
- the right children index is
2 * n + 1
- a node is a leaf when its index
> n / 2
- the parent node index is given by
-
common applications of heap are: sort (heap sort), getting the top-k elements, and finding the kth element.
-
in python you can use
heapq.heapify(array)
withheapq.heappush(array, value)
andheapq.heappop()
.
class Heap:
def __init__(self):
self.heap = []
def heapify(self, n, i):
largest = i
left_children = 2 * i + 1
right_children = 2 * i + 2
if left_children < n and self.heap[i] < self.heap[left_children]:
largest = left_children
if right_children < n and self.heap[largest] < self.heap[right_children]:
largest = right_children
if largest != i:
self.heap[i], self.heap[largest] = self.heap[largest], self.heap[i]
self.heapify(n, largest)
def insert(self, num):
size = len(self.heap)
if size == 0:
self.heap.append(num)
else:
self.heap.append(num)
for i in range((size // 2) - 1, -1, -1):
self.heapify(size, i)
def delete_node(self, num):
size = len(self.heap)
i = 0
for i in range(size):
if num == self.heap[i]:
break
self.heap[i], self.heap[size - 1] = self.heap[size - 1], self.heap[i]
self.heap.remove(size - 1)
for i in range((len(self.heap) // 2) - 1, -1, -1):
self.heapify(len(self.heap), i)
-
it's cheaper to heapify an array of data (
O(N)
) than create an empty heap and inserting each element (O(N log(N))
).- heapify means create a binary tree and then comparing each nodes with their children (and swapping when necessary).
- parents node can simply exchange with their smallest child (so the max number of exchanges is
N/2
) and leaves are left out.
-
python's built-in heap differs from the standard implementation of a heap in two ways:
- firstly, it uses zero-based indexing, so it stores the root node at index zero instead of the size of the heap.
- secondly, the built-in module does not offer a direct way to create a max heap, instead, we must modify the values of each element when inserting in the heap, and when removing it from the heap.
import heapq
min_heap = [3,1,2]
heapq.heapify(min_heap)
max_heap = [-x for x in min_heap]
heapq.heapify(max_heap)
heapq.heappush(min_heap, 5)
heapq.heappush(min_heap, -5)
min_elem = min_heap[0]
max_elem = -1 * max_heap[0
heapq.heappop(min_heap)
heapq.heappop(max_heap)
size_min_heap = len(min_heap)
size_max_heap = len(max_heap)
-
a priority queue is an abstract data type with the following properties:
- every item has a priority (usually an integer).
- an item with a high priority is dequeued before an item with low priority.
- two items with an equal priority are dequeued based on their order in the queue.
-
priority queues can be implemented with a stack, queue, or linked list data structures.
-
a min heap is a complete binary tree where each node is smaller than its children (the root is the min element).
-
insert
:- insert the element at the bottom, by finding the most rightmost node and checking its children: if left is empty, insert there, otherwise, insert on right.
- then compare this node to each parent, exchanging them until the tree's properties are correct.
-
extract_min
:- first, remove/return the top and then replace the tree's top with its latest element (the bottom-most rightmost).
- then bubble down, swapping it with one of its children until the min-heap is properly restored
- there is no need for order between right and left, so this operation would only take
O(log N)
runtime.
-
the code below is an example of an ad-hoc heap class in python.
class MinHeap:
def __init__(self, size):
self.heapsize = size
self.minheap = [0] * (size + 1)
self.realsize = 0
def add(self, element):
if self.realsize + 1 > self.heapsize:
print("Too many elements!")
return False
self.realsize += 1
self.minheap[self.realsize] = element
index = self.realsize
parent = index // 2
while self.minheap[index] < self.minheap[parent] and index > 1:
self.minheap[parent], self.minheap[index] = self.minheap[index], self.minheap[parent]
index = parent
parent = index // 2
def peek(self):
return self.minheap[1]
def pop(self):
if self.realsize < 1:
print("Heap is empty.")
return False
else:
remove_element = self.minheap[1]
self.minheap[1] = self.minheap[self.realsize]
self.realsize -= 1
index = 1
while index <= self.realsize // 2:
left_children = index * 2
right_children = (index * 2) + 1
if self.minheap[index] > self.minheap[left_children] or \
self.minheap[index] > self.minheap[right_children]:
if self.minheap[left_children] < self.minheap[right_children]:
self.minheap[left_children], self.minheap[index] = self.minheap[index], self.minheap[left_children]
index = left_children
else:
self.minheap[right_children], self.minheap[index] = self.minheap[index], self.minheap[right_children]
index = right_children
else:
break
return remove_element
def size(self):
return self.realsize
def __str__(self):
return str(self.minheap[1 : self.realsize + 1])
-
a max heap is a complete binary tree where each node is larger than its children (the root is the max element).
-
insert
:- insert the element at the bottom, at the leftmost node.
- then compare the node to each parent, exchanging them until the tree's properties are correct.
-
extract_max
:- remove/return the top and then replace the tree's top with its bottom rightmost element.
- swap up until the max element is on the top.
-
the code below is an example of a max heap class built in python:
class MaxHeap:
def __init__(self, heapsize):
self.heapsize = heapsize
self.maxheap = [0] * (heapsize + 1)
self.realsize = 0
def add(self, element):
self.realsize += 1
if self.realsize > self.heapsize:
print("Too many elements!")
self.realsize -= 1
return False
self.maxheap[self.realsize] = element
index = self.realsize
parent = index // 2
while self.maxheap[index] > self.maxheap[parent] and index > 1:
self.maxheap[parent], self.maxheap[index] = self.maxheap[index], self.maxheap[parent]
index = parent
parent = index // 2
def peek(self):
return self.maxheap[1]
def pop(self):
if self.realsize < 1:
print("Heap is empty.")
return False
else:
remove_element = self.maxheap[1]
self.maxheap[1] = self.maxheap[self.realsize]
self.realsize -= 1
index = 1
while (index <= self.realsize // 2):
left_children = index * 2
right_children = (index * 2) + 1
if (self.maxheap[index] < self.maxheap[left_children] or self.maxheap[index] < self.maxheap[right_children]):
if self.maxheap[left_children] > self.maxheap[right_children]:
self.maxheap[left_children], self.maxheap[index] = self.maxheap[index], self.maxheap[left_children]
index = left_children
else:
self.maxheap[right_children], self.maxheap[index] = self.maxheap[index], self.maxheap[right_children]
index = right_children
else:
break
return remove_element
def size(self):
return self.realsize
def __str__(self):
return str(self.maxheap[1 : self.realsize + 1])
-
the core concept of the heap sort involves constructing a heap from our input and then repeatedly removing the min/max element to sort the array.
-
the key idea for in-place heap sort involves a balance of these two ideas:
- building a heap from an unsorted array through a "bottom-up heapification" process, and
- using the heap to sort the input array
-
heapsort traditionally uses a max-heap to sort the array, although a min-heap also works.
-
this is not a stable sort.
def heap_sort(self, array) -> None:
def max_heapify(heap_size, index):
left, right = 2 * index + 1, 2 * index + 2
largest = index
if left < heap_size and array[left] > array[largest]:
largest = left
elif if right < heap_size and array[right] > array[largest]:
largest = right
elif largest != index:
array[index], array[largest] = array[largest], array[index]
max_heapify(heap_size, largest)
for i in range(len(lst) // 2 - 1, -1, -1):
max_heapify(len(array), i)
for i in range(len(array) - 1, 0, -1):
array[i], array[0] = array[0], array[i]
max_heapify(i, 0)
return array
def compare_two_tops(array) -> int:
for i in range(len(array)):
array[i] *= -1
heapq.heapify(array)
while len(array) > 1:
val1 = heapq.heappop(array)
val2 = heapq.heappop(array)
if val1 != val2:
heapq.heappush(array, val1 - val2)
if array:
return -heapq.heappop(array)
return 0
- given an array of
intervals[i] = [start_i, end_i]
, return the minimum the non-overlapping intervals:
def non_overlapping_invervals(intervals):
if not intervals:
return 0
result = []
intervals.sort(key=lambda x: x[0])
heapq.heappush(result, intervals[0][-1])
for interval in intervals[1:]:
if result[0] <= interval[0]:
heapq.heappop(result)
heapq.heappush(result, interval[1])
return len(result)
def top_k_frequent_values(list, k):
if k == len(nums):
return nums
# hashmap element: frequency
counter = Counter(nums)
return heapq.nlargest(k, counter.keys(), key=counter.get)
class KthLargest:
def __init__(self, k, nums):
self.k = k
self.heap = nums
heapq.heapify(self.heap)
while len(self.heap) > k:
heapq.heappop(self.heap)
def add(self, val: int) -> int:
heapq.heappush(self.heap, val)
if len(self.heap) > self.k:
heapq.heappop(self.heap)
return self.heap[0]
- given an
n x n
matrix where each of the rows and columns is sorted in ascending order, return thekth
smallest element in the matrix.
def kth_smallest(matrix, k) -> int:
min_heap = []
for row in range(min(k, len(matrix))):
min_heap.append((matrix[row][0], row, 0))
heapq.heapify(min_heap)
while k:
element, row, col = heapq.heappop(min_heap)
if col < len(matrix) - 1:
heapq.heappush(min_heap, (matrix[row][cow + 1], row, col + 1))
k -= 1
return element