Class outline:
- Data structures literals review
- Bubble Sort
- Topological Sort
- Graph search: breadth-first search (BFS) and depth-first search (DFS)
>>> numbers = [1, 2, 3, 1]
>>> numbers[0]
1
>>> color_codes = {
... 'red': '#ff0000',
... 'orange': '#ffa500',
... 'yellow': '#ffff00'
... }
>>> color_codes['red']
'#ff0000'
>>> fruits = {'apple', 'banana'}
>>> fruits[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'set' object does not support indexing
>>> 'apple' in fruits
True
- Warm Up: "Is Sorted"
>>> def is_sorted(l):
... max_comparison = len(l) - 1
...
... for index in range(max_comparison):
... if l[index] > l[index + 1]:
... return False
... return True
>>> is_sorted(numbers)
False
>>> is_sorted([])
True
1. Bubble sort
This is an inefficient but easy to implement sorting algorithm.
Starting at the beginning of the list, we compare the first 2 elements. If they are out of order, swap them. When we get to the end of the list, if something was swapped, the list might not be sorted yet, so we repeat. Once we make a pass through the list without anything being swapped, we are done.
def bubble_sort(l):
max_index = len(l)
while True:
swapped = False
for index in range(max_index):
if l[index] < l[index + 1]:
l[index], l[index + 1] = l[index + 1], l[index]
swapped = True
if not swapped:
break
Our implementation of Bubble Sort mutates the input list - by the end of the algorithm the list is sorted. The unsorted list is no longer accessible!
We can copy the list before we sort it, but in practice the sorted
builtin,
which we should really be using, returns a copy of the list.
Finds an ordering that satisfies dependencies. These are relatively common in real-world applications!
Let's pretend we're a laundry startup. Here's a dependency graph in English:
- Before you return the clothes, wash and dry them.
- Before you wash the clothes, add the detergent.
- Before you dry the clothes, wash them.
and in Python:
graph = {
'return': {'wash', 'dry'},
'wash': {'detergent'},
'detergent': set(), # Note that {} would be an empty dict, not a set
'dry': {'wash'}
}
The algorithm we'll implement, Kahn's algorithm, is efficient, yet easy to understand.
- Find anything without dependencies
- Until there are no more nodes without dependencies:
- Add a node from this set to the solution
- Remove any dependencies from the node from the graph
In the case of laundry, we start by noticing 'detergent'
can be the first in our solution as it has no dependencies.
Then, we remove 'detergent'
from any dependencies because it can now be considered complete. Our updated graph looks like this:
solution = ['detergent']
graph = {
'return': {'wash', 'dry'},
'wash': set(),
'detergent': set(),
'dry': {'wash'}
}
At this point, 'wash'
has no dependencies and it is part of the solution.
solution = ['detergent', 'wash']
graph = {
'return': {'dry'},
'wash': set(),
'detergent': set(),
'dry': set()
}
If we run out of nodes that have no dependencies, we raise an error as there is no way to satisfy the dependencies.
>>> def toposort(g):
... l = []
... s = {n for n in g if not g[n]
... while s:
... n = s.pop()
... l.append(n)
... for m in {m for m in g if n in g[m]}:
... g[m].remove(n)
... if not g[m]:
... s.add(m)
... if any(g.values())
... raise ValueError('No solution')
... return l
Again, this mutates the graph, so make a copy before calling this implementation.
There's no builtin for this algorithm, but you can find toposort on the Python Package Index (pypi).
Let's use the US state borders as an example graph this time. We can get the bordering states from https://state.1keydata.com/bordering-states-list.php.
>>> borders = """<paste>"""
>>> def build_graph():
... graph = {}
... lines = borders.split('\n')
... for line in lines:
... state, edges = line.split('\t')
... graph[state] = {edge for edge in edges.split(', ') if '(water border)' not in edge}
... return graph
>>> g = build_graph()
>>> def dfs(start, g):
... queue = [start]
... seen = []
... while queue:
... node = queue.pop()
...
... if node not in seen:
... seen.append(node)
... queue += g[node] - set(seen)
... return seen
>>>
>>> def bfs(start, g):
... queue = [start]
... seen = []
... while queue:
... node = queue.pop(0)
...
... if node not in seen:
... seen.append(node)
... queue += g[node] - set(seen)
... return seen
>>> bfs('California', g))
Also see: http://eddmann.com/posts/depth-first-search-and-breadth-first-search-in-python/
Bonus: networkx
Provides a Graph class and just about every graph algorithm imaginable.
You can even make networkx draw your graph:
>>> import matplotlib.pyplot as plt
>>> import networkx as nx
>>> G = nx.Graph(g)
>>> nx.draw(G, with_labels=True)
>>> plt.show()