Graph Traversal Algorithms β How to Invent BFS, DFS & Dijkstra from Scratch
Instead of memorizing BFS, DFS, and Dijkstra, learn the thought process that leads you to invent them. Follow the reasoning from naive exploration to each algorithm's key insight, building the mindset to tackle novel problems.
This article is not about memorizing algorithm steps from a textbook.
If you didn't know these algorithms existed, how would you think your way to inventing them when faced with the problem? That's what we'll explore β the reasoning process that leads to BFS, DFS, and Dijkstra.
Algorithms didn't fall from the sky. Someone faced a problem, analyzed its structure, and reasoned their way to a solution. If you understand that reasoning, you won't need to memorize anything β and you'll be equipped to invent new algorithms for novel problems.
Start With the Problem
Every algorithm begins with a problem definition. Before thinking about solutions, the most important step is getting crystal clear on what you're trying to solve.
The Fundamental Question: "Visit Every Place"
Imagine you're dropped into an unfamiliar city. You have no map, but you can see which roads lead away from your current location. How do you visit every location without missing any?
This is the root question behind graph traversal.
Let's abstract the city. Intersections become vertices, roads become edges. The collection of vertices and edges is a graph.
Graph G = (V, E)
V: set of vertices {A, B, C, D}
E: set of edges {(A,B), (A,C), (B,C), (B,D), (C,D)}A surprising number of real-world problems can be modeled as graphs:
- Road networks: intersections are vertices, roads are edges
- Social networks: users are vertices, friendships are edges
- The Web: pages are vertices, hyperlinks are edges
- Dependencies: packages are vertices, dependencies are edges
When edges have direction, we get a directed graph (one-way streets). When edges have costs, we get a weighted graph (distances, travel times). But the core structure is the same.
Graph Representation: Build the "Who Are My Neighbors?" Machinery
Before thinking about algorithms, we need to decide how to represent a graph in code. Here too, we reason backwards from what operations we need.
The most frequent operation in traversal is "list all neighbors of a given vertex." We choose a representation that makes this fast.
Graph Representation β Adjacency List vs Matrix
Adjacency List
Space: O(V + E)
Edge lookup: O(degree)
β Best for sparse graphs
Adjacency Matrix
| A | B | C | D | |
|---|---|---|---|---|
| A | 0 | 1 | 1 | 0 |
| B | 1 | 0 | 1 | 1 |
| C | 1 | 1 | 0 | 1 |
| D | 0 | 1 | 1 | 0 |
Space: O(VΒ²)
Edge lookup: O(1)
β Best for dense graphs
Adjacency List β "Who are this person's friends?"
Store a list of neighbors for each vertex. Natural, and ideal for sparse graphs (few edges).
graph = {
'A': ['B', 'C'],
'B': ['A', 'C', 'D'],
'C': ['A', 'B', 'D'],
'D': ['B', 'C']
}Space: . Neighbor enumeration: . This is the best choice for most traversal algorithms.
Adjacency Matrix β "Are these two people friends?"
Build a table β set 1 where an edge exists. Edge lookup is , but costs space.
# A B C D
matrix = [
[0, 1, 1, 0], # A
[1, 0, 1, 1], # B
[1, 1, 0, 1], # C
[0, 1, 1, 0], # D
]Thought Experiment 1: The Most Naive Traversal
Now we have a graph. How do we visit every vertex?
The First Wall: "I Keep Revisiting the Same Place"
The obvious approach is "if you can go to a neighbor, go." But unlike trees, graphs can have cycles. AβBβCβAβBββ¦ β you loop forever.
First insight: "To avoid revisiting vertices, I need a visited record."
visited = set() # Without this, we loop foreverThis visited set is the most fundamental building block shared by every graph traversal algorithm. It seems obvious, but it's a direct consequence of analyzing the problem structure β graphs can have cycles.
The Next Question: "When I Have Multiple Unvisited Neighbors, Which Do I Visit First?"
You're at vertex A with unvisited neighbors B and C. Visit B first, or C first? This is where two contrasting strategies diverge.
The BFS Insight β "Before Going Far, Check Everything Nearby"
Reframe the Problem: "Where's the Nearest Coffee Shop?"
If you're in an unfamiliar city looking for the nearest convenience store, what would you do?
You wouldn't charge down a single street forever. You'd check all blocks immediately around you first, then expand to the next ring, then the next β spreading outward in concentric circles.
This is the fundamental insight behind BFS (Breadth-First Search):
"If I check nearby places first, then whenever I find something, I'm guaranteed it's the closest one."
The Design Core: Use a "Waiting Line"
How do we implement "process nearby vertices first"? This is the key design question.
Think through this scenario. You're at starting vertex A:
- Add A's neighbors B and C to a "to-do list"
- Process B. Add B's neighbors D, E to the "to-do list"
- Now, instead of processing D or E next, process C first
Why C before D and E? Because C was added to the "to-do list" before D and E. C is near the start (distance 1), while D and E are farther (distance 2). If we want to process nearby vertices first, we need first-in-first-out β a queue (FIFO).
This realization is the essence of BFS. "Process nearby first" β "earlier discovery = closer distance" β "first in, first out" β "use a queue."
Why a Queue Guarantees Shortest Paths
The FIFO property ensures that all vertices at distance are processed before any vertex at distance β this level-by-level ordering is guaranteed.
This is clear by induction:
- The source (distance 0) is processed first β
- When a distance- vertex is processed, its neighbors (distance ) are enqueued
- FIFO ensures all distance- vertices are processed before distance- vertices
- If vertex is first discovered at distance , a shorter path would mean was already discovered at an earlier level β contradiction
Formally, the invariant (where discovered ) always holds.
The Algorithm
Translating this reasoning into code:
from collections import deque
def bfs(graph, start):
visited = set() # Insight 1: don't revisit vertices
queue = deque([start]) # Insight 2: queue ensures "nearest first"
visited.add(start)
order = []
while queue:
vertex = queue.popleft() # First in, first out (FIFO)
order.append(vertex)
for neighbor in graph[vertex]:
if neighbor not in visited:
visited.add(neighbor) # Mark visited when enqueuing
queue.append(neighbor)
return orderImportant detail: visited.add(neighbor) happens at enqueue time, not dequeue time. Why? If we wait until dequeue, and both B and C are neighbors of D, then D gets enqueued twice β once when processing B and once when processing C. "Mark as discovered the moment you find it" prevents double-enqueuing.
Step Through It
Watch BFS in the visualizer below. Pay attention to:
- The queue always contains vertices in distance order
- All distance-1 vertices are processed before any distance-2 vertex
- This is the "ripple" pattern
Graph Traversal Visualizer β BFS vs DFS
Compare how BFS and DFS traverse the same graph. Press play or step manually.
Starting BFS (Breadth-First Search) from vertex A. Using a queue to explore vertices level by level β like ripples spreading outward from where a stone hits water.
BFS Complexity
- Time: β each vertex enters the queue once, each edge is examined once
- Space: β queue and visited set
The DFS Insight β "Go All the Way Down One Path First"
A Different Problem: "Is There an Exit?"
Now let's change the question. Not "what's nearest?" but "Does this maze have an exit? Can I explore every room?"
When stuck in a maze, humans instinctively:
- At a fork, pick one path and walk as far as possible
- Hit a dead end? Backtrack to the last fork
- Try a different path, again going as far as possible
- Repeat until every path is exhausted
We don't care about "shortest." We want to exhaustively explore everything β that's DFS (Depth-First Search).
The Design Core: Recursion as a Natural Structure
The elegance of DFS is that this strategy maps perfectly onto recursion.
When at vertex , "explore all of 's neighbors" means: for each neighbor , "explore all of 's neighbors." This is the same problem, recursively defined:
explore(v):
mark v as visited
for each neighbor u of v:
if u is unvisited, call explore(u)This recursive structure naturally produces "go deep, backtrack at dead ends." When explore(u) returns, control flows back to explore(v) β that's the backtracking.
Why a "Stack"? β The Essence of Recursion
If we implement DFS without recursion, we see why the data structure is a stack.
In BFS, we put "vertices to process later" into a queue. In DFS, we also store "vertices to process later," but we want to process the most recently added one first β the deepest one. "Last in, first out" β that's a stack (LIFO).
The difference between BFS and DFS, in code, is a single line:
BFS: queue.popleft() β first added (nearest) processed first
DFS: stack.pop() β last added (deepest) processed firstSwapping one data structure fundamentally changes the traversal behavior. That's the beauty of algorithm design.
The Algorithm (Recursive)
def dfs(graph, start, visited=None):
if visited is None:
visited = set()
visited.add(start)
print(start) # Process vertex
for neighbor in graph[start]:
if neighbor not in visited:
dfs(graph, neighbor, visited) # Recurse = go deeper
# Returning here = backtracking
return visitedThe Algorithm (Iterative with Stack)
When recursion depth is a concern:
def dfs_iterative(graph, start):
visited = set()
stack = [start]
order = []
while stack:
vertex = stack.pop() # Last in, first out (LIFO)
if vertex in visited:
continue
visited.add(vertex)
order.append(vertex)
for neighbor in reversed(graph[vertex]):
if neighbor not in visited:
stack.append(neighbor)
return orderStep Through It
Switch the visualizer to DFS mode above. Compare with BFS and notice:
- The call stack grows deeper and deeper in one direction
- Backtracking occurs at dead ends
- The visit order is completely different from BFS
- There's no "nearest first" guarantee
Discovery and Finish Times β DFS's Bonus Information
DFS's recursive structure yields a powerful byproduct that BFS lacks: discovery time and finish time for each vertex.
time = 0
def dfs_with_timestamps(graph, vertex, visited, discovery, finish):
global time
time += 1
discovery[vertex] = time # Discovery time
visited.add(vertex)
for neighbor in graph[vertex]:
if neighbor not in visited:
dfs_with_timestamps(graph, neighbor, visited, discovery, finish)
time += 1
finish[vertex] = time # Finish timeWhy is this useful? The reverse of finish times gives topological sort (dependency ordering). If there's an edge , then finishes before (we complete 's exploration before returning to ). Reversing puts before β the correct order when " depends on ."
Build systems use exactly this: if "libA depends on libB" and "libB depends on libC," the build order C β B β A is naturally derived from DFS finish times.
BFS and DFS Have the Same Complexity
- Time:
- Space:
This makes sense β both process each vertex once and each edge once. The difference is only in the processing order. And that order difference changes which problems each can solve.
Comparing BFS and DFS Side by Side
BFS vs DFS Comparison
| Aspect | BFS | DFS |
|---|---|---|
| Data Structure | Queue (FIFO) | Stack / Recursion (LIFO) |
| Traversal Order | Level by level (breadth first) | As deep as possible first |
| Shortest Path | β Guaranteed (unweighted) | β Not guaranteed |
| Memory | O(V) β can be large for wide graphs | O(V) β proportional to depth |
| Time Complexity | O(V + E) | O(V + E) |
| Typical Uses | Shortest path, level traversal, network analysis | Cycle detection, topological sort, maze generation |
| Intuition | Ripples on water π | Exploring a maze π |
Choosing Between Them β Read the Problem Structure
BFS when "distance" matters:
- Shortest path (unweighted graphs)
- All nodes within hops
- Bipartiteness testing (can we 2-color the graph?)
DFS when "structure" matters:
- Cycle detection (did we encounter a visited vertex during exploration?)
- Topological sort (reverse finish times)
- Connected/strongly connected components
- Backtracking (puzzles, constraint satisfaction)
At its core: "spread outward layer by layer" β BFS. "Follow one path to the end" β DFS.
Thought Experiment 2: Hitting BFS's Limitation
We now know BFS solves shortest paths in unweighted graphs.
But what about this problem:
"What's the fastest route from Tokyo to Osaka?"
Each road has a different travel time (weight). The expressway takes 4 hours, the mountain route takes 5, the bullet train takes 2.5β¦
If we apply BFS directly, what happens? BFS minimizes the number of edges (hops), not the total weight. A 2-edge route can be shorter than a 1-edge route (e.g., TokyoβNagoyaβOsaka = 4h vs. TokyoβOsaka direct = 5h).
BFS's limitation: When edge costs aren't uniform, it can't guarantee shortest distance.
This leads to the natural question: "Can we extend BFS to handle weights?"
The Dijkstra Insight β Adapting BFS for Weights
Step 1: Re-analyze What BFS Was Guaranteeing
Let's revisit why BFS works:
- Finalize distance-0 vertices β just the source
- Finalize distance-1 vertices β all source neighbors
- Finalize distance-2 vertices β neighbors of distance-1 vertices
- β¦
The queue's FIFO property guaranteed "finalize smallest distances first."
In weighted graphs, "being a neighbor" doesn't mean "being close." A direct edge (weight 5) can be longer than a 2-edge path (weight 2+1=3).
Step 2: "What If I Sort the Queue by Distance?"
A natural idea emerges: sort the queue by distance instead of insertion order. Replace the FIFO queue with a priority queue.
- BFS: queue (FIFO) β "discovered earlier" = fewer edges
- Dijkstra: priority queue β "smallest total distance" = least total weight
Replacing BFS's queue with a priority queue β this is Dijkstra's core insight.
Step 3: "The Minimum-Distance Vertex Is Safe to Finalize"
But why is this correct? Can we really finalize the vertex at the front of the priority queue?
This is Dijkstra's deepest insight:
The vertex at the front of the priority queue cannot be reached by any shorter path. Why? All unprocessed vertices have distance β₯ that of , and since edge weights are non-negative, routing through them cannot reduce the distance.
This logic only holds when edge weights are non-negative. If negative edges exist, a detour through a "farther" vertex could reduce the total distance, breaking the argument. (For negative edges, use the Bellman-Ford algorithm.)
Step 4: The "Relaxation" Operation
When we finalize vertex and examine a neighbor , we ask: "Is going through shorter than what we currently know for ? If so, update." This is called relaxation.
Why "relaxation"? We start with distance estimates of and gradually relax (reduce) them as we discover better paths. We're tightening an upper bound.
Putting the Reasoning Into Code
import heapq
def dijkstra(graph, start):
# graph: { node: [(neighbor, weight), ...] }
dist = {node: float('inf') for node in graph} # Start at infinity
prev = {node: None for node in graph}
dist[start] = 0 # Source is 0
pq = [(0, start)] # Priority queue (replaces BFS's queue)
while pq:
d, u = heapq.heappop(pq) # Extract minimum distance (replaces popleft)
if d > dist[u]: # Skip stale entries
continue
for v, weight in graph[u]:
new_dist = dist[u] + weight
if new_dist < dist[v]: # Relaxation: better path found
dist[v] = new_dist
prev[v] = u
heapq.heappush(pq, (new_dist, v))
return dist, prevCompare this with the BFS code. The structure is nearly identical. What changed:
- Queue β priority queue (extract by distance, not insertion order)
visitedβdisttable (track distances, not just visited/unvisited)- Relaxation check added (
new_dist < dist[v])
Step Through It
In the visualizer below, pay close attention to:
- Moments when a previously set distance gets improved through a different path (that's relaxation)
- The minimum-distance vertex is always the one selected and finalized
- The
prevcolumn ultimately forms the shortest path tree
Dijkstra's Algorithm Visualizer
Step through Dijkstra's algorithm on a weighted directed graph.
Start Dijkstra's algorithm from vertex A. Initialize distance to A as 0, all others as β. Insert A(dist=0) into the priority queue.
Dijkstra's Complexity
Depends on the data structure:
- Binary heap: β most common
- Fibonacci heap: β theoretically fastest but complex to implement
- Array (naive loop): β actually efficient for dense graphs
Tracing the Evolution of the Three Algorithms
Let's recap the reasoning journey:
Starting point: "I want to visit every vertex in a graph without missing any"
Insight 1 β visited: Cycles cause infinite loops, so we need a visited record
The fork β "In what order?":
- "Nearby places first" β FIFO queue β BFS
- "One path all the way to the end" β LIFO stack / recursion β DFS
Evolution β "What if edges have weights?":
- Replace BFS's queue with a priority queue β Dijkstra
- Key insight: "the minimum-distance vertex is safe to finalize" + relaxation
Each algorithm was born from the limitations of the previous one. Analyzing problem structure and identifying "what's missing" leads to the next algorithm's insight.
Applications: How the Insights Transfer
Graph Traversal Applications
GPS navigation, game AI pathfinding
BFS / Dijkstra / A*Deadlock detection, dependency cycle checking
DFSBuild systems, compilation order
DFSSocial network clusters, network partitioning
BFS / DFSNetwork design, minimum-cost connectivity
BFS / DFS + Kruskal/PrimMatching problems, scheduling
BFSTopological Sort β A Byproduct of DFS
When a build system has "libA depends on libB" and "libB depends on libC," determining build order is topological sort.
Reasoning from DFS: "The reverse of DFS finish times gives the correct order." If there's an edge , then finishes before in DFS. Reversing puts first β the correct dependency order.
def topological_sort(graph):
visited = set()
result = []
def dfs(node):
visited.add(node)
for neighbor in graph[node]:
if neighbor not in visited:
dfs(neighbor)
result.append(node) # Add on completion
for node in graph:
if node not in visited:
dfs(node)
return result[::-1] # Reverse = topological orderBipartiteness Testing β Exploiting BFS's Layer Structure
"Can we 2-color a graph so no adjacent vertices share a color?" Use BFS's level-by-level processing β alternate colors between even and odd levels:
def is_bipartite(graph):
color = {}
for start in graph:
if start in color:
continue
queue = deque([start])
color[start] = 0
while queue:
node = queue.popleft()
for neighbor in graph[node]:
if neighbor not in color:
color[neighbor] = 1 - color[node]
queue.append(neighbor)
elif color[neighbor] == color[node]:
return False # Same-color neighbors = not bipartite
return TrueComplexity Summary
Complexity Comparison
| Algorithm | Time | Space | Note |
|---|---|---|---|
| BFS | O(V + E) | O(V) | Processes each vertex and edge once |
| DFS | O(V + E) | O(V) | Recursion depth up to V |
| Dijkstra (binary heap) | O((V + E) log V) | O(V) | Heap operations cost log V |
| Dijkstra (Fibonacci heap) | O(V log V + E) | O(V) | Theoretically fastest but complex |
| Bellman-Ford | O(VE) | O(V) | Handles negative edges; V-1 relaxation rounds |
Conclusion β The Ability to Reinvent Algorithms
Here's the thinking process we traced throughout this article:
- Define the problem: What exactly are we trying to solve?
- Try the naive approach: Start with the simplest possible method and analyze what goes wrong
- Identify the bottleneck: What specific aspect of the naive approach fails? (Cycles β visited. Order β data structure choice)
- Solve with data structures: Queue vs. stack vs. priority queue β the right data structure implements the right strategy
- Verify correctness: Why does this work? Under what conditions does it fail? (Dijkstra and negative edges)
BFS, DFS, and Dijkstra are not magic that fell from the sky. They're logical consequences of reasoning: "process nearby first β queue," "go deep first β stack," "account for weights β priority queue."
Master this way of thinking, and when you encounter a problem not in any textbook, you'll be able to follow the same process to "invent" your own algorithm.