Graph Traversal Algorithms — How to Invent BFS, DFS & Dijkstra from Scratch

This article is not about memorizing algorithm steps from a textbook.

If you didn't know these algorithms existed, how would you think your way to inventing them when faced with the problem? That's what we'll explore — the reasoning process that leads to BFS, DFS, and Dijkstra.

Algorithms didn't fall from the sky. Someone faced a problem, analyzed its structure, and reasoned their way to a solution. If you understand that reasoning, you won't need to memorize anything — and you'll be equipped to invent new algorithms for novel problems.

Start With the Problem

Every algorithm begins with a problem definition. Before thinking about solutions, the most important step is getting crystal clear on what you're trying to solve.

The Fundamental Question: "Visit Every Place"

Imagine you're dropped into an unfamiliar city. You have no map, but you can see which roads lead away from your current location. How do you visit every location without missing any?

This is the root question behind graph traversal.

Let's abstract the city. Intersections become vertices, roads become edges. The collection of vertices and edges is a graph.

Graph G = (V, E)
  V: set of vertices {A, B, C, D}
  E: set of edges    {(A,B), (A,C), (B,C), (B,D), (C,D)}

A surprising number of real-world problems can be modeled as graphs:

Road networks: intersections are vertices, roads are edges
Social networks: users are vertices, friendships are edges
The Web: pages are vertices, hyperlinks are edges
Dependencies: packages are vertices, dependencies are edges

When edges have direction, we get a directed graph (one-way streets). When edges have costs, we get a weighted graph (distances, travel times). But the core structure is the same.

Graph Representation: Build the "Who Are My Neighbors?" Machinery

Before thinking about algorithms, we need to decide how to represent a graph in code. Here too, we reason backwards from what operations we need.

The most frequent operation in traversal is "list all neighbors of a given vertex." We choose a representation that makes this fast.

Graph Representation — Adjacency List vs Matrix

Adjacency List

A→

B→

ACD

C→

ABD

D→

Space: O(V + E)

Edge lookup: O(degree)

✅ Best for sparse graphs

Adjacency Matrix

	A	B	C	D
A	0	1	1	0
B	1	0	1	1
C	1	1	0	1
D	0	1	1	0

Space: O(V²)

Edge lookup: O(1)

✅ Best for dense graphs

Adjacency List — "Who are this person's friends?"

Store a list of neighbors for each vertex. Natural, and ideal for sparse graphs (few edges).

graph = {
    'A': ['B', 'C'],
    'B': ['A', 'C', 'D'],
    'C': ['A', 'B', 'D'],
    'D': ['B', 'C']
}

Space: $O(V + E)$ . Neighbor enumeration: $O(\text{degree})$ . This is the best choice for most traversal algorithms.

Adjacency Matrix — "Are these two people friends?"

Build a $V \times V$ table — set 1 where an edge exists. Edge lookup is $O(1)$ , but costs $O(V^2)$ space.

#     A  B  C  D
matrix = [
    [0, 1, 1, 0],  # A
    [1, 0, 1, 1],  # B
    [1, 1, 0, 1],  # C
    [0, 1, 1, 0],  # D
]

Thought Experiment 1: The Most Naive Traversal

Now we have a graph. How do we visit every vertex?

The First Wall: "I Keep Revisiting the Same Place"

The obvious approach is "if you can go to a neighbor, go." But unlike trees, graphs can have cycles. A→B→C→A→B→… — you loop forever.

First insight: "To avoid revisiting vertices, I need a visited record."

visited = set()  # Without this, we loop forever

This visited set is the most fundamental building block shared by every graph traversal algorithm. It seems obvious, but it's a direct consequence of analyzing the problem structure — graphs can have cycles.

The Next Question: "When I Have Multiple Unvisited Neighbors, Which Do I Visit First?"

You're at vertex A with unvisited neighbors B and C. Visit B first, or C first? This is where two contrasting strategies diverge.

The BFS Insight — "Before Going Far, Check Everything Nearby"

Reframe the Problem: "Where's the Nearest Coffee Shop?"

If you're in an unfamiliar city looking for the nearest convenience store, what would you do?

You wouldn't charge down a single street forever. You'd check all blocks immediately around you first, then expand to the next ring, then the next — spreading outward in concentric circles.

This is the fundamental insight behind BFS (Breadth-First Search):

"If I check nearby places first, then whenever I find something, I'm guaranteed it's the closest one."

The Design Core: Use a "Waiting Line"

How do we implement "process nearby vertices first"? This is the key design question.

Think through this scenario. You're at starting vertex A:

Add A's neighbors B and C to a "to-do list"
Process B. Add B's neighbors D, E to the "to-do list"
Now, instead of processing D or E next, process C first

Why C before D and E? Because C was added to the "to-do list" before D and E. C is near the start (distance 1), while D and E are farther (distance 2). If we want to process nearby vertices first, we need first-in-first-out — a queue (FIFO).

This realization is the essence of BFS. "Process nearby first" → "earlier discovery = closer distance" → "first in, first out" → "use a queue."

Why a Queue Guarantees Shortest Paths

The FIFO property ensures that all vertices at distance $d$ are processed before any vertex at distance $d+1$ — this level-by-level ordering is guaranteed.

This is clear by induction:

The source (distance 0) is processed first ✓
When a distance- $d$ vertex is processed, its neighbors (distance $d+1$ ) are enqueued
FIFO ensures all distance- $d$ vertices are processed before distance- $d+1$ vertices
If vertex $v$ is first discovered at distance $d$ , a shorter path would mean $v$ was already discovered at an earlier level — contradiction

Formally, the invariant $d(v) \leq d(u) + 1$ (where $u$ discovered $v$ ) always holds.

The Algorithm

Translating this reasoning into code:

from collections import deque
 
def bfs(graph, start):
    visited = set()        # Insight 1: don't revisit vertices
    queue = deque([start]) # Insight 2: queue ensures "nearest first"
    visited.add(start)
    order = []
 
    while queue:
        vertex = queue.popleft()  # First in, first out (FIFO)
        order.append(vertex)
 
        for neighbor in graph[vertex]:
            if neighbor not in visited:
                visited.add(neighbor)   # Mark visited when enqueuing
                queue.append(neighbor)
 
    return order

Important detail: visited.add(neighbor) happens at enqueue time, not dequeue time. Why? If we wait until dequeue, and both B and C are neighbors of D, then D gets enqueued twice — once when processing B and once when processing C. "Mark as discovered the moment you find it" prevents double-enqueuing.

Step Through It

Watch BFS in the visualizer below. Pay attention to:

The queue always contains vertices in distance order
All distance-1 vertices are processed before any distance-2 vertex
This is the "ripple" pattern

Graph Traversal Visualizer — BFS vs DFS

Compare how BFS and DFS traverse the same graph. Press play or step manually.

UnvisitedIn QueueCurrentVisited

Queue (FIFO)

(empty)

Visit Order

(none yet)

Explanation

Starting BFS (Breadth-First Search) from vertex A. Using a queue to explore vertices level by level — like ripples spreading outward from where a stone hits water.

Step 0/131 / 14

BFS Complexity

Time: $O(V + E)$ — each vertex enters the queue once, each edge is examined once
Space: $O(V)$ — queue and visited set

The DFS Insight — "Go All the Way Down One Path First"

A Different Problem: "Is There an Exit?"

Now let's change the question. Not "what's nearest?" but "Does this maze have an exit? Can I explore every room?"

When stuck in a maze, humans instinctively:

At a fork, pick one path and walk as far as possible
Hit a dead end? Backtrack to the last fork
Try a different path, again going as far as possible
Repeat until every path is exhausted

We don't care about "shortest." We want to exhaustively explore everything — that's DFS (Depth-First Search).

The Design Core: Recursion as a Natural Structure

The elegance of DFS is that this strategy maps perfectly onto recursion.

When at vertex $v$ , "explore all of $v$ 's neighbors" means: for each neighbor $u$ , "explore all of $u$ 's neighbors." This is the same problem, recursively defined:

explore(v):
  mark v as visited
  for each neighbor u of v:
    if u is unvisited, call explore(u)

This recursive structure naturally produces "go deep, backtrack at dead ends." When explore(u) returns, control flows back to explore(v) — that's the backtracking.

Why a "Stack"? — The Essence of Recursion

If we implement DFS without recursion, we see why the data structure is a stack.

In BFS, we put "vertices to process later" into a queue. In DFS, we also store "vertices to process later," but we want to process the most recently added one first — the deepest one. "Last in, first out" — that's a stack (LIFO).

The difference between BFS and DFS, in code, is a single line:

BFS: queue.popleft()  → first added (nearest) processed first
DFS: stack.pop()      → last added (deepest) processed first

Swapping one data structure fundamentally changes the traversal behavior. That's the beauty of algorithm design.

The Algorithm (Recursive)

def dfs(graph, start, visited=None):
    if visited is None:
        visited = set()
 
    visited.add(start)
    print(start)  # Process vertex
 
    for neighbor in graph[start]:
        if neighbor not in visited:
            dfs(graph, neighbor, visited)  # Recurse = go deeper
 
    # Returning here = backtracking
    return visited

The Algorithm (Iterative with Stack)

When recursion depth is a concern:

def dfs_iterative(graph, start):
    visited = set()
    stack = [start]
    order = []
 
    while stack:
        vertex = stack.pop()  # Last in, first out (LIFO)
        if vertex in visited:
            continue
        visited.add(vertex)
        order.append(vertex)
 
        for neighbor in reversed(graph[vertex]):
            if neighbor not in visited:
                stack.append(neighbor)
 
    return order

Step Through It

Switch the visualizer to DFS mode above. Compare with BFS and notice:

The call stack grows deeper and deeper in one direction
Backtracking occurs at dead ends
The visit order is completely different from BFS
There's no "nearest first" guarantee

Discovery and Finish Times — DFS's Bonus Information

DFS's recursive structure yields a powerful byproduct that BFS lacks: discovery time and finish time for each vertex.

time = 0
 
def dfs_with_timestamps(graph, vertex, visited, discovery, finish):
    global time
    time += 1
    discovery[vertex] = time  # Discovery time
    visited.add(vertex)
 
    for neighbor in graph[vertex]:
        if neighbor not in visited:
            dfs_with_timestamps(graph, neighbor, visited, discovery, finish)
 
    time += 1
    finish[vertex] = time  # Finish time

Why is this useful? The reverse of finish times gives topological sort (dependency ordering). If there's an edge $u \to v$ , then $v$ finishes before $u$ (we complete $v$ 's exploration before returning to $u$ ). Reversing puts $u$ before $v$ — the correct order when " $v$ depends on $u$ ."

Build systems use exactly this: if "libA depends on libB" and "libB depends on libC," the build order C → B → A is naturally derived from DFS finish times.

BFS and DFS Have the Same Complexity

Time: $O(V + E)$
Space: $O(V)$

This makes sense — both process each vertex once and each edge once. The difference is only in the processing order. And that order difference changes which problems each can solve.

Comparing BFS and DFS Side by Side

BFS vs DFS Comparison

Aspect	BFS	DFS
Data Structure	Queue (FIFO)	Stack / Recursion (LIFO)
Traversal Order	Level by level (breadth first)	As deep as possible first
Shortest Path	✅ Guaranteed (unweighted)	❌ Not guaranteed
Memory	O(V) — can be large for wide graphs	O(V) — proportional to depth
Time Complexity	O(V + E)	O(V + E)
Typical Uses	Shortest path, level traversal, network analysis	Cycle detection, topological sort, maze generation
Intuition	Ripples on water 🌊	Exploring a maze 🏃

Choosing Between Them — Read the Problem Structure

BFS when "distance" matters:

Shortest path (unweighted graphs)
All nodes within $k$ hops
Bipartiteness testing (can we 2-color the graph?)

DFS when "structure" matters:

Cycle detection (did we encounter a visited vertex during exploration?)
Topological sort (reverse finish times)
Connected/strongly connected components
Backtracking (puzzles, constraint satisfaction)

At its core: "spread outward layer by layer" → BFS. "Follow one path to the end" → DFS.

Thought Experiment 2: Hitting BFS's Limitation

We now know BFS solves shortest paths in unweighted graphs.

But what about this problem:

"What's the fastest route from Tokyo to Osaka?"

Each road has a different travel time (weight). The expressway takes 4 hours, the mountain route takes 5, the bullet train takes 2.5…

If we apply BFS directly, what happens? BFS minimizes the number of edges (hops), not the total weight. A 2-edge route can be shorter than a 1-edge route (e.g., Tokyo→Nagoya→Osaka = 4h vs. Tokyo→Osaka direct = 5h).

BFS's limitation: When edge costs aren't uniform, it can't guarantee shortest distance.

This leads to the natural question: "Can we extend BFS to handle weights?"

The Dijkstra Insight — Adapting BFS for Weights

Step 1: Re-analyze What BFS Was Guaranteeing

Let's revisit why BFS works:

Finalize distance-0 vertices → just the source
Finalize distance-1 vertices → all source neighbors
Finalize distance-2 vertices → neighbors of distance-1 vertices
…

The queue's FIFO property guaranteed "finalize smallest distances first."

In weighted graphs, "being a neighbor" doesn't mean "being close." A direct edge (weight 5) can be longer than a 2-edge path (weight 2+1=3).

Step 2: "What If I Sort the Queue by Distance?"

A natural idea emerges: sort the queue by distance instead of insertion order. Replace the FIFO queue with a priority queue.

BFS: queue (FIFO) → "discovered earlier" = fewer edges
Dijkstra: priority queue → "smallest total distance" = least total weight

Replacing BFS's queue with a priority queue — this is Dijkstra's core insight.

Step 3: "The Minimum-Distance Vertex Is Safe to Finalize"

But why is this correct? Can we really finalize the vertex $u$ at the front of the priority queue?

This is Dijkstra's deepest insight:

The vertex $u$ at the front of the priority queue cannot be reached by any shorter path. Why? All unprocessed vertices have distance ≥ that of $u$ , and since edge weights are non-negative, routing through them cannot reduce the distance.

This logic only holds when edge weights are non-negative. If negative edges exist, a detour through a "farther" vertex could reduce the total distance, breaking the argument. (For negative edges, use the Bellman-Ford algorithm.)

Step 4: The "Relaxation" Operation

When we finalize vertex $u$ and examine a neighbor $v$ , we ask: "Is going through $u$ shorter than what we currently know for $v$ ? If so, update." This is called relaxation.

Why "relaxation"? We start with distance estimates of $\infty$ and gradually relax (reduce) them as we discover better paths. We're tightening an upper bound.

Putting the Reasoning Into Code

import heapq
 
def dijkstra(graph, start):
    # graph: { node: [(neighbor, weight), ...] }
    dist = {node: float('inf') for node in graph}  # Start at infinity
    prev = {node: None for node in graph}
    dist[start] = 0                                 # Source is 0
    pq = [(0, start)]  # Priority queue (replaces BFS's queue)
 
    while pq:
        d, u = heapq.heappop(pq)  # Extract minimum distance (replaces popleft)
 
        if d > dist[u]:  # Skip stale entries
            continue
 
        for v, weight in graph[u]:
            new_dist = dist[u] + weight
            if new_dist < dist[v]:  # Relaxation: better path found
                dist[v] = new_dist
                prev[v] = u
                heapq.heappush(pq, (new_dist, v))
 
    return dist, prev

Compare this with the BFS code. The structure is nearly identical. What changed:

Queue → priority queue (extract by distance, not insertion order)
visited → dist table (track distances, not just visited/unvisited)
Relaxation check added (new_dist < dist[v])

Step Through It

In the visualizer below, pay close attention to:

Moments when a previously set distance gets improved through a different path (that's relaxation)
The minimum-distance vertex is always the one selected and finalized
The prev column ultimately forms the shortest path tree

Dijkstra's Algorithm Visualizer

Step through Dijkstra's algorithm on a weighted directed graph.

Distance Table

A0−

B∞−

C∞−

D∞−

E∞−

F∞−

Priority Queue

A(0)

← Extract minimum distance first

Explanation

Start Dijkstra's algorithm from vertex A. Initialize distance to A as 0, all others as ∞. Insert A(dist=0) into the priority queue.

Step 0/121 / 13

Dijkstra's Complexity

Depends on the data structure:

Binary heap: $O((V + E) \log V)$ — most common
Fibonacci heap: $O(V \log V + E)$ — theoretically fastest but complex to implement
Array (naive loop): $O(V^2)$ — actually efficient for dense graphs

Tracing the Evolution of the Three Algorithms

Let's recap the reasoning journey:

Starting point: "I want to visit every vertex in a graph without missing any"

Insight 1 — visited: Cycles cause infinite loops, so we need a visited record

The fork — "In what order?":

"Nearby places first" → FIFO queue → BFS
"One path all the way to the end" → LIFO stack / recursion → DFS

Evolution — "What if edges have weights?":

Replace BFS's queue with a priority queue → Dijkstra
Key insight: "the minimum-distance vertex is safe to finalize" + relaxation

Each algorithm was born from the limitations of the previous one. Analyzing problem structure and identifying "what's missing" leads to the next algorithm's insight.

Applications: How the Insights Transfer

Graph Traversal Applications

🗺️Shortest Path

GPS navigation, game AI pathfinding

BFS / Dijkstra / A*

🔄Cycle Detection

Deadlock detection, dependency cycle checking

DFS

📦Topological Sort

Build systems, compilation order

DFS

🌐Connected Components

Social network clusters, network partitioning

BFS / DFS

🌳Spanning Tree

Network design, minimum-cost connectivity

BFS / DFS + Kruskal/Prim

🧩Bipartiteness Testing

Matching problems, scheduling

BFS

Topological Sort — A Byproduct of DFS

When a build system has "libA depends on libB" and "libB depends on libC," determining build order is topological sort.

Reasoning from DFS: "The reverse of DFS finish times gives the correct order." If there's an edge $u \to v$ , then $v$ finishes before $u$ in DFS. Reversing puts $u$ first — the correct dependency order.

def topological_sort(graph):
    visited = set()
    result = []
 
    def dfs(node):
        visited.add(node)
        for neighbor in graph[node]:
            if neighbor not in visited:
                dfs(neighbor)
        result.append(node)  # Add on completion
 
    for node in graph:
        if node not in visited:
            dfs(node)
 
    return result[::-1]  # Reverse = topological order

Bipartiteness Testing — Exploiting BFS's Layer Structure

"Can we 2-color a graph so no adjacent vertices share a color?" Use BFS's level-by-level processing — alternate colors between even and odd levels:

def is_bipartite(graph):
    color = {}
    for start in graph:
        if start in color:
            continue
        queue = deque([start])
        color[start] = 0
        while queue:
            node = queue.popleft()
            for neighbor in graph[node]:
                if neighbor not in color:
                    color[neighbor] = 1 - color[node]
                    queue.append(neighbor)
                elif color[neighbor] == color[node]:
                    return False  # Same-color neighbors = not bipartite
    return True

Complexity Summary

Complexity Comparison

Algorithm	Time	Space	Note
BFS	O(V + E)	O(V)	Processes each vertex and edge once
DFS	O(V + E)	O(V)	Recursion depth up to V
Dijkstra (binary heap)	O((V + E) log V)	O(V)	Heap operations cost log V
Dijkstra (Fibonacci heap)	O(V log V + E)	O(V)	Theoretically fastest but complex
Bellman-Ford	O(VE)	O(V)	Handles negative edges; V-1 relaxation rounds

Conclusion — The Ability to Reinvent Algorithms

Here's the thinking process we traced throughout this article:

Define the problem: What exactly are we trying to solve?
Try the naive approach: Start with the simplest possible method and analyze what goes wrong
Identify the bottleneck: What specific aspect of the naive approach fails? (Cycles → visited. Order → data structure choice)
Solve with data structures: Queue vs. stack vs. priority queue — the right data structure implements the right strategy
Verify correctness: Why does this work? Under what conditions does it fail? (Dijkstra and negative edges)

BFS, DFS, and Dijkstra are not magic that fell from the sky. They're logical consequences of reasoning: "process nearby first → queue," "go deep first → stack," "account for weights → priority queue."

Master this way of thinking, and when you encounter a problem not in any textbook, you'll be able to follow the same process to "invent" your own algorithm.