I just spent half a day trying to wrap my brain around how to detect cycles within linked lists, as well as how to find the first node of the cycle. After perusing all the Stack Overflow and Wikipedia entries, I had a pretty good mechanical understanding of the most elegant solution, which is known as Floyd’s Cycle-Finding Algorithm, or more informally, the tortoise and the hare algorithm.
The problem with my mechanical understanding was that it’s a classic case of surface-level learning, the kind of sloppy, rote-memorization cramming that somehow powered me through UCI to a 3.2 GPA but left me utterly uninterested and disengaged from my craft. I’d perform well if an interviewer asked me that question tomorrow, but I’d feel like a charlatan if I only parroted an answer back at him. Just because I flipped through some old tome and found an incantation that allowed me to cast a black magic wizard spell doesn’t make me Harry Potter.
I wanted an intuitive understanding of the solution. If I have that, I will carry it for the rest of my life.
In the name of that pursuit, I sat down and turned that problem over and over through my head until I found an elegant way to visualize and understand the reason behind the math. Since it’s an approach I wish I’d thought of six hours ago, I want to share it.
First off, let’s assume everyone knows what a linked list is. I’m talking about your standard, garden-variety singly linked list: it has a head pointer but no tail pointer, made up of nodes that contain a pointer to the next node in the list (except for the last node, which points to null).
A linked list with a cycle in it has one or more nodes that point to a previous node in the list. If you traverse the node starting from the head, you’ll get caught in the cycle loop.
Did you ever play Super Mario Bros. 1? On the Level 8 and 9 dungeons, if you didn’t go the right path, you would cycle backwards until you did. It’s sort of like that.
The Floyd Cycle Detection Algorithm works by using two pointers, one slow and one fast. We’ll call them the tortoise and the hare, respectively. They start at the first node.
The hare travels 2 nodes per move, and the tortoise travels 1 node per move.
And so on. If the hare pointer meets the tortoise pointer, you’ve got yourself a cycle:
To visualize why this works, let’s go back to our Mario analogy. Imagine you could play a co-op two-player mode of those cycle-containing dungeons. If Luigi always moved slower than Mario, and Mario keeps hitting the cycle part of the dungeon, Mario will at some point pass Luigi.
But why do we choose to go 2 nodes per move? Couldn’t you go 3 nodes per move? Well, yes, you could, but it’s not important how fast the hare goes. The hare could go light speed for all we care, but it can only pass the tortoise once the tortoise reaches the cycle.
There are some interesting properties about this algorithm:
- The hare is always twice as many steps ahead of the tortoise (if an interviewer asks you how to find the middle of a linked list, have the hare run off the end of your list and the tortoise will be at exactly the middle)
- The hare can never “skip” over the tortoise without them first sharing the same node. It seems like it might because it moves in chunks of two, but remember that for each step, the hare only gains one node on the tortoise.
Now that we can detect cycles, the next step is figuring out where the cycle starts and how to fix the cycle. This is where it gets tricky.
All of the Stack Overflow answers talk about the tortoise being Xi, the hare being X2i, and lambda (the length of the cycle) and mu (the number of steps to get to the cycle).
Let’s toss that all aside and agree upon this: the hare has always taken twice as many steps as the tortoise, right? Right.
So the tortoise and the hare meet at a particular node for the first time. The tortoise has entered the cycle but hasn’t done a full loop. The hare, depending on the size of the list and the cycle, may have done the loop only once, or it could have done the loop a million times. However, no matter how many times the poor hare has been spun through the cycle, we can guarantee that it has traveled twice as many steps as the tortoise has.
Again, that’s very important. It means that at that meeting point node, if the tortoise has taken i steps, and if the tortoise takes another i steps (i + i = 2i), it will end up at that same node. If it’s a big loop, it’ll only have to loop once, but if it’s a tight loop, it might go through it a lot of times, just like the hare had to do. But no matter what, we can guarantee that it’ll end up on the meeting node after i steps.
So here’s the heart and soul of Floyd’s Cycle Detection Algorithm: if we cloned the tortoise and placed one at the head of the list and one at the meeting point node, if we had them take i steps, we would guarantee that they would reach the meeting point node at the same time. Why? Because the cloned tortoise starts at position i and takes i steps, which makes a total distance of 2i.
But think about it: they wouldn’t just meet at the meeting point node, they would meet for all of the nodes at the beginning of the cycle until they reached the meeting point node. In fact, it means that the very first node that they meet is the very first node of the cycle.
Once we know the first node of the cycle, fixing the cycle is trivial: plop down a pointer to this first node, and use another pointer to traverse the rest of the cycle until you reach the node who points at the first node.
You had me until this bit:
“But think about it: they wouldn’t just meet at the meeting point node, they would meet for all of the nodes at the beginning of the cycle until they reached the meeting point node.”
Can you expand on why that is the case? It makes sense that they would meet at the meeting point node because the cloned tortoise goes
i
steps and the original tortoise goes2i
steps, thus replicating what the original tortoise and the hare did, respectively. But why does it then follow that the cloned tortoise and the original tortoise would meet at the start of the cycle?One intuitive reason that comes to mind: the original tortoise and the cloned tortoise move at the same speed, so if they _didn’t_ meet at the beginning of the cycle, then they would _never_ meet. So by proving that they _do_ meet at some point, then we implicitly show that they must meet at the start of the cycle, whereupon they move in lockstep until the
2i
meeting point eventually arrives.We know that if you start at the beginning of the list and take
i
steps that you will end up at the meeting point, and we know that if you start at the meeting point and goi
steps that you will end up at the meeting point again. Another way to state that last part is that if you start at the beginning and go2i
steps that you will end up at the meeting point.Now, imagine you take two tortoises and start them from the meeting point. Visualize them both taking one step backwards at a time, except one goes to the beginning of the list and the other one goes through the cycle. Notice that their paths diverge at the beginning of the cycle, and notice that if you keep stepping backwards until one reaches the beginning of the list, the other one ends up at the meeting point.
The breakthrough I had was being able to visualize this, and that’s when it made sense intuitively.