Martin Handford can spend weeks creating a single *Where’s Waldo* puzzle hiding a tiny red and white striped character wearing Lennon glasses and a bobble hat among an ocean of cartoon figures that are immersed in amusing activities. Finding Waldo is the puzzle’s objective, so hiding him well, perhaps, is even more challenging. Martin once said, “As I work my way through a picture, I add Wally when I come to what I feel is a good place to hide him.” Aware of this, Ben Blatt from *Slate* magazine wondered if it’s possible “to master *Where’s Waldo* by mapping Handford’s patterns?” Ben devised a simple trick to speed up a Waldo search. In a sense, it’s the same observation that allowed Jon McLoone to write an algorithm that can beat a human in a Rock-Paper-Scissors game. As Jon puts it, “we can rely on the fact that humans are not very good at being random.”

Readers with a sharp eye can see that the set of points in the two images below is the same, so the images are just different visualizations of the same data. The data shows the 68 spots where Waldo hides in the special edition of seven classic Waldo books. Big thanks go to Ben Blatt, who sat for three hours in a Barnes & Noble bookstore with a measuring tape painstakingly constructing this fabulous dataset. Ben’s study had a follow up by Randal Olson from Michigan State University, who suggested a different approach based on the density of Waldo spots and a shortest path algorithm. In this post, I will go over both of these approaches and suggest alternatives—these are the final results:

#### Getting the Data

I start with getting coordinates for Waldo’s hideouts from all seven classic puzzles. Ben Blatt indicated where they are on a single image. After importing the data, you will see a schematic of a dual-page of an open puzzle book with all hideouts marked with red spots. Each spot has text inside labeling the book and the page of the puzzle it belongs to. The text is not important to us and, moreover, needs to be removed in order to obtain the coordinates.

With Wolfram Language image processing tools, it’s easy to find the coordinates of the red spots. Some spots overlap, so we need robust segmentation to account for each spot separately. There are different approaches, and the code below is just one of them. I’ve arranged it so you can see the effects, step-by-step, of applications of image processing functions. Binarize is used to reduce complex RGB image data to a simple matrix of 0s and 1s (binary image). Then ColorNegate followed by FillingTransform heal the small cracks inside the spots due to text labels. Erosion with DiskMatrix diminish and separate overlapping spots. Finally, SelectComponents picks only the spots, and filters out the other white pixels by specifying the rough number of pixels in a spot.

Getting spots’ centers in the image coordinate system is done with ComponentMeasurements. We also verify that we have exactly 68 coordinate pairs:

The precision of our routine is illustrated by placing blue circles at the coordinates we found—they all encompass the hideouts pretty accurately, even the overlapping ones:

Note a sizable textbox in the top-left corner, which is always present in the original puzzles too, deforming the data a bit. I will transition from the image coordinate system to the actual book one. A single Waldo puzzle takes a double-page spread measuring 20 by 12.5 inches:

#### Ben Blatt’s Approach

Quoting Ben: “53 percent of the time Waldo is hiding within one of two 1.5-inch tall bands, one starting three inches from the bottom of the page and another one starting seven inches from the bottom, stretching across the spread…. The probability of any two 1.5-inch bands containing at least 50 percent of all Waldo’s is remarkably slim, less than 0.3 percent. In other words, these findings aren’t a coincidence.” This is a keen observation because it is not obvious from just looking at the scattered dataset. I can illustrate by defining a geometric region equal to a single 1.5-inch strip (note the handy new ImplicitRegion function):

To see which bands contain the most Waldos, I scan the double-page spread vertically in small steps, counting the percentage of points contained in the band. The percentage is shown in the left plot while the corresponding band position is shown in the right one. Clearly there are sharp peaks at 3 and 7 inches, which would amount to almost 50% if two identical bands were placed at the maxima.

Widening the bands or adding more of them would raise the percentage, but also add more area to cover during the search, so it would increase the search time too. The total percentage from two bands at the maxima amounts to about 47%, while Ben claimed 53%, which I attribute to errors in quite approximate data acquisition:

#### Musing on Randy Olson’s Approach

Randy had a deeper statistical and algorithmic take on the Waldo data. First he visualized the density of points with a smooth function. The higher the density, the greater the probability is of finding Waldo in the surrounding area. The density plot has a quite irregular pattern, very different from Ben’s two simple bands. The irregular pattern is more precise as a description, but harder to remember as a strategy. Then, Randy “decided to approach this problem as a traveling salesman problem: We need to check every possible location that Waldo could be at while taking as little time as possible. That means we need to cover as much ground as possible without any backtracking.”

I can get a similar plot of the density points right away with SmoothDensityHistogram. To that, I’ll add two more things. The first is a shortest tour through all the points (i.e., a traveling salesman problem solution) found with FindShortestTour:

It lists the order of consecutive locations to visit so the length of the tour is minimal (the first number is the tour’s length). This “tour” is a closed loop and is different from Randy’s open “path” from A to B. If we could remember the tour exactly, then we could always follow it and find Waldo quickly. But it’s too complex, so to better comprehend its shape, we can smooth it out with a spline curve:

If you compare the visualization above with the original plain scattered plot of points, you may get the feeling that there are more points in the scattered plot. This is an optical illusion, which, I suspect, is related to the different dimensionality of these two visualizations. Points along a curve are perceived as a one-dimensional object covering much less area than a scatter plot of points perceived as two-dimensional and covering almost all the area.

Such visualizations help to give a first impression of the data, the overall distribution, and patterns:

- Background density is a visual guide to remembering where finding Waldo is more probable.
- The line through the points is a guide to the optimal trajectory when you scan the puzzle with your eyes.

But these two must be related somehow, right? And indeed, if we make the probability density background “more precise” or “local and sharp,” it will sort of wrap around our tour (see image below). SmoothDensityHistogram is based on SmoothKernelDistribution, which is pretty easy to grasp. It is based on the formula

where kernel *k*(*x*) can be a function of our choice and *h* defines effective “sharpness” or “locality” of the resulting density function. You see below that by making our distribution sharper by decreasing *h*, we localize the density more around the shortest tour. I have also chosen a different kernel “Cosine” function and changed several minor visual settings (for the tour, spline, and density) to show flexible options for tuning the overall appearance to anyone’s liking.

By getting the spline from the tour and widening it, we could get closer to the pattern of the density (the regions of thick-enough spline would overlap where the density is higher). And vice versa, by localizing the density kernel, we can get the density to envelope the spline and the tour. It is quite marvelous to see one mathematical structure mimic another!

Well, while nice looking, this image is not terribly useful unless you have a good visual memory. Let’s think of potential improvements. Currently I have a closed loop for the shortest tour, and this means going once from the left to right page and then returning back. Could I find a one-way shortest path that traverses once from left to right? This is what Randy Olson did with a genetic algorithm, and what I will do with a modified FindShortestTour that *artificially sets the distance between the start and end points to zero*. Thus, FindShortestTour is forced to draw a link between the start and end and then traverse all the rest of the points. Finally, the fake zero-length start-end link is removed, and I get a disconnected start-to-end point path.

I will also get rid of the single outlier point in the top-left corner and sort points in order of increasing horizontal coordinate. The outlier does not influence much, and even when it’s the solution, it can be easily checked because the area above the textbox is small.

Next I will find a shortest path from the bottom-left to bottom-right corners. And again, I will choose different visual options to provide an alternative appearance.

Randy used a similar but different path to devise a less-detailed schematic trajectory and instructions for eye-scanning the puzzle.

#### The Grand Unification

When you have a dual-page book spread, isn’t it natural to think in terms of pages? People habitually examine pages separately. Looking at the very first density plot I made, one can see that there’s a big density maximum on the left and another one stretched toward the right. Perhaps it’s worth looking at the same analysis done for each page independently. Let’s split our points into two sets belonging to different pages:

We walk through the same steps, slightly changing visual details:

And indeed, a simple pattern emerges. Can you see the following two characters on the left and right pages correspondingly: a slanted capital L or math angle “∠” and Cyrillic “И”?

I would suggest the following strategy:

- Avoid the very top and very bottom of both pages.
- Avoid the top-left corner of the left page.
- Start from the left page and walk a capital slanted “L” with your eyes, starting with its bottom and ending up crossing diagonally from the bottom left to top right.
- Continue on the right page and walk the capital Cyrillic “И”, starting at the top left and ending at the bottom right via the И-zigzag.
- If Waldo is not found, then visit areas in steps 1) and 2).

These two approaches of density and shortest path make a full turn and lead me back to the original simpler band idea. Those who prefer something simpler might notice that in the density plot above, there are two maxima on each page along the diagonals mirroring each other across the vertical page split. There are quite a few points along the diagonals, and we could attempt to build two bands of the following geometry:

I compare this region to the original *Slate* design using RegionMeasure. We see that my pattern is 6% larger in area, but it also covers 7.4% more of Waldo’s positions than Blatt’s region, and it provides a simpler and shorter one-way scanning of the puzzle instead of walking from left to right twice along two bands.

So, concerning the strategy, imagine two squares inscribed in each page and walk their diagonals, scanning bands about 2.8 inches thick. This strategy not only tells you where to look for Waldo first, it tells you where to look if you failed to find him: along the other two diagonals. This can be seen from our split-page density plot above and my final visualization below showing diagonal bands covering all four maxima:

#### For the Die-Hard Shortest-Path Buffs

While FindShortestTour can work with large datasets, the way we adapted FindShortestTour (which by default returns a “loop” path) to a shortest disconnected path between two given points is a bit dodgy. The traveling salesman problem is NP-complete, thus, it is possible that the worst-case running time for any algorithm solving it grows superpolynomially (even exponentially) with the number of points. This is why FindShortestTour uses shortcuts to find an approximate solution. For a sufficiently large number of points, it’s possible for FindShortestTour to not actually give the shortest tour. Because the algorithm takes a shortcut, it’s not even guaranteed to find the zero-distance edge that I artificially constructed. Hence, the algorithm could return an approximate shortest path not passing though the zero-distance edge, and thus not going from the given start to the given end points. Our dataset is small, so it worked fine. But what to do in general for a larger set of points? We could force FindShortestTour to always work properly by considering all Waldo positions as vertices of a Graph.

In general, FindShortestTour on a set of points in *n*-dimensional space essentially finds the shortest tour in a CompleteGraph. Now if I want such a path with the additional constraint of start and end points, I can augment the graph with an extra vertex that joins the given start and end points in question. The distance to the extra vertex does not really matter. *FindShortestTour will be forced to pass through the section start-extra-end points.* This works with any connected graph, not necessarily a complete graph, as shown below:

After finding the tour on the augmented graph, I remove the extra vertex and its edges, and so the shortest tour disconnects and goes from the start to end points.

I will build a symmetric WeightedAdjacencyMatrix by computing the EuclideanDistance between every two points. Then I’ll simply build a WeightedAdjacencyGraph based on the matrix and add an extra vertex with fake unit distances. Before I start, here is a side note to show how patterns of data can appear even in such an abstract object as WeightedAdjacencyMatrix. Old and new (sorted) orders of points are visually drastically different, while the matrix in both cases has the same properties and would generate the same graph.

On the left, points are unsorted, and on the right, points are sorted according to the shortest tour. The brighter colors are pairs of most remote points. Now let’s define the whole function that will return and visualize a shortest tour based on a network approach:

Applying this to the Waldo dataset, we get a beautiful short path visual through a complete graph:

By the way, the latest Wolfram Language release has the state-of-the-art FindShortestTour algorithm. To illustrate, I’ll remind you of a famous art form when a drawing is made of single line with no interruption or a sculpture from a single wire piece. The oldest examples I know are simultaneously the largest ones—mysterious Nazca Lines from Peru that could measure hundreds of feet in span. Here is, for instance, an aerial image of a 310-foot-long hummingbird (Nazca Lines cannot be perceived from the ground—they are that large). Adjacent to it is a one-line drawing by Pablo Picasso.

Picasso’s art is more complex, but it’s pretty easy to guess that humans will have a harder and harder time drawing a subject as the line gets longer and more intricate. But as it turns out, this is exactly where FindShortestTour excels. Here is a portrait computed by FindShortestTour spanning a single line through more than 30,000 points to redraw the *Mona Lisa*. The idea is quite simple: an image is reduced to only two colors with ColorQuantize, and then FindShortestTour draws a single line through the coordinates of only dark pixels.

So why isn’t it simply a chaos of lines rather than a recognizable image? Because in a shortest tour, connecting closer neighbors is often less costly. Thus, in the *Mona Lisa* one-liner, there are no segments spanning across large white spaces.

#### Concluding Remarks

My gratitude goes to Martin Handford, Ben Blatt, and Randy Olson, who inspired me to explore the wonderful Waldo patterns, and also to many creative people at Wolfram Research who helped me along the way. Both Ben and Randy claimed to verify the efficiency of their strategies by actually solving the puzzles and measuring the time. I do not have the books, but hope to do this in future. I also believe we would need many people for consistent benchmarks, especially accounting for different ways people perceive visually and work through riddles. And most of all, I am certain we should not take this seriously.

We did attest that humans are bad at being random. But that is not a bad thing. One who is good at exhibiting patterns is likely apt at detecting them. I would speculate further and say that perhaps a sense of deviation from randomness is at the root of our intelligence and perception of beauty.

Download this post as a Wolfram notebook.