Boggle is a classical word searching game in a 4 x 4 grid of letter dice. The winner is determined based on how many words the player has found and the length of those words.
We quite often play this game with my family and the inspiration for this project came from an interest in knowing all the possible words in a given board.
So I decided to write a program that automates this process. To make the program convenient to use, the input will be just an image of the board.
Subproblems to solve
There are two main tasks:
- Extract the boggle board letters from an image
- Find all the words the letters form
This post will be concentrating on the first task of extracting the letters and their locations from an image. Once the letters and their locations are known, the latter problem of finding all the words can be solved with a simple algorithm that searches through the board and matches strings against a dictionary. For this there exists multiple pre-made solutions in GitHub (see e.g. here).
Before we can start extracting the letters from an image, we need to be able to locate the board. I first thought of rendering boggle board images with a computer and then use artificial neural networks (ANN) to learn locating the board. However I quickly came into a conclusion that making the renderer realistic enough to generalize to real word images would take too much time. I decided to take a few photos of the board and use traditional image processing techniques for the location, and then ANN for only letter classification.
My training data set would be eight images of the game board with all the different die faces visible.
Locating the board
My first task is to locate the board. Once the board can be located, it is easy to extract the individual letter images from it due to the symmetry of the board. The last step is then to train a classifier to classify the letters on the die faces.
I decided to use traditional image processing techniques to this task. The benefit of traditional image processing techniques is that they apply to situations where there is little data available like in my case. They are also ofter quite efficient to run. The idea is to apply different image transformations to the original image that make it recognizable for the computer. The downside is that you need to manually build the algorithm (i.e. not that much machine learning).
Related to my task, I found a blog post describing how to locate Sudoku boards from an image. That relied on finding the straight lines of the sudoku board. That didn't however directly apply to my task as the boggle board is an actual 3d object object that stands out of the board surface. After some experimentation I arrived at the following approach:
- Assume an image of a Boggle board taken from above (possibly with some noise like other objects, e.g. a hand).
- Assume dice have a solid background (white in my case). Threshold the image to find all whitish blobs in it.
- Do outlier detection to determine what blobs are actually dice among the white blobs. Remove the others.
- Dilate the blobs until there is just one blob.
- This blob should now be basically the board. Find the minimal enclosing rectangle.
- Do a pespective transform to get the image of the board in upright position.
- Extract each die from the upright board image
Below is an example of the results after each of the above steps:
The outlier detection in step three worked surprisingly well. I added it after I noticed that the thresholding alone was too unreliable. The image had to basically contain just the Boggle board and not much else. With the outlier detection, however, even having a hand, a sheet of paper, or even reflection from the table didn't confuse the program. I used three features for the outlier detection, the size of the blob, the blob location and the blob's minimum bounding rectangle width in proportion to its height.
After extraction of the features the program locates 24 largest white blobs in the image, and assumes that 16 of them are dice (as there are 16 dice). The remaining eight blobs are "outliers" to be removed, e.g. a white sheet of paper next to the game board. Because the blobs corresponding to the 16 dice should have very similar blob feature values, the outlier detection algorithm should be able to label the other white blobs as outliers. The outlier detection algorithm I used was the
EllipticEnvelope from Python's scikit-learn.
Below is another illustration of the process for another image:
This process worked surprisingly well in practice. You can make it fail with bad lighting conditions because of the white thresholding but in normal circumstances it was fairly robust. For the actual implementation I used
NumPy Python libraries.
Generating a training dataset
As the location of the board is now known, the remaining task is to classify the letters in the dice. I decided to use artificial neural network (ANN) classifier for this task. I first tried to find ready made letter datasets, and found some, but they were for the English alphabet. That meant that I would anyway have to add the Ö, Ä letters to the dataset as well as the combined FG and BC letter dice from the finnish Boggle version.
Consequently, I decided to use six out of the eight of the previously taken images of the board to generate more training images by using random distortions. Adding distortions simulates the noise and variation present in real world images and makes the classification model robust against them, i.e. preventing overfitting to the original training data. The six game board images each had 16 dice in them that gave me an initial set of 6 * 16 = 96 images of a die containing all the different sides of the dice in the game.
First I labeled the die images by hand. To increase the size of my dataset, I took a blurred and sharpened version of the 6 original images to bring my dataset size to 96 * 3 = 288 labeled images of Boggle letters.
I decided to cast the dice images to black and white 64 x 64 pixel images for classification. In this case the colour information does not carry any essential information with respect to classifying the letters. By removing the colours I can substantially reduce the dimensionality of the data while retaining all the crucial information, which helps to make the classification robust as the model does not need to learn how to handle the colors. After this I made a function that would add random noise (salt and pepper), shifts, and rotations to the images
Below you can see some of the original 288 black and white letter images on the left hand side and the result of the random noising on the right hand side.
I would use these randomly noised images to train my classifier. Because the distribution of the letters was not uniform (e.g. there were many more A:s compared to e.g. Y:s) I sampled them with weights to have an uniform probability for a letter appearing in a random sample.
I was now ready to train the ANN with a sort of an unlimited supply of training letter images and hoped that the above steps would be enough for the classifier to generalize to unseen images given the tiny initial set of training images.
Training the ANN
I used Keras with the TensorFlow backend to implement the ANN model. I decided to start with the basic one hidden layer setup with relu activation and a categorical loss. Because the training data are rather heavily preprocessed black and white images of letters, I anticipated that this ANN structure would already yield a decent performance.
Below is the Keras code of the ANN model with a hidden layer size of 192.
model = keras.models.Sequential([ Dense(192, input_shape=(64*64,), activation='relu'), Dense(22), Activation('softmax') ]) opt = keras.optimizers.SGD(lr=0.01) model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
I used random batches of 1000 noised letters in training the classifier. Using the original 288 letters training set (without random noising) as a validation set I reached over 80% accuracy after just some tens of thousands of iterations. After a few hundred thousands more I reached 100% accuracy, i.e. my model at least had nicely (over)fitted to the original training dataset. This of course is not yet any guarantee of it's generalization to unsees images but shows that the training process is working.
I then took a look at the ANN hidden layer weights which seemed to had formed rather reasonable looking shapes for detecting letters in different 90 degree rotations. The image on the right shows the first 20 out of the 192 hidden layer node weight maps arranged such that there are four maps on each row.
I then tried the classifier on a batch of 1000 randomly noised images. The accuracy was roughly 98%, i.e. around 20 misclassified letters per batch. For the boggle solver to be practical, the accuracy would have to be higher than that. That is because if even one letter is misclassified on the board, it will cause the program to find wrong words from the board. It would be frustrating to have to repeatedly retake images because of few letters being misclassified.
I decided to let it train 32M letters overnight (using CPU only) and see if the accuracy would improve. On the next morning the performance was mostly 100% percent for batches of 1000 randomly noised letter images. Since the perfomance could not really be meaningfully improved, I decided not to develop the model further with e.g. convolutional layers.
Examples of missclassifications
On rare occasions a single letter would was missclassified in the batches of 1000 noised images. I wasn't too worried of it because the random shifts and noising can sometimes generate extreme outcomes. To be sure I nevertheless checked what was going on. Below are a couple of examples of misclassified randomly noised images.
The first one was classified as "S" while it should have been an "M" and the second one was classified as "A" but should have been an "Ä". You can't blame the ANN from the second one though: the random shift pushed the dots in "Ä" outside of the image. The first one is also rather unprobable in practice, since the letter is so close to the edge.
Testing the solver
I had left out two images from the training set to serve as a test set. So I had a total of 2 * 16 = 32 images of letters for the test set -- not too many but that would have to do.
Evaluating the model with the 32 unseen letter images gave a 100% accuracy with a very small loss, meaning that the classifications were seen as rather certain by the model. This was encouraging, but due to the minuscule size of the test set, it would really need to be tested in practice.
When testing the whole process from image processing to the letter classification the first time while playing the game, it turned out that the lighting condition in the original 8 images was too alike (they were taken in a sunny day) and the thresholding in the image processing phase failed quite often. This meant that the program was not able to locate the board nor do the black and white transform reliably with unseen images. I had to tune the thresholding process in the image processing part a bit to make it more robust. After a few iterations and using adaptive thresholds I got it working well with images in different lighting conditions.
On the other hand, the ANN classifier did not undergo any changes and performed surprisingly well with the black and white images right from the beginning. The problem setup for the classifier was simple enough and the letter noising sufficient for allowing the classifier to generalize to unseen letter images. We have since successfully used and enjoyed it in our Boggle games!