Sudoku Code Golf

We run regular coding competitions at HackerRank. Recently we had a CodeSprint for the public and a Hacker Olympics for college students. The challenge was: To create a Sudoku solver that could print a solution to any randomly-generated 9×9 Sudoku problem. Instead of trying to create a solver with an optimal run-time, the goal was to create one with the shortest possible source code that could still solve the problem within the given amount of time. In short, it was another Code Golf Contest.

The top Hackathon hackers from each school won an all-expenses paid trip to San Francisco to hack with other students. The winners were asked if they could share their code and approach with others, and many graciously agreed. This post will discuss a couple of different approaches to the problem in different languages.

C++ : Constantin Dumitrascu
Constantin Dumitrascu lives in Romania, where the codesprint contest ran from 9pm to 9am.  Initially, he thought he would go to bed rather than solving the problem, but he found the challenge so interesting he ended up coding away the night. Constantin submitted his solution in at 6am and made it to the top 10 in the contest, even with C++ code. He chose to code in C++ since he was familiar with it and so he could make use of its macros. There were two components to solving the problem: First to find a working solution in regular (but short) code, and then to work on minifying it to make it much shorter.

Before discussing the algorithm, I’ll quickly review the rules of sudoku. The goal of the game is to fill in a grid with digits so that every column, row, and 3×3 sub-grid contains all 9 digits. This means that each digit has to be unique among those 3 areas, there can be no “conflict” between it and an identical digit in its “zones”.

The Code – Long Version
Constantin’s solution involved a brute-force/backtracking algorithm which fills in digits in each cell until it is able to complete the grid. For each cell, the algorithm checks each digit for conflicts with the relevant cells until a digit fits. If every digit has conflicts, the algorithm backtracks to a previous cell and tries a different digit there. This backtracking lets the code finish within the allotted time and ensures that once the grid is completed, the solution is valid. (See also Wikipedia for a similar solution.)

This is the initial code:

The Loops in C++ can appear somewhat messy, so you can just focus on the key part inside the loops. This is definition of each variable:

  • G[i][q] – The current cell being checked. The code goes through every cell in the sudoku array by increasing i and q.
  • d – A digit, which will go from 1-9 until it is able to fit in the grid.
  • q – A digit which goes from 1-9 to compare d with the relevant numbers in the rows and columns.

The code first compares the current cell with the relevant cells that could conflict with it. If it finds a digit that fits, it recursively continues. Otherwise, it marks the spot unfilled and returns to the previous cell (which called it) to try a different digit there.

The Short Version
The next step was to minify the code. It was Constantin’s first time doing code golf, but he managed to use a number of effective techniques. He did basic things like changing all identifiers to single characters and replacing many of the spaces with newlines, which didn’t count for the character count in this contest. The next step was to remove return statements where they weren’t needed, i.e in the main function. Since the code guarantees the Sudoku would be solvable, there was no need to check for unsolvable problems, so that code could be removed too.

The more complicated step was using macros to shorten the code. Macros in C act as find-and-replace statements that run before the code is compiled. The key to shortening the code is to find any repeated characters of code that justify writing a “#define” statement to replace them with. For example, for-loops can be shortened by writing:

#define F(i) for(int i=­1;++i<9;)
An additional macro can be used to shorten all similar for-loops, so one could end up with F(j) and F(d) as shortcuts too. To shorten it even further, notice that F(i) and F(j) appear together three times in the code. A new macro P can be defined to replace those repeated loops:

#define P F(i){F(j)
By using a few more macros, the entire code ends up being only 304 characters, excluding newlines. This is the shortened code:

Python – Andy Zhao
Andy started his solution in C++, but when he realized it was too verbose, he  switched to Python. Interestingly, his code wasn’t able to successfully complete the solutions in time until he changed his code to check the array in backwards order. I’m not sure why that happened.

His basic approach was similar to Constantin’s, but there were a couple of differences. One improvement was using a single array to store the entire Sudoku puzzle instead of a double array. This made it much simpler to iterate through the array without requiring double loops and indices. Even the comparison equation (to check for conflicts) ended up being simpler since one variable was enough to get a value from the array.

Below is the function which solves the problem, with a small improvement that Andy added after the contest. The function keeps a single array n to store all 81 values of the sudoku grid. It starts by passing 80 into the function s.

This code is noticeably shorter than the comparable C++ solution. This is partially because Python is more concise than C code, though this becomes less noticeable once all the code is minified. Python contains some useful features for concise code, such as the ‘any’ function, which can check if anything in an iterable set is true.  The range() function is also able to express an iteration in a more concise manner.

To further minify the code, Andy took a number of steps, similar to Constantin’s. In Python it is even easier to use “find-and-replace”, since you can just assign a substitution by using =. While there isn’t as much repetition in the Python code as in the C code, the code can still be shortened with a few substitutions:
r=range
t=r(9)
a=r(81)

After a few more minifying steps, Andy got the code down to under 300 characters. A shortened version of the code can be viewed on Ideone. After the deadline, Andy realized a ‘hack’ that could shorten the code much further. Since the site didn’t count a newline as a character, one could use the number of continuous newlines as a compressed code to represent any other character. While this would have been a bit of a cheap trick, it could have shortened the code enough to come in first place.

Python – Cyphase
The first-place overall winner goes by the name Cyphase. His winning submission was only 218 characters, including newlines:

Cyphase wrote a full post describing his solution, which you should read to get the full details. While Cyphase used similar backtracking search as the other submissions, his code was able to achieve a very short length due to a number of tricks he used. Instead of storing the Sudoku grid in an array, he used a string. Instead of using standard comparisons to check for conflicts, he used bit manipulation. For example, he used x/9^y/9 to compare cells which are in the same row of the Sudoku grid. (Note that he later doubled the numbers so that he could make the I/O code more compact.) After he had figured out all the comparisons, he was able to efficiently combine them using multiplication:

(x-y)%9*(x/9^y/9)*(x/27^y/27|x%9/3^y%9/3)

Since he’s using bit manipulation, any conflict will result in a 0 value, which will cause the whole product to return 0. Notice how much shorter bit comparison is than the more direct comparisons that Andy used:

(i/9==x/9 or i%9==x%9 or i/27==x/27 and i%9/3==x%9/3)

Once the function can check for conflicts, it can recursively call itself with the numbers that don’t conflict. It tries out different possible values of c by placing them in the string at the current spot with Python’s slice notation:

f(s[:x]+c+s[x+1:])

This solution uses manages to do perform a full backtracking sudoku search in only a few lines of code! While Python code is known for being readable, this code shows that it can sometimes be extremely concise instead.

Haskell – Aleksander Balicki
While the Python and C++ solutions used similar approaches, it is also worth looking at the functional way of doing things. This is the Aleksander’s Haskell way to solve Sudoku Code Golf:
This code can also be viewed on Ideone. Since I’m not so familiar with Haskell, I will quote Aleksander’s explanation of the code:

First the input is transformed from String into a [(Int, Int), Int] list, which holds the (x, y) coordinates and the number that is in that place. That’s what the “i” function does. Then we iterate over solutions in the form of (Int, Int) -> [Int], i.e. a function that maps the coordinates to the possible values in the solution, and the function “m” cuts all the elements that would generate a collision – same coordinate or the same subsquare. The main function prints the first solution, line by line.

There are many different ways to do well at Sudoku code Golf! While the C++ started off messy, it was able to use macros to shorten itself considerably. Andy’s Python solution expressed a similar algorithms a bit more concisely in Python, while Cyphase was able to express the whole solution in only a couple lines by using bit manipulation. Finally there was Aleksander’s solution, which used a functional approach.

If you’d like to write your own backtracking search algorithm, you can enter Level 4 on HackerRank by writing your own AI to play Reversi. Also, our next Hackathon will have more challenges and bigger prizes, so sign up now to participate!

Please share this article:

3 Replies to “Sudoku Code Golf”

  1. Pingback: شرا

Leave a Reply

Your email address will not be published. Required fields are marked *