Yet Another Sudoku Puzzle Solver Using AWK

by Bill Duncan, VE3IED on January 13, 2010

Photo Courtesy: jimray

We have seen in a previous awk introduction article that awk can be an effective tool for everything from small one-liners up through some interesting applications. There are certainly more complex languages at our disposal if a situation calls for it; perl and python come to mind. Applications requiring networking support, database access, user interfaces, binary data or more extensive library support and complexity are best left to other languages with better support in these areas.

Nevertheless, awk is a superb language for testing algorithms and applications with some complexity, especially where the problem can be broken into chunks which can streamed as part of a pipe. It’s an ideal tool for augmenting the features of shell programming as it is ubiquitous; found in some form on almost all Unix/Linux/BSD systems. Many problems dealing with text, log lines or symbol tables are handily solved or at the very least prototyped with awk along with the other tools found on Unix/Linux systems.

While awk lends itself well to operating on input a line at a time, processing and then sending some output for each line, it may also be used in a more traditional batching-style application where it reads all the input, processes and then sends the processed output onward.

Yet Another Sudoku Puzzle Solver – YASPS for Awk

The application I chose to use as an example is “yet another sudoku puzzle solver”. I must confess at the outset that I have never sat down to solve one of these puzzles myself, but sketched this out over a few days while commuting on a train and watching other people work on them. It was far more fun I think than actually doing any of the puzzles..

Download YASPS Program for Awk: solve.awk

The input format I’ve chosen is one which is easy to parse in awk and is fairly traditional in the Unix environment. Blank lines and those starting with a hash (#) character are ignored making it easy to insert comments. Extra spaces may be used between columns for readability. An example is shown in the following figure:

# I forget where I got this..
# this is one of the hardest ones I've found for this algorithm, although
# after transposing the matrix it can be solved in a fraction of the time

2 0 0  6 7 0  0 0 0
0 0 6  0 0 0  2 0 1
4 0 0  0 0 0  8 0 0

5 0 0  0 0 9  3 0 0
0 3 0  0 0 0  0 5 0
0 0 2  8 0 0  0 0 7

0 0 1  0 0 0  0 0 4
7 0 8  0 0 0  6 0 0
0 0 0  0 5 3  0 0 8

There is almost no error checking in this program, but it would be easy to add in front or as part of a wrapper script. I’ll leave that as an exercise for the reader.

This program uses a very simple depth-first recursive backtracking algorithm with up-front and ongoing elimination of invalid entries. Awk may not have the expressive power for representing complex data that perl or other languages have, but with care, many moderate sized problems and data sets can be used. This algorithm may not be the best one around, but it is certainly fast enough for most problems and is easy to implement.

With any problem, representing the data effectively makes the task of designing a program much easier. I have represented the complete state of the puzzle in a matrix called “master”. This is hardly used for much except keeping a record of what is where and for doing the final output.

The main workhorse variables are three other arrays. Intuitively we know from the recursive trial and error method we’ve chosen that we will need to check the validity of trial numbers quite often. To accelerate that process we maintain associative arrays for storing the current state for the rows, columns and each region (which, although not technically correct, we’ll call a “quadrant”). These are the arrays R, C and Q. (Note that awk is case sensitive.)

Sometimes it helps when trying to factor out common computations from nested for-loops or recursive function calls to pre-compute values which are used often. I had tried this with the “regmap” matrix which would store a quadrant number given the row, column values but I found the time savings in this case to be ambiguous. I’ve left it commented out, but your mileage may vary and the technique is often very useful.

Recursive algorithms are often best designed and therefore described in a top-down manner. The top function in this program is called “search()” and is called from the “END” pattern after the problem data has been read into the arrays.

At a high level, search() starts with the supplied row and column parameters and looks for the next empty space to check. If there aren’t any, the problem has been solved and it returns with the solution. If there is an empty space (represented by zero), it tests available numbers (with the inuse() function, for numbers not in use in that row, column or quadrant) by inserting a number into the arrays using mark() and calling itself recursively. If the recursive search() function returns a zero it means that it has failed, so the mark() function is again called to un-mark the trial number. It then loops around until the possibilities are exhausted or the search() call returns success.

The beauty of many recursive algorithms is the inherent elegance and simplicity of the solutions. While they are sometimes not the fastest algorithms, they are often “fast enough” and easier to design. This program solves most puzzles in less than a few seconds. One thing I noticed while trying this program on different puzzles is that if a problem took a longer period of time to solve (in tens of seconds), simply transposing the matrix would often give the same solution in a fraction of a second. With today’s multi-core CPUs, this suggests one possibility for speeding it up: Write a wrapper script which starts several instances of the program with different transpositions of the matrix. An example is shown with the previous puzzle shown above and the transposed version in the following figure where the transposed problem was solved four times faster.

marion:~/sudoku$ time awk -f solve.awk THE-HARDEST-SO-FAR.dat 

# Searches=134214

  2 8 3  6 7 1  9 4 5
  9 7 6  5 4 8  2 3 1
  4 1 5  3 9 2  8 7 6

  5 6 7  4 1 9  3 8 2
  8 3 4  2 6 7  1 5 9
  1 9 2  8 3 5  4 6 7

  3 2 1  7 8 6  5 9 4
  7 5 8  9 2 4  6 1 3
  6 4 9  1 5 3  7 2 8

real    0m10.009s
user    0m9.889s
sys     0m0.024s

marion:~/sudoku$ time awk -f solve.awk /tmp/transposed.dat

# Searches=32253

  8 3 4  7 9 2  6 1 5
  2 1 9  6 5 8  7 3 4
  7 6 5  4 1 3  8 2 9

  3 4 6  5 7 9  2 8 1
  5 2 8  3 6 1  9 4 7
  1 9 7  8 2 4  3 5 6

  9 8 1  2 4 7  5 6 3
  4 5 2  9 3 6  1 7 8
  6 7 3  1 8 5  4 9 2

real    0m2.487s
user    0m2.456s
sys     0m0.008s

When something even faster is required, it can often be accomplished by translating the algorithm into another language with a more direct representation of the data sets. I did a translation of this program to C once with an interesting twist on the data indexing. This version probably executes a few orders of magnitude faster, largely due to the way that the data is represented. Probably we’ll release “Yet Another Sudoku Puzzle Solver Using C” as an another article later.

I believe that awk deserves a place in everyone’s toolkit. Its simplicity relative to other languages is perhaps seen as a weakness, but I see it as one of its strengths. The language can be learned in an afternoon and used without resorting to reference books for solving many day-to-day problems. I use it on a regular basis straight from the command line, right through to implementing things such as compilers for DSLs (Domain Specific Languages).

Recommended Awk Books

  • The AWK Programming Language by Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger. The original book by the authors of the language have some excellent examples of moderately complex programs and is still my favorite book on awk. Published by Addison-Wesley, 1988. ISBN 020107981X.
  • More Programming Pearls: Confessions of a Coder by Jon Bentley, AT&T Bell Labs, Murray Hill, NJ. Another book for using awk as a prototyping tool for algorithms may be found in More Pearls. Excellent reading.  Year of Publication: 1988. ISBN:0201118890

Linux Sysadmin Course Linux provides several powerful administrative tools and utilities which will help you to manage your systems effectively. If you don’t know what these tools are and how to use them, you could be spending lot of time trying to perform even the basic administrative tasks. The focus of this course is to help you understand system administration tools, which will help you to become an effective Linux system administrator.
Get the Linux Sysadmin Course Now!

If you enjoyed this article, you might also like..

  1. 50 Linux Sysadmin Tutorials
  2. 50 Most Frequently Used Linux Commands (With Examples)
  3. Top 25 Best Linux Performance Monitoring and Debugging Tools
  4. Mommy, I found it! – 15 Practical Linux Find Command Examples
  5. Linux 101 Hacks 2nd Edition eBook Linux 101 Hacks Book

Bash 101 Hacks Book Sed and Awk 101 Hacks Book Nagios Core 3 Book Vim 101 Hacks Book

{ 11 comments… read them below or add one }

1 Przemek January 13, 2010 at 3:36 am

Nice article!

2 Anonymous January 15, 2010 at 8:56 am

thanks for this – it sparked my interest in learning more about awk. very well-written. keep ‘em coming!

3 bill duncan January 16, 2010 at 11:42 am

Thanks for the feedback! Appreciate the comments..


4 Jim Lemke January 16, 2010 at 10:03 pm

Awk is great for reports but I haven’t tried computing any algorithms with it! Interesting article.

5 CRB January 17, 2010 at 7:32 am

Great article … it’s amazing what can be done with awk in short order. It’s refreshing to see an awk article that incorporates both the text manipulating power of awk and its ability to compute algorithms. Too many awk articles focus just on text file manipulation, nice job!

6 bill duncan January 17, 2010 at 4:45 pm

Thanks everyone for your comments!

I had put a teaser in the article suggesting a better/faster way to index the data in C. Anyone with some ideas on this?

7 Milos January 20, 2010 at 7:03 pm

Exchange line 100 of the original script from:
return 1
to this:
return 0
and change line 143 of the original script from:
to this
# dump()

and voila, you will get all solutions, not just the first one.
You can test it on this:

# multiple solutions
3 0 8 6 0 9 0 1 4
1 0 4 3 0 8 0 6 9
6 0 9 4 0 1 0 3 8

7 4 3 1 8 5 6 9 2
8 6 2 9 4 7 3 5 1
9 1 5 2 3 6 8 4 7

4 8 7 5 9 3 1 2 6
2 3 6 8 1 4 9 7 5
5 9 1 7 6 2 4 8 3

Have a nice evening

8 eMancu January 24, 2010 at 12:40 pm

I cant believe the power of Awk… I was refused to change my grep for awk, but now… I’ll try awk!

9 bill duncan January 24, 2010 at 2:23 pm

Thanks for all your comments. There are some more interesting awk applications coming, so stay tuned!

10 alfredo ricciotti February 6, 2010 at 6:27 am

Thank for the article, it is a nice way to spur awk interest.

11 bill duncan February 6, 2010 at 5:05 pm

Thanks Alfredo,
Mssion accomplished I guess; that’s what I was trying to do.. ;-)
Stay tuned, as there are some follow-up articles coming.

Leave a Comment

Previous post:

Next post: