Problem 6.3 Optimize the Sudoku Solver

The basic recursive Sudoku solver checks available numbers for each empty cell by iterating over all values from \( 1 \) to \( 9 \) and validating them against the row, column, and 3×3 sub-grid. This process can be optimized using the FastSet class to store and manipulate allowed numbers efficiently.

How to use FastSet?

Instead of looping over all possible numbers for a given cell, we can maintain three arrays of FastSet objects: row[9], col[9], and box[9], where each array contains 9 FastSet instances. Here, row[i] tracks numbers used in row i, col[j] tracks numbers used in column j, and box[b] tracks numbers used in the 3×3 sub-grid indexed by b. Each FastSet efficiently stores which numbers (1-9) are already placed in its corresponding region.

At first glance, this seems like a lot of extra storage, but we are using only 243 bits, which means 31 bytes, or in other words, fewer than 8 integers!

Operations

Instead of iterating through numbers 1 to 9, searching for a valid one by performing separate validity checks, we can do this quickly using FastSet.

Finding available numbers quickly in a single operation

To determine the valid numbers for an empty cell at \( (i, j) \), we compute: \[ \text{allowed} = \sim(\text{row}[i] \cup \text{col}[j] \cup \text{box}[b]) \] using bitwise operations. This represents all numbers that are still available for placement.

Extracting the smallest of the available numbers in a single operation

Instead of iterating through numbers 1 to 9, we can extract the smallest available number efficiently using a bitwise operation, such as counting the position of the lowest set bit. To do this most efficiently, see how the built-in function __builtin_ctz() works. (Note: __builtin_ctz() is a compiler built-in function which runs in O(1) time because it is mapped directly to a single CPU instruction on most modern processors.)

Placement and removal

When placing a number, update the corresponding FastSet objects using insert(). When backtracking, remove the number using remove().

Overall benefits

Using FastSet allows us to:

Reduce unnecessary iterations over digits 1-9 and comparisons across rows, columns, and grids.
Improve cache efficiency with compact storage.
Optimize code by using bitwise operators, which avoid branching (unlike comparison operators), preventing CPU stalls and minimizing pipeline flushes at the assembly level.

This results in a significantly faster recursive solver without changing the fundamental backtracking logic.