Skip to content

Linear Bounded Automata enumeration program

Pietro Ribeiro Pepe edited this page Jun 26, 2024 · 7 revisions

The enumeration program was written in C++.

Dependencies

It uses libtbb and C++17 for_each for parallelization.

To install libtbb:

sudo apt install libtbb-dev

The program can be compiled with g++:

g++ -O3 -std=c++17 -o [EXECUTABLE_FILE] [PATH_TO_CPP] -ltbb

Usage

The program contains a series of define directives. The reason is that we optimized the simulation procedured using BITSET, that require to have its size known at compilation time. So if you desire to change the number of states, or tape size, you will need to adjust in the source file, and recompile it. These directives are:

  • N_STATES: number of states for LBA
  • OUTPUT_CAP: maximum tape size to consider (this means only strings with up to this size will be produced)
  • STATE_BITS: amount of bits required to codify N_STATES (should BE ceiling of log base 2 of N_STATES)
  • POS_BITS: amount of bits required to codify OUTPUT_CAP (should BE ceiling of log base 2 of OUTPUT_CAP)

Once it was compiled, as "LBA_ENUM", for instance, it can be executed with:

./LBA_ENUM

Root transitions will be enumerated and paralellized across CPU cores. Each parallel call will generate a file with its partial results. The program creates a dist directory in the same path where the executable resides, and the partial results files will be within.

You can also run in sharded mode, simply by providing an integer argument:

./LBA_ENUM [SHARD_NUMBER]

For more information on the root transitions and sharding, see the sections below

On enumeration, parallelization, and sharding

Each pair of (state, symbol) will output a transition (state, symbol, direction), (where direction is direction to move tape head). We recursively go through each of these pairs and enumerate their possible transitions.

If we have $n$ states and $2$ symbols, each pair will have $B(n)$ choices: $$B(n) = 4n+2$$ This also means $B(n)$ is the branching factor of the enumeration tree (See ref)

Parallelization

We list the possible transitions for (n, 1), and send each of them to a CPU core, where it does the recursion starting from (n, 0) down to (0, 0). Each of these initial transition calls will output a file with the information of strings produced, in the dist folder.

Sharding

Sharding here means "breaking the whole enumeration in shards: sub-enumeration".

Instead of parallelizing over all $B(n)$ possible initial transitions, we go a further depth: select one of these transitions, and paralellize over all $B(n)$ possible second transitions. Naturally the program require as argument the index of the shard, that should be in $[0,B(n)-1]$.

This allow for smaller subtasks that we can run separately and even distribute among different computers.

For instance, consider that the $5$ states the whole computation would take days. Since $B(5) = 22$, we have 22 shards, and can compute each of them separately, in a different program call, and they would take a few hours ($\frac{1}{22}$ of the time).

Each shard outputs a collection of files (one for each transition), they will be in directory dist/shard[i]/

This sharding can be extend further so we can call two levels depth (so $22^2$ shards and then each one would take $\frac{1}{22^2}$ of the time). This would require some adjustments and can be relevant when attempting to compute more states.

More technical/conceptual information

To deeper understand the foundation of the method, refer to this document

Clone this wiki locally