This repository contains the companion code of the paper "Engineering a Lightweight Suffix Array Construction Algorithm" by Giovanni Manzini and Paolo Ferragina, first published in the Proceedings of the European Symposium on Algorithms, 2002 and later, in extended format, in Algorithmica 40:33–50, 2004. The paper won the 2023 ESA Test of Time Award since it was (quoting the award committee): "a significant step towards the efficient solution of an importat practical problem".
The algorithm computes the Suffix Array, and possibly the Burrows-Wheeler Transform, of an input text of size at most
For an in-depth evaluation of the availble suffix array construction algorithms see the paper
SACABench: Benchmarking Suffix Array Construction and the companion repository.
As we stated above, the algorithm is no longer the state of the art, however the code should compile on any linux/unix system. To try it just clone/download and then make
. This should produce the following executable (without any no error or warining):
-
sa32
: computes the suffix array of an input file and store the result using 4 bytes per entry -
ds
: computes the suffix array, and optionally the BWT of an input file and store the result using$\lceil\log n\rceil$ bits per entry. Allows to change all the internal parameters of the algorithm -
bwt
: computes the suffix array and then the BWT saving only the latter. The BWT is stored using one byte x entry following by 4 bytes storing the EOS position -
unbwt
: takes as input a file produced bybwt
and recover the original input file
If for some reason you need to use this algorithm in your project look at the sa32.c
file which shows the simplest usage using the default parameters.
The repository contains also the implementations of the LCP array construction algorithms described in the paper "Two Space Saving Tricks for Linear Time LCP Array Computation" published in the Proceedings of the Scandinavian Workshop on Algorithm Theory, 2004 (files bwt_aux.c
and lcp_aux.c
). The executable testlcp
computes the suffix array using Deep-Shallow suffix sorting and then the LCP array using the algorithms described in the paper (called lcp9
, lcp13
, lcp9125
and lcp6
).