Linear-size CDAWG: new repetition-aware indexing and grammar compression

May 27, 2017 Β· Declared Dead Β· πŸ› SPIRE

πŸ‘» CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Takuya Takagi, Keisuke Goto, Yuta Fujishige, Shunsuke Inenaga, Hiroki Arimura arXiv ID 1705.09779 Category cs.DS: Data Structures & Algorithms Citations 23 Venue SPIRE Last Checked 3 months ago
Abstract
In this paper, we propose a novel approach to combine \emph{compact directed acyclic word graphs} (CDAWGs) and grammar-based compression. This leads us to an efficient self-index, called Linear-size CDAWGs (L-CDAWGs), which can be represented with $O(\tilde e_T \log n)$ bits of space allowing for $O(\log n)$-time random and $O(1)$-time sequential accesses to edge labels, and $O(m \log Οƒ+ occ)$-time pattern matching. Here, $\tilde e_T$ is the number of all extensions of maximal repeats in $T$, $n$ and $m$ are respectively the lengths of the text $T$ and a given pattern, $Οƒ$ is the alphabet size, and $occ$ is the number of occurrences of the pattern in $T$. The repetitiveness measure $\tilde e_T$ is known to be much smaller than the text length $n$ for highly repetitive text. For constant alphabets, our L-CDAWGs achieve $O(m + occ)$ pattern matching time with $O(e_T^r \log n)$ bits of space, which improves the pattern matching time of Belazzougui et al.'s run-length BWT-CDAWGs by a factor of $\log \log n$, with the same space complexity. Here, $e_T^r$ is the number of right extensions of maximal repeats in $T$. As a byproduct, our result gives a way of constructing an SLP of size $O(\tilde e_T)$ for a given text $T$ in $O(n + \tilde e_T \log Οƒ)$ time.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Data Structures & Algorithms

Died the same way β€” πŸ‘» Ghosted