Document Spanners for Extracting Incomplete Information: Expressiveness and Complexity

July 04, 2017 ยท Declared Dead ยท ๐Ÿ› ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems

๐Ÿ‘ป CAUSE OF DEATH: Ghosted
No code link whatsoever

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Francisco Maturana, Cristian Riveros, Domagoj Vrgoฤ arXiv ID 1707.00827 Category cs.DB: Databases Citations 44 Venue ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems Last Checked 3 months ago
Abstract
Rule-based information extraction has lately received a fair amount of attention from the database community, with several languages appearing in the last few years. Although information extraction systems are intended to deal with semistructured data, all language proposals introduced so far are designed to output relations, thus making them incapable of handling incomplete information. To remedy the situation, we propose to extend information extraction languages with the ability to use mappings, thus allowing us to work with documents which have missing or optional parts. Using this approach, we simplify the semantics of regex formulas and extraction rules, two previously defined methods for extracting information, extend them with the ability to handle incomplete data, and study how they compare in terms of expressive power. We also study computational properties of these languages, focusing on the query enumeration problem, as well as satisfiability and containment.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

๐Ÿ“œ Similar Papers

In the same crypt โ€” Databases

R.I.P. ๐Ÿ‘ป Ghosted

Datasheets for Datasets

Timnit Gebru, Jamie Morgenstern, ... (+5 more)

cs.DB ๐Ÿ› CACM ๐Ÿ“š 2.6K cites 8 years ago

Died the same way โ€” ๐Ÿ‘ป Ghosted