SparkGOR: A unified framework for genomic data analysis

August 31, 2020 Β· Declared Dead Β· πŸ› arXiv.org

πŸ’€ CAUSE OF DEATH: 404 Not Found
Code link is broken/dead
Authors Sigmar K. StefΓ‘nsson, HΓ‘kon GuΓ°bjartsson arXiv ID 2009.00061 Category cs.DB: Databases Citations 1 Venue arXiv.org Repository https://github.com/gorpipe Last Checked 2 months ago
Abstract
Motivation: Our goal was to combine the capabilities of Spark and GOR into a single computing framework for use in analysis of large scale genome data. Results: We have created a relational query engine that unites SparkSQL and GORpipe into a single declarative query framework. This has been achieved by allowing embedding of SQL expressions into the high-level relational statement syntax in GOR and by supporting virtual relations and nested GORpipe expressions within SQL. Furthermore, we have built drivers to enable Spark and GOR to use and leverage their preferred file formats, Parquet and GORZ respectively, and introduced APIs to allow the use of GOR with Spark dataframes. Availability: The SparkGOR version of the GORpipe software is open-source and freely available at https://gorpipe-website.now.sh and https://github.com/gorpipe.
Community shame:
Not yet rated
Community Contributions

Found the code? Know the venue? Think something is wrong? Let us know!

πŸ“œ Similar Papers

In the same crypt β€” Databases

R.I.P. πŸ‘» Ghosted

Datasheets for Datasets

Timnit Gebru, Jamie Morgenstern, ... (+5 more)

cs.DB πŸ› CACM πŸ“š 2.6K cites 8 years ago

Died the same way β€” πŸ’€ 404 Not Found