R.I.P.
๐ป
Ghosted
Biological Reasoning-Informed Regression for Interpretable Regulatory DNA Activity Prediction
June 06, 2026 ยท Grace Period ยท ๐ KDD 2026 AI4Sciences Track
Authors
Yi Duan, Zhao Yang, Jiwei Zhu, Ying Ba, Chuan Cao, Bing Su
arXiv ID
2606.08147
Category
q-bio.GN
Cross-listed
cs.LG
Citations
0
Venue
KDD 2026 AI4Sciences Track
Abstract
DNA cis-regulatory elements (CREs) such as enhancers control gene expression levels. Accurately predicting regulatory activity from DNA sequences is valuable but challenging, as it requires understanding complex biological regulatory processes. Existing methods typically regress activity scores from sequences in a black-box manner, limiting both interpretability and regression performance. Meanwhile, large language models (LLMs) benefit from explicit reasoning processes, yet directly applying LLMs to raw DNA sequences performs poorly. In this paper, we bridge this gap by introducing R3LM, a framework that teaches LLMs reasoning-informed regression on regulatory DNA through structured biological knowledge. Specifically, we design a biologically grounded data format that structures DNA's regulatory information for improved LLM understanding, and construct CRE-ReasonBench, the first dataset that associates DNA sequences and activity scores with mechanistic reasoning traces. Through two-stage training that first teaches LLMs reasoning over structured biological information then performs regression, R3LM achieves state-of-the-art performance on enhancer prediction across three cell types, outperforming both LLMs with raw sequence input and specialized DNA models while providing interpretable mechanistic explanations. We expect R3LM as an interpretable reward model that can effectively assist biologists in CRE design. Code is available at https://github.com/DuanYi516/R3LM.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ q-bio.GN
R.I.P.
๐ป
Ghosted
Accurate Genomic Prediction Of Human Height
R.I.P.
๐ป
Ghosted
Synergistic Drug Combination Prediction by Integrating Multi-omics Data in Deep Learning Models
๐
๐
Old Age
GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping
R.I.P.
๐ป
Ghosted
Tasks, Techniques, and Tools for Genomic Data Visualization
๐
๐
Old Age