R.I.P.
๐ป
Ghosted
Are Abstracts Enough for Hypothesis Generation?
April 13, 2018 ยท Declared Dead ยท ๐ 2018 IEEE International Conference on Big Data (Big Data)
Authors
Justin Sybrandt, Angelo Carrabba, Alexander Herzog, Ilya Safro
arXiv ID
1804.05942
Category
cs.IR: Information Retrieval
Cross-listed
cs.DL
Citations
25
Venue
2018 IEEE International Conference on Big Data (Big Data)
Repository
https://github.com/jsybran/moliere
Last Checked
1 month ago
Abstract
The potential for automatic hypothesis generation (HG) systems to improve research productivity keeps pace with the growing set of publicly available scientific information. But as data becomes easier to acquire, we must understand the effect different textual data sources have on our resulting hypotheses. Are abstracts enough for HG, or does it need full-text papers? How many papers does an HG system need to make valuable predictions? How sensitive is a general-purpose HG system to hyperparameter values or input quality? What effect does corpus size and document length have on HG results? To answer these questions we train multiple versions of knowledge network-based HG system, Moliere, on varying corpora in order to compare challenges and trade offs in terms of result quality and computational requirements. Moliere generalizes main principles of similar knowledge network-based HG systems and reinforces them with topic modeling components. The corpora include the abstract and full-text versions of PubMed Central, as well as iterative halves of MEDLINE, which allows us to compare the effect document length and count has on the results. We find that, quantitatively, corpora with a higher median document length result in marginally higher quality results, yet require substantially longer to process. However, qualitatively, full-length papers introduce a significant number of intruder terms to the resulting topics, which decreases human interpretability. Additionally, we find that the effect of document length is greater than that of document count, even if both sets contain only paper abstracts. Reproducibility: Our code and data are available at github.com/jsybran/moliere, and bit.ly/2GxghpM respectively.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Information Retrieval
R.I.P.
๐ป
Ghosted
LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation
R.I.P.
๐ป
Ghosted
Graph Convolutional Neural Networks for Web-Scale Recommender Systems
๐
๐
Old Age
Neural Graph Collaborative Filtering
R.I.P.
๐ป
Ghosted
Self-Attentive Sequential Recommendation
R.I.P.
๐ป
Ghosted
DeepFM: A Factorization-Machine based Neural Network for CTR Prediction
Died the same way โ ๐ 404 Not Found
R.I.P.
๐
404 Not Found
Deep High-Resolution Representation Learning for Visual Recognition
R.I.P.
๐
404 Not Found
HuggingFace's Transformers: State-of-the-art Natural Language Processing
R.I.P.
๐
404 Not Found
CCNet: Criss-Cross Attention for Semantic Segmentation
R.I.P.
๐
404 Not Found