R.I.P.
๐ป
Ghosted
Designing an Evaluation Framework for Large Language Models in Astronomy Research
May 30, 2024 ยท Entered Twilight ยท ๐ arXiv.org
Repo contents: .gitignore, LICENSE, README.md, pyproject.toml, requirements.txt, src
Authors
John F. Wu, Alina Hyk, Kiera McCormick, Christine Ye, Simone Astarita, Elina Baral, Jo Ciuca, Jesse Cranney, Anjalie Field, Kartheik Iyer, Philipp Koehn, Jenn Kotler, Sandor Kruk, Michelle Ntampaka, Charles O'Neill, Joshua E. G. Peek, Sanjib Sharma, Mikaeel Yunus
arXiv ID
2405.20389
Category
astro-ph.IM
Cross-listed
cs.AI,
cs.HC,
cs.IR
Citations
3
Venue
arXiv.org
Repository
https://github.com/jsalt2024-evaluating-llms-for-astronomy/astro-arxiv-bot
โญ 2
Last Checked
1 month ago
Abstract
Large Language Models (LLMs) are shifting how scientific research is done. It is imperative to understand how researchers interact with these models and how scientific sub-communities like astronomy might benefit from them. However, there is currently no standard for evaluating the use of LLMs in astronomy. Therefore, we present the experimental design for an evaluation study on how astronomy researchers interact with LLMs. We deploy a Slack chatbot that can answer queries from users via Retrieval-Augmented Generation (RAG); these responses are grounded in astronomy papers from arXiv. We record and anonymize user questions and chatbot answers, user upvotes and downvotes to LLM responses, user feedback to the LLM, and retrieved documents and similarity scores with the query. Our data collection method will enable future dynamic evaluations of LLM tools for astronomy.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ astro-ph.IM
R.I.P.
๐ป
Ghosted
Deep Neural Networks to Enable Real-time Multimessenger Astrophysics
๐
๐
Old Age
Star-galaxy Classification Using Deep Convolutional Neural Networks
R.I.P.
๐ป
Ghosted
CosmoGAN: creating high-fidelity weak lensing convergence maps using Generative Adversarial Networks
R.I.P.
๐ป
Ghosted
Non-negative Matrix Factorization: Robust Extraction of Extended Structures
R.I.P.
๐
404 Not Found