๐
๐
Old Age
A Comprehensive Evaluation of Large Language Models on Legal Judgment Prediction
October 18, 2023 ยท Entered Twilight ยท ๐ Conference on Empirical Methods in Natural Language Processing
Repo contents: .gitignore, compress.sh, config, download_data.sh, lm_eval, main.py, readme.md, requirements.txt, resources, run.sh, scripts, tests
Authors
Ruihao Shui, Yixin Cao, Xiang Wang, Tat-Seng Chua
arXiv ID
2310.11761
Category
cs.CL: Computation & Language
Citations
48
Venue
Conference on Empirical Methods in Natural Language Processing
Repository
https://github.com/srhthu/LM-CompEval-Legal
โญ 12
Last Checked
1 month ago
Abstract
Large language models (LLMs) have demonstrated great potential for domain-specific applications, such as the law domain. However, recent disputes over GPT-4's law evaluation raise questions concerning their performance in real-world legal tasks. To systematically investigate their competency in the law, we design practical baseline solutions based on LLMs and test on the task of legal judgment prediction. In our solutions, LLMs can work alone to answer open questions or coordinate with an information retrieval (IR) system to learn from similar cases or solve simplified multi-choice questions. We show that similar cases and multi-choice options, namely label candidates, included in prompts can help LLMs recall domain knowledge that is critical for expertise legal reasoning. We additionally present an intriguing paradox wherein an IR system surpasses the performance of LLM+IR due to limited gains acquired by weaker LLMs from powerful IR systems. In such cases, the role of LLMs becomes redundant. Our evaluation pipeline can be easily extended into other tasks to facilitate evaluations in other domains. Code is available at https://github.com/srhthu/LM-CompEval-Legal
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted