Assessing LLMs for High Stakes Applications.

Shannon K. Gallagher, Jasmine Ratchford, Tyler Brooks, Bryan P. Brown, Eric Heim,William R. Nichols, Scott Mcmillan, Swati Rallapalli,Carol J. Smith,Nathan M. VanHoudnos, Nick Winski, Andrew O. Mellinger

International Conference on Software Engineering(2024)

Cited 0|Views0
No score
Abstract
Large Language Models (LLMs) promise strategic benefit for numerous application domains. The current state-of-the-art in LLMs, however, lacks the trust, security, and reliability which prohibits their use in high stakes applications. To address this, our work investigated the challenges of developing, deploying, and assessing LLMs within a specific high stakes application, intelligence reporting workflows. We identified the following challenges that need to be addressed before LLMs can be used in high stakes applications: (1) challenges with unverified data and data leakage, (2) challenges with fine tuning and inference at scale, and (3) challenges in re-producibility and assessment of LLMs. We argue that researchers should prioritize test and assessment metrics, as better metrics will lead to insight to further improve these LLMs.
More
Translated text
Key words
Large language models,TEVV,metrics,scaling,HCI,trust
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined