Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
arxiv(2024)
Abstract
Large language model systems face important security risks from maliciously
crafted messages that aim to overwrite the system's original instructions or
leak private data. To study this problem, we organized a capture-the-flag
competition at IEEE SaTML 2024, where the flag is a secret string in the LLM
system prompt. The competition was organized in two phases. In the first phase,
teams developed defenses to prevent the model from leaking the secret. During
the second phase, teams were challenged to extract the secrets hidden for
defenses proposed by the other teams. This report summarizes the main insights
from the competition. Notably, we found that all defenses were bypassed at
least once, highlighting the difficulty of designing a successful defense and
the necessity for additional research to protect LLM systems. To foster future
research in this direction, we compiled a dataset with over 137k multi-turn
attack chats and open-sourced the platform.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined