BackdoorAlign: Mitigating Fine-tuning Based Jailbreak Attack with Backdoor Enhanced Safety Alignment
NeurIPS 2024(2024)
关键词
Fine-tuning based Jailbreak Attack,Backdoor Attack,Safety Alignment for Large Language Models
AI 理解论文
溯源树
样例

生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要