Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
arxiv(2024)
Abstract
Researchers have been studying approaches to steer the behavior of Large
Language Models (LLMs) and build personalized LLMs tailored for various
applications. While fine-tuning seems to be a direct solution, it requires
substantial computational resources and may significantly affect the utility of
the original LLM. Recent endeavors have introduced more lightweight strategies,
focusing on extracting "steering vectors" to guide the model's output toward
desired behaviors by adjusting activations within specific layers of the LLM's
transformer architecture. However, such steering vectors are directly extracted
from the activations of human preference data and thus often lead to suboptimal
results and occasional failures, especially in alignment-related scenarios.
This work proposes an innovative approach that could produce more effective
steering vectors through bi-directional preference optimization. Our method is
designed to allow steering vectors to directly influence the generation
probability of contrastive human preference data pairs, thereby offering a more
precise representation of the target behavior. By carefully adjusting the
direction and magnitude of the steering vector, we enabled personalized control
over the desired behavior across a spectrum of intensities. Extensive
experimentation across various open-ended generation tasks, particularly
focusing on steering AI personas, has validated the efficacy of our approach.
Moreover, we comprehensively investigate critical alignment-concerning
scenarios, such as managing truthfulness, mitigating hallucination, and
addressing jailbreaking attacks. Remarkably, our method can still demonstrate
outstanding steering effectiveness across these scenarios. Furthermore, we
showcase the transferability of our steering vectors across different
models/LoRAs and highlight the synergistic benefits of applying multiple
vectors simultaneously.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined