MIB: A Mechanistic Interpretability Benchmark
Aaron Mueller,Atticus Geiger,Sarah Wiegreffe,Dana Arad, Iván Arcuschin, Adam Belfki, Yik Siu Chan, Jaden Fiotto-Kaufman, Tal Haklay,Michael Hanna, Jing Huang, Rohan Gupta, Yaniv Nikankin,Hadas Orgad,Nikhil Prakash, Anja Reusch, Aruna Sankaranarayanan, Shun Shao,Alessandro Stolfo,Martin Tutek,Amir Zur,David Bau,Yonatan Belinkov ICML 2025(2025)
AI 理解论文
溯源树
样例
