Pyramid Dynamic Inference: Encouraging Faster Inference Via Early Exit Boosting

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览8
暂无评分
摘要
Transformer-based models demonstrate state of the art results on several natural language understanding tasks. However, their deployment comes at the cost of increased footprint and inference latency, limiting their adoption to real-time applications. Early exit strategies are designed to speed-up the inference by routing out a subset of samples at the earlier layers of the model. Exiting early causes losing model accuracy. In order to optimize the trade-off between model accuracy and latency, we propose Pyramid Dynamic Inference (PDI), a scheme that encourages fast inference via boosting the performance of early exit heads. PDI allows for more confident early inference by injecting stronger classifiers at earlier layers. It also prevents a significant increase in the model footprint by gradually shrinking the classifiers as the semantic capacity of the deeper transformer layers increase. Experiment results show that PDI outperforms the baselines on both accuracy and latency on the GLUE benchmark.
更多
查看译文
关键词
confident early inference,deeper transformer layers increase,earlier layers,early causes,early exit heads,encouraging faster Inference,exit strategies,fast inference,increased footprint,model accuracy,model footprint,natural language understanding tasks,PDI,Pyramid Dynamic Inference,transformer-based models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要