Self-tuning Intel Restricted Transactional Memory

Parallel Computing(2015)

引用 14|浏览48
暂无评分
摘要
Transactional Memory is now supported in hardware in Intel processors (called RTM).The performance of RTM varies greatly depending on the workload and run-time usage.We provide a solution by relying on lightweight reinforcement learning techniques.We integrate our solution in GCC compiler, achieving transparency to programmers.We obtain quasi-optimal performance whereas static approaches are up to 10 × worse. The Transactional Memory (TM) paradigm aims at simplifying the development of concurrent applications by means of the familiar abstraction of atomic transaction. After a decade of intense research, hardware implementations of TM have recently entered the domain of mainstream computing thanks to Intel's decision to integrate TM support, codenamed RTM (Reduced Transactional Memory), in their last generation of processors.In this work we shed light on a relevant issue with great impact on the performance of Intel's RTM: the correct tuning of the logic that regulates how to cope with failed hardware transactions. We show that the optimal tuning of this policy is strongly workload dependent, and that the relative difference in performance among the various possible configurations can be remarkable (up to 10 × slow-downs).We address this issue by introducing a simple and effective approach that aims to identify the optimal RTM configuration at run-time via lightweight reinforcement learning techniques. The proposed technique requires no off-line sampling of the application, and can be applied to optimize both the cases in which a single global lock or a software TM implementation is used as fall-back synchronization mechanism.We propose and evaluate different designs for the proposed self-tuning mechanisms, which we integrated with GCC in order to achieve full transparency for the programmers. Our experimental study, based on standard TM benchmarks, demonstrates average gains of 60% over any static approach while remaining within 5% from the performance of manually identified optimal configurations.
更多
查看译文
关键词
Hardware Transactional Memory,Performance,Self-tuning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要