Integrated Programmable-Array accelerator to design heterogeneous ultra-low power manycore architectures

semanticscholar(2022)

引用 0|浏览3
暂无评分
摘要
Context Today's technological advances allow us to produce very complex multi-core architectures containing hundreds of processors. However, programming infrastructure does not evolve at the pace demanded by technological advances and market pressure. It is still necessary to find new techniques, new architectures and new tools to help designers to efficiently implement complex applications on sophisticated platforms and make use of the underlying hardware. Moreover, in order to combine the ever increasing performance requirements with an extremely tight energy budget, systems are moving towards heterogeneous architectures as the main design paradigm. In this context, designers use hardware accelerators. The objectives of this work are: 1) to explore heterogeneous many-cores architectures integrating reconfigurable hardware accelerators [5]; 2) develop associated programming models to address the growing complexity of application development. The proposed approach will allow programmers to easily deploy applications on dynamically reconfigurable heterogeneous manycores architectures. In this context, an OpenMP-based programming model and a many-core architecture model integrating a CGRA (Coarse Grained Reconfigurable Array) [1] [2] [3] [4] [5] will be defined. An automated design flow and HW / SW module for dynamic reconfiguration of the CGRA will be provided. The results will be validated on a virtual platform and a hardware prototype, using signal processing and image processing applications. The main block of the targeted many-core architectures is a multi-core cluster containing strongly coupled shared memories. Notable examples are the STHORM architecture of ST, Kalray MPPA, Plurality HAL, Adapteva Epiphany, or GPUs like Fermi from Nvidia. This type of cluster allows to combine short latency communication and high bandwidth between a certain number of cores (typically 16). Replicating clusters and interconnecting them hierarchically across a network-on-a-chip can scale to a large number of cores. As an example, the STHORM architecture has 4 clusters and 69 processors, the Kalray MPPA architecture has 16 clusters and 256 processors. The benefits of hardware acceleration of critical kernels for a given application domain are known. It is therefore necessary to study the evolution of these clusters in terms of heterogeneity. The two key elements of the work we propose are an integration of a programmable accelerator coupled to the multi-core and a shared-memory communication scheme. Several years of multi-core programming have provided many parallel applications, based on abstract, standard and portable programming models (e.g. OpenMP, OpenCL). It is therefore important to investigate new approaches to hardware acceleration that are consistent with multi-core programming models. From a programming point of view, in the same way that "threads" are a good abstraction of the processor in most parallel programming models, the hardware accelerators here will be abstracted in the form of hardware tasks, an important step to simplify the development of applications for multi-core architectures with hardware accelerators. A third key element in the novelty of the proposed approach is the integration of reconfigurable accelerators into our highly coupled shared memory cluster. The use of a CGRA will allow to reconfigure an accelerator to satisfy the needs of an application, combining several hardware tasks with a low cost reconfiguration.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要