L2 Cache Performance Analysis and Optimizations for Processing HDF5 Data on Multi-core Nodes

Parallel and Distributed Processing with Applications（2012）

Cited 3|Views1

No score

Abstract

It is important to design and develop scientific middleware libraries to harness the opportunities presented by emerging multi-core processors that are available on grid and cloud environments. Scientific middleware libraries not adhering or adapting to this programming paradigm can suffer from severe performance limitations while executing on emerging multi-core processors. In this paper, we focus on the utilization of a critical shared resource on chip multiprocessors (CMPs), the L2 cache. The way in which an application schedules and assigns processing work to each thread determines the access pattern of the shared L2 cache, which may result in either enhancing or diminishing the effects of memory latency on a multi-core processor. Therefore, while processing scientific datasets such as HDF5, it is essential to conduct fine-grained analysis of cache utilization, to make informed processing and scheduling decisions in multi-threaded programming. In this paper, using the TAU toolkit for performance feedback from dual- and quad-core machines, we analyze and recommend methods for effective scheduling of threads on multi-core nodes to augment the performance of scientific applications processing HDF5 data. We discuss the benefits that can be achieved by using L2 Cache-Affinity and L2 Balanced-Set based scheduling algorithms for improving L2 cache performance and effectively the overall execution time.

Translated text

Key words

l2 cache-affinity,multi-core nodes,l2 cache performance,processing hdf5 data,l2 cache performance analysis,scientific application,l2 balanced-set,multi-core node,cache utilization,performance feedback,l2 cache,scientific middleware library,multi-core processor,multithreaded programming,multicore processing,programming,hdf5,programming paradigm,hardware,middleware,multicore processor,multi threading,memory latency,instruction sets,multi core,optimization,hierarchical data format

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined