PQC-AMX: Accelerating Saber and FrodoKEM on the Apple M1 and M3 SoCs.

Décio Luiz Gazzoni Filho, Guilherme Brandão, Gora Adj, Arwa Alblooshi, Isaac A. Canales-Martínez, Jorge Chávez-Saab,Julio López

IEEE Symposium on Computer Arithmetic(2023)

Cited 0|Views1
No score
Abstract
As CPU performance cannot keep up with the dramatic growth of the past few decades, CPU architects turn to domain-specific architectures to accelerate certain tasks. A recent trend is the introduction of matrix-multiplication accelerators to CPUs by manufacturers such as IBM, Intel and ARM, some of them yet to launch commercially. Apple’s systems-on-chip (SoCs) for its mobile phones, tablets and personal computers include a proprietary, undocumented CPU-coupled matrix multiplication coprocessor called AMX. We leverage AMX to accelerate the post-quantum lattice-based cryptosystems Saber and FrodoKEM, and benchmark their performance on Apple M1 and M3 SoCs. We propose a variant of the Toeplitz Matrix-Vector Product algorithm for polynomial multiplication, which sets new speed records for Saber using AMX, improving up to 20% for the main KEM operations, and 152% for matrix-vector multiplication of polynomials, over the current state-of-the-art. We also set new FrodoKEM speed records using AMX, gaining up to 21% for the main KEM operations and 124% for matrix multiplication (with further improvements for 4×-batching), over our optimized NEON implementation, also introduced here, which already improves upon the previous state-of-the-art for ARMv8 CPUs.
More
Translated text
Key words
Post-quantum cryptography,AMX,ARM,NEON,FrodoKEM,Saber
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined