Least Squares on GPUs in Multiple Double Precision

Abstract:

Graphics Processing Units (GPUs) are well suited to offset the cost overhead caused by multiple double precision. Code generated by the CAMPARY software is applied to accelerate the solving of linear systems in the least squares sense in double double, quad double, and octo double precision. Thanks to the high Compute to Global Memory Access (CGMA) ratios of multiple double arithmetic, teraflop performance is already attained running the double double Householder QR on 1,024-by-1,024 matrices, on the NVIDIA P100 and the V100 GPUs. In doubling the precision from double double to quad double and from quad double to octo double, the observed cost overhead factors are lower than the factors predicted by the arithmetical operation counts.

Online AriC Seminar (ENS Lyon, France), 18 November 2021

slides of the talk