A multiple double is a sequence of nonoverlapping doubles. Exploiting hardware arithmetic, multiple double arithmetic multiplies the precision. To compensate for the cost overhead of multiple double arithmetic, the tensor cores in the NVIDIA A100 graphics processing unit are applied. Specialized for matrix multiplications, tensor cores only support elementary, noncomposite floating-point operations. The renormalization of results of multiple double arithmetical operations into nonoverlapping doubles cannot be performed by tensor cores, as renormalizations involve branching.
The renormalization of multiple doubles into nonoverlapping doubles is relaxed, widening the gaps between the doubles with trailing zero bits. Data staging algorithms arrange the convolutions of low with high doubles into inner products for execution on tensor cores. The renormalizations are handled by the streaming multiprocessors. Experiments demonstrate the correctness and performance.
This is joint work with Howard Chen.
SIAM PP 2026, 4 March 2026, Berlin, Germany.