Because of large degrees, the Jacobian matrix may contain extreme values, requiring extended precision, leading to a significant overhead. This overhead of multiprecision arithmetic is our main motivation to develop a massively parallel algorithm. To allow overdetermined linear systems we solve linear systems in the least squares sense, computing the QR decomposition of the matrix by the modified Gram-Schmidt algorithm. We describe our implementation of the modified Gram-Schmidt orthogonalization method using double double and quad double arithmetic for GPUs. Our experimental results on the NVIDIA C2050 and K20C show that the achieved speedups are sufficiently high to compensate for the overhead of one extra level of precision.
This is joint work with Genady Yoffe.
Keywords. double double arithmetic, general purpose graphics processing unit (GPU), massively parallel algorithm, modified Gram-Schmidt method, orthogonalization, quad double arithmetic, quality up.
The 27th Parallel and Distributed Processing Symposium (IPDPS-13), the 14th IEEE international Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC-13), 20-24 May 2013, Boston, Massachusetts.