Houlseholder vectors, tall-skinny QR, parallel algorithms, householder representation, householder reconstruction algorithm, TSQR algorithm, CAQR algorithm
The present paper deals with the fact that the output of TSQR algorithm is not represented as householder format, hence was born the idea to provide a new version of TSQR algorithm while taking advantage of its multiple benefits such as: communication efficiency, less computational cost... In addition, this algorithm is considered stable numerically, as is proven by many experiments.
This paper describes also some of the parallel algorithms which are based on the householder representation, and it presents alternatives to the householder reconstruction algorithm, but they suffer from numerical stability issue.
[...] Before introducing the CAQR algorithm, it is opportune to present the algorithm which implements the application of the orthogonal factor as a binary reduction tree, its purpose is to update the trailing matrix. If we look for improved computation and communication costs, it is advisable to form just an implicit representation of householder matrices, not an explicit format of the orthogonal factor. Butterfly CAQR As mentioned in the latter paragraph, the CAQR algorithm relies on the use of a tree TSQR in order to update the trailing matrix, but this tree shape generates load imbalance issue, so that the use of butterfly CAQR algorithm appears more efficient than CAQR. [...]
[...] - The algorithm's purpose is to find an upper triangular R which is obtained from the p remaining n*n triangles. As to the matrix it's represented as a tree of sets of householder vectors. An analysis of sparsity structure shows that the set of the householder vectors is just an identity matrix stacked on top of a triangle. - An analysis was conducted to determine the parallel TSQR costs, these costs concern the computation, the interprocessor communication and the memory- bandwidth. [...]
[...] - To compute the householder vectors using the algorithm of modified LU. The main purpose of this algorithm is to guarantee sign agreement between the columns of the orthogonal factor and rows of R. One of its characteristics is to compute the upper triangular matrix T from the upper triangular factor U not directly from the householder vectors, that's why it features improved costs compared to the householder QR algorithm, concerning the different types of cost. The algorithm TSQR with householder reconstruction has an optimized version that allows to minimize the computation and communication costs. [...]
[...] Cholesky-QR decomposition The simplicity of this algorithm offers many advantages like low communication costs, because it is based on multiplication matrix and it can be performed locally on a single node. It has also some disadvantages especially the numerical stability. In this context, as the present paper proposes, the idea of applying the Cholesky QR algorithm twice (Cholesky QR2) can significantly improve the numerical stability. The inconvenient of this algorithm is that the obtained orthogonal factor is not in the form of the householder representation, but it is possible to relieve this situation by using another algorithm CholQR2-HR which is considering the best algorithm in term of running time. [...]
[...] Therefore, it is important to replace the binary tree with a butterfly communication network. This algorithm provides better costs than CAQR in term of horizontal and vertical communication as well as memory-bandwidth. The last part of the paper is considering as a practical part which provides computed results for all studied approaches. [...]
Lecture en ligneavec notre liseuse dédiée !
Contenu vérifiépar notre comité de lecture