## A Divide-and-Conquer Approach to the Singular Value Decomposition

**Abstract:**

The tremendous expansion of the Internet has made efficient searching and information retrieval an area of growing importance. Traditional search methods have drawbacks that may result in documents being returned that are not relevant or in documents that are relevant being overlooked. Latent Semantic Indexing (LSI) is an alternative method of information retrieval that attempts to avoid these problems. This method involves creating a term-by-document matrix to represent the documents and their keywords. Queries are projected into this matrix using the matrix factorization method known as the singular value decomposition (SVD). The SVD provides a means of data compression, allowing reduction of the term space while preserving the most important characteristics of the original matrix. Because calculating the SVD of a matrix is an expensive process that needs to be repeated often in LSI, speeding up this SVD calculation is of particular importance.

Calculating the SVD of a matrix involves two main steps: first the matrix is bidiagonalized, and then the SVD is formed from this bidiagonalized matrix. Traditionally, the SVD calculation is performed using the Golub-Kahan algorithm. Unfortunately, for a bidiagonal m x n matrix, this algorithm takes O(n^{3}) time [8], making it infeasible for the extremely large term-by-document matrices used in LSI. This thesis presents a divide-and-conquer approach to calculating the SVD that is asymptotically faster than the Golub-Kahan method, resulting in a significant improvement in the cost of computation. Results are presented comparing the CPU times of the two methods, both implemented and tested in the Matlab problem solving environment .

**Author:** Jane E. B. Tougas

**Advisor:** Raymond J. Spiteri

**Download:** jtougas_bcs_thesis