Home > Publications > Efficient Singular Value Decomposition for Information Retrieval

Efficient Singular Value Decomposition for Information Retrieval

Abstract:

The rapid growth of digital libraries and other electronic databases has caused an increasing need for efficient information retrieval techniques. Latent Semantic Indexing (LSI) is one of several recent information retrieval techniques that is based on a vector space model in which documents in the database and user-specified queries are represented by a sparse term-by-document matrix. The Singular Value Decomposition (SVD) of the term-by-document matrix allows us to take advantage of the higher-order structure in the association of terms and documents by clustering terms and/or documents together, thus reducing the inaccuracies that arise with problems such as polysemy and synonymy. However, the SVD calculation, which breaks down into a bidiagonalization step and an iterative step, is generally very expensive. Although the efficiency of the iterative step cannot be improved, the efficiency of the bidiagonalization can be improved over conventional bidiagonalization methods by means of the Lanczos algorithm because the matrix is sparse. For the examples in this thesis , we find there is an order of magnitude improvement. It is also common for a user to add (update) or remove (downdate) terms or documents to or from a existing query where the SVD of the original term-by-document matrix is already known. In this thesis, we also give one-way and two-way chasing schemes to efficiently remove a document from a search. The chasing schemes modify the previous SVD to obtain the downdated SVD at a significantly lower cost than recomputing the SVD from scratch. The two-way chasing scheme requires 50 %fewer plane rotations than the one-way chasing scheme.

Author: Sarah MacKinnon-Cormier

Advisor: Raymond J. Spiteri

Download: sarahm_bsc_thesis