Background: Molecular biologists work with DNA databases that often include entire genomes. A common requirement is to search a DNA database to find exact matches for a nondegenerate or partially degenerate query. The software programs available for such purposes are normally designed to run on remote servers, but an appealing alternative is to work with DNA databases stored on local computers. We describe a desktop software program termed MICA (K-Mer Indexing with Compact Arrays) that allows large DNA databases to be searched efficiently using very little memory. Results: MICA rapidly indexes a DNA database. On a Macintosh G5 computer, the complete human genome could be indexed in about 5 minutes. The indexing algorithm recognizes all 15 characters of the DNA alphabet and fully captures the information in any DNA sequence, yet for a typical sequence of length L, the index occupies only about 2L bytes. The index can be searched to return a complete list of exact matches for a nondegen...
William A. Stokes, Benjamin S. Glick