Given a collection of contigs and mate-pairs. The Contig Scaffolding Problem is to order and orientate the given contigs in a manner that is consistent with as many mate-pairs as possible. This paper describes an efficient heuristic called the greedy-path merging algorithm for solving this problem. The method was originally developed as a key component of the compartmentalized assembly strategy developed at Celera Genomics. This interim approach was used at an early stage of the sequencing of the human genome to produce a preliminary assembly based on preliminary whole genome shotgun data produced at Celera and preliminary human contigs produced by the Human Genome Project. Categories and Subject Descriptors: F.2.2 [Analysis of Algorithms and Problem Complexity]: NonnumericalAlgorithmsandProblems--computationsondiscretestructures;J.3[LifeandMedical Sciences]: biology and genetics General Terms: Algorithms, Experimentation Additional Key Words and Phrases: Genome assembly
Daniel H. Huson, Knut Reinert, Eugene W. Myers