Comparative Sequence Analysis and Multiple Sequence Alignment

With the rapid sequencing of so many related genomes, comparative sequence analysis has emerged as one of the most important areas of modern biology. Comparative sequence analysis at its heart is about predicting all the homologies (i.e., correspondences) among the genomes under consideration, in order to understand what is shared and what is unique in each species. The fundamental method of comparative sequence analysis is multiple sequence alignment, whose goal is to line up all and only the homologous regions from each genome.

We have developed methods for assessing the accuracy of multiple sequence alignments [2,3,5] and have applied those methods to perform a careful assessment of four whole-genome multiple alignment tools [1]. We have also developed methods for identifying those regions of multiple sequence alignments that show extreme similarity among the species [4].

References:

  1. X. Chen, M. Tompa, "Comparative assessment of methods for aligning multiple genome sequences", Nat. Biotechnol., vol. 28 (2010) 567-72. Pubmed 20495551.   Supplement.
  2. A. Prakash, M. Tompa, "Assessing the discordance of multiple sequence alignments", IEEE/ACM Trans Comput Biol Bioinform, vol. 6 (2009) 542-51. Pubmed 19875854.
  3. A. Prakash, M. Tompa, "Measuring the accuracy of genome-size multiple alignments", Genome Biol., vol. 8 (2007) R124. Pubmed 17594489.   Supplement.
  4. H. Tseng, M. Tompa, "Algorithms for locating extremely conserved elements in multiple sequence alignments", BMC Bioinformatics, vol. 10 (2009) 432. Pubmed 20021665.
  5. A. Wang, W. Ruzzo, M. Tompa, "How accurately is ncRNA aligned within whole-genome multiple alignments?", BMC Bioinformatics, vol. 8 (2007) 417. Pubmed 17963514.   Supplement.