TY - JOUR
T1 - SynerClust
T2 - A highly scalable, synteny-aware orthologue clustering tool
AU - Georgescu, Christophe
AU - Manson McGuire, Abigail L.
AU - Griggs, Alexander D.
AU - Desjardins, Christopher A.
AU - Pironti, Alejandro
AU - Wapinski, Ilan
AU - Abeel, Thomas
AU - Haas, Brian J.
AU - Earl, Ashlee M.
PY - 2018
Y1 - 2018
N2 - Accurate orthologue identification is a vital component of bacterial comparative genomic studies, but many popular sequence-similarity-based approaches do not scale well to the large numbers of genomes that are now generated routinely. Furthermore, most approaches do not take gene synteny into account, which is useful information for disentangling paralogues. Here, we present SynerClust, a user-friendly synteny-aware tool based on synergy that can process thousands of genomes. SynerClust was designed to analyse genomes with high levels of local synteny, particularly prokaryotes, which have operon structure. SynerClust’s run-time is optimized by selecting cluster representatives at each node in the phylogeny; thus, avoiding the need for exhaustive pairwise similarity searches. In benchmarking against Roary, Hieranoid2, PanX and Reciprocal Best Hit, SynerClust was able to more completely identify sets of core genes for datasets that included diverse strains, while using substantially less memory, and with scalability comparable to the fastest tools. Due to its scalability, ease of installation and use, and suitability for a variety of computing environments, orthogroup clustering using SynerClust will enable many large-scale prokaryotic comparative genomics efforts.
AB - Accurate orthologue identification is a vital component of bacterial comparative genomic studies, but many popular sequence-similarity-based approaches do not scale well to the large numbers of genomes that are now generated routinely. Furthermore, most approaches do not take gene synteny into account, which is useful information for disentangling paralogues. Here, we present SynerClust, a user-friendly synteny-aware tool based on synergy that can process thousands of genomes. SynerClust was designed to analyse genomes with high levels of local synteny, particularly prokaryotes, which have operon structure. SynerClust’s run-time is optimized by selecting cluster representatives at each node in the phylogeny; thus, avoiding the need for exhaustive pairwise similarity searches. In benchmarking against Roary, Hieranoid2, PanX and Reciprocal Best Hit, SynerClust was able to more completely identify sets of core genes for datasets that included diverse strains, while using substantially less memory, and with scalability comparable to the fastest tools. Due to its scalability, ease of installation and use, and suitability for a variety of computing environments, orthogroup clustering using SynerClust will enable many large-scale prokaryotic comparative genomics efforts.
KW - orthologues
KW - orthogroup clustering
KW - synteny
KW - comparative genomics
U2 - 10.1099/mgen.0.000231
DO - 10.1099/mgen.0.000231
M3 - Article
VL - 4
SP - 1
EP - 7
JO - Microbial Genomics
JF - Microbial Genomics
SN - 2057-5858
IS - 11
ER -