Monday, February 20, 2023

Unexpected Topology in Gubbins Newick File: An Analysis of Branch Arrangement with Respect to SNPs

I generated a dummy file and utilized Gubbins to eliminate recombination. Here are the observations I made:

  • SNPs are identified based on the bases in the first sequence.

  • Gaps or Ns do not contribute to SNPs

  • The initial sequence is designated as the reference sequence, and any modifications made to it can result in notable changes in the output. These changes may range from slight to significant depending on the similarity between the sequences.

Furthermore, I detected an issue with the newick file in a distinct dataset when employing Gubbins. Specifically, the branches that lack SNPs should have been positioned adjacently, but were instead organized differently, resulting in a distinct topology.


 For instance, in the given example, all the samples without SNPs should have been grouped together, but Gubbins tree (newick) failed to do so.







To achieve this, I utilized the postGubbins.filtered_polymorphic_sites.fasta file and produced a tree using fasttree, resulting in the accurate output illustrated below:




Hope that helps!