This issue is already reported in gubbins github page for older versions, where it runs successfully for a set of fasta files and not for other set. Could not find much documentation to trace and resolve this error. This is the step which is throwing me the error:
gubbins -r -v seqnew.fa.gaps.vcf -a 100 -b 10000 -f test_gubbins/tmp/seqnew.fa -t seqnew.fa.iteration_1 -m 3 seqnew.fa.gaps.snp_sites.aln
Failed while running Gubbins. Please ensure you have enough free memory
After the parsnp alignment, i used the parsnp.fasta (seqnew.fasta here), to run the following steps. By breaking the alignment into shorter chunks I could successfully run gubbins. After the gubbins is run for all chunks, I combined them and plotted a phylogenetic tree. Commands I used are:
1) Formatting the alignment fasta file
$ fasta_formatter -i seqnew.fa -o seqnew.oneline.fa -w 0 ##convert multiple fasta file to single line fasta file |
2) Split the alignment file
$ perl splitAlignment.pl -rw-rw-r-- 1 ttsh ttsh 7.7M Nov 9 11:00 seqnew.oneline.fa_0 -rw-rw-r-- 1 ttsh ttsh 7.7M Nov 9 11:00 seqnew.oneline.fa_1 -rw-rw-r-- 1 ttsh ttsh 7.7M Nov 9 11:00 seqnew.oneline.fa_2 -rw-rw-r-- 1 ttsh ttsh 7.7M Nov 9 11:00 seqnew.oneline.fa_3 -rw-rw-r-- 1 ttsh ttsh 7.7M Nov 9 11:00 seqnew.oneline.fa_4 -rw-rw-r-- 1 ttsh ttsh 7.7M Nov 9 11:00 seqnew.oneline.fa_5 -rw-rw-r-- 1 ttsh ttsh 2.8M Nov 9 11:00 seqnew.oneline.fa_6
$ for d in {0..6}; do time nohup run_gubbins.py --prefix postGubbins_ecloacae_$d --thread 12 --verbose --tree_builder fasttree seqnew.oneline.fa_$d >>gubbins.log 2>&1; done
4) Convert the alignment file into one line fasta format using fastx_toolkit
$ for d in {0..6}; do fasta_formatter -i postGubbins_ecloacae_$d.filtered_polymorphic_sites.fasta -o postGubbins_ecloacae_$d.filtered_polymorphic_sites.fasta_fastx -w 0; done
-rw-rw-r-- 1 ttsh ttsh 1.2M Nov 9 11:31 postGubbins_ecloacae_0.filtered_polymorphic_sites.fasta_fastx
-rw-rw-r-- 1 ttsh ttsh 1.3M Nov 9 11:31 postGubbins_ecloacae_1.filtered_polymorphic_sites.fasta_fastx -rw-rw-r-- 1 ttsh ttsh 1.5M Nov 9 11:31 postGubbins_ecloacae_2.filtered_polymorphic_sites.fasta_fastx -rw-rw-r-- 1 ttsh ttsh 1.3M Nov 9 11:31 postGubbins_ecloacae_3.filtered_polymorphic_sites.fasta_fastx -rw-rw-r-- 1 ttsh ttsh 1.3M Nov 9 11:31 postGubbins_ecloacae_4.filtered_polymorphic_sites.fasta_fastx -rw-rw-r-- 1 ttsh ttsh 1.2M Nov 9 11:31 postGubbins_ecloacae_5.filtered_polymorphic_sites.fasta_fastx -rw-rw-r-- 1 ttsh ttsh 397K Nov 9 11:31 postGubbins_ecloacae_6.filtered_polymorphic_sites.fasta_fastx
5) Combine the alignment files
$ perl combinedAlignedfasta.pl
combinedAlignedfasta.pl can be found at this github page.
6) You can skip the 7th step by converting aligned fasta file to newick or .tre file using phylogenetic tree inferring softwares like fasttree or RaxML.
$ FastTree -nt postGubbins_seqnew_ecloacae.filtered_polymorphic_sites.fasta > postGubbins_seqnew_ecloacae.filtered_polymorphic_sites.tre
$ qiime (enter into qiime environment) $ qiime > make_phylogeny.py -i postGubbins_seqnew_ecloacae.filtered_polymorphic_sites.fasta -o postGubbins_seqnew_ecloacae.filtered_polymorphic_sites.tre
$ while IFS= read -r aline; do filename=`echo "$aline" | cut -f1 -d " " `; phyloID=`echo "$aline" | cut -f2 -d " " `; sed -i "s/$filename/$phyloID/" postGubbins_seqnew_ecloacae.filtered_polymorphic_sites_Final_Parsed.nwk ; echo "$aline"; done < ~/Datta/7International_NDM/International_NDM_Proj_Filenames_PhyloIDs.txt
$ perl script.pl PhyloIDs.txt postGubbins_seqnew_ecloacae.filtered_polymorphic_sites.tre >postGubbinsChunk.filtered_polymorphic_sitesv2.tre
Note: *" " in the above one liner of cut command are tabs
updated on June 29th, 2018
Code: hilite.me, lang: CSS, CSS: border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;, Style: Perldoc
Code: hilite.me, lang: CSS, CSS: border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;, Style: Perldoc
Tested this gubbins_in_chunks pipeline with a dataset that has complete Gubbins result.
ReplyDeleteThe gubbins_in_chunks result captures ~96% of the polymorphic sites captured by the full result.
Thank you Datta! :)
After parsnp I didn't find any .fasta files. There are parsnpAligner.ini parsnpAligner.log parsnp.ggr parsnp.tree parsnp.xmfa . How did you generated the .fasta file? I tried to convert the .xmfa file to .fasta but that file is not compatible for gubbins.
ReplyDeleteI converted the xmfa to fasta file using sequence scripts available here:
Deletehttps://github.com/kjolley/seq_scripts/blob/master/xmfa2fasta.pl
The output of this script is fed to gubbins