A search for solutions: June 2022

Tuesday, June 28, 2022

Contigs with no read coverage (zero reads mapped) after mapping using Bowtie2

I recently mapped reads using Bowtie2 against viral genome segments to find the read counts for each of the segments. To my surprise, I found contigs with no reads mapped on them.

That is when I came to know that bowtie2 used END to END read alignment using default settings (see here). In reality, the contigs from many assemblers are further broken down into kmers and are stitched together. In this case, I used the Megahit metagenome assembler, which uses the default sequence of 21, 41, 61, 81 and 99 kmers (Reference). So, aligning reads using the END to END parameter settings may not work sometimes resulting in zero reads count or no read coverage for the particular contig.

In the figure down, we see the PB2 and NP (colored read) using END to END has no reads mapped but the contig is created by the assembler. This more likely seems to be happened because the assemblers use kmer approach to create contigs. This is when, changing the setting from END to END to LOCAL makes more sense.

In the local alignment when some of the bases at the ends of the read do not participate, they are omitted (or "soft trimmed" or "soft clipped") from the beginning or from the end. That's how we see PB2 and NP have now read counts. For other segments, the read counts seemed to increase.

So, in this particular case of alignment, local alignment of the reads using bowtie2 makes more sense.

For more discussion, see here:

Tuesday, June 21, 2022

Downloading viral genome database for viral read classification - Metagenomics

Kraken2 Metagenomic Virus Database - redirects to globus - does not have an option to download
Default kraken2 command:

kraken2-build --download-library viral --db $DBNAME

Results in the error - adding --use-ftp did not help
rsync: getaddrinfo: ftp.ncbi.nlm.nih.gov 873: Name or service not known
rsync error: error in socket IO (code 10) at clientserver.c(127) [Receiver=3.1.3]
Error downloading assembly summary file for viral, exiting.
Tried changing code in specific scripts based on this error thread - still no luck! :(
Tried changing code in specific scripts based on this error thread - still no luck! :(

A python script that helps with updating the kraken databases :

error - FileNotFoundError: [Errno 2] No such file or directory: 'assembly_summary_refseq.txt'

Downloaded NCBI Viral database from here - have not tested - but fasta sqeuences needed to converted to kraken2 database format.
Downloaded Viral RefSeq database from a PeerJ paper - worked!

Pre-compiled databases
Looks comprehensive! Size is big (6.6 gb)
Could classify the viral reads using kraken2 command

Thursday, June 16, 2022

Resolved-How to load an HTML page on Github as a normal HTML page we see in browser?

Github is all about code and there is no automatic way to load and view a .html file uploaded in our github repo. So, only the html code is displayed when clicked on the .html document.

A simple way to overcome this problem is to prepend the following before the actual link.

For example,

my actual html file is located at:

https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.html

to load it as html file, we should prepend :

https://htmlpreview.github.io/?

Finally, my link in the browser should look like this:

https://htmlpreview.github.io/?https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.html