Thursday, November 19, 2020

Regions with average quality BELOW this will be trimmed

BBDuk does not actually do naive quality trimming rather quality trimming is done using the Phred algorithm.

This is nicely explained by author Brian Bushnell in the following thread:

 http://seqanswers.com/forums/showthread.php?t=42776

Imagine a read with this quality profile:

40, 40, 40, 40, 2, 2, 2, 2, 40, 2

The Phred algorithm would trim the last 6 bases, because their average quality (calculated by summing the error probabilities) is 2.79, which is below 10. Trimming regions with average quality below a threshold gives the optimal result in terms of the ratio of retained bases to the expected number of errors.

In the following example, I ran the following command to filter bases at Q30.

bbduk.sh -Xmx6g in1=$R1.fastq in2=$R2.fastq out1=$R1\_bbmap_adaptertrimmed_Q30.fastq out2=$R2\_bbmap_adaptertrimmed_Q30.fastq ref=bbmap/resources/adapters.fa ktrim=r k=23 mink=11 hdist=1 qtrim=rl trimq=30 minavgquality=30

This does not mean that your reads will not have any bases with quality score less than Q30. Because the trimming is not based on naive trimming but phred score quality trimming. Thus, we still end up with bases less than Q30 probably in minor portion of our fastq data.