Thursday, September 15, 2022

Installing Seurat Package in Rstudio

While installing  seurat, I had to install several dependencies in linux system. Keeping a log of them.

install.packages('Seurat')

Had an issue with curl:

ERROR: configuration failed for package ‘curl’

* removing ‘/home/ramadatta/R/x86_64-pc-linux-gnu-library/4.1/curl’

Warning in install.packages :

  installation of package ‘curl’ had non-zero exit status

sudo apt-get install libcurl4-openssl-dev

configure: error: geos-config not found or not executable.

ERROR: configuration failed for package ‘rgeos’

* removing ‘/home/ramadatta/R/x86_64-pc-linux-gnu-library/4.1/rgeos’

 sudo apt install libgeos-dev

 install.packages('Seurat')

ERROR: configuration failed for package ‘nloptr’
* removing ‘/home/ramadatta/R/x86_64-pc-linux-gnu-library/4.1/nloptr’
Warning in install.packages :
  installation of package ‘nloptr’ had non-zero exit status

sudo apt-get install libnlopt-dev

ERROR: configuration failed for package ‘xml2’
* removing ‘/home/ramadatta/R/x86_64-pc-linux-gnu-library/4.1/xml2’
Warning in install.packages :
  installation of package ‘xml2’ had non-zero exit status

sudo apt-get install libxml2-dev 

install.packages("xml2")

ERROR: configuration failed for package ‘hdf5r’
* removing ‘/home/ramadatta/R/x86_64-pc-linux-gnu-library/4.1/hdf5r’

sudo apt-get install libhdf5-dev

---------------------------

The below are not necessary for Seurat but I needed to run for other packages

ERROR: configuration failed for package ‘systemfonts’
* removing ‘/home/ramadatta/R/x86_64-pc-linux-gnu-library/4.1/systemfonts’

sudo apt -y install libfontconfig1-dev

Thursday, September 8, 2022

Bash - partially resolved - cannot create temp file for here-document: no space left on device


Deleted some of the files from /var/log first

Then docker was not used for sometime, so deleted all the dockers images which reclaimed some space!!

1
2
3
4
5
cd /var/log
du -sh *

sudo du -ahx / | sort -rh | head -20
sudo docker image prune --all

Wednesday, August 31, 2022

Useful conda commands list

List the conda environments

conda info --envs

Create a conda environment

conda create --name PlasClass

Activate a conda environment

conda activate PlasClass

Install a package in conda environment

conda install -c bioconda plasclass

Install multiple packages of specific version while creating a conda environment

conda create -n PlasClass python=3.7 scikit-learn=0.21.3 plasclass

Install specific version of package in a conda environment

conda install spades=3.9.0

Search for the package versions available to install in a conda environment

conda search spades

Remove an environment

conda env remove -n PlasClass
Export 
conda env export > singlecell_env.yml
install using yml file
working with mamba  fast installations
useful tips


Monday, August 8, 2022

Difference between PCA and PCoA

 PCA and PCoA are really similar. In fact, PCA is just a type of PCoA that uses euclidean distances! So we could say:

Type of Ordination: - MDS - CCA - PCoA - PCoA of Jaccard distances - PCoA of Bray-Curtis dissimilarities - PCoA of Euclidian distances (this is also called PCA) - PCoA of UniFrac distances
source: https://forum.qiime2.org/t/pca-vs-pcoa-which-is-the-appropriate-one-for-microbiome-data/5974/2

Tuesday, July 19, 2022

Calculate the frequency for a range of nucleotides aligned to reference from a BAM file

I wanted to find out the frequency for a range of nucleotides from multiple reads aligned to reference sequence from a BAM file. 

As you see from figure below, I want to find the frequency of nucleotides from position 13 to 15 and give their count.


For this, I use samtools mpileup command with a few bash commands as below.

samtools mpileup snps.bam -Q 0 -r CY116646.1:13-15 \| # Extract aligned region
sed 's/\^\]//g' \|  # "^]" signifies a base that was at the start of a read 
awk '{print $5}' \| # Read bases
sed -e 's/./&,/g' -e 's/,$//g' >infile && # temp file
ruby -F, -lane 'BEGIN{$a=[]};$a.concat([$F]);END{$a.transpose.each{|x|puts x.join(",")}}' infile \| # pass temp file to ruby oneliner
sed 's/,//g' \| #remove delimiter
sort \| 
uniq -c # Count


This results in:
[mpileup] 1 samples in 1 input files
     10 GGT
      1 GTT


EDIT 1:

But this would not work if the mpileup bases are not of same length.

This could be achived using a R library package. All we need is input bam file and coordinates of the sequence.

1
2
3
library(GenomicAlignments)    
stacked <- stackStringsFromBam("snps.bam", param = GRanges(CY116646.1:13-15))
write.table(stacked, Aligned_Reads_at_Region_Of_Interest.txt, quote = FALSE, col.names = F, row.names = F)


If you have multiple positions, then you can run the following R script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
  fileName <- "regions.list"
  conn <- file(fileName, open = "r")
  linn <- readLines(conn)

  for (i in 1:length(linn)) {
    stacked <- stackStringsFromBam("snps.bam", param = GRanges(linn[i]))
    write.table(
      stacked,
      paste0("Region", linn[i], ".txt"),
      quote = FALSE,
      col.names = F,
      row.names = F
    )
  }
  close(conn)

Some reference posts:

https://www.biostars.org/p/259690/

Sunday, July 17, 2022

Useful Blast Commands and one-liners

 # Blast command with additional details such as query length and subject length

blastn -db /storage/data/Storage/nt/nt/nt -query query.fa -outfmt '6 qseqid sseqid pident nident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen' -out query.fa_vs_nt_with_stitle_blastresults.txt -num_threads 48

# To get title of subject, For example: Enterobacter_aerogenes_strain_G7,_complete_genome 

blastn -db /storage/data/Storage/nt/nt/nt -query query.fa -outfmt '6 qseqid sseqid pident nident length mismatch gapopen qstart qend sstart send evalue bitscore stitle qlen slen' -out query.fa_vs_nt_with_stitle_blastresults.txt -num_threads 48

# Tophits

$ awk '!x[$1]++' query.fa_vs_nt_with_stitle_blastresults.txt

# Tophits with plasmid assignment

$ awk '!x[$1]++' query.fa_vs_nt_with_stitle_blastresults.txt | sed 's/ /_/g' | awk '{print $0 "\t" $5*100/$(NF-1)}' | column -t | grep 'plasmid'

Thursday, July 14, 2022

Running MLST tool on individual contigs in fasta vs whole genome assembly fasta

This is more of a kind of observation.

When MLST tool was used to assign species and ST on assembled contigs as a whole file the assignment was ecoli species and ST 448. 

But when assembled file is separated into multiple contigs, only one contig was assigned with species "senterica", without ST assignment. The rest of the contigs were not assigned by any species and ST. 

It appears, running all the contigs together in a file gives MLST tool more confidence than running the tool on each single contig. 




Tuesday, June 28, 2022

Contigs with no read coverage (zero reads mapped) after mapping using Bowtie2

I recently mapped reads using Bowtie2 against viral genome segments to find the read counts for each of the segments. To my surprise, I found contigs with no reads mapped on them.

That is when I came to know that bowtie2 used END to END read alignment using default settings (see here). In reality, the contigs from many assemblers are further broken down into kmers and are stitched together. In this case, I used the Megahit metagenome assembler, which uses the default sequence of 21, 41, 61, 81 and 99 kmers (Reference). So, aligning reads using the END to END parameter settings may not work sometimes resulting in zero reads count or no read coverage for the particular contig. 

In the figure down, we see the PB2 and NP (colored read) using END to END has no reads mapped but the contig is created by the assembler. This more likely seems to be happened because the assemblers use kmer approach to create contigs. This is when, changing the setting from END to END to LOCAL makes more sense. 

In the local alignment when some of the bases at the ends of the read do not participate, they are omitted (or "soft trimmed" or "soft clipped") from the beginning or from the end. That's how we see PB2 and NP have now read counts. For other segments, the read counts seemed to increase. 

So, in this particular case of alignment, local alignment of the reads using bowtie2 makes more sense.



For more discussion, see here:

Tuesday, June 21, 2022

Downloading viral genome database for viral read classification - Metagenomics

  1.  Kraken2 Metagenomic Virus Database - redirects to globus - does not have an option to download

  2. Default kraken2 command:
    • kraken2-build --download-library viral --db $DBNAME

      Results in the error - adding --use-ftp did not help
      rsync: getaddrinfo: ftp.ncbi.nlm.nih.gov 873: Name or service not known
      rsync error: error in socket IO (code 10) at clientserver.c(127) [Receiver=3.1.3]
      Error downloading assembly summary file for viral, exiting.


    • Tried changing code in specific scripts based on this error thread - still no luck! :(
    • Tried changing code in specific scripts based on this error thread - still no luck! :(

  3. A python script that helps with updating the kraken databases :
    • error - FileNotFoundError: [Errno 2] No such file or directory: 'assembly_summary_refseq.txt'

  4. Downloaded NCBI Viral database from here - have not tested - but fasta sqeuences needed to converted to kraken2 database format.

  5. Downloaded Viral RefSeq database from a PeerJ paper - worked!
    • Pre-compiled databases
    • Looks comprehensive! Size is big (6.6 gb)
    • Could classify the viral reads using kraken2 command

Thursday, June 16, 2022

Resolved-How to load an HTML page on Github as a normal HTML page we see in browser?

Github is all about code and there is no automatic way to load and view a .html file uploaded in our github repo. So, only the html code is displayed when clicked on the .html document.

A simple way to overcome this problem is to prepend the following before the actual link.

For example, 

my actual html file is located at:

https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.html

to load it as html file, we should prepend :

https://htmlpreview.github.io/?

Finally, my link in the browser should look like this:

https://htmlpreview.github.io/?https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.html