Thursday, December 21, 2023

Tip: Single Cell Analysis in python - Doublet Detection notes

Had some issues, so ran the following commands:

# remotes::install_github("davismcc/scater") in R prompt in the conda environment from linux terminal

# BiocManager::install("scDblFinder") - in R prompt in conda environment from linux terminal

Wednesday, December 6, 2023

Tip: the item to be deployed exceeds the maximum deployment size

When deploying R Shiny app, I constantly had this pop-up that I exceeding the maximum deployment size which was around 1.3Gb.

But, my file sizes all combined was around ~320 Mb.

I realized that the .RData was around ~1.1Gb. Once I identified and removed this file, the deployment size was brought within the acceptable limits, resolving the issue.

Thursday, November 23, 2023

Cutoffs and Thresholds for single nuclei and single cell RNAseq data

paper : https://www.biorxiv.org/content/biorxiv/early/2021/06/25/2021.06.25.449944.full.pdf

Tuesday, November 14, 2023

Data Downloads with SFTP Commands

sftp> lpwd

Local working directory: /home/user/

sftp> lcd /home/user/Data/

sftp> lpwd

Local working directory: /home/user/Data/

for d in abc842 abc9829 abc8424; do var=`echo $d | sed 's/abc/Sample_ABC/g'`; echo "get -R $var"; done

get -R Sample_ABC8425

get -R Sample_ABC9829

get -R Sample_ABC8424

paste the above get-R commands in the sftp command prompt to download the data to server

Monday, November 13, 2023

Resolved: Error: object ‘LayerData<-’ is not exported by 'namespace:SeuratObject'

Signac gives me :

Error: object ‘LayerData<-’ is not exported by 'namespace:SeuratObject'

Execution halted

I updated my SeuratObject (4.1.3     -> 5.0.0    ) [CRAN]

Notes on snapatac2

When I ran:

data = snap.pp.import_data(

fragment_file,

chrom_sizes=snap.genome.hg38,

#file="2_filtered_data/ATAC/atac_test.h5ad", # Optional

sorted_by_barcode=False,)

------------------------

fragment_file being fragments.tsv.gz and fragments.tsv generated the error

PanicException: called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: "Unknown frame descriptor" }

instead i used fragments.srt.bed.gz generated from :

time sort -k4,4 fragments.tsv | gzip - > fragments.srt.bed.gz,

resolved the issue

Tuesday, October 24, 2023

Issues with running scArches

The scarches command below:

scarches_model = sca.models.SCVI.load_query_data( adata=adata_to_map_augmented, reference_model="./reference_model", freeze_dropout=True)

gave the following error:

RuntimeError: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver

Built on Thu_Nov_18_09:45:30_PST_2021

Cuda compilation tools, release 11.5, V11.5.119

Build cuda_11.5.r11.5/compiler.30672275_0

import torch

print(torch.version.cuda)

12.1

So, I ran the following:

pip install light-the-torch
ltt install torch torchvision

Looks like my cuda and torch versions are not same. it is scary to update nvidia drivers without backing up. Coz this may result in issues. So thought I will not delve into cuda for now.

But again I got this error:

RuntimeError: Detected that PyTorch and torchvision were compiled with different CUDA major versions. PyTorch has CUDA Version=12.1 and torchvision has CUDA Version=11.8. Please reinstall the torchvision that matches your PyTorch install.

I did the following:

conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

This time again another error:

undefined symbol when importing torchaudio with pytorch...

Now, did the following, the above error was resolved

pip install -U torch torchaudio --no-cache-dir

Instead of gpu I relied on CPU.

scarches_model = sca.models.SCVI.load_query_data( adata=adata_to_map_augmented, reference_model="./reference_model", freeze_dropout=True, use_gpu=False )

Now, I have only a warning as below:

WARNING:jax._src.xla_bridge:An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not installed. Falling back to cpu.

Monday, October 2, 2023

Blank Screen Cursor at Top Left Ubuntu 22.04 LTS

I changed to Nouveau driver and I could not login

cannot login to tty using CTRL+ALT+[F1 to F7] - Blank screen still persists

Reboot to recovery mode

- Shift did not work at all

- ESC took me to grub mode - but nothing worked! ( i think i pressed ESC too long!)

- ESC until I saw logo (not ubuntu) on the monitor -

https://support.starlabs.systems/kb/guides/using-recovery-mode

https://linuxconfig.org/how-to-uninstall-the-nvidia-drivers-on-ubuntu-20-04-focal-fossa-linux

Then I pressed enter for option: Press Enter for maintenance

nvidia-smi

ubuntu-drivers autoinstall

apt-get update

ubuntu-drivers devices

apt install nvidia-driver-535

apt-get purge nvidia*

sudo ubuntu-drivers autoinstall

sudo reboot

This still did not result in correct configuration. I changed the driver from 535 to 470, then it started working!

Tuesday, September 26, 2023

Notes on adata object

mdata["GEX"].X = mdata["GEX"].layers['counts'].copy()

sc.pp.normalize_total(mdata["GEX"], target_sum=1e4)

sc.pp.log1p(mdata["GEX"])

sc.pp.highly_variable_genes(mdata["GEX"], n_top_genes=2000, batch_key='batch')

sc.pp.neighbors(mdata["GEX"])

WARNING: You’re trying to run this on 13953 dimensions of `.X`, if you really want this, set `use_rep='X'`.

         Falling back to preprocessing with `sc.pp.pca` and default params.

In the above, I did not explicitly calculate the pca. So, scanpy is letting me know that it is calculating the PCA and then use that to calculate the neighbors. This makes sense, just because we have ~13K genes does not means we capture more information. Too much high dimensional data is difficult even computationally deal with. So, doing a PCA makes sense here.

Wednesday, August 30, 2023

My conda environments for single cell analysis

Environment 1 - Automatic annotation_celltypist (successful)

conda create --name annotation_celltypist

conda activate annotation_celltypist

conda install -c conda-forge mamba

mamba install -c conda-forge scanpy python-igraph leidenalg
pip install jupyter

mamba install -c bioconda -c conda-forge celltypist

pip install -U scarches

pip install urllib - was automatically installed via scarches I think

pip install "pandas<2.0.0"

=========================================

Environment 2 - scanpy (successful)

conda create --name sc_manual_annotation
conda activate sc_manual_annotation
conda install -c conda-forge mambamamba install -c conda-forge scanpy
pip install jupyter

=========================================

Environment 3 - counts_to_clustering + nichenet (successful)

conda create --name singlecell
conda activate singlecell
conda install -c conda-forge mamba
mamba install -c conda-forge scanpy python-igraph leidenalg
mamba install -c conda-forge altair matplotlib numpy pandas seaborn
pip install jupyter
pip install anndata2ri
pip install sccoda

In linux:
sudo apt install libcurl4-openssl-dev
sudo apt-get install libmkl-rt
sudo apt install libfontconfig1-dev
sudo apt-get install libcairo2-dev
sudo apt-get install libharfbuzz-dev libfribidi-dev
sudo apt-get install libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev
sudo apt install cmake

In R studio:
install.packages("SoupX")
install.packages("systemfonts", dependencies = TRUE)
remotes::install_github("davismcc/scater", dependencies = TRUE)
install.packages("Cairo")
install.packages("textshaping")
install.packages("ragg")
install.packages("ggraster")
BiocManager::install("scDblFinder")
BiocManager::install("BiocParallel")
BiocManager::install("scry")
install.packages("ggpubr")
library("devtools"); install_github("lme4/lme4",dependencies=TRUE)
BiocManager::install("ComplexHeatmap")
devtools::install_github("saeyslab/nichenetr")
install.packages("tidyverse")

## advise to self: dont install R packages in jupyter notebook install in Rstudio

=========================================

Environment 4 - CITE-Seq data analysis (in Apple M2 Macbook) - successful

brew install --cask mambaforge

mamba create -n scvi_mamba

mamba activate scvi_mamba

mamba install -y -c conda-forge python=3.9 scanpy python-igraph leidenalg altair matplotlib numpy pandas seaborn scvi-tools muon

pip install jupyter

pip install --user scikit-misc

Issue 1: Sometimes mamba activate does not work in mac m2 notebooks and it would just throw only the following lines when trying to activate an environment.

% mamba activate scvi_mamba

Run 'mamba init' to be able to run mamba activate/deactivate

and start a new shell session. Or use conda to activate/deactivate.

Then, copy the following contents in .bash_profile to .zhsrc file

# >>> conda initialize >>>

# !! Contents within this block are managed by 'conda init' !!

......

# <<< conda initialize <<<

Now

source ~/.zshrc

Issue 2:

NotImplementedError: The operator 'aten::lgamma.out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

Solution: In the terminal run this before opening Jupyter notebook

% export PYTORCH_ENABLE_MPS_FALLBACK=1

Else, in the Jupyter notebook

% env PYTORCH_ENABLE_MPS_FALLBACK=1

In my case, I aborted my notebook and ran the command in terminal which then resolve the issue.

Environment 5 - RNA velocity - successful

conda create -n scvelo_env python=3.9
conda activate scvelo_env
conda install numpy scipy cython numba matplotlib scikit-learn h5py click
pip install notebook
pip install scanpy
pip install leidenalg
pip install scvelo
pip install pandas==1.1.5 
pip install numpy==1.21.1

pip install git+https://github.com/theislab/cellrank.git@main

pip install pandas==1.1.5

pip install numpy==1.21.1

pip install rpy2

cellrank==2.0.0 scanpy==1.9.3 anndata==0.9.2 numpy==1.24.4 numba==0.57.1 scipy==1.10.1 pandas==2.0.3 pygpcca==1.0.4 scikit-learn==1.1.3 statsmodels==0.14.0 scvelo==0.3.0 pygam==0.8.0 matplotlib==3.7.1 seaborn==0.12.2

Environment 5 - Differential Gene Expression - successful

conda create -n deg

conda activate deg

conda install -c conda-forge scanpy python-igraph leidenalg
pip install jupyter

pip install diffxpy

Monday, August 28, 2023

Installing Singularity version: 3.6.3 in Ubuntu Linux

I was facing the following error when installing singularity:

checking: host Go compiler (at least version 1.20)... not found! mconfig: could not complete configuration

Running the following commands in terminal, resolved the problem by installing singularity version: 3.6.3

Check this thread for more information: https://github.com/apptainer/singularity/issues/5099#issuecomment-1286798317

sudo apt-get update && \
sudo apt-get install -y build-essential \
libseccomp-dev pkg-config squashfs-tools cryptsetup

sudo rm -r /usr/local/go

export VERSION=1.13.15 OS=linux ARCH=amd64  # change this as you need

wget -O /tmp/go${VERSION}.${OS}-${ARCH}.tar.gz https://dl.google.com/go/go${VERSION}.${OS}-${ARCH}.tar.gz && \
sudo tar -C /usr/local -xzf /tmp/go${VERSION}.${OS}-${ARCH}.tar.gz

echo 'export GOPATH=${HOME}/go' >> ~/.bashrc && \
echo 'export PATH=/usr/local/go/bin:${PATH}:${GOPATH}/bin' >> ~/.bashrc && \
source ~/.bashrc

curl -sfL https://install.goreleaser.com/github.com/golangci/golangci-lint.sh |
sh -s -- -b $(go env GOPATH)/bin v1.21.0

mkdir -p ${GOPATH}/src/github.com/sylabs && \
cd ${GOPATH}/src/github.com/sylabs && \
git clone https://github.com/sylabs/singularity.git && \
cd singularity

git checkout v3.6.3

cd ${GOPATH}/src/github.com/sylabs/singularity && \
./mconfig && \
cd ./builddir && \
make && \
sudo make install

singularity version

Wednesday, July 19, 2023

How to run Jupyter Notebooks on remote server

Very nice article indeed!

https://medium.com/@apbetahouse45/how-to-run-jupyter-notebooks-on-remote-server-part-1-ssh-a2be0232c533

Sunday, May 7, 2023

MINIFS function in excel

=MINIFS(Sheet2!F:F,Sheet2!M:M,A2)

Sheet2!F:F : Dates - because I want the minimum date from a list of dates

Sheet2!M:M : Searches this column for my key

A2: search this is the key for which I want to find out the minimum date in column M

Column F Column M

22/11/2011 NA

27/11/2011 NA

13/12/2021 Cluster_18

18/09/2013 Cluster_18

Final result:

Cluster_18 18/09/2013

The MINIFS function returns the minimum value among cells specified by a given set of conditions or criteria.

Sunday, April 9, 2023

How to Fix a VirtualBox Aborted Error with Blank Screen in Ubuntu

Have you ever encountered an issue where your Windows virtual machine just suddenly aborted and left you with a blank screen? This can be a frustrating experience, especially if you were in the middle of an important task or project. Luckily, there's a simple solution to this problem: installing the VirtualBox Extension Pack.

VirtualBox is a free and open-source virtualization software that allows you to run different operating systems on your computer. However, in order for VirtualBox to emulate the VM's devices correctly, it needs the same version of the VirtualBox Extension Pack to be installed. This is why you might have encountered an aborted VM with a blank screen – the extension pack was missing.

To fix this issue, you'll need to follow a few simple steps:

Step 1: Check your VirtualBox version

Open the VirtualBox Manager and go to Help > About VirtualBox. This will show you the version of VirtualBox that's currently installed on your system. Make note of this version number, as you'll need it for the next step.

Step 2: Download and install the VirtualBox Extension Pack from here based on your ubuntu OS version.

Now that you know your VirtualBox version, you need to download the same version of the VirtualBox Extension Pack. You can find the download link for the extension pack on the VirtualBox website. Make sure to download the correct version that matches your VirtualBox installation.

Once the download is complete, double-click the .deb file to start the installation process in Ubuntu. You'll need to accept the license agreement and follow the prompts to complete the installation.

And that's it! Once the VirtualBox Extension Pack is installed, and you restart the VM, your Windows virtual machine should no longer abort with a blank screen. You can now continue working on your projects and tasks without any interruptions.

In conclusion, VirtualBox is a powerful virtualization tool that can greatly improve your productivity by allowing you to run multiple operating systems on a single computer. However, it's important to ensure that you have the correct version of the VirtualBox Extension Pack installed to avoid any issues with your virtual machines. By following the simple steps outlined above, you can quickly and easily fix any aborted VMs with a blank screen.

Wednesday, April 5, 2023

qpois function in R

"The qpois function provides the maximum number of events that can occur within a given time interval, given a certain mean number of events and a 95% confidence level. Essentially, it tells us the upper limit of events that are likely to occur within that time interval based on statistical analysis."

Monday, March 27, 2023

Resolved-Java version 11 is required to run InterProScan

$ ./interproscan.sh -i test_all_appl.fasta -f tsv

Java version 11 is required to run InterProScan.

Detected version 1.8.0_92

Please install the correct version.

$ conda create -n openjdk_11.0.1 openjdk=11.0.1

$ conda activate openjdk_11.0.1

time /data01/Databases/interproscan/interproscan-5.60-92.0/interproscan.sh -t n -i filename.fasta -f tsv -dp -cpu 64

$ conda deactivate

Monday, February 20, 2023

Unexpected Topology in Gubbins Newick File: An Analysis of Branch Arrangement with Respect to SNPs

I generated a dummy file and utilized Gubbins to eliminate recombination. Here are the observations I made:

SNPs are identified based on the bases in the first sequence.
Gaps or Ns do not contribute to SNPs
The initial sequence is designated as the reference sequence, and any modifications made to it can result in notable changes in the output. These changes may range from slight to significant depending on the similarity between the sequences.

Furthermore, I detected an issue with the newick file in a distinct dataset when employing Gubbins. Specifically, the branches that lack SNPs should have been positioned adjacently, but were instead organized differently, resulting in a distinct topology.

For instance, in the given example, all the samples without SNPs should have been grouped together, but Gubbins tree (newick) failed to do so.

To achieve this, I utilized the postGubbins.filtered_polymorphic_sites.fasta file and produced a tree using fasttree, resulting in the accurate output illustrated below:

Hope that helps!