Monday, October 21, 2024

SCENIC - GRN prediction time

Memory : 256 gb RAM
!pyscenic grn {loom_path} {tfs_path} -o $outpath_adj --num_workers 56
  • Sending large graph of size : 144.10 MiB took less than 3 minutes
  • Sending large graph of size : 512.77 MiB took ~8-9 minutes
  • Sending large graph of size 114.02 MiB took ~4 minutes
  • Sending large graph of size 2.26 GiB took ~56 minutes
  • Sending large graph of size 185.87 GiB took less than 2 minutes
  • Sending large graph of size 1.00 GiB. ~7-8 minutes
  • Sending large graph of size 4.48 GiB. ~50-60 minutes

Tuesday, May 14, 2024

Troubleshooting Module Import Issues in Jupyter Notebook

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[8], line 5
      1 import warnings
      3 warnings.filterwarnings("ignore")
----> 5 import pyscenic
      6 import loompy as lp
      7 import scanpy as sc

ModuleNotFoundError: No module named 'pyscenic'



If you're running a package (pyscenic in my case now) in a Jupyter notebook and encountering issues with importing modules from a specific conda environment, it's essential to ensure that the Jupyter notebook is using the correct kernel associated with your pyscenic environment. 

Steps to Resolve the Issue:

1. Identify the Current Jupyter-Lab Path:

    $ which jupyter-lab 
/home/ramadatta/.local/bin/jupyter-lab

2. Install Jupyter in Your Conda Environment:
$ conda install jupyter

3. Verify the Jupyter-Lab Path Again:
$ which jupyter-lab                                                         
/home/ramadatta/anaconda3/envs/pyscenic/bin/jupyter-lab    


You can see that the path for jupyter-lab changed from the local one to the conda environment. This was causing an issue and which was why although I installed pyscenic in the environment. After installing Jupyter in your conda environment, the path should reflect the conda environment instead of the local path.

Tuesday, February 20, 2024

How to Access SMB Shares on Linux: A Step-by-Step Guide

If you're working in a mixed environment where you need to access shared folders (also known as SMB shares) from a Windows machine on your Linux system, you'll find this guide invaluable. SMB, or Server Message Block, is a protocol for sharing files, printers, serial ports, and communications abstractions such as named pipes and mail slots between computers. In this tutorial, we'll walk through the process of accessing an SMB share on a Linux machine, step by step.

Step 1: Installing Required Packages

Before we can mount and access SMB shares, we need to ensure our Linux system has the necessary tools. The cifs-utils package provides utilities for mounting SMB/CIFS shares on a Linux system. To install this package, open a terminal and enter the following command:

sudo apt install cifs-utils


sThis command works for Debian-based distributions like Ubuntu. If you're using a different distribution, you might need to use a different package manager, such as yum for Fedora/RHEL or zypper for openSUSE.

Step 2: Creating a Local Mount Directory

Next, we'll create a local directory where the SMB share will be mounted. This is akin to assigning a drive letter to a network share in Windows. While you can choose any location for this directory, a common practice is to use the /mnt directory as a base. Let's create a subdirectory under /mnt for our SMB share. You can replace smb with any name that is meaningful to you. Run the following command:

sudo mkdir /mnt/smb


Step 3: Mounting the SMB Share

Now, we're ready to mount the SMB share to the directory we created in the previous step. To do this, use the following syntax:

sudo mount -t cifs -o username=user, uid=1000,gid=1000,file_mode=0777,dir_mode=0777 //smb-server-address/share-name /mnt/smb

Here's a breakdown of the command:
  • -t cifs: Specifies the type of file system. CIFS is a version of the SMB protocol.
  • -o username=user: Specifies the username required to access the SMB share. Replace user with the actual username.
    • uid=1000: This sets the user ID of the owner for the files. You can find your user ID by running id -u [your_username]. If you omit [your_username], it will give you the ID of the current user. Setting this to your user's ID makes the mounted share behave as if it's owned by you, allowing for easier manipulation of files.
    • gid=1000: Similar to uid, this sets the group ID for the files. Use id -g [your_groupname] to find your group's ID. Often, your primary group ID is the same as your user ID.
    • file_mode=0777 and dir_mode=0777: These options set the permissions for files and directories, respectively. 0777 grants read, write, and execute permissions to everyone. Adjust these values based on your security requirements.
  • //smb-server-address/share-name: This is the path to the SMB share. Replace smb-server-address with the IP address or hostname of the SMB server and share-name with the name of the shared folder.
  • /mnt/smb: The local mount point directory we created earlier.

For example, if the username is smbusername, the IP address of the SMB server is 10.0.1.123, and the name of the shared folder is Seq-Data, the command would look like this:

sudo mount -t cifs -o username=smbusername,uid=1000,gid=1000,file_mode=0777,dir_mode=0777 //10.0.1.123/Seq-data /mnt/mysmb


After entering this command, you will be prompted to enter the password for the specified user account. Once authenticated, the SMB share will be accessible from the local mount point directory (in our example).

And there you have it! You can now access files and directories on the SMB share directly from your Linux system as if they were part of your local file system. This method provides a seamless way to integrate Windows shares into your Linux environment, facilitating file sharing and collaboration across different operating systems.

Thursday, February 15, 2024

Resolved - TypeError: Can't implicitly convert non-string objects to strings

When trying to save my adata object:

adata_query_final.write_h5ad("anndata_objects/3_pcls_adata_with_scArches_labels_20240215.h5ad")

I received the following error:

TypeError: Can't implicitly convert non-string objects to strings

Error raised while writing key 'ann_level_2_transfer_uncert' of <class 'h5py._hl.group.Group'> to /

I have these columns in obs matrix which are supposed to float but some mix up there different type of variables for which I could not save the object.

       'ann_level_1_transfer_uncert', 'ann_level_2_transfer_uncert',
       'ann_level_3_transfer_uncert', 'ann_level_4_transfer_uncert',
       'ann_level_5_transfer_uncert',

I checked the type of the variable in the columns using below command:


print(adata_query_final.obs['ann_level_1_transfer_uncert'].apply(type).value_counts())

ann_level_1_transfer_uncert
<class 'numpy.float64'>    11075
<class 'int'>               1754
<class 'float'>               78
Name: count, dtype: int64

Converted all the columns to float with command below:

adata_query_final.obs['ann_level_1_transfer_uncert'] = adata_query_final.obs['ann_level_1_transfer_uncert'].astype(float)
adata_query_final.obs['ann_level_2_transfer_uncert'] = adata_query_final.obs['ann_level_2_transfer_uncert'].astype(float)
adata_query_final.obs['ann_level_3_transfer_uncert'] = adata_query_final.obs['ann_level_3_transfer_uncert'].astype(float)
adata_query_final.obs['ann_level_4_transfer_uncert'] = adata_query_final.obs['ann_level_4_transfer_uncert'].astype(float)
adata_query_final.obs['ann_level_5_transfer_uncert'] = adata_query_final.obs['ann_level_5_transfer_uncert'].astype(float)

Now, i can save the adata without an issue:

adata_query_final.write_h5ad("anndata_objects/3_pcls_adata_with_scArches_labels_20240215.h5ad")


Monday, January 15, 2024

Resolved - R Shiny - Navigation containers expect a collection of

An error has occurred!

Navigation containers expect a collection of `bslib::nav_panel()`/`shiny::tabPanel()`s and/or `bslib::nav_menu()`/`shiny::navbarMenu()`s. Consider using `header` or `footer` if you wish to place content above (or below) every panel's contents.


Solution:

My current shiny version was 1.7.X, which gave me the above error. 

Downgrading shiny version to 1.6 solves the problem. I delete the shiny version 1.7.X version from the packages and installed 1.6 in R console using the below command:

remotes::install_version("shiny", "1.6.0")

When you deploy your Shiny app to a server or cloud service (e.g., shinyapps.io), the server environment typically uses the version of Shiny that's installed in that environment.

You don't explicitly specify the Shiny version when deploying; instead, the server environment determines which version of Shiny to use based on its package setup.

Friday, January 12, 2024

Resolved - Souporcell installation issues

 $ bash run_Donor_Deconvolution.sh
checking modules
imports done
checking bam for expected tags
[W::hts_idx_load2] The index file is older than the data file: /home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/1_raw_data/human/FC_12h/bamfile/possorted_genome_bam.bam.bai
checking fasta
restarting pipeline in existing directory /home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/3_donor_deconvolution/
running souporcell clustering
/home/ramadatta/sw/souporcell/hardinstall/souporcell/souporcell/target/release/souporcell -k 4 -a /home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/3_donor_deconvolution//alt.mtx -r /home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/3_donor_deconvolution//ref.mtx --restarts 100 -b /home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/1_raw_data/human/FC_12h/count_matrices/filtered_feature_bc_matrix/barcodes.tsv --min_ref 10 --min_alt 10 --threads 56
running souporcell doublet detection
running co inference of ambient RNA and cluster genotypes
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_ca32e407e94c33afe8f72cdc7357f09f NOW.
Traceback (most recent call last):
  File "/home/ramadatta/sw/souporcell/hardinstall/souporcell/consensus.py", line 158, in <module>
    sm = pystan.StanModel(model_code=cell_genotype_consensus)
  File "/home/ramadatta/anaconda3/envs/souporcell/lib/python3.6/site-packages/pystan/model.py", line 384, in __init__
    self.module = load_module(self.module_name, lib_dir)
  File "/home/ramadatta/anaconda3/envs/souporcell/lib/python3.6/site-packages/pystan/model.py", line 50, in load_module
    return __import__(module_name)
ImportError: /tmp/pystan_rk7jnbr7/stanfit4anon_model_ca32e407e94c33afe8f72cdc7357f09f_6509297500926240442.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZSt28__throw_bad_array_new_lengthv
Traceback (most recent call last):
  File "/home/ramadatta/sw/souporcell/hardinstall/souporcell/souporcell_pipeline.py", line 600, in <module>
    consensus(args, ref_mtx, alt_mtx, doublet_file)
  File "/home/ramadatta/sw/souporcell/hardinstall/souporcell/souporcell_pipeline.py", line 550, in consensus
    "--output_dir",args.out_dir,"--soup_out", args.out_dir + "/ambient_rna.txt", "--vcf_out", args.out_dir + "/cluster_genotypes.vcf", "--vcf", final_vcf])
  File "/home/ramadatta/anaconda3/envs/souporcell/lib/python3.6/subprocess.py", line 291, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/ramadatta/sw/souporcell/hardinstall/souporcell/consensus.py', '-c', '/home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/3_donor_deconvolution//clusters.tsv', '-a', '/home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/3_donor_deconvolution//alt.mtx', '-r', '/home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/3_donor_deconvolution//ref.mtx', '-p', '2', '--output_dir', '/home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/3_donor_deconvolution/', '--soup_out', '/home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/3_donor_deconvolution//ambient_rna.txt', '--vcf_out', '/home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/3_donor_deconvolution//cluster_genotypes.vcf', '--vcf', '/home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/3_donor_deconvolution//souporcell_merged_sorted_vcf.vcf.gz']' returned non-zero exit status 1.

110  mamba install -c conda-forge c-compiler
  111  mamba install -c conda-forge cxx-compiler
  112  bash run_Donor_Deconvolution.sh 

checking modules
imports done
checking bam for expected tags
[W::hts_idx_load2] The index file is older than the data file: /home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/1_raw_data/human/FC_12h/bamfile/possorted_genome_bam.bam.bai
checking fasta
restarting pipeline in existing directory /home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/3_donor_deconvolution/
running co inference of ambient RNA and cluster genotypes
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_ca32e407e94c33afe8f72cdc7357f09f NOW.
Traceback (most recent call last):
  File "/home/ramadatta/anaconda3/envs/souporcell/lib/python3.6/distutils/unixccompiler.py", line 197, in link
    self.spawn(linker + ld_args)
  File "/home/ramadatta/anaconda3/envs/souporcell/lib/python3.6/distutils/ccompiler.py", line 909, in spawn
    spawn(cmd, dry_run=self.dry_run)
  File "/home/ramadatta/anaconda3/envs/souporcell/lib/python3.6/distutils/spawn.py", line 36, in spawn
    _spawn_posix(cmd, search_path, dry_run=dry_run)
  File "/home/ramadatta/anaconda3/envs/souporcell/lib/python3.6/distutils/spawn.py", line 159, in _spawn_posix
    % (cmd, exit_status))
distutils.errors.DistutilsExecError: command 'g++' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ramadatta/sw/souporcell/hardinstall/souporcell/consensus.py", line 158, in <module>
    sm = pystan.StanModel(model_code=cell_genotype_consensus)
  File "/home/ramadatta/anaconda3/envs/souporcell/lib/python3.6/site-packages/pystan/model.py", line 378, in __init__
    build_extension.run()
  File "/home/ramadatta/anaconda3/envs/souporcell/lib/python3.6/distutils/command/build_ext.py", line 339, in run
    self.build_extensions()
  File "/home/ramadatta/anaconda3/envs/souporcell/lib/python3.6/distutils/command/build_ext.py", line 448, in build_extensions
    self._build_extensions_serial()
  File "/home/ramadatta/anaconda3/envs/souporcell/lib/python3.6/distutils/command/build_ext.py", line 473, in _build_extensions_serial
    self.build_extension(ext)
  File "/home/ramadatta/anaconda3/envs/souporcell/lib/python3.6/distutils/command/build_ext.py", line 558, in build_extension
    target_lang=language)
  File "/home/ramadatta/anaconda3/envs/souporcell/lib/python3.6/distutils/ccompiler.py", line 717, in link_shared_object
    extra_preargs, extra_postargs, build_temp, target_lang)
  File "/home/ramadatta/anaconda3/envs/souporcell/lib/python3.6/distutils/unixccompiler.py", line 199, in link
    raise LinkError(msg)
distutils.errors.LinkError: command 'g++' failed with exit status 1
Traceback (most recent call last):
  File "/home/ramadatta/sw/souporcell/hardinstall/souporcell/souporcell_pipeline.py", line 600, in <module>
    consensus(args, ref_mtx, alt_mtx, doublet_file)
  File "/home/ramadatta/sw/souporcell/hardinstall/souporcell/souporcell_pipeline.py", line 550, in consensus
    "--output_dir",args.out_dir,"--soup_out", args.out_dir + "/ambient_rna.txt", "--vcf_out", args.out_dir + "/cluster_genotypes.vcf", "--vcf", final_vcf])
  File "/home/ramadatta/anaconda3/envs/souporcell/lib/python3.6/subprocess.py", line 291, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/ramadatta/sw/souporcell/hardinstall/souporcell/consensus.py', '-c', '/home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/3_donor_deconvolution//clusters.tsv', '-a', '/home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/3_donor_deconvolution//alt.mtx', '-r', '/home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/3_donor_deconvolution//ref.mtx', '-p', '2', '--output_dir', '/home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/3_donor_deconvolution/', '--soup_out', '/home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/3_donor_deconvolution//ambient_rna.txt', '--vcf_out', '/home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/3_donor_deconvolution//cluster_genotypes.vcf', '--vcf', '/home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/3_donor_deconvolution//souporcell_merged_sorted_vcf.vcf.gz']' returned non-zero exit status 1.

real    0m38,518s
user    0m39,721s
sys    0m8,153s

$ sudo apt-get install gcc

$ sudo apt-get install build-essentials

$ sudo apt-get install python-dev

$ sudo apt-get install python3-dev

No success!


$ ls ~/anaconda3/envs/souporcell/bin/ | grep clang gave nothing


$ clang --version
Command 'clang' not found, but can be installed with:
sudo apt install clang

distutils.errors.CompileError: command 'clang_osx' failed with exit status 1

conda deactivate souporcell

conda activate souporcell

Started working now...

$ bash run_Donor_Deconvolution.sh
checking modules
imports done
checking bam for expected tags
[W::hts_idx_load2] The index file is older than the data file: /home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/1_raw_data/human/FC_12h/bamfile/possorted_genome_bam.bam.bai
checking fasta
restarting pipeline in existing directory /home/ramadatta/Analysis/1_Schiller_Lab/Projects/1_scGenomics_hPCLS/3_donor_deconvolution/
running co inference of ambient RNA and cluster genotypes
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_ca32e407e94c33afe8f72cdc7357f09f NOW.
95804 excluded for potential RNA editing
1090 doublets excluded from genotype and ambient RNA estimation
924 not used for soup calculation due to possible RNA edit
Initial log joint probability = -13978.7
    Iter      log prob        ||dx||      ||grad||       alpha      alpha0  # evals  Notes
       3      -13973.8    0.00109013     0.0953795      0.9474      0.9474        4   
Optimization terminated normally:
  Convergence detected: relative gradient magnitude is below tolerance

====================================

freebayes -f /home/ramadatta/sw/souporcell/refdata-cellranger-GRCh38-3.0.0/fasta/genome.fa -iXu -C 2 -q 20 -n 3 -E 1 -m 30 --min-coverage 20 --pooled-continuous --skip-coverage 100000 -r 1:0-55352692
Traceback (most recent call last):
  File "/home/ramadatta/sw/souporcell/hardinstall/souporcell/souporcell_pipeline.py", line 585, in <module>
    final_vcf = freebayes(args, bam, fasta)
  File "/home/ramadatta/sw/souporcell/hardinstall/souporcell/souporcell_pipeline.py", line 455, in freebayes
    p = subprocess.Popen(cmd, stdout = filehandle, stderr = errhandle)
  File "/home/ramadatta/anaconda3/envs/souporcell/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/home/ramadatta/anaconda3/envs/souporcell/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'freebayes': 'freebayes'

Finally I installed freebayes, which worked without issue

Thursday, January 4, 2024

Resolved - Issues with running Monocle 3

I converted my adata to rds format using sceasy first since I cannot use the anndata object directly in R, specifically monocle 3.

Issue 1: 

cds <- as.cell_data_set(sce)
Error in UseMethod(generic = "as.cell_data_set", object = x) : 
  no applicable method for 'as.cell_data_set' applied to an object of class "c('SingleCellExperiment', 'RangedSummarizedExperiment', 'SummarizedExperiment', 'RectangularData', 'Vector', 'Annotated', 'vector_OR_Vector')"

Solution : 

This was because my sce object was: 

> class(sce)
[1] "SingleCellExperiment"
attr(,"package")
[1] "SingleCellExperiment"


When I changed sce object as seurat object, this error was resolved: 

> class(sce)
[1] "Seurat"
attr(,"package")
[1] "SeuratObject"

Now, cds object looks like this:


>
class(cds)

[1] "cell_data_set"
attr(,"package")
[1] "monocle3"

Issue 2: 

cds <- preprocess_cds(cds, num_dim = 100)
Error in names(sf) <- colnames(SingleCellExperiment::counts(cds)) : 
  attempt to set an attribute on NULL

Solution : source 

> ## Calculate size factors using built-in function in monocle3
> cds <- estimate_size_factors(cds)

Wednesday, January 3, 2024

QA on in Single cell data analysis

Question 1:

I have adata object which does not have layers with spliced and unspliced counts information. Is it even possible one of the kernels from cellrank2 without this information to find the cell fate transition? If yes for above, could you kindly advice what could be used alternative to the spl/unspl count layers?

Yes, we can use pseudotime instead of RNAvelocity to calculate cellfates and transitions.

In cases, where I am not sure of the root celltype - is it possible to use cellrank2?

Yes, using the following command we can:
adata.obsm['X_diffmap'][:, 3].argmax()

Single cell analysis - My short summaries collection

1. In single-cell RNA sequencing, mitochondrial (mt), ribosomal (ribo), and hemoglobin (Hb) gene expression levels serve as quality indicators:

  • mt Genes: High levels indicate cell stress or damage. (Should be less 5-10%)
  • ribo Genes: Elevated levels suggest RNA isolation or library preparation issues. (no standard cutoff)  

  • Hb Genes: High expression in non-blood cells flags potential red blood cell contamination. (maybe 50% in RBC but no standard cutoff)
2. The choice between RNA velocity and pseudotime them depends on the specific research context and data quality, with RNA velocity requiring data on both spliced and unspliced transcripts, while pseudotime focuses on overall gene expression changes.

Thursday, December 21, 2023

Tip: Single Cell Analysis in python - Doublet Detection notes

 Had some issues, so ran the following commands:

# remotes::install_github("davismcc/scater")  in R prompt in the conda environment from linux terminal

# BiocManager::install("scDblFinder") - in R prompt in conda environment from linux terminal