Tuesday, September 26, 2023

Notes on adata object

mdata["GEX"].X = mdata["GEX"].layers['counts'].copy()

sc.pp.normalize_total(mdata["GEX"], target_sum=1e4)

sc.pp.log1p(mdata["GEX"])

sc.pp.highly_variable_genes(mdata["GEX"], n_top_genes=2000, batch_key='batch')

sc.pp.neighbors(mdata["GEX"])

WARNING: You’re trying to run this on 13953 dimensions of `.X`, if you really want this, set `use_rep='X'`.

         Falling back to preprocessing with `sc.pp.pca` and default params.


In the above, I did not explicitly calculate the pca. So, scanpy is letting me know that it is calculating the PCA and then use that to calculate the neighbors. This makes sense, just because we have ~13K genes does not means we capture more information. Too much high dimensional data is difficult even computationally deal with. So, doing a PCA makes sense here.