-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Thank you for the great work on SLICE. I'm currently applying getEntropy() to a Seurat object with ~9,000 cells and ~30,000 genes. As suggested, I'm attempting to use a gene similarity matrix (km) as input instead of precomputed clusters.
However, computing a full gene-gene similarity matrix (e.g., kappa or Jaccard) over 30,000 genes is computationally infeasible due to memory and time constraints (900M entries). I have a few questions:
What is the most efficient way to compute the similarity matrix (km) in this context?
Is there a recommended method for approximating kappa similarity (e.g., sparse binary matrices, nearest neighbors)?
Can SLICE work with a partial or sparsified matrix (e.g., top-N neighbors per gene)?
Is it possible to directly use HVGs (highly variable genes) to reduce the matrix size without losing biological meaning?
Would it be acceptable to use Jaccard similarity instead of kappa?
I am currently using proxyC::simil() on binarized expression, which is fast but may not match SLICE expectations.
I’d appreciate any advice or best practices you could share for scaling SLICE to large single-cell datasets.
Best regards,
Francesco