Fast Diffusion Optimal Transport for Manifold-of-Manifold Embeddings


High-throughput biomedical data is now being generated massively in parallel in different conditions or patients. However, there are few systematic methods for organizing a large collection of datasets rather than data points and for gaining insight from such organization. Here we propose a manifold-based Wasserstein distance to learn and embed the manifold of samples. Our method, based on graph diffusions, is up to 50x faster than commonly used entropic regularized algorithms. We apply this to organize single-cell datasets arising from CRISPR perturbations in single-cell data.

NeurIPS Workshop on Learning Meaningful Representations of Life.