High-throughput biomedical data is now being generated massively in parallel in different conditions or patients. However, there are few systematic methods for organizing a large collection of datasets rather than data points and for gaining insight from such organization. Here we propose a manifold-based Wasserstein distance to learn and embed the manifold of samples. Our method, based on graph diffusions, is up to 50x faster than commonly used entropic regularized algorithms. We apply this to organize single-cell datasets arising from CRISPR perturbations in single-cell data.