Flow matching for generative modelling in bioinformatics and computational biology

Abstract

Numerous problems in bioinformatics and computational biology can be framed as a task of learning a mapping from one state of a biological system to another relevant state or of exploring novel data points across biologically constrained spaces. However, manually deriving such mappings—for example, to transform cells in a diseased state back into a healthy state, or extrapolating from existing datasets to create new data—is often non-trivial and can require extraordinary domain expertise and resources. Fortunately, the field of generative artificial intelligence (AI) has introduced a new training paradigm referred to as (conditional) flow matching, which has emerged as a promising solution to this problem, with broad applicability in computer vision, natural language processing, and the physical and life sciences. Flow matching is a powerful and principled, data-driven framework for efficiently learning a mapping between arbitrary pairs of high-dimensional data distributions, making it well suited for addressing problems in molecular and cell biology. In this Review, we characterize the theoretical foundations of flow matching and its applications in biomolecular modelling for small molecules, proteins, DNA/RNA, and their interactions, as well as its uses in single/multi-cellular modelling for cell phenotyping and imaging, each contributing towards the development of an AI-based virtual cell. Finally, this review highlights open-source flow-matching methods and discusses future directions in flow-based generative modelling for bioinformatics and computational biology.

Publication
Nature Machine Intelligence, 1-18
Alex Tong
Alex Tong
Principal Investigator

I work on improving flow models with applications to cells and proteins.