Large-scale feature learning of chromosome conformation for genome modeling

Date:

Presentation at Argonne National Laboratory. The slides can be found here.

The exponential growth in the quantity and quality of biological data in recent years, coupled with rapid advancements in machine learning technologies, has laid the groundwork for major breakthroughs in computational modeling of cellular behavior. At the forefront of this new wave of cell modeling has been the development of biological foundation models. These models have proven powerful on DNA sequence, single-cell RNA-seq, and protein sequence data, but their development on epigenomic data has been largely underexplored. The importance of genome architecture, specifically, for modeling important cellular processes such as gene regulation and cell replication warrants its own model development. To address this gap, we introduce a foundation model for chromosome conformation data (Hi-C) capable of learning information-rich and context-dependent representations of locus-level genome structure. In this talk, I will discuss the challenges of building such a model, highlight its capabilities, and outline areas for improvement and future work.