MIT researchers have developed a new approach to determine the 3D genome structures using generative artificial intelligence, enabling rapid analysis of thousands of structures in just minutes.
Every cell in the human body contains the same genetic sequence, yet each cell expresses only a subset of those genes. This cell-specific gene expression is partly determined by the three-dimensional structure of the genetic material, which controls the accessibility of each gene.
MIT chemists have developed a new approach to determine these 3D genome structures using generative artificial intelligence. Their technique can predict thousands of structures in just minutes, making it much speedier than existing experimental methods for analyzing the structures.
The Massachusetts Institute of Technology (MIT) is a private research university located in Cambridge, Massachusetts.
Founded in 1861, MIT is known for its academic programs in science, technology, engineering, and mathematics (STEM).
“The best way to predict the future is to invent it.”
“The only thing necessary for the triumph of evil is for good men to do nothing.”
The institute has produced many notable alumni, including 98 Nobel laureates and 34 Marshall Scholars.
MIT's campus spans over 168 acres and features a mix of modern and historic buildings.
The university is also home to numerous research centers and institutes, making it one of the world's leading institutions for innovation and discovery.
From Sequence to Structure
Inside the cell nucleus, DNA and proteins form a complex called chromatin, which has several levels of organization. Long strands of DNA wind around proteins called histones, giving rise to a structure somewhat like beads on a string. ‘Chemical tags known as epigenetic modifications can be attached to DNA at specific locations, affecting the folding of the chromatin and the accessibility of nearby genes.’
Chromatin is a complex of DNA, 'a fundamental substance that contains the genetic instructions used in the development and function of all living organisms' , histone proteins, and other non-histone proteins found in the nucleus of eukaryotic cells.
It plays a crucial role in packaging DNA into a compact structure, allowing for cell division and gene expression.
Chromatin consists of nucleosomes, which are units of DNA wrapped around histone octamers.
The structure and function of chromatin are essential for regulating gene activity, DNA replication, and repair.
Research has shown that changes in chromatin structure can lead to various diseases, including cancer.
Over the past 20 years, scientists have developed experimental techniques for determining chromatin structures. One widely used technique, known as Hi-C, links together neighboring DNA strands in the cell’s nucleus, allowing researchers to determine which segments are located near each other by shredding the DNA into many tiny pieces and sequencing it. However, this method is labor-intensive and can take about a week to generate data from one cell.
The ChromoGen Model
To overcome these limitations, MIT chemists developed a model that takes advantage of recent advances in generative AI to create a fast, accurate way to predict chromatin structures in single cells. The AI model they designed analyzes DNA sequences and predicts the chromatin structures those sequences might produce in a cell.
The Massachusetts Institute of Technology (MIT) is a private research university located in Cambridge, Massachusetts.
Founded in 1861, MIT is known for its academic programs in science and engineering, as well as its innovative research initiatives.
With over 4,500 faculty members and 11,000 students, 'one of the world's premier institutions for higher education' , MIT is one of the world's premier institutions for higher education.
The university is also a major hub for technological innovation, with many successful startups and companies emerging from its campus.
The ChromoGen model has two components: a deep learning model that ‘reads’ the genome and analyzes the information encoded in the underlying DNA sequence and chromatin accessibility data, and a generative AI model that predicts physically accurate chromatin conformations. When integrated, these components effectively capture sequence-structure relationships and generate many possible structures for each sequence.
Rapid Analysis
Once trained, the ChromoGen model can generate predictions on a much faster timescale than Hi-C or other experimental techniques. ‘Whereas you might spend six months running experiments to get a few dozen structures in a given cell type, you can generate a thousand structures in a particular region with our model in 20 minutes on just one GPU,’ says Greg Schuette, lead author of the paper.
The researchers used their model to generate structure predictions for more than 2,000 DNA sequences and compared them to experimentally determined structures. They found that the structures generated by the model were the same or very similar to those seen in the experimental data.
Potential Applications
This new approach has many potential applications, including analyzing how chromatin structures differ between cell types and how those differences affect their function. The model could also be used to explore different chromatin states that can exist within a single cell and how those changes affect gene expression.
Furthermore, this technique could help shed light on how mutations in a particular DNA sequence change the chromatin conformation, which could explain how such mutations may cause disease.
The researchers have made all of their data and the model available to others who wish to use it. The research was funded by the National Institutes of Health.