3 Questions: On biology and medicine’s “data revolution”

MIT News

By: Jane Halpern | Elvira Forte | Department of Electrical Engineering and Computer Science

September 3^rd 2025 at 1:15 am

Caroline Uhler is an Andrew (1956) and Erna Viterbi Professor of Engineering at MIT; a professor of electrical engineering and computer science in the Institute for Data, Science, and Society (IDSS); and director of the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard, where she is also a core institute and scientific leadership team member.

Uhler is interested in all the methods by which scientists can uncover causality in biological systems, ranging from causal discovery on observed variables to causal feature learning and representation learning. In this interview, she discusses machine learning in biology, areas that are ripe for problem-solving, and cutting-edge research coming out of the Schmidt Center.

Q: The Eric and Wendy Schmidt Center has four distinct areas of focus structured around four natural levels of biological organization: proteins, cells, tissues, and organisms. What, within the current landscape of machine learning, makes now the right time to work on these specific problem classes?

A: Biology and medicine are currently undergoing a “data revolution.” The availability of large-scale, diverse datasets — ranging from genomics and multi-omics to high-resolution imaging and electronic health records — makes this an opportune time. Inexpensive and accurate DNA sequencing is a reality, advanced molecular imaging has become routine, and single cell genomics is allowing the profiling of millions of cells. These innovations — and the massive datasets they produce — have brought us to the threshold of a new era in biology, one where we will be able to move beyond characterizing the units of life (such as all proteins, genes, and cell types) to understanding the `programs of life’, such as the logic of gene circuits and cell-cell communication that underlies tissue patterning and the molecular mechanisms that underlie the genotype-phenotype map.

At the same time, in the past decade, machine learning has seen remarkable progress with models like BERT, GPT-3, and ChatGPT demonstrating advanced capabilities in text understanding and generation, while vision transformers and multimodal models like CLIP have achieved human-level performance in image-related tasks. These breakthroughs provide powerful architectural blueprints and training strategies that can be adapted to biological data. For instance, transformers can model genomic sequences similar to language, and vision models can analyze medical and microscopy images.

Importantly, biology is poised to be not just a beneficiary of machine learning, but also a significant source of inspiration for new ML research. Much like agriculture and breeding spurred modern statistics, biology has the potential to inspire new and perhaps even more profound avenues of ML research. Unlike fields such as recommender systems and internet advertising, where there are no natural laws to discover and predictive accuracy is the ultimate measure of value, in biology, phenomena are physically interpretable, and causal mechanisms are the ultimate goal. Additionally, biology boasts genetic and chemical tools that enable perturbational screens on an unparalleled scale compared to other fields. These combined features make biology uniquely suited to both benefit greatly from ML and serve as a profound wellspring of inspiration for it.

Q: Taking a somewhat different tack, what problems in biology are still really resistant to our current tool set? Are there areas, perhaps specific challenges in disease or in wellness, which you feel are ripe for problem-solving?

A: Machine learning has demonstrated remarkable success in predictive tasks across domains such as image classification, natural language processing, and clinical risk modeling. However, in the biological sciences, predictive accuracy is often insufficient. The fundamental questions in these fields are inherently causal: How does a perturbation to a specific gene or pathway affect downstream cellular processes? What is the mechanism by which an intervention leads to a phenotypic change? Traditional machine learning models, which are primarily optimized for capturing statistical associations in observational data, often fail to answer such interventional queries.There is a strong need for biology and medicine to also inspire new foundational developments in machine learning.

The field is now equipped with high-throughput perturbation technologies — such as pooled CRISPR screens, single-cell transcriptomics, and spatial profiling — that generate rich datasets under systematic interventions. These data modalities naturally call for the development of models that go beyond pattern recognition to support causal inference, active experimental design, and representation learning in settings with complex, structured latent variables. From a mathematical perspective, this requires tackling core questions of identifiability, sample efficiency, and the integration of combinatorial, geometric, and probabilistic tools. I believe that addressing these challenges will not only unlock new insights into the mechanisms of cellular systems, but also push the theoretical boundaries of machine learning.

With respect to foundation models, a consensus in the field is that we are still far from creating a holistic foundation model for biology across scales, similar to what ChatGPT represents in the language domain — a sort of digital organism capable of simulating all biological phenomena. While new foundation models emerge almost weekly, these models have thus far been specialized for a specific scale and question, and focus on one or a few modalities.

Significant progress has been made in predicting protein structures from their sequences. This success has highlighted the importance of iterative machine learning challenges, such as CASP (critical assessment of structure prediction), which have been instrumental in benchmarking state-of-the-art algorithms for protein structure prediction and driving their improvement.

The Schmidt Center is organizing challenges to increase awareness in the ML field and make progress in the development of methods to solve causal prediction problems that are so critical for the biomedical sciences. With the increasing availability of single-gene perturbation data at the single-cell level, I believe predicting the effect of single or combinatorial perturbations, and which perturbations could drive a desired phenotype, are solvable problems. With our Cell Perturbation Prediction Challenge (CPPC), we aim to provide the means to objectively test and benchmark algorithms for predicting the effect of new perturbations.

Another area where the field has made remarkable strides is disease diagnostic and patient triage. Machine learning algorithms can integrate different sources of patient information (data modalities), generate missing modalities, identify patterns that may be difficult for us to detect, and help stratify patients based on their disease risk. While we must remain cautious about potential biases in model predictions, the danger of models learning shortcuts instead of true correlations, and the risk of automation bias in clinical decision-making, I believe this is an area where machine learning is already having a significant impact.

Q: Let’s talk about some of the headlines coming out of the Schmidt Center recently. What current research do you think people should be particularly excited about, and why?

A: In collaboration with Dr. Fei Chen at the Broad Institute, we have recently developed a method for the prediction of unseen proteins’ subcellular location, called PUPS. Many existing methods can only make predictions based on the specific protein and cell data on which they were trained. PUPS, however, combines a protein language model with an image in-painting model to utilize both protein sequences and cellular images. We demonstrate that the protein sequence input enables generalization to unseen proteins, and the cellular image input captures single-cell variability, enabling cell-type-specific predictions. The model learns how relevant each amino acid residue is for the predicted sub-cellular localization, and it can predict changes in localization due to mutations in the protein sequences. Since proteins’ function is strictly related to their subcellular localization, our predictions could provide insights into potential mechanisms of disease. In the future, we aim to extend this method to predict the localization of multiple proteins in a cell and possibly understand protein-protein interactions.

Together with Professor G.V. Shivashankar, a long-time collaborator at ETH Zürich, we have previously shown how simple images of cells stained with fluorescent DNA-intercalating dyes to label the chromatin can yield a lot of information about the state and fate of a cell in health and disease, when combined with machine learning algorithms. Recently, we have furthered this observation and proved the deep link between chromatin organization and gene regulation by developing Image2Reg, a method that enables the prediction of unseen genetically or chemically perturbed genes from chromatin images. Image2Reg utilizes convolutional neural networks to learn an informative representation of the chromatin images of perturbed cells. It also employs a graph convolutional network to create a gene embedding that captures the regulatory effects of genes based on protein-protein interaction data, integrated with cell-type-specific transcriptomic data. Finally, it learns a map between the resulting physical and biochemical representation of cells, allowing us to predict the perturbed gene modules based on chromatin images.

Furthermore, we recently finalized the development of a method for predicting the outcomes of unseen combinatorial gene perturbations and identifying the types of interactions occurring between the perturbed genes. MORPH can guide the design of the most informative perturbations for lab-in-a-loop experiments. Furthermore, the attention-based framework provably enables our method to identify causal relations among the genes, providing insights into the underlying gene regulatory programs. Finally, thanks to its modular structure, we can apply MORPH to perturbation data measured in various modalities, including not only transcriptomics, but also imaging. We are very excited about the potential of this method to enable the efficient exploration of the perturbation space to advance our understanding of cellular programs by bridging causal theory to important applications, with implications for both basic research and therapeutic applications.

“The current landscape of machine learning presents a unique opportunity to address problems across different levels of biological organization, from proteins to organisms, due to a data revolution in biology and significant advancements in AI,” says Caroline Uhler.

New gift expands mental illness studies at Poitras Center for Psychiatric Disorders Research

MIT News

By: Julie Pryor | McGovern Institute for Brain Research

September 3^rd 2025 at 12:50 am

One in every eight people — 970 million globally — live with mental illness, according to the World Health Organization, with depression and anxiety being the most common mental health conditions worldwide. Existing therapies for complex psychiatric disorders like depression, anxiety, and schizophrenia have limitations, and federal funding to address these shortcomings is growing increasingly uncertain.

Patricia and James Poitras ’63 have committed $8 million to the Poitras Center for Psychiatric Disorders Research to launch pioneering research initiatives aimed at uncovering the brain basis of major mental illness and accelerating the development of novel treatments.

“Federal funding rarely supports the kind of bold, early-stage research that has the potential to transform our understanding of psychiatric illness. Pat and I want to help fill that gap — giving researchers the freedom to follow their most promising leads, even when the path forward isn’t guaranteed,” says James Poitras, who is chair of the McGovern Institute for Brain Research board.

Their latest gift builds upon their legacy of philanthropic support for psychiatric disorders research at MIT, which now exceeds $46 million.

“With deep gratitude for Jim and Pat’s visionary support, we are eager to launch a bold set of studies aimed at unraveling the neural and cognitive underpinnings of major mental illnesses,” says Professor Robert Desimone, director of the McGovern Institute, home to the Poitras Center. “Together, these projects represent a powerful step toward transforming how we understand and treat mental illness.”

A legacy of support

Soon after joining the McGovern Institute Leadership Board in 2006, the Poitrases made a $20 million commitment to establish the Poitras Center for Psychiatric Disorders Research at MIT. The center’s goal, to improve human health by addressing the root causes of complex psychiatric disorders, is deeply personal to them both.

“We had decided many years ago that our philanthropic efforts would be directed towards psychiatric research. We could not have imagined then that this perfect synergy between research at MIT’s McGovern Institute and our own philanthropic goals would develop,” recalls Patricia.

The center supports research at the McGovern Institute and collaborative projects with institutions such as the Broad Institute of MIT and Harvard, McLean Hospital, Mass General Brigham, and other clinical research centers. Since its establishment in 2007, the center has enabled advances in psychiatric research including the development of a machine learning “risk calculator” for bipolar disorder, the use of brain imaging to predict treatment outcomes for anxiety, and studies demonstrating that mindfulness can improve mental health in adolescents.

For the past decade, the Poitrases have also fueled breakthroughs in the lab of McGovern investigator and MIT Professor Feng Zhang, backing the invention of powerful CRISPR systems and other molecular tools that are transforming biology and medicine. Their support has enabled the Zhang team to engineer new delivery vehicles for gene therapy, including vehicles capable of carrying genetic payloads that were once out of reach. The lab has also advanced innovative RNA-guided gene engineering tools such as NovaIscB, published in Nature Biotechnology in May 2025. These revolutionary genome editing and delivery technologies hold promise for the next generation of therapies needed for serious psychiatric illness.

In addition to fueling research in the center, the Poitras family has gifted two endowed professorships — the James and Patricia Poitras Professor of Neuroscience at MIT, currently held by Feng Zhang, and the James W. (1963) and Patricia T. Poitras Professor of Brain and Cognitive Sciences at MIT, held by Guoping Feng — and an annual postdoctoral fellowship at the McGovern Institute.

New initiatives at the Poitras Center

The Poitras family’s latest commitment to the Poitras Center will launch an ambitious set of new projects that bring together neuroscientists, clinicians, and computational experts to probe underpinnings of complex psychiatric disorders including schizophrenia, anxiety, and depression. These efforts reflect the center’s core mission: to speed scientific discovery and therapeutic innovation in the field of psychiatric brain disorders research.

McGovern cognitive neuroscientists Evelina Fedorenko PhD ’07, an associate professor, and Nancy Kanwisher ’80, PhD ’86, the Walter A. Rosenblith Professor of Cognitive Neuroscience — in collaboration with psychiatrist Ann Shinn of McLean Hospital — will explore how altered inner speech and reasoning contribute to the symptoms of schizophrenia. They will collect functional MRI data from individuals diagnosed with schizophrenia and matched controls as they perform reasoning tasks. The goal is to identify the brain activity patterns that underlie impaired reasoning in schizophrenia, a core cognitive disruption in the disorder.

A complementary line of investigation will focus on the role of inner speech — the “voice in our head” that shapes thought and self-awareness. The team will conduct a large-scale online behavioral study of neurotypical individuals to analyze how inner speech characteristics correlate with schizophrenia-spectrum traits. This will be followed by neuroimaging work comparing brain architecture among individuals with strong or weak inner voices and people with schizophrenia, with the aim of discovering neural markers linked to self-talk and disrupted cognition.

A different project led by McGovern neuroscientist and MIT Associate Professor Mark Harnett and 2024–2026 Poitras Center Postdoctoral Fellow Cynthia Rais focuses on how ketamine — an increasingly used antidepressant — alters brain circuits to produce rapid and sustained improvements in mood. Despite its clinical success, ketamine’s mechanisms of action remain poorly understood. The Harnett lab is using sophisticated tools to track how ketamine affects synaptic communication and large-scale brain network dynamics, particularly in models of treatment-resistant depression. By mapping these changes at both the cellular and systems levels, the team hopes to reveal how ketamine lifts mood so quickly — and inform the development of safer, longer-lasting antidepressants.

Guoping Feng is leveraging a new animal model of depression to uncover the brain circuits that drive major depressive disorder. The new animal model provides a powerful system for studying the intricacies of mood regulation. Feng’s team is using state-of-the-art molecular tools to identify the specific genes and cell types involved in this circuit, with the goal of developing targeted treatments that can fine-tune these emotional pathways.

“This is one of the most promising models we have for understanding depression at a mechanistic level,” says Feng, who is also associate director of the McGovern Institute. “It gives us a clear target for future therapies.”

Another novel approach to treating mood disorders comes from the lab of James DiCarlo, the Peter de Florez Professor of Neuroscience at MIT, who is exploring the brain’s visual-emotional interface as a therapeutic tool for anxiety. The amygdala, a key emotional center in the brain, is heavily influenced by visual input. DiCarlo’s lab is using advanced computational models to design visual scenes that may subtly shift emotional processing in the brain — essentially using sight to regulate mood. Unlike traditional therapies, this strategy could offer a noninvasive, drug-free option for individuals suffering from anxiety.

Together, these projects exemplify the kind of interdisciplinary, high-impact research that the Poitras Center was established to support.

“Mental illness affects not just individuals, but entire families who often struggle in silence and uncertainty,” adds Patricia Poitras. “Our hope is that Poitras Center scientists will continue to make important advancements and spark novel treatments for complex mental health disorders and, most of all, give families living with these conditions a renewed sense of hope for the future.”

Patricia (right) and James Poitras ’63 (center) with Guoping Feng, the the James W. (1963) and Patricia T. Poitras Professor of Brain and Cognitive Sciences at MIT.