Eye on AI: Applying Artificial Intelligence to Drive Cancer Research, Part 1

Molecular cloning, next-generation sequencing, CRISPR—time and again, we have seen how technological advances revolutionize cancer research, empowering scientists to dive further into the depths of cancer biology than was previously even conceivable. With each new technology, our understanding of cancer’s complexities has grown, but even more questions and opportunities have emerged. There is no doubt that continued innovation is needed to propel cancer research further, and artificial intelligence (AI) has shown immense promise to do exactly that. 

With its superhuman ability to sift through tremendous amounts of data and recognize complex patterns, AI is allowing researchers to dig deeper into the fundamentals of cancer biology, uncover innovative therapeutic approaches, and enhance patient care.  

During two recent events—the AACR Special Conference on Artificial Intelligence and Machine Learning and an AACR webinar on AI and Machine Learning in IO—researchers highlighted how AI is showing potential to drive advances across the cancer research spectrum. 

In the first installment of this two-part post, we will examine some applications for discovery science and translational cancer research shared during these presentations, including AI’s ability to characterize the mutational landscapes of tumors, identify neoantigens and novel drug targets, and aid drug discovery. Stay tuned for part two to learn about the innovative clinical AI applications that researchers are exploring to improve cancer diagnosis and treatment. 

AI Applications for Discovery Science: Enabling a Deeper Understanding of Cancer 

It is well established that genetic alterations drive cancer development, and studies conducted decades ago uncovered key oncogenic drivers, including mutant p53, BCR-ABL fusions, and mutant KRAS, among many others. 

But most of these landmark studies examined each alteration in isolation, explained John-William Sidhom, MD, PhD, of Weill Cornell Medicine, during a presentation at the AACR Special Conference on Artificial Intelligence and Machine Learning.  

John-William Sidhom, MD, PhD

“Cancer doesn’t behave according to one mutation,” he continued. “It evolves through a network of mutations that work together to give you some phenotype of the cancer. We’re missing the patterns in these complex genomic signatures because our current tools were never built to see them.” 

To better understand the relationships among the many mutations found in each patient’s cancer and the biological impacts of the intervariant relationships, Sidhom turned to AI. 

“The question was, ‘What if we had a ChatGPT for the cancer genome—one that could reason through all these mutations in a given patient?’” he said. 

To this end, Sidhom and colleagues developed a large language model that examined sequencing data from thousands of tumors in The Cancer Genome Atlas (TCGA) and AACR Project GENIE databases to learn the mutational landscape of each individual tumor. The model analyzed variants both locally—discovering individual variants and examining their surrounding DNA sequence—and globally—understanding which other variants co-occurred within that tumor.  

Large language models typically use what are called transformers to convert text into numerical values that can be interpreted by a computer. For the model’s first transformer, the input “text” was DNA sequences, and the output was the identification of a variant and its local DNA context, such as the presence of nearby enhancer or promoter regions. This output served as the input for the second transformer, which examined each variant within the context of all the other variants within the tumor and provided a mutational fingerprint for each tumor.  

Using only the tumor sequencing data provided, the model was able to classify variants as pathogenic, benign, or of uncertain significance. This demonstrated that “the model has learned something biologically salient about the function of the variants just by essentially reading the cancer genome,” Sidhom noted.  

Unlike existing research methods, the model does not require the user to provide information about known variants, which means it has the potential to discover previously unknown pathogenic variants, including ones that are patient-specific, Sidhom explained. 

“It gets you out of a paradigm of having to know a priori what variants I want to look at and what co-occurrences I need to study,” he said. 

As a form of validation, Sidhom showed that the model’s discovery of pathogenic variants in colorectal cancer was consistent with the Vogelstein model of colorectal carcinogenesis, with the AI model not only accurately identifying alterations in APC, KRAS, and TP53 as drivers of colorectal cancer but also determining the correct order in which these occur during carcinogenesis.  

Sidhom suggested that the AI model could help researchers extract important biological insights from tumor sequencing data, such as novel mutational signatures and mutational dependencies, that could ultimately influence treatment. Underscoring this point, he demonstrated that the model was able to use the information it gleaned from tumor DNA to predict responses in an ex vivo drug screen. This could potentially aid the design of rational drug combinations or predict treatment responses in a clinical setting. 

“This is hopefully a peek into the future of cancer treatment,” Sidhom said. “We have a [cancer] genome, we sequence it, we use models to understand the variants, and then we can understand what drugs would be best tailored to that [cancer].” 

Translational AI Applications: Uncovering New Therapeutic Approaches for Cancer 

Fundamental discoveries, like the understanding of somatic variants discussed above, pave the way to the exploration of new therapeutic strategies. This step of the cancer research pipeline—where researchers attempt to translate biological insights into effective cancer treatments—is also benefiting from AI, as highlighted by several presentations during the AACR webinar on AI and Machine Learning in IO. 

Finding new precision medicine targets 

One area where AI has flexed its muscle is in identifying new therapeutic targets by combing through large sets of genetic and phenotypic data. 

Arnav Mehta, MD, PhD

“We’re now in an era where we’ve generated vast amounts of multimodal data,” said Arnav Mehta, MD, PhD, of Stanford University, during the webinar. “There’s been a series of recent studies where leveraging biologically informed neural network architecture has allowed interpretability of these multimodal features and [improved] our ability to predict genes, pathways, and processes that may be influential in response and survival.”  

As one example, Mehta pointed to research led by Eliezer Van Allen, MD, of Dana-Farber Cancer Institute, that used AI to analyze sequencing data from more than 3,000 prostate tumors to characterize the genes and pathways associated with disease progression and treatment resistance. With this approach, the researchers found that the protein MDM4 was associated with resistance to androgen deprivation therapy and hypothesized that it may represent a therapeutic vulnerability. Follow-up experiments in the lab demonstrated that MDM4 inhibition slowed the proliferation of prostate cancer cells, suggesting that this approach may be effective against certain treatment-resistant prostate cancers. 

Identifying high-quality personalized neoantigens 

In another presentation, Benjamin Greenbaum, PhD, of Memorial Sloan Kettering Cancer Center (MSKCC), discussed how AI has also helped researchers identify tumor neoantigens against which therapeutic cancer vaccines can be designed. He explained that AI tools can assess the quality of neoantigens by determining which neoantigens are most likely to be recognized as “non-self” by immune cells. AI can also predict the impact of targeting a particular neoantigen on cancer fitness—information that can help researchers narrow down which neoantigens are worth targeting. 

Benjamin Greenbaum, PhD

This approach was utilized in the development of autogene cevumeran, an investigational personalized vaccine to treat pancreatic cancer, said Greenbaum. Using sequencing data from each patient’s pancreatic tumor, Greenbaum and other researchers led by Vinod Balachandran, MD, also of MSKCC, used AI tools to identify somatic mutations, infer the immunogenicity of each mutation by drawing on information about known microbial-derived immunogenic peptides, and predict how well T cells would distinguish the mutant from the wild-type peptide.  

This information was then integrated into a model to estimate how targeting each neoantigen would impact the cancer cell population, providing researchers with a list of candidate targets for each patient’s unique tumor. In a clinical trial, this AI-guided personalized vaccine approach induced neoantigen-specific immune responses in eight patients with pancreatic cancer, who have experienced significantly longer recurrence-free survival than patients who did not have responses. These promising results underscore the power of AI to identify effective targets for cancer treatment. 

Accelerating drug discovery  

Once targets are identified, researchers must design drugs that selectively go after these targets. Traditionally, this has required a long and costly process of solving the protein structure of the drug target, tediously mining chemical libraries for compounds that may bind to it, testing the compounds, and optimizing lead compounds for enhanced binding. AI, however, has the potential to shave years off the process

AI tools like AlphaFold can accelerate target structure prediction by using decades-worth of protein structure data housed in the Protein Data Bank, and new AI models can use “machine learning-directed evolution” to design optimized drug compounds, said Mehta.  

Machine learning-directed evolution models work by iteratively testing and improving drug designs in silico until a compound with the desired properties is achieved. Mehta noted that this method has been particularly effective for antibody design, where an optimized antibody can be designed within five rounds of evolution. He also shared how AI can automate other elements of the drug discovery pipeline, including predicting a drug’s toxicology and pharmacokinetics. 

“There is an exciting frontier as we emerge on new immuno-oncology targets … where we can design molecules to the specifications we want,” said Mehta. “The advent of these tools has really taken a hold in drug discovery efforts, where we are now leveraging many of these tools to develop best-in-class and first-in-class molecules.” 


To learn about more AI applications across the cancer research spectrum, check out the special series on Driving Cancer Discoveries with Computational Research, Data Science, and Machine Learning/AI in the AACR journal Cancer Research. The series includes additional examples of discovery science and translational research applications of AI, including: