Eye on AI: Agentic AI Brings Bots to the Bench
Artificially intelligent agents—commonly known as agentic artificial intelligence (AI) or AI agents—are one of the latest developments in the rapidly accelerating field, and they are known for their distinctive capability to do, rather than simply talk or summarize. If the AI chatbot of two years ago could make you a grocery list, the AI agent of today can (with the right setup and permissions) make the list, order the groceries, and have them delivered to your door. More pressing than fully automated food delivery, however, is the question of accelerating scientific progress. Can AI agents do cancer research in their own right?
Many AI-centric research teams currently believe that AI agents, with the right training and resources, can rise to the level of “co-scientists”: autonomous companions that can think through scientific questions and go about solving them alongside researchers. Recently, an agentic AI spring has sprung in the world of biomedical research, with several research groups testing agentic co-scientist models. The still-nascent agents, if they can deliver on their potential, may be able to speed up cancer research and shorten the timeline from ideas to therapies for patients.
Jure Leskovec, PhD, a professor at Stanford University, helped usher in the ongoing boom in biomedical research agents by developing Biomni, which he presented at the AACR Annual Meeting 2026. As the first generalist biomedical research agent, Biomni distinguishes itself based on its comprehensive environment, Leskovec says—differentiating it from more basic, all-purpose agents like Claude Code. Leskovec and colleagues integrated multiple databases, software packages, and modeling tools into Biomni’s environment so that it can work on behalf of scientists by completing tasks they’d otherwise do themselves.
As the first generalist biomedical research agent, Biomni distinguishes itself based on its comprehensive environment, Leskovec says—differentiating it from more basic, all-purpose agents like Claude Code. Leskovec and colleagues integrated multiple databases, software packages, and modeling tools into Biomni’s environment so that it can work on behalf of scientists by completing tasks they’d otherwise do themselves.
“Getting AI-built apps to work, hopping between multiple programs—to me, that’s just automated suffering. We want to be doing science! So why bother using AI to code yourself research tools when good tools already exist?” Leskovec pointed out. “That’s why, when we built Biomni, we very intentionally gave it a virtual environment with dozens of research tools. We don’t want to reinvent the wheel; we want to speed up progress.”
Biomni’s Beginnings
Biomni started as a project of the Leskovec lab and has since spun out into a commercial enterprise, funded in part by the AI-heavy venture capital firm, Andreessen Horowitz. What inspired Leskovec to create his academic automaton? Leskovec said he could feel that it was the wave of the future.
As Leskovec described during his Annual Meeting presentation, Biomni works to adapt to scientists’ workstyle and areas of investigation, and over time, the agent is designed to refine its ability to assist in generating hypotheses and research questions that emerge from the trajectory of a scientist’s work. Despite Biomni’s ability to adapt to individual researchers, Leskovec does not think this predisposes the agent to sycophancy—the all-positive, no-negative feedback even to dubious ideas and requests that has plagued some AI models.
“In the end, the goal of Biomni is to solve tasks in the best way possible. So I do not think it would over-personalize in this sense. For example, I don’t think Biomni would look at a scientist who tends to do a lot of single-cell sequencing and draw the conclusion, ‘okay, we’re always going to do single-cell, regardless of whether we need to,’” Leskovec said. “The agent is goal-driven. It is told to answer questions and complete tasks, which requires the agent to think through the steps that either it needs to take or its human collaborators need to take in order to do science efficiently and effectively.”
There are, according to its creators, still limitations to Biomni. In the preprint that introduced Biomni to the scientific community, Leskovec and his co-authors addressed certain shortcomings to their agent’s capabilities, including nonexhaustive integrations of biomedical research tools, possible underweighting of foundational papers and ideas, and limited capacity for “deep biological thinking.” The open-access GitHub page reflects issues highlighted by users as well, ranging from banal problems with PDF fonts to deficits in Biomni’s “memory” across different sessions.
Even so, Leskovec remains optimistic about the future of agent-assisted research. To him, some of the biggest opportunities lie in strengthening the global scientific community’s collective access to data, scientific literature, best practices, and research know-how.
“To do cutting-edge science, you need a critical mass of human expertise. And I think there are very few places on the planet that have that critical mass. Agents like Biomni democratize that expertise by giving users access essentially wherever they are,” he said. “It would reduce the bottlenecks to progress humongously if you don’t have experts clustered at what is, in the grand scheme, a small set of institutions. Biomni can channel that knowledge into a workable form in labs across the world.”
Co-scientist Agents Abound
Scientists throughout the AI space clearly share Leskovec’s optimism, with a new crop of research agents making their debut. One group of researchers from Google DeepMind has simply decided to call their co-scientist AI agent “Co-Scientist.” Powered by Google’s AI system Gemini, which competes with the likes of Anthropic’s Claude, OpenAI’s ChatGPT, etc., Co-Scientist uses multiple AI agents to execute tasks like data analysis, literature review, or hypothesis generation independently, and according to its designers, the agent sharpens its scientific reasoning by arguing with itself in an “idea tournament.”
In their paper, the Google researchers successfully induced Co-Scientist to hypothesize how existing approved drugs could be repurposed for treating acute myeloid leukemia (AML). Co-Scientist’s reasoning led the team to conduct in vitro testing of drugs it identified as candidates. To validate Co-Scientist’s work, a panel of 30 experts assessed the agent’s proposed hypotheses and list of drugs that could be repurposed to treat AML. Following the experts’ selection of Co-Scientist’s most promising hypotheses, wet-lab experiments confirmed anti-AML activity of 3 out of the 5 tested drugs. The validation, the team wrote, amounts to a demonstration of the agent’s capacity to meaningfully contribute to knowledge creation.
In a paper published on the same day as Google’s Co-Scientist manuscript, the AI startup FutureHouse introduced Robin, their more anthropomorphically named, small-c co-scientist agent. In a similar vein to Co-Scientist, Robin orchestrates a series of agents to “generate hypotheses, propose experiments, interpret experimental results, and generate updated hypotheses, achieving a semi-autonomous approach to scientific discovery.”
The authors behind Robin performed a similar test case in which Robin worked through the process of generating hypotheses for dry age-related macular degeneration. Robin’s process began as a general query about the disease state’s causes and iteratively progressed to proposing experiments to test candidate therapies aimed at increasing phagocytosis as the mechanism of action. This process led to in vitro testing of a drug with an identified therapeutic mechanism of action (testing that, once again, relied upon experts’ judgment of which of the AI agents’ hypotheses were most feasible). The experiments found that the Robin-identified drug performed as hypothesized, successfully increasing phagocytosis, and the research team estimated that Robin’s involvement compressed the time needed to arrive at an experiment-ready point from more than 300 hours to less than two hours.
Getting AI-built apps to work, hopping between multiple programs—to me, that’s just automated suffering. We want to be doing science!
Like Biomni, however, both Co-Scientist and Robin have limitations—including the fundamental nature of the large language model (LLM) technology that underpins agentic capacity. As the authors from Google noted, “imperfect factuality and the potential for hallucinations” are “intrinsic” to the models. The FutureHouse team wrote that Robin is not yet capable of generating executable experimental protocols with the desired level of precision. One of Robin’s subagents, Finch, also requires highly precise prompt engineering from domain experts to ensure accurate data analysis outputs.
It’s difficult to say where the field may go from here, considering the unrelenting speed with which AI technology evolves. But one thing is for sure: many scientists are betting on AI agents’ ability to speed up research. As co-scientist agents progress and more researchers begin to use them, the cancer research community will see how those bets may pay off. In the meantime, scientists with job-security worries can find some solace in the most striking caveat outlined in the Biomni preprint: “No system yet captures the full scope of human biomedical expertise.”
AACR Annual Meeting sessions are available for virtual viewing for all registered attendees through October 2026.
To read more about agentic AI and cancer, check out our story on AI agents being developed for the clinic.


