A First Step for Generative Biological Search: Bringing LLMs to Biology and Medicine

Since our inception, we’ve been pitching a vision of Biological Search and of a Biological Atlas. We’ve built our platform to be able to ingest public and private biological data, to operate flexibly on that data, and to search over the entire corpus. And starting this week, we’ve made the Enable Medicine Platform free to academic accounts, allowing for even more users to upload, analyze, and share their biological data.

Meanwhile, recent advancements in Large Language Models have accelerated change in many industries; in Biology, we see an unprecedented opportunity to make tooling, data, and knowledge drastically more accessible.

Introducing Generative Biological Search (GBS)

Today we’re releasing a first step in Generative Biological Search on our growing Atlas, powered by LLMs. In just the last few weeks, we’ve been able to show that we can use LLMs to:

Simplify performing research on our platform
Discover data across our platform
Provide direct biological insight

This is undoubtedly a first step of many. Beyond the work shared here, this first step serves as an example of how AI, LLMs, and General Intelligence can solve real problems for scientists.

What Can GBS Do Today?

We classify the work we've done into three main categories:

Embedded Features: plugging in LLMs seamlessly into existing features on the platform, improving their ease of use
Semantic Search: allowing users to search across their data using natural language
Enable Medicine Assistant: building a chat-based research assistant to improve the productivity of every researcher using the platform

Embedded Features

Automated Cell Phenotyping

The Unsupervised Clustering extension allows users to run industry standard algorithms for grouping cells into types. The user must then manually label each cell group with its cell type annotation.

We now leverage LLMs to suggest cluster labels for each group. We can utilize the broad knowledge that LLMs are trained on to identify common cell types and to provide reasoning about why the label was suggested, based on the cluster’s biomarker expression statistics. We are excited about the ability to accelerate a crucial task that once required significant time and biological expertise from researchers.

Similar Studies

One of our overarching goals at Enable Medicine is to build the world’s largest Biological Atlas. With it, users can access their own data, and augment their work by discovering other relevant datasets.

Using LLMs, we are now able to suggest similar studies for any given study. We hope this serves as a way to streamline data discoverability, and to scale the impact and power of scientists’ work.

Explorer Cohorts

The Explorer allows users to visualize comparisons across clinical cohorts. We allow for flexible cohort definitions, enabling powerful analysis but making cohort definition a potentially tedious task.

Users can now define cohorts using natural language. This feature will convert the user description to a structured cohort output that the Explorer uses to render its plots. This can significantly lower the time it takes to analyze your data.

‍

Semantic Search

AI is also playing a key role in the future of biological search on our platform. LLM technology has unlocked full free-text semantic search within our Atlas Search, on all levels of our data hierarchy. This is in addition to the more granular search that we already support.

You can now search for something like “human skin samples”, and receive all relevant results, whether they’re classified as “skin”, or “epithelial”, or other related terms. This opens the door for exciting new cross-dataset research and insight generation, as datasets are rarely standardized with the labels and metadata that are provided.

Enable Medicine Assistant

Finally, we’re excited to release the initial version of our Enable Medicine research assistant powered by LLMs.

This assistant will be deeply integrated throughout the entire Enable Medicine platform, able to assist researchers through their entire research process. Our assistant has:

Awareness of platform capabilities and the ability to move the research process forward
Knowledge about data that the user has access to
Ability to converse naturally and handle various scientific questions about general biology and the Enable Medicine Platform

To ensure that the assistant does not provide inaccurate or invalid responses, it will always wait on user confirmation to actually take actions. Additionally, to keep the scientific process flexible, we built out support for richer interaction patterns - redirecting users throughout the platform, citing sources and linking information, and returning structured answers with multiple suggestions for the user to take.

The assistant can now be accessed from any page in the Enable Medicine Portal. In the near future, we’ll continue to improve the assistant, such as providing the assistant with context to target responses. For example, if a user is viewing a specific region, the assistant may provide details about the region and its associated metadata without prompting.

The Enable Medicine Assistant is equipped with general biological knowledge.

The Enable Medicine Assistant can answer questions about data stored on platform.

The Enable Medicine Assistant can answer specific platform-related questions.

The Enable Medicine Assistant is designed to stay focused on scientific questions and minimze hallucinations.

Limitations

Overall, we’re extremely excited about these new AI-backed features and have already seen how these features can provide true value to scientists. We’re just beginning to see the full impact of this technology, and we expect that tools and models will only continue to grow more powerful and reliable.

Quality

However, LLMs today are not yet trusted to make decisions without human supervision, and can still produce hallucinations or fail to return relevant responses. The huge improvements from GPT-3.5 to GPT-4 have allowed us to feel confident integrating the technology into our platform, but these features still require human verification. As the field continues to improve, we believe that more of these concerns will become resolved, and we’re confident that the path forward involves embracing AI in the research process.

Latency

To date, GPT-4 is known to have a certain amount of associated latency. This affects us most with our Enable Medicine Assistant; in this first release, 10-30 second API responses may be expected. We do expect that this will improve over time.

Try Out Generative Biological Search Today

Generative Biological Search is available on the Enable Medicine Platform starting today! You can access it in all of the above locations by signing up for an Enable Medicine account.

These features are all still experimental, and we’re working hard to improve and expand their capabilities. Beyond today’s launch, we’re planning even more improvements to our platform, from adding more general biological knowledge in search, to better ways to operate on our data and interact with images more dynamically, to easier ways to summarize analyses.

Help Us Build the Future of Research

Imagine having access to assistance throughout your entire workflow, guiding you not only with the process of research but providing necessary insight required. Ask it to summarize your data; to recommend next steps; to build plots and create insights from your findings; and to relate your findings to other studies. More than ever before, we believe we have the tools to chart a concrete roadmap toward this vision.

By continuing to innovate and adapt the latest in AI to science, we can help accelerate the entire field of biological research. On the Enable Medicine Platform, we hope users will be able to conduct novel biological research with increasing ease, increasing collaboration, and increasing support.

Our long-term vision hasn’t changed: a Biological Atlas to enable Biological Search and Open Research. We hope that developing our platform and Atlas will be a community-driven effort. Whether through AI and LLM expertise, or biological data and research, we are always looking for feedback and ideas, and even hope to build our own open source ecosystems. If you are a biologist, scientist, engineer, or anything in between, check out our website, get started with our platform, and reach out to us to collaborate on our vision for the future of science.

‍

Jeff Chang

Software Engineer

Jeff is a software engineer at Enable. He completed his Bachelor’s degree in Computer Science at Harvard. Prior to joining Enable, Jeff worked on data processing and machine learning at Airbnb. In his spare time, he enjoys gardening, reading, and spending too much money on clothes.