Language model powered digital biology

Published in arXiv, 2024

Recommended citation: Pickard, Joshua, et al. "Language model powered digital biology." arXiv preprint arXiv:2409.02864 (2024). https://arxiv.org/pdf/2409.02864

Recent advancements in Large Language Models (LLMs) are transforming biology, computer science, and many other research fields, as well as impacting everyday life. While transformer-based technologies are currently being deployed in biology, no available agentic system has been developed to tackle bioinformatics workflows. We present a prototype Bioinformatics Retrieval Augmented Data (BRAD) digital assistant. BRAD is a chatbot and agentic system that integrates a suite of tools to handle bioinformatics tasks, from code execution to online search. We demonstrate its capabilities through (1) improved question-and-answering with retrieval augmented generation (RAG), (2) the ability to run complex software pipelines, and (3) the ability to organize and distribute tasks in agentic workflows. We use BRAD for automation, performing tasks ranging from gene enrichment and searching the archive to automatic code generation for running biomarker identification pipelines. BRAD is a step toward autonomous, self-driving labs for digital biology.

Download paper here

Recommended BibTeX entry:

@article{pickard2024language,
  title={Language model powered digital biology},
  author={Pickard, Joshua and Choi, Marc Andrew and Oliven, Natalie and Stansbury, Cooper and Cwycyshyn, Jillian and Galioto, Nicholas and Gorodetsky, Alex and Velasquez, Alvaro and Rajapakse, Indika},
  journal={arXiv preprint arXiv:2409.02864},
  year={2024}
}