AAAI 2025FOUNDATION MODELS FOR BIOLOGICAL DISCOVERIES
ABOUT
Foundation models (FMs) have transformed natural language understanding and computer vision. In particular, research on LLMs and multi-modal LLMs in these two domains is progressing rapidly, and this progress is starting to permeate a broad range of scientific disciplines. In this second offering of our workshop, our focus is on FMs for advancing biological discoveries. Current efforts have revealed that indeed FMs are advancing our ability to conduct biological research in silico, formulate interesting hypotheses and even design novel molecules, but biology remains complex and is ultimately a multi-systems discipline. Biology occurs when molecules come together, governed by an underlying physics advancing processes that occur at disparate spatio-temporal scales, only probed in the wet laboratories at different conditions, at different granularities, at different levels of fidelity, and incompletely. This workshop poses and advances the following question: How can we advance FMs to transform biological research? This workshop brings together an interdisciplinary community of researchers at various levels of their career to nucleate a community that advances this question.
TOPICS
In addition to the following research themes, we encourage novel contributions from researchers that bring different perspectives on the core focus of the workshop:
- Learning from Incomplete Data of Different Modalities
- Grounding Foundation Models in Knowledge Beyond the Data
- Reconciling Disparate Spatio-temporal scale and Varying Fidelity in Multimodal Data
- Beyond Prediction: Answering the How and the Why
- Quantifying Confidence of Predictions with Foundation Models
PROGRAM
SUBMISSION
To reflect the disciplinary diversity, we will encourage submissions of varying length:
- 1-page position papers
- 4-page papers on breaking results, datasets, benchmark
- 6-8-page papers on more detailed investigations
- 10-page surveys on topics aligned with the theme of the workshop
Each manuscript should be submitted in a single PDF file, including all content, figures, tables, and references, following the format of AAAI conference papers. Paper submissions need to include author information (reviews are not double-blinded). References do not counts towards page limit.
Papers should be submitted at: https://easychair.org/conferences/?conf=fms4bio25.
Concurrent submissions to other journals and conferences are acceptable. Accepted papers will be presented as posters or short talks during the workshop and published on the workshop website at https://llms4science-community.github.io/aaai2025. We encourage authors of accepted papers to submit datasets at https://github.com/LLMs4Science-Community. Selected accepted papers will be presented as contributed talks. As a tradition, accepted workshop papers are NOT included in the ACM Digital Library. The authors maintain the copyright of their papers. Author enquiries should be directed at llms4science@gmail.com.
IMPORTANT DATES
Following are the key dates for the workshop. All deadlines are “anywhere on earth” (UTC-12).
- Paper submission deadline: December 2, 2024
- Notification of decision: December 13, 2024
- Early AAAI 2025 Registration Deadline: December 19, 2024
- Workshop Day: March 4, 2024
ATTENDANCE
For each accepted paper, at least one author must attend the conference and present their work. Authors of all accepted papers must prepare a final version for publication and a three-minute short video presentation (further details will be provided in the acceptance notification).
SCHEDULE
9:00 - 9:05
Opening Remarks
9:05 - 9:45
Morning Keynote
Payel Das, Principal Research Scientist and Manager, IBM Research AI
9:45 – 10:45 | Session I
20 mins
P1. Understanding the Natural Language of DNA using Encoder-Decoder Foundation Models with Byte-level Precision
Aditya Malusare, Harish Kothandaraman, Dipesh Tamboli, Nadia Lanman and Vaneet Aggarwal
20 mins
P2. Deciphering enzymatic potential in metagenomic reads through DNA language
Prabakaran Ramakrishnan and Yana Bromberg
20 mins
P3. PathoLM: Identifying pathogenicity from the DNA sequence through the Genome Foundation Model
Sajib Acharjee Dip, Uddip Shuvo, Tran Chau, Haoqiu Song, Petra Choi, Xuan Wang and Liqing Zhang
10:45 - 11:00
Coffee/Tea Break
Light refreshments available near session rooms
11:00 - 11:55 | Session II
20 mins
P4. DeepAge: Harnessing Deep Neural Network for Epigenetic Age Estimation From DNA Methylation Data of Human Blood Samples
Sajib Acharjee Dip, Da Ma and Liqing Zhang
20 mins
P5. Scalable DNA Feature Generation and Transcription Factor Binding Prediction via Deep Surrogate Models
Anowarul Kabir, Toki Tahmid Inan, Kim Rasmussen, Amarda Shehu, Anny Usheva, Alan Bishop, Boian Alexandrov and Manish Bhattarai
15 mins
P6. Efficient High-Throughput DNA Breathing Features Generation Using Jax-EPBD
Toki Tahmid Inan, Anowarul Kabir, Kim Rasmussen, Amarda Shehu, Anny Usheva, Alan Bishop, Boian Alexandrov and Manish Bhattarai
12:00 - 12:30
Invited Talk
Yoseph Barash, Professor of Genetics, University of Pennsylvania
12:30 - 13:45
Lunch
On your own. No sponsored lunch provided
13:45 - 14:25
Afternoon Keynote
Jian Tang, Associate Professor at Mila-Quebec AI Institute and HEC Montreal
11:00 - 11:55 | Session III
20 mins
P7. DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration
Sizhe Liu, Yizhou Lu, Siyu Chen, Xiyang Hu, Jieyu Zhao, Tianfan Fu and Yue Zhao
20 mins
P8. Towards Automated Biological Discovery: LLM-based Multi-Agent AI System for Single-Cell Disease Marker Identification and Discovery
Jieli Zhou, Luting Zhou and Hongyi Xin
20 mins
P9. Transformer-Based Approach for Automated Functional Group Replacement in Chemical Compounds
Bo Pan, Zhiping Zhang, Kevin Spiekermann, Tianchi Chen, Xiang Yu, Liying Zhang and Liang Zhao
15 mins
P10. AbAffinity: A Large Language Model for Predicting Antibody Binding Affinity against SARS-CoV-2
Faisal Bin Ashraf, Animesh Ray and Stefano Lonardi
15:45 - 16:00
Coffee/Tea Break
Light refreshments available near session rooms
16:00 - 16:40
Evening Keynote
Jian Tang, Associate Professor at Mila-Quebec AI Institute and HEC Montreal
16:40 - 17:50 | Session IV
20 mins
P11. Leveraging Multimodal Representations to Predict Protein Melting Temperatures
Daiheng Zhang, Yan Zeng, Xinyu Hong and Jinbo Xu
20 mins
P12. Energy Efficient Protein Language Models: Leveraging Small Language Models with LoRA for Controllable Protein Generation
Aayush Shah and Shankar Jayaratnam
20 mins
P13. Multi-Omic Integration for Breast Cancer Subtype Classification Using Advanced AI Techniques
ajib Acharjee Dip and Liqing Zhang
15 mins
P14. Foundation Models for AI-Enabled Biological Design
Asher Moldwin and Amarda Shehu
17:50 - 18:00
Closing Remarks
KEYNOTE SPEAKERS

Payel Das
Trusted AI, IBM Thomas J Watson Research

Jian Tang
Montreal Institute for Learning Alogorithms (MILA)

Allison Heath
Center for Data Driven Discovery in Biomedicine

Yoseph Barash
University of Pennsylvania
GENERAL CO-CHAIRS

Amarda Shehu
George Mason University
amarda@gmu.edu

Yana Bromberg
Emory University
yana@bromberglab.org

Liang Zhao
Emory University
liang.zhao@emory.edu
STUDENT CO-ORGANIZERS

Weisen Zhao
George Mason University
wzhao9@gmu.edu

Yifei Zhang
Emory University
yifei.zhang2@emory.edu

Ethan Lee
Emory University
ethan.lee@emory.edu

Samuel Blouir
George Mason University

Manpriya Dua
George Mason University

Asher Moldwin
George Mason University