AAAI 2025FOUNDATION MODELS FOR BIOLOGICAL DISCOVERIES

ABOUT

Foundation models (FMs) have transformed natural language understanding and computer vision. In particular, research on LLMs and multi-modal LLMs in these two domains is progressing rapidly, and this progress is starting to permeate a broad range of scientific disciplines. In this second offering of our workshop, our focus is on FMs for advancing biological discoveries. Current efforts have revealed that indeed FMs are advancing our ability to conduct biological research in silico, formulate interesting hypotheses and even design novel molecules, but biology remains complex and is ultimately a multi-systems discipline. Biology occurs when molecules come together, governed by an underlying physics advancing processes that occur at disparate spatio-temporal scales, only probed in the wet laboratories at different conditions, at different granularities, at different levels of fidelity, and incompletely. This workshop poses and advances the following question: How can we advance FMs to transform biological research? This workshop brings together an interdisciplinary community of researchers at various levels of their career to nucleate a community that advances this question.

TOPICS

In addition to the following research themes, we encourage novel contributions from researchers that bring different perspectives on the core focus of the workshop:

  • Learning from Incomplete Data of Different Modalities
  • Grounding Foundation Models in Knowledge Beyond the Data
  • Reconciling Disparate Spatio-temporal scale and Varying Fidelity in Multimodal Data
  • Beyond Prediction: Answering the How and the Why
  • Quantifying Confidence of Predictions with Foundation Models

PROGRAM

SUBMISSION

To reflect the disciplinary diversity, we will encourage submissions of varying length:

  • 1-page position papers
  • 4-page papers on breaking results, datasets, benchmark
  • 6-8-page papers on more detailed investigations
  • 10-page surveys on topics aligned with the theme of the workshop

Each manuscript should be submitted in a single PDF file, including all content, figures, tables, and references, following the format of AAAI conference papers. Paper submissions need to include author information (reviews are not double-blinded). References do not counts towards page limit.

Papers should be submitted at: https://easychair.org/conferences/?conf=fms4bio25.

Concurrent submissions to other journals and conferences are acceptable. Accepted papers will be presented as posters or short talks during the workshop and published on the workshop website at https://llms4science-community.github.io/aaai2025. We encourage authors of accepted papers to submit datasets at https://github.com/LLMs4Science-Community. Selected accepted papers will be presented as contributed talks. As a tradition, accepted workshop papers are NOT included in the ACM Digital Library. The authors maintain the copyright of their papers. Author enquiries should be directed at llms4science@gmail.com.

IMPORTANT DATES

Following are the key dates for the workshop. All deadlines are “anywhere on earth” (UTC-12).

  • Paper submission deadline: December 2, 2024
  • Notification of decision: December 13, 2024
  • Early AAAI 2025 Registration Deadline: December 19, 2024
  • Workshop Day: March 4, 2024

ATTENDANCE

For each accepted paper, at least one author must attend the conference and present their work. Authors of all accepted papers must prepare a final version for publication and a three-minute short video presentation (further details will be provided in the acceptance notification).

SCHEDULE

9:00 - 9:05

Opening Remarks

9:05 - 9:45

Morning Keynote

Payel Das, Principal Research Scientist and Manager, IBM Research AI

9:45 – 10:45 | Session I

20 mins

P1. Understanding the Natural Language of DNA using Encoder-Decoder Foundation Models with Byte-level Precision

Aditya Malusare, Harish Kothandaraman, Dipesh Tamboli, Nadia Lanman and Vaneet Aggarwal

20 mins

P2. Deciphering enzymatic potential in metagenomic reads through DNA language

Prabakaran Ramakrishnan and Yana Bromberg

20 mins

P3. PathoLM: Identifying pathogenicity from the DNA sequence through the Genome Foundation Model

Sajib Acharjee Dip, Uddip Shuvo, Tran Chau, Haoqiu Song, Petra Choi, Xuan Wang and Liqing Zhang

10:45 - 11:00

Coffee/Tea Break

Light refreshments available near session rooms

11:00 - 11:55 | Session II

20 mins

P4. DeepAge: Harnessing Deep Neural Network for Epigenetic Age Estimation From DNA Methylation Data of Human Blood Samples

Sajib Acharjee Dip, Da Ma and Liqing Zhang

20 mins

P5. Scalable DNA Feature Generation and Transcription Factor Binding Prediction via Deep Surrogate Models

Anowarul Kabir, Toki Tahmid Inan, Kim Rasmussen, Amarda Shehu, Anny Usheva, Alan Bishop, Boian Alexandrov and Manish Bhattarai

15 mins

P6. Efficient High-Throughput DNA Breathing Features Generation Using Jax-EPBD

Toki Tahmid Inan, Anowarul Kabir, Kim Rasmussen, Amarda Shehu, Anny Usheva, Alan Bishop, Boian Alexandrov and Manish Bhattarai

12:00 - 12:30

Invited Talk

Yoseph Barash, Professor of Genetics, University of Pennsylvania

12:30 - 13:45

Lunch

On your own. No sponsored lunch provided

13:45 - 14:25

Afternoon Keynote

Jian Tang, Associate Professor at Mila-Quebec AI Institute and HEC Montreal

11:00 - 11:55 | Session III

20 mins

P7. DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration

Sizhe Liu, Yizhou Lu, Siyu Chen, Xiyang Hu, Jieyu Zhao, Tianfan Fu and Yue Zhao

20 mins

P8. Towards Automated Biological Discovery: LLM-based Multi-Agent AI System for Single-Cell Disease Marker Identification and Discovery

Jieli Zhou, Luting Zhou and Hongyi Xin

20 mins

P9. Transformer-Based Approach for Automated Functional Group Replacement in Chemical Compounds

Bo Pan, Zhiping Zhang, Kevin Spiekermann, Tianchi Chen, Xiang Yu, Liying Zhang and Liang Zhao

15 mins

P10. AbAffinity: A Large Language Model for Predicting Antibody Binding Affinity against SARS-CoV-2

Faisal Bin Ashraf, Animesh Ray and Stefano Lonardi

15:45 - 16:00

Coffee/Tea Break

Light refreshments available near session rooms

16:00 - 16:40

Evening Keynote

Jian Tang, Associate Professor at Mila-Quebec AI Institute and HEC Montreal

16:40 - 17:50 | Session IV

20 mins

P11. Leveraging Multimodal Representations to Predict Protein Melting Temperatures

Daiheng Zhang, Yan Zeng, Xinyu Hong and Jinbo Xu

20 mins

P12. Energy Efficient Protein Language Models: Leveraging Small Language Models with LoRA for Controllable Protein Generation

Aayush Shah and Shankar Jayaratnam

20 mins

P13. Multi-Omic Integration for Breast Cancer Subtype Classification Using Advanced AI Techniques

ajib Acharjee Dip and Liqing Zhang

15 mins

P14. Foundation Models for AI-Enabled Biological Design

Asher Moldwin and Amarda Shehu

17:50 - 18:00

Closing Remarks

KEYNOTE SPEAKERS

Payel Das

Trusted AI, IBM Thomas J Watson Research

Jian Tang

Montreal Institute for Learning Alogorithms (MILA)

Allison Heath

Center for Data Driven Discovery in Biomedicine

Yoseph Barash

University of Pennsylvania

GENERAL CO-CHAIRS

Amarda Shehu

George Mason University

amarda@gmu.edu

Yana Bromberg

Emory University

yana@bromberglab.org

Liang Zhao

Emory University

liang.zhao@emory.edu

STUDENT CO-ORGANIZERS

Weisen Zhao

George Mason University

wzhao9@gmu.edu

Yifei Zhang

Emory University

yifei.zhang2@emory.edu

Ethan Lee

Emory University

ethan.lee@emory.edu

Samuel Blouir

George Mason University

Manpriya Dua

George Mason University

Asher Moldwin

George Mason University