AAAI 2024: LLMs4Bio

An open exchange of ideas to spur foundational research and LLM-enabled biological discoveries

About

Rapid advances in large language models (LLMs) provide an unprecedented opportunity to further scientific inquiry across scientific disciplines and domains. Despite remarkable feats in natural language tasks often exclusively indicative of human intelligence, the potential of LLMs beyond natural language has yet to be realized. Outstanding challenges in diverse scientific disciplines, such as molecular biology, material science, climate science, geology, hydrology, and various domains within them, such as drug discovery, quantum material design, weather forecasting, and more, necessitate the integration of heterogeneous, multi-modal datasets resulting from diverse physical processes, as well as the injection of deep domain knowledge accumulated over decades of discovery about the inherent processes that govern the natural and biological world. This workshop brings together diverse researchers from various disciplines to formulate problem spaces, datasets, standards, benchmarks, and spur innovation on accessible and inclusive LLMs to power the next scientific breakthroughs.

The Call for Papers has been published

Program

Opening Remarks	1:55PM – 2:00PM
Keynote Remarks Sorin Draghici, NSF CISE:IIS:III Program Director. Molecular Biology Research at the National Science Foundation: An Information and Intelligent Systems Perspective.	2:00PM – 2:20PM
Session I: Genomics
Invited Talk: Davuluri Ramana	20 mins
P1.Jim Clauwaert, Zahra McVey, Ramneek Gupta, John Prensner and Gerben Menschaert. RIBO-former: applying attention-based neural networks for precise delineation of translated open reading frames using ribosome profiling data.	10 mins
P2. Bernardo de Almeida and Thomas Pierrot. Large Language Models for Genomics.	10 mins
P3. Evan Trop, Chia-Hsiang Kao, Mckinley Polen, Yair Schiff, Bernardo P. de Almeida, Aaron Gokaslan, Thomas Pierrot and Volodymyr Kuleshov. Advancing DNA Language Models: The Genomics Long-Range Benchmark.	7 mins
P4. Sam Boshar, Evan Trop, Bernardo P. de Almeida and Thomas Pierrot. Are Genomic Language Models All You Need ? Exploring Genomic Language Models on Protein Downstream Tasks.	7 mins
PANEL with Q&A for presenting authors of papers	15 mins
2:20PM – 3:30PM
Session II: Proteomics
P5. Asher Moldwin, Anowarul Kabir, and Amarda Shehu. A More Informative and Reproducible Remote Homology Evaluation for Protein Language Models.	10 mins
P6. Amitesh Badkul, Li Xie, Shuo Zhang and Lei Xie. Trustworthy protein-ligand binding affinity prediction using large protein language model	10 mins
P7. Shuo Zhang and Lei Xie. Protein Large Language Model-Powered 3D Ligand Binding Site Prediction from Protein Sequence.	10 mins
PANEL with Q&A for presenting authors of papers	15 mins
3:30PM – 4:15PM
Session III: Bio & ChemInformatics
Invited Talk: Aidong Zhang	20 mins
P8. Amir Shariatmadari and Aidong Zhang. Harnessing the Power of Knowledge Graphs to Enhance LLM Explainability in the BioMedical Domain.	10 mins
P9. Jiayu Chang, Shiyu Wang, Chen Ling, Zhaohui Qin and Liang Zhao. Gene-associated Disease Discovery Powered by Large Language Models.	10 mins
P10. Xiuyuan Hu, Guoqing Liu, Yang Zhao and Hao Zhang. Empirical Evidence for the Fragment level understanding on Drug Molecular Structure of LLMs.	10 mins
P11. Gaetan De Waele, Anneleen D Wieme, Gerben Menschaert, Peter Vandamme and Willem Waegeman. Transformers for MALDI-TOF mass spectra.	7 mins
P12. Xinyang Liu, Bowei Fang, Ping Zhang, Bo Chen and Huaxiu Yao. MolGuide: 2D Molecular Optimization with Preserved Structural Motifs Guidance.	7 mins
PANEL with Q&A for presenting authors of papers	20 mins
4:15PM - 5:40PM
Moderated Discussion with all Participants	5:40PM – 6:00PM
Concluding Remarks	6:00PM

Keynote Speakers

Sorin Draghici

Professor, Wayne State University

Aidong Zhang

Thomas M. Linville Professor, University of Virginia

Ramana Daluvuri

Professor, Stony Brook

Organizers

Dr. Amarda Shehu

Professor, Associate Dean of Engineering for AI Innovation and Associate Vice President of Research, George Mason University

Dr. Amarda Shehu is a Professor in the Department of Computer Science at George Mason University, where she is also Associate Dean of Engineering for AI Innovation and Associate Vice President of Research for the Institute of Digital InnovAtion. Shehu obtained her Ph.D. from Rice University in 2008, where she was also an NIH predoctoral fellow in the Nanobiology Program and was dually trained in AI and Molecular Biophysics. Shehu's research bridges foundational AI research and AI in support of progress across scientific disciplines. Her scholarship includes work at the intersection of stochastic optimization, planning, deep learning, optimization for deep learning, bioinformatics, health informatics, applied NLP, and more. She is a 2022 Fellow of the American Institute for Medical and Biological Engineering (AIMBE) and is passionate about bridging scientific communities. She regularly delivers keynotes and invited talks (IEEE BIBM 2022, BIOKDD 2021, ICCABS 2020, etc.), organizes tutorials and workshops in ACM and IEEE bioinformatics and evolutionary computation conferences, and journal collections and special issues (PLoS Comput Biol, Curr Opin Struct Biol, Molecules, Intl J Mol Sci, etc.) to advance transdisciplinary research. Shehu also served as Program Director in the CISE directorate at NSF 2019-2022 during which time she galvanized various sub-communities of CISE, BIO, CHEM, and ENG researchers in transdisciplinary research, education programs, and workshops. She has extensive expertise bringing together silo-ed communities together in workshops (Women @ GECCCO, CSBW, AAAI Workshops, etc.), with a strong emphasis on integration and leadership for historically-minoritized researchers.

amarda@gmu.edu

Dr. Yana Bromberg

Professor, Departments of Computer Science and Biology, Emory University, Hans Fischer Fellow, Institute for Advanced Study, Technical University of Munich

Dr. Yana Bromberg is a Professor in the Departments of Computer Science and Biology at Emory University and a Hans Fischer Fellow at the Institute for Advanced Study in the Technical University of Munich. She is a noted researcher in computational molecular biology. A former NLM Biomedical Informatics doctoral student, she is trained in molecular biology and bioinformatics. She is known for her work in annotation of genome variant-caused functional changes, tracking of microbiome functional shifts due to environmental pressures, and exploration of protein structures at the origins of life. Her seminal SNAP method, the first neural network-based tool for predicting functional impacts of single amino acid substitutions has enabled novel variant assessment methods and whole genome analyses and methods for mapping genetic architectures of disease. Bromberg has advanced computational annotation, prediction, and analysis of protein functionality, establishing the library of protein folds at the origins of life. Her exploration of the microbial world has led to advances in understanding molecular functional diversity carried out by individual microbes and emergent functionality in microbial communities. Bromberg is a director of the International Society for Computational Biology and has filled a number of leading roles in the organization of ISMB -- the society's annual flagship conference. She consistently delivers keynotes and invited talks at national and international conferences (ISMB, GRC, etc.) and lectures at relevant workshops and courses (STAMPs workshop @MBL, the University of Bologna International Masters Bioinformatics program, etc.)

yana@bromberglab.org

Dr. Liang Zhao

Associate Professor, Department of Computer Science, Emory University

Dr. Liang Zhao is an Associate Professor in the Department of Computer Science at Emory University with extensive experience in spatiotemporal data mining, network modeling, deep learning, interpretable machine learning, and nonconvex and distributed optimization. His work is published at top conferences/journals, including KDD, NeurIPS, ICLR, ICDM, AAAI, IJCAI, WWW, TKDE, CSUR, PIEEE, and TPAMI. He received the NSF CAREER Award, Amazon Research Award, Meta Research Award, Cisco Faculty Research Award, and Jeffress Trust Award. His paper on deep subgraph anomaly detection won the Best Paper Award in ICDM 2022. His paper on interpretable representation learning is among the Best Paper Shortlist in WWW 2021. He has also won Best Paper Runner-up and Best Paper Candidate in ACM SIGSPATIAL 2022. His paper on deep graph transformation and applications on molecule reaction prediction won the Best Paper Award in ICDM 2019. His recent book "Graph Neural Network: Foundations, Frontiers, and Applications" received over 100K accesses in the publisher Springer link's website. Its Chinese version has won Best-Seller Award from Post \& Telecom Press. Zhao has disseminated over 30 software tools and numerous datasets.

liang.zhao@emory.edu