W1 |
1/14 |
Course Overview |
- |
PDF |
Instructor |
|
1/16 |
Scientific LLMs: Encoder-Only & Encoder-Decoder |
* SciBERT: A Pretrained Language Model for Scientific Text [EMNLP 2019]
* BioBERT: A Pre-trained Biomedical Language Representation Model for Biomedical Text Mining [Bioinformatics 2020]
* ELECTRAMed: A New Pre-trained Language Representation Model for Biomedical NLP [arXiv 2021]
* SciFive: A Text-to-Text Transformer Model for Biomedical Literature [arXiv 2021] |
PDF |
Instructor |
W2 |
1/21 |
Campus-Wide Class Cancellation |
|
1/23 |
Scientific LLMs: Decoder-Only |
* Solving Quantitative Reasoning Problems with Language Models [NeurIPS 2022]
* SciInstruct: A Self-Reflective Instruction Annotated Dataset for Training Scientific Language Models [NeurIPS 2024]
* BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains [ACL 2024]
* OceanGPT: A Large Language Model for Ocean Science Tasks [ACL 2024]
|
PDF |
Instructor |
W3 |
1/28 |
Citation Prediction |
* SPECTER: Document-Level Representation Learning using Citation-Informed Transformers [ACL 2020]
* Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings [EMNLP 2022]
* Explaining Relationships between Scientific Documents [ACL 2021]
* SciRepEval: A Multi-Format Benchmark for Scientific Document Representations [EMNLP 2023]
|
PDF |
Instructor |
|
1/30 |
Scientific Question Answering |
* PubMedQA: A Dataset for Biomedical Research Question Answering [EMNLP 2019]
* Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries [WWW 2024]
* MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models [ICLR 2024]
|
PDF |
Yichen |
W4 |
2/4 |
Scientific Knowledge Extraction |
* AIONER: All-in-One Scheme-Based Biomedical Named Entity Recognition using Deep Learning [Bioinformatics 2023]
* SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents [EMNLP 2024]
* ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision [ACL 2023]
* ActionIE: Action Extraction from Scientific Literature with Programming Languages [ACL 2024]
|
PDF |
Instructor |
|
2/6 |
Scientific Literature Retrieval |
* MedCPT: Contrastive Pre-trained Transformers with Large-scale PubMed Search Logs for Zero-shot Biomedical Information Retrieval [Bioinformatics 2023]
* BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers [EMNLP 2024]
* Fact or Fiction: Verifying Scientific Claims [EMNLP 2020]
* Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding [EMNLP 2023]
|
PDF |
Instructor |
W5 |
2/11 |
Scientific VLMs: Bioimaging |
* MedCLIP: Contrastive Learning from Unpaired Medical Images and Text [EMNLP 2022]
* A Visual–Language Foundation Model for Pathology Image Analysis using Medical Twitter [Nature Medicine 2023]
* LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day [NeurIPS 2023]
* A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [Nature Medicine 2024]
|
PDF |
Instructor |
|
2/13 |
Scientific VLMs: Geometry |
* UniMath: A Foundational and Multimodal Mathematical Reasoner [EMNLP 2023]
* G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model [ICLR 2025]
* Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models [EMNLP 2024]
|
PDF |
Shuo |
W6 |
2/18 |
[Guest Lecture] Hanwen Xu (University of Washington): Towards Patient Level Representations for Better Clinical Outcome
* Suggested Reading: A Whole-Slide Foundation Model for Digital Pathology from Real-World Data [Nature 2024]
|
N/A |
Guest Lecturer |
|
2/20 |
Scientific VLMs: Miscellaneous |
* UrbanCLIP: Learning Text-Enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web [WWW 2024]
* BioCLIP: A Vision Foundation Model for the Tree of Life [CVPR 2024]
* MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI [CVPR 2024]
|
PDF |
Hasnat |
|
2/23 |
Project Proposal Due (Sunday) |
W7 |
2/25 |
Protein Language Models |
* Evolutionary-Scale Prediction of Atomic-Level Protein Structure with a Language Model [Science 2023]
* Large Language Models Generate Functional Protein Sequences across Diverse Families [Nature Biotechnology 2023]
* ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts [ICML 2023]
* BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations [EMNLP 2023]
|
PDF |
Instructor |
|
2/27 |
DNA/RNA/Single-Cell Language Models |
* DNABERT: Pre-trained Bidirectional Encoder Representations from Transformers Model for DNA-Language in Genome [Bioinformatics 2021]
* A 5' UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions [Nature Machine Intelligence 2024]
* scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics using Generative AI [Nature Methods 2024]
|
PDF |
Omnia |
W8 |
3/4 |
Molecule Language Models |
* Text2Mol: Cross-Modal Molecule Retrieval with Natural Language Queries [EMNLP 2021]
* Translation between Molecules and Natural Language [EMNLP 2022]
* LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset [COLM 2024]
* Fine-Tuned Language Models Generate Stable Inorganic Materials as Text [ICLR 2024]
|
PDF |
Instructor |
|
3/6 |
Urban Language Models |
* SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity Representation [EMNLP 2022]
* GeoLM: Empowering Language Models for Geospatially Grounded Language Understanding [EMNLP 2023]
* UrbanGPT: Spatio-Temporal Large Language Models [KDD 2024]
|
PDF |
Shaohuai |
|
3/7 |
Literature Review Due (Friday) |
W9 |
3/11 |
Spring Break (No Class) |
|
3/13 |
Spring Break (No Class) |
W10 |
3/18 |
[Guest Lecture] Bowen Jin (University of Illinois Urbana-Champaign): Large Language Models on Scientific Text-Attributed Graphs
* Suggested Reading: Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs [ACL 2024]
|
PDF |
Guest Lecturer |
|
3/20 |
Language Models with Academic Graphs |
* OAG-BERT: Towards a Unified Backbone Language Model for Academic Knowledge Services [KDD 2022]
* LinkBERT: Pretraining Language Models with Document Links [ACL 2022]
* Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification [WWW 2022]
* Investigating Instruction Tuning Large Language Models on Graphs [COLM 2024]
|
PDF |
Instructor |
W11 |
3/25 |
Midterm Project Presentations |
N/A |
Students |
|
3/27 |
Table Language Models |
* TaBERT: Learning Contextual Representations for Natural Language Utterances and Structured Tables [ACL 2020]
* TableLlama: Towards Open Large Generalist Models for Tables [NAACL 2024]
* UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers [NAACL 2025]
* Accurate Predictions on Small Data with a Tabular Foundation Model [Nature 2025]
|
PDF |
Instructor |
|
3/30 |
Midterm Report Due (Sunday) |
W12 |
4/1 |
LLMs for Research: Idea Generation |
* ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [NAACL 2025]
* Nova: An Iterative Planning and Search Approach to Enhance Novelty and Diversity of LLM Generated Ideas [arXiv 2024]
* Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers [ICLR 2025]
|
PDF |
Hangxiao |
|
4/3 |
LLMs for Research: Content Generation |
* Mapping the Increasing Use of LLMs in Scientific Papers [COLM 2024]
* Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews [ICML 2024]
* Let's Get to the Point: LLM-Supported Planning, Drafting, and Revising of Research-Paper Blog Posts [arXiv 2023]
|
PDF |
Ethan |
W13 |
4/8 |
[Guest Lecture] Qingyun Wang (University of Illinois Urbana-Champaign): AI4Scientist: Accelerating and Democratizing Scientific Research Lifecycle
* Suggested Reading: SciMON: Scientific Inspiration Machines Optimized for Novelty [ACL 2024]
|
N/A |
Guest Lecturer |
|
4/10 |
LLMs for Research: Reviewing |
* Can Large Language Models Provide Useful Feedback on Research Papers? A Large-Scale Empirical Analysis [NEJM AI 2024]
* LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [EMNLP 2024]
* AgentReview: Exploring Peer Review Dynamics with LLM Agents [EMNLP 2024]
|
PDF |
Michael |
W14 |
4/15 |
LLMs for Research: Miscellaneous |
* A Search Engine for Discovery of Scientific Challenges and Directions [AAAI 2022]
* Chain-of-Factors Paper-Reviewer Matching [WWW 2025]
* ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews [ACL 2024]
|
PDF |
Instructor |
|
4/17 |
Scientific Agents |
* Autonomous Chemical Research with Large Language Models [Nature 2023]
* Augmenting Large Language Models with Chemistry Tools [Nature Machine Intelligence 2024]
* Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design [EMNLP 2023]
|
PDF |
Rithik |
W15 |
4/22 |
Final Project Presentations |
N/A |
Students |
|
4/24 |
Final Project Presentations |
N/A |
Students |
W16 |
5/4 |
Final Report Due (Sunday) |