CSCE 689 - Special Topics in NLP for Science (Spring 2025)

Course Information

Grading

Schedule (Subject to changes)

Week Date Topic Papers Slides Presenter
W1 1/14 Course Overview - PDF Instructor
1/16 Scientific LLMs: Encoder-Only & Encoder-Decoder * SciBERT: A Pretrained Language Model for Scientific Text [EMNLP 2019]
* BioBERT: A Pre-trained Biomedical Language Representation Model for Biomedical Text Mining [Bioinformatics 2020]
* ELECTRAMed: A New Pre-trained Language Representation Model for Biomedical NLP [arXiv 2021]
* SciFive: A Text-to-Text Transformer Model for Biomedical Literature [arXiv 2021]
PDF Instructor
W2 1/21 Campus-Wide Class Cancellation
1/23 Scientific LLMs: Decoder-Only * Solving Quantitative Reasoning Problems with Language Models [NeurIPS 2022]
* SciInstruct: A Self-Reflective Instruction Annotated Dataset for Training Scientific Language Models [NeurIPS 2024]
* BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains [ACL 2024]
* OceanGPT: A Large Language Model for Ocean Science Tasks [ACL 2024]
PDF Instructor
W3 1/28 Citation Prediction * SPECTER: Document-Level Representation Learning using Citation-Informed Transformers [ACL 2020]
* Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings [EMNLP 2022]
* Explaining Relationships between Scientific Documents [ACL 2021]
* SciRepEval: A Multi-Format Benchmark for Scientific Document Representations [EMNLP 2023]
PDF Instructor
1/30 Scientific Question Answering * PubMedQA: A Dataset for Biomedical Research Question Answering [EMNLP 2019]
* Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries [WWW 2024]
* MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models [ICLR 2024]
PDF Yichen
W4 2/4 Scientific Knowledge Extraction * AIONER: All-in-One Scheme-Based Biomedical Named Entity Recognition using Deep Learning [Bioinformatics 2023]
* SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents [EMNLP 2024]
* ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision [ACL 2023]
* ActionIE: Action Extraction from Scientific Literature with Programming Languages [ACL 2024]
PDF Instructor
2/6 Scientific Literature Retrieval * MedCPT: Contrastive Pre-trained Transformers with Large-scale PubMed Search Logs for Zero-shot Biomedical Information Retrieval [Bioinformatics 2023]
* BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers [EMNLP 2024]
* Fact or Fiction: Verifying Scientific Claims [EMNLP 2020]
* Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding [EMNLP 2023]
PDF Instructor
W5 2/11 Scientific VLMs: Bioimaging * MedCLIP: Contrastive Learning from Unpaired Medical Images and Text [EMNLP 2022]
* A Visual–Language Foundation Model for Pathology Image Analysis using Medical Twitter [Nature Medicine 2023]
* LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day [NeurIPS 2023]
* A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [Nature Medicine 2024]
PDF Instructor
2/13 Scientific VLMs: Geometry * UniMath: A Foundational and Multimodal Mathematical Reasoner [EMNLP 2023]
* G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model [ICLR 2025]
* Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models [EMNLP 2024]
PDF Shuo
W6 2/18 [Guest Lecture] Hanwen Xu (University of Washington): Towards Patient Level Representations for Better Clinical Outcome
* Suggested Reading: A Whole-Slide Foundation Model for Digital Pathology from Real-World Data [Nature 2024]
N/A Guest Lecturer
2/20 Scientific VLMs: Miscellaneous * UrbanCLIP: Learning Text-Enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web [WWW 2024]
* BioCLIP: A Vision Foundation Model for the Tree of Life [CVPR 2024]
* MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI [CVPR 2024]
PDF Hasnat
2/23 Project Proposal Due (Sunday)
W7 2/25 Protein Language Models * Evolutionary-Scale Prediction of Atomic-Level Protein Structure with a Language Model [Science 2023]
* Large Language Models Generate Functional Protein Sequences across Diverse Families [Nature Biotechnology 2023]
* ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts [ICML 2023]
* BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations [EMNLP 2023]
PDF Instructor
2/27 DNA/RNA/Single-Cell Language Models * DNABERT: Pre-trained Bidirectional Encoder Representations from Transformers Model for DNA-Language in Genome [Bioinformatics 2021]
* A 5' UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions [Nature Machine Intelligence 2024]
* scGPT: Towards Building a Foundation Model for Single-Cell Multi-omics using Generative AI [Nature Methods 2024]
PDF Omnia
W8 3/4 Molecule Language Models * Text2Mol: Cross-Modal Molecule Retrieval with Natural Language Queries [EMNLP 2021]
* Translation between Molecules and Natural Language [EMNLP 2022]
* LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset [COLM 2024]
* Fine-Tuned Language Models Generate Stable Inorganic Materials as Text [ICLR 2024]
PDF Instructor
3/6 Urban Language Models * SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity Representation [EMNLP 2022]
* GeoLM: Empowering Language Models for Geospatially Grounded Language Understanding [EMNLP 2023]
* UrbanGPT: Spatio-Temporal Large Language Models [KDD 2024]
PDF Shaohuai
3/7 Literature Review Due (Friday)
W9 3/11 Spring Break (No Class)
3/13 Spring Break (No Class)
W10 3/18 [Guest Lecture] Bowen Jin (University of Illinois Urbana-Champaign): Large Language Models on Scientific Text-Attributed Graphs
* Suggested Reading: Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs [ACL 2024]
PDF Guest Lecturer
3/20 Language Models with Academic Graphs * OAG-BERT: Towards a Unified Backbone Language Model for Academic Knowledge Services [KDD 2022]
* LinkBERT: Pretraining Language Models with Document Links [ACL 2022]
* Metadata-Induced Contrastive Learning for Zero-Shot Multi-Label Text Classification [WWW 2022]
* Investigating Instruction Tuning Large Language Models on Graphs [COLM 2024]
PDF Instructor
W11 3/25 Midterm Project Presentations N/A Students
3/27 Table Language Models * TaBERT: Learning Contextual Representations for Natural Language Utterances and Structured Tables [ACL 2020]
* TableLlama: Towards Open Large Generalist Models for Tables [NAACL 2024]
* UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers [NAACL 2025]
* Accurate Predictions on Small Data with a Tabular Foundation Model [Nature 2025]
PDF Instructor
3/30 Midterm Report Due (Sunday)
W12 4/1 LLMs for Research: Idea Generation * ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [NAACL 2025]
* Nova: An Iterative Planning and Search Approach to Enhance Novelty and Diversity of LLM Generated Ideas [arXiv 2024]
* Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers [ICLR 2025]
PDF Hangxiao
4/3 LLMs for Research: Content Generation * Mapping the Increasing Use of LLMs in Scientific Papers [COLM 2024]
* Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews [ICML 2024]
* Let's Get to the Point: LLM-Supported Planning, Drafting, and Revising of Research-Paper Blog Posts [arXiv 2023]
PDF Ethan
W13 4/8 [Guest Lecture] Qingyun Wang (University of Illinois Urbana-Champaign): AI4Scientist: Accelerating and Democratizing Scientific Research Lifecycle
* Suggested Reading: SciMON: Scientific Inspiration Machines Optimized for Novelty [ACL 2024]
N/A Guest Lecturer
4/10 LLMs for Research: Reviewing * Can Large Language Models Provide Useful Feedback on Research Papers? A Large-Scale Empirical Analysis [NEJM AI 2024]
* LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [EMNLP 2024]
* AgentReview: Exploring Peer Review Dynamics with LLM Agents [EMNLP 2024]
PDF Michael
W14 4/15 LLMs for Research: Miscellaneous * A Search Engine for Discovery of Scientific Challenges and Directions [AAAI 2022]
* Chain-of-Factors Paper-Reviewer Matching [WWW 2025]
* ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews [ACL 2024]
PDF Instructor
4/17 Scientific Agents * Autonomous Chemical Research with Large Language Models [Nature 2023]
* Augmenting Large Language Models with Chemistry Tools [Nature Machine Intelligence 2024]
* Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design [EMNLP 2023]
PDF Rithik
W15 4/22 Final Project Presentations N/A Students
4/24 Final Project Presentations N/A Students
W16 5/4 Final Report Due (Sunday)