Course Catalog

Full Day Courses
Title
Instructors
Haoda Fu, Amgen
Afternoon Half-Day Courses
Title
Instructors
Jing Qian, University of Massachusetts Amherst
Jerry Li, BMS; Ivna Chan, BMS; Vlad Son, BMS;
Weijie Su, University of Pennsylvania; Jiancong Xiao, University of Pennsylvania; Xiang Li, University of Pennsylvania

Full-Day Courses:

Tutorial on Deep Learning and Generative AI:

Instructors: Haoda Fu, Amgen

Target Audience: People with at least master level statistics training

Prerequisites for participants: People know linear regression, basic programming

Computer and software requirements: Python, PyTorch

In an era where AI technologies are transforming industries, understanding their foundations and applications is essential for a wide range of professionals. This course is designed to equip statisticians, biostatisticians, researchers, and decision-makers with the mental models needed to navigate and leverage AI effectively. Whether you’re a decision-maker aiming to make informed choices about AI tools, a researcher seeking to integrate AI into your work, or a statistician looking to build advanced models, this course offers a valuable gateway to the world of AI.

Focusing on deep learning and generative AI, participants will gain hands-on experience with PyTorch, learn foundational concepts, and explore state-of-the-art architectures such as CNNs, GNNs, ResNet, U-Net, and transformers. The course also delves into applications of these models in medical imaging and drug discovery, as well as cutting-edge generative AI techniques like GANs, VAEs, DDPM, score-based models, and the mechanics behind large language models (LLMs).

By bridging technical knowledge with practical insights, this course empowers participants to apply AI in healthcare, research, and beyond, making it an indispensable resource for those seeking to understand and shape the future of AI in their fields.

The following are the outlines of the short course.

Why: History of Deep Learning and Generative AI
Build Our First Neural Network Model from Scratch
(Break 1)
Let Us Code Together
Build Our First Deep Learning Model for Computer Vision
(Launch break)
Sequence Classification Model
Nuts and Bolts for LLM: Sequence-to-Sequence Models
(Break 2)
Generative AI Family
Advanced Topics and Extensions: Generative AI on Smooth Manifolds
Final Thoughts

Short Bio:  Dr. Haoda Fu is Head of Exploratory Biostatistics in Amgen, before that he was an Associate Vice President and an Enterprise Lead for Machine Learning, Artificial Intelligence, from Eli Lilly and Company. Dr. Haoda Fu is a Fellow of ASA (American Statistical Association), and IMS Fellow (Institute of Mathematical Statistics). He is also an adjunct professor of biostatistics department, Univ. of North Carolina Chapel Hill and Indiana university School of Medicine. Dr. Fu received his Ph.D. in statistics from University of Wisconsin – Madison in 2007 and joined Lilly after that. Since he joined Lilly, he is very active in statistics and data science methodology research. He has more than 100 publications in the areas, such as Bayesian adaptive design, survival analysis, recurrent event modeling, personalized medicine, indirect and mixed treatment comparison, joint modeling, Bayesian decision making, and rare events analysis. In recent years, his research area focuses on machine learning and artificial intelligence. His research has been published in various top journals including JASA, JRSS-B, Biometrika, Biometrics, ACM, IEEE, JAMA, Annals of Internal Medicine etc.. He has been teaching topics of machine learning and AI in large industry conferences including teaching this topic in FDA workshop. He was board of directors for statistics organizations and program chairs, committee chairs such as ICSA, ENAR, and ASA Biopharm session. He is a COPSS Snedecor Awards committee member from 2022-2026, and also served as an associate editor for JASA theory and method from 2023, and JASA application and case study from 2025-2027.

Category: Technology Training


 

Afternoon Half-Day Courses:

Statistical methods for time-to-event data subject to truncation:

Instructors: Jing Qian, University of Massachusetts Amherst

          Time: Afternoon Session

Target Audience: students, practitioners or researchers with an interest in understanding statistical analysis of time-to-event data subject to truncation.

Prerequisites for participants: knowledge of statistical inference; basic knowledge of survival analysis.

Computer and software requirements: a computer/laptop installed with R (and RStudio) is recommended

Truncated time-to-event data arises in various fields, including biomedical sciences, public health, epidemiology, and astronomy. It involves biased sampling where the event time is observed only if it falls within a certain interval. This short course reviews statistical methods for time-to-event data subject to left, right, and sequential truncation, exploring both classical and advanced techniques.

The first half introduces classical risk-set adjustment methods for estimating event time distributions and conducting regression analysis with left-truncated data, with or without additional right censoring. Methods for right-truncated data will also be discussed. The assumption of quasi-independence between truncation and event times, which is crucial for the validity of classical methods, will be emphasized, along with hypothesis tests for assessing this assumption.

The second half covers recent methodological advances for analyzing truncated time-to-event data. Topics include methods for estimation and regression under dependent truncation, sequential truncation in observational cohort studies with complex sampling schemes, and techniques for estimation and regression under sequential truncation. Discussions will be supplemented with real-world data examples. R software will be used to demonstrate the implementation of the techniques.

The following are the outlines of the short course.

Teaching Plan for a half-day course entitled “”Statistical methods for time-to-event data subject to truncation”” (using morning time as an illustration)

8:30-9:30am, Part I: Introduction to time-to-event data subject to truncation, highlighting the difference between censoring and truncation. Classical risk-set adjustment methods for estimating event time distribution with left-truncated time-to-event data, with or without additional right censoring. Estimation of event time distribution with right-truncated time-to-event data.

9:30-10:15am: Part II: Regression analysis with left-truncated and right-censored time-to-event data, including Cox model and accelerated failure time model. Regression analysis with right-truncated data.

10:15-10:30am: 15 minutes break

10:30-11:45am: Part III: Hypothesis tests for assessing quasi-independence between truncation and event times. One-sample estimation and regression analysis methods under dependent truncation.

11:45am-12:30pm: Part IV: The concept of sequential truncation in observational cohort studies with complex sampling schemes. Methods for estimating event time distributions and performing regression analysis in the presence of sequential truncation.

Short Bio: Dr. Jing Qian is a Professor of Biostatistics in the Department of Biostatistics and Epidemiology at the University of Massachusetts Amherst, with extensive experience in statistical methodology and its applications to public health and biomedical research. Dr. Qian’s research focuses on the development of statistical methods for survival analysis of biomedical outcomes subject to complex censoring or sampling, biomarker evaluation and risk prediction, and covariates subject to censoring and truncation. His collaborative research spans neurodegenerative diseases such as Alzheimer’s and Parkinson’s diseases, breast cancer epidemiology, and health services research. He has served as the Principal Investigator on multiple NIH-funded grants and has published extensively in leading statistical and biomedical journals. Dr. Qian is also an experienced educator, having taught graduate-level courses on introductory, intermediate, and advanced biostatistical methods for over a decade.

Dr. Jing Qian is an experienced educator, having taught graduate-level courses on introductory, intermediate, and advanced biostatistical methods for over a decade at the University of Massachusetts Amherst. His teaching portfolio includes introductory and applied biostatistics courses for public health students, such as Introduction to Biostatistics and Intermediate Biostatistics; intermediate-level theory and methods courses for biostatistics graduate students, such as Fundamentals of Probability and Statistical Inference and Topics in Health Data Science; and advanced statistical theory and methods courses for Ph.D. students in biostatistics, such as Applied Statistical Learning and Advanced Statistical Inference. In all of these settings, Dr. Qian emphasizes the challenge and importance of engaging students and fostering active learning in the classroom.

Beyond classroom teaching, Dr. Qian has served as the primary dissertation advisor to more than 10 postdoctoral research fellows, doctoral students, and master’s students.

Category: Methodology and application


Introduction of Dynamic Borrowing in Clinical Trials and Regulatory Submission:

Instructors: Jerry Li, BMS; Ivan Chan, BMS; Vlad Son, BMS

Time: Afternoon Session

Target Audience: Statisticians working on clinical trials

Prerequisites for participants: NA

Computer and software requirements: NA

Clinical trials represent a significant portion of drug development in both budget and duration. While randomized clinical trials are still gold standard, given the availability of the sheer amount of prior clinical trial data and real-world data/evidence, finding innovative ways to design more efficient clinical trials and to supplement relevant data for regulatory submission have become imperative and led to the increasing popularity of dynamic borrowing.

Dynamic borrowing can bring significant benefits to expedite drug development and to a company’s portfolio. Specifically, dynamic borrowing can overcome challenges when patients are difficult to enroll, reduce the size/duration/risk of a new trial ensuring adequate power, have great operational and cost saving benefits, and boost the power and improve the efficiency of analysis for a trial with a limited sample size.

This short course will cover the common sources of data for borrowing and introduce the approaches of both frequentist and Bayesian borrowing as well as the regulatory landscape in this space.

The following are the outlines of the short course.

The first part of this short course will introduce the rationale and overall benefits of dynamic borrowing, The first part will also cover the general methods. After the break, the second part will be more specific about the methods, regulatory landscape, and possible case studies and demo of an R Shiny App.

Short Bio: Dr. Jerry Li is currently a Director and TA Lead of Hematology Malignant Myeloid Diseases in Global Biostatistics and Data Sciences (GBDS), BMS. Jerry leads statistical support for the clinical development of multiple assets for clinical trials design including phase2/3 seamless design, interactions with worldwide health authorities, and life cycle management of the assets. Jerry established and co-lead the Dynamic Borrowing Working Group at BMS. Dr. also recently co-organized and co-moderated a whole-day Biostatistics Research and Innovation Network (BRAIN) meeting at BMS dedicated to dynamic borrowing topic.

Prior to BMS, Jerry was at Merck and Daiichi Sankyo following working at the FDA. He has held positions with increasing responsibilities in multiple therapeutic areas including oncology, neurosciences, immunology, and infectious disease and demonstrated a track record of successful regulatory approvals.

In addition to dynamic borrowing, Jerry is also interested in dose optimization, phase 2/3 seamless design, statistical modeling of disease-modifying treatment effect, and properties of log-rank test following covariate-adaptive randomization in oncology trials. Jerry received his Ph.D. in statistics from the University of Maryland, College Park.

Dr. Ivan Chan has more than 25 years of experience in the pharmaceutical industry. He is currently a VP and interim Head of Global Biometrics & Data Sciences at Bristol Myers Squibb. Prior to joining BMS, Ivan was VP and Head of Statistical Sciences at AbbVie leading multiple therapeutic areas. In addition, he spent 21 years previously at Merck Research Laboratories where he led the global statistical support for vaccines and early oncology.

Ivan received his B.S. in Statistics from the Chinese University of Hong Kong and Ph.D. in Biostatistics from the University of Minnesota. He is an elected Fellow of the American Statistical Association (ASA) and an elected Fellow of the Society for Clinical Trials (SCT). Ivan was the 2021 recipient of the Deming Lecturer Award from ASA for his outstanding contributions to vaccine development. He currently serves as Executive Director of the International Society for Biopharmaceutical Statistics and Co-Chair of Deming Conference on Applied Statistics. Ivan has previously served as the President of the International Chinese Statistical Association and the Program Chair of the ASA Biopharmaceutical Section. He has 90+ publications in statistical and clinical journals.

Dr. Vladimir (Vlad) Son is currently a Director and Ozanimod Lead in Immunology within Global Biostatistics and Data Sciences (GBDS) at BMS. Prior to joining BMS, Vlad worked at Regeneron, contributing to multiple therapeutic areas, including ophthalmology and cardiovascular. He earned his Ph.D. in Statistics from Bowling Green State University.

Vlad serves as co-lead of the Regulatory Subteam within the Dynamic Borrowing Working Group. He is also the GBDS representative on the Policy Evaluation And Regulatory Landscape (PEARL) Council at BMS, and is an active member of the BMS Estimand Working Group..

 

Category: Methodology


Statistical Inference in Large Language Models:

Instructors: Weijie Su, University of Pennsylvania; Qi Long, University of Pennsylvania; Xiang Li, University of Pennsylvania

Time: Afternoon Session Target Audience: PhD students and faculty who are interested in generative AI Prerequisites for participants: NA Computer and software requirements: NA

Large Language Models (LLMs) have recently stood out as revolutionary AI tools for processing data in the form of text. However, when harnessing their potential for statistical decision-making, it becomes essential to understand the risks of their outputs. Evaluating the uncertainty and confidence levels associated with LLMs presents both challenges and intriguing opportunities for today’s statisticians. The aim of this one-day short course is to equip statisticians with the skills to integrate inferential concepts into the applications and advancement of LLMs. Course topics include: 1) a brief introduction to the fundamentals of LLMs, tailored for those new to transformers and deep learning; 2) a primer on statistical inference techniques specifically for text data using LLMs; and 3) in-depth exploration of LLM applications in medical domains and the broader data science field. By the end of the course, attendees will possess the skills needed to empower LLMs with statistical inference. While this course promises a deep and enriching dive into the confluence of statistics and advanced AI, no prior knowledge of LLMs is required.

The following are the outlines of the short course.

Take the morning session as an example. The time arrangement would be similar for the afternoon session.

Morning Half-Day Course (8:30 a.m. – 12:30 p.m.)

Session 1: Understanding and Building LLM Foundations 8:30 a.m. – 10:15 a.m. (1 hour 45 minutes)

1. Understand the Evolution and Significance of LLMs – Recognize the evolution and importance of Large Language Models in AI and data processing.

2. Grasp LLM Architectures and Mechanics – Describe core principles and architectures of LLMs, including transformers and attention mechanisms. – Learn how text data is processed, tokenized, and embedded in LLMs.

Break 10:15 a.m. – 10:30 a.m. (15 minutes)

Session 2: Challenges, Applications, and Ethics in LLMs 10:30 a.m. – 12:30 p.m. (2 hours)

1. Identify Challenges in LLMs – Pinpoint common challenges and limitations such as overfitting and bias. – Appreciate the importance of critically evaluating model outputs.

2. Analyze Real-World LLM Applications – Evaluate the use of LLMs in healthcare settings, such as processing clinical notes and predicting patient outcomes. – Identify other applications, including sentiment analysis and recommendation systems.

3. Navigate Ethical and Responsible Use of LLMs – Recognize potential biases in medical text data and broader implications in data science. – Advocate for the fair, ethical, and responsible use of LLMs across various domains.

Short Bio: Weijie Su is an Associate Professor in the Wharton Statistics and Data Science Department and, by courtesy, in the Departments of Computer and Information Science and Mathematics at the University of Pennsylvania. He is a co-director of Penn Research in Machine Learning (PRiML) Center. Prior to joining Penn, he received his Ph.D. in Statistics from Stanford University in 2016 and a bachelor’s degree in Mathematics from Peking University in 2011. His research interests span the statistical foundations of generative AI, privacy-preserving machine learning, high-dimensional statistics, and optimization. He serves as an associate editor of the Journal of Machine Learning Research, Journal of the American Statistical Association, Foundations and Trends in Statistics, and Operations Research, and he is currently guest editing a special issue on Statistics for Large Language Models and Large Language Models for Statistics in Stat. His work has been recognized with several awards, such as the Stanford Anderson Dissertation Award, NSF CAREER Award, Sloan Research Fellowship, IMS Peter Hall Prize, SIAM Early Career Prize in Data Science, ASA Noether Early Career Award, and the ICBS Frontiers of Science Award in Mathematics.

Jiancong Xiao is a postdoctoral researcher at the University of Pennsylvania, working with Professors Qi Long and Weijie Su. He received his Ph.D. from the Chinese University of Hong Kong, Shenzhen, an M.S. from the Chinese University of Hong Kong, and a B.S. from Sun Yat-sen University. His research interests lie in statistical and deep learning theory, with a focus on developing responsible and trustworthy machine learning models. His recent work explores statistical foundations of large language models. His research has been featured at top machine learning conferences, including NeurIPS, COLT, ICML, and ICLR.

Xiang Li is a postdoctoral researcher at the University of Pennsylvania, collaborating with Prof. Qi Long and Prof. Weijie Su. He received his Ph.D. in 2023 and B.S. in 2018 from the School of Mathematical Sciences at Peking University. His research lies at the intersection of statistics, stochastic optimization, and machine learning, with a recent focus on large language models. During his Ph.D., he made significant contributions to federated learning, stochastic approximation, online decision-making, and online statistical inference. His work has been featured at leading machine learning conferences, including ICML, ICLR, and NeurIPS, as well as in top journals such as JMLR and AOS.

Category: Methodology

Scroll to top