Full Day Courses | |
Title | Instructors |
---|---|
Tutorial on Deep Learning and Generative AI | Haoda Fu, Amgen |
Large-Scale Spatial Data Science | Marc Genton, KAUST; Saneh Abdulah, KAUST; Mary Lai Salvana, University of Connecticut |
An introduction to the analysis of incomplete data | Ofer Harel, University of Connecticut |
Morning Half-Day Courses | |
Merging Data sources: Record Linkage Techniques and Analysis of Linked Datasets | Roee Gutman, Brown University |
Unleashing the power of machine learning and deep learning to accelerate clinical development | Li Wang, AbbVie; Yunzhao Xing, AbbVie; Sheng Zhong, AbbVie |
Bayesian Evidence Synthesis Approaches to Accelerate Rare Disease Drug Development | Satrajit Roychoudhury, Pfizer Inc; Wei Wei, Yale University |
Quantitative decision making for staging up to phase 3 clinical development | Cong Han, Astellas Pharma Global Development, Inc; Yusuke Yamaguchi, Astellas Pharma Global Development, Inc; Annie Wang, Astellas Pharma Global Development, Inc; Yongming Qu, Eli Lilly and Company |
Interface between Regulation and Statistics in Drug Development | Birol Emir, Pfizer Inc; Michael Gaffney, Independent Consultant; Demissie Alemayehu, Pfizer Inc |
A Bootcamp on Git for Project Management with Applications in Data Science | Jun Yan, University of Connecticut |
Afternoon Half-Day Courses | |
Statistical methods for time-to-event data subject to truncation | Jing Qian, University of Massachusetts Amherst |
Introduction of Dynamic Borrowing in Clinical Trials and Regulatory Submission | Jerry Li, BMS; Ivna Chan, BMS; Inna Perevozskaya, BMS; Hao Sun, BMS |
Comparing Python and R for Data Visualization | Jose Manuel Magallanes, Pontificia Universidad Catolica del Peru |
Win Statistics (Win Ratio, Win Odds, and Net Benefit): Theories and Applications | Gaohong Dong, Sarepta Therapeutics |
Bayesian Machine Learning Methods for Partially Observed Data | Sujit Ghosh, NC State University |
Statistical Inference in Large Language Models | Weijie Su, University of Pennsylvania; Qi Long, University of Pennsylvania; Xiang Li, University of Pennsylvania |
An Outstanding Supervisor: Leading for Motivation, Innovation, and Retention | Claude Petit, Astellas Pharma |
Full-Day Courses:
Tutorial on Deep Learning and Generative AI:
Instructors: Haoda Fu, Amgen
Target Audience: People with at least master level statistics training
Prerequisites for participants: People know linear regression, basic programming
Computer and software requirements: Python, PyTorch
In an era where AI technologies are transforming industries, understanding their foundations and applications is essential for a wide range of professionals. This course is designed to equip statisticians, biostatisticians, researchers, and decision-makers with the mental models needed to navigate and leverage AI effectively. Whether you’re a decision-maker aiming to make informed choices about AI tools, a researcher seeking to integrate AI into your work, or a statistician looking to build advanced models, this course offers a valuable gateway to the world of AI.
Focusing on deep learning and generative AI, participants will gain hands-on experience with PyTorch, learn foundational concepts, and explore state-of-the-art architectures such as CNNs, GNNs, ResNet, U-Net, and transformers. The course also delves into applications of these models in medical imaging and drug discovery, as well as cutting-edge generative AI techniques like GANs, VAEs, DDPM, score-based models, and the mechanics behind large language models (LLMs).
By bridging technical knowledge with practical insights, this course empowers participants to apply AI in healthcare, research, and beyond, making it an indispensable resource for those seeking to understand and shape the future of AI in their fields.
The following are the outlines of the short course.
Why: History of Deep Learning and Generative AI
Build Our First Neural Network Model from Scratch
(Break 1)
Let Us Code Together
Build Our First Deep Learning Model for Computer Vision
(Launch break)
Sequence Classification Model
Nuts and Bolts for LLM: Sequence-to-Sequence Models
(Break 2)
Generative AI Family
Advanced Topics and Extensions: Generative AI on Smooth Manifolds
Final Thoughts
Short Bio: Dr. Haoda Fu is Head of Exploratory Biostatistics in Amgen, before that he was an Associate Vice President and an Enterprise Lead for Machine Learning, Artificial Intelligence, from Eli Lilly and Company. Dr. Haoda Fu is a Fellow of ASA (American Statistical Association), and IMS Fellow (Institute of Mathematical Statistics). He is also an adjunct professor of biostatistics department, Univ. of North Carolina Chapel Hill and Indiana university School of Medicine. Dr. Fu received his Ph.D. in statistics from University of Wisconsin – Madison in 2007 and joined Lilly after that. Since he joined Lilly, he is very active in statistics and data science methodology research. He has more than 100 publications in the areas, such as Bayesian adaptive design, survival analysis, recurrent event modeling, personalized medicine, indirect and mixed treatment comparison, joint modeling, Bayesian decision making, and rare events analysis. In recent years, his research area focuses on machine learning and artificial intelligence. His research has been published in various top journals including JASA, JRSS-B, Biometrika, Biometrics, ACM, IEEE, JAMA, Annals of Internal Medicine etc.. He has been teaching topics of machine learning and AI in large industry conferences including teaching this topic in FDA workshop. He was board of directors for statistics organizations and program chairs, committee chairs such as ICSA, ENAR, and ASA Biopharm session. He is a COPSS Snedecor Awards committee member from 2022-2026, and also served as an associate editor for JASA theory and method from 2023, and JASA application and case study from 2025-2027.
Category: Technology Training
Large-Scale Spatial Data Science:
Instructors: Marc Genton, KAUST; Sameh Abdulah, KAUST; Mary Lai Salvana, University of Connecticut
Target Audience: students, data scientists, geospatial analysts, researchers, and other practitioners in the academe and industry
Prerequisites for participants: background in data science, computing, geospatial analysis, or related areas
Computer and software requirements: R
The course, designed for data scientists, geospatial analysts, and researchers, will provide a comprehensive understanding of advanced methods in large-scale geospatial data science. The focus will be on three key topics: large-scale data modeling and prediction, accelerating geospatial data processing with multi- and mixed-precision techniques on modern hardware architectures, and parallelizing related R codes using the first parallel runtime system package in R. Participants will first explore ExaGeoStatCPP, a parallel framework for high-performance geostatistical computations. It enables efficient modeling and prediction of large-scale geospatial datasets within C++ and R environments. The course will also focus on the MPCR package, which provides multi- and mixed-precision support on CPUs and GPUs. Attendees will learn how to integrate MPCR functions into their R workflows to optimize performance and precision trade-offs in computational tasks. Participants will also be introduced to RCOMPSs, a new runtime system designed to parallelize R code across HPC systems. The course will demonstrate how RCOMPSs can be used to accelerate R code execution in high-performance computing environments, providing hands-on experience in parallelizing computations effectively. Hands-on sessions will provide practical examples of parallelizing computations.
The following are the outlines of the short course.
1- Overview of Spatial Statistics
– Introduction to spatial statistics, including background and tools for large-scale spatial data manipulation.
Instructor: Marc Genton (1 hour)
2- Introduction to High-Performance Computing (HPC) and Parallel Systems
– Overview of HPC and parallel hardware systems with an introduction to ExaGeoStatCPP and its large-scale geospatial data modeling capabilities using modern parallel systems, including GPUs.
Instructor: Sameh Abdulah (1 hour)
3- Hands-on: Spatial Data Modeling and Prediction with ExaGeoStatCPP in R
– Practical session on spatial data modeling and prediction, focusing on performance and accuracy with large synthetic and real datasets.
Instructors: Sameh Abdulah & Mary Salvana (1 hour)
4- Introduction to Multi-Precision and Mixed-Precision Computing
– Overview of multi-precision and mixed-precision computing, featuring the MPCR R package on CPU and GPU architectures.
Instructor: Mary Salvana (1 hour)
5- Hands-on: Spatial Data with Multi- and Mixed-Precision Computation in R
– Practical session using the MPCR package to process spatial data with multi- and mixed-precision techniques in R.
Instructors: Mary Salvana & Sameh Abdulah (1 hour)
6- Overview of Parallel Processing with RCOMPSs
– Introduction to parallel processing using the RCOMPSs runtime system in R, focusing on task-based parallelism.
Instructor: Sameh Abdulah (1 hour)
7- Hands-on: Developing Task-Based Algorithms for Big Data
– Practical session on building task-based algorithms with examples of parallelizing spatial data analysis for big data applications.
Instructors: Sameh Abdulah & Mary Salvana (1 hour)
Short Bio: Marc Genton is Al-Khawarizmi Distinguished Professor of Statistics at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia. He received the Ph.D. degree in Statistics (1996) from the Swiss Federal Institute of Technology (EPFL), Lausanne. He is a fellow of the American Statistical Association (ASA), of the Institute of Mathematical Statistics (IMS), and the American Association for the Advancement of Science (AAAS), and is an elected member of the International Statistical Institute (ISI). In 2010, he received the El-Shaarawi award for excellence from the International Environmetrics Society (TIES) and the Distinguished Achievement award from the Section on Statistics and the Environment (ENVR) of the American Statistical Association (ASA). He received an ISI Service award in 2019 and the Georges Matheron Lectureship award in 2020 from the International Association for Mathematical Geosciences (IAMG). He led a Gordon Bell Prize finalist team with the ExaGeoStat software for Super Computing 2022. He received the Royal Statistical Society (RSS) 2023 Barnett Award for his outstanding research in environmental statistics and the prestigious 2024 Don Owen Award from the ASA’s San Antonio Chapter. He again led a Gordon Bell Prize in Climate Modeling winner team with Exascale Climate Emulators for Super Computing 2024. His research interests include statistical analysis, flexible modeling, prediction, and uncertainty quantification of spatio-temporal data, with applications in environmental and climate science, as well as renewable energies.
Sameh Abdulah obtained his M.S. and Ph.D. degrees from Ohio State University, Columbus, USA, in 2014 and 2016, respectively. Presently, he serves as a research scientist at the Extreme Computing Research Center (ECRC), King Abdullah University of Science and Technology, Saudi Arabia. His research focuses on various areas, including high-performance computing applications, big data, bitmap indexing, handling large spatial datasets, parallel spatial statistics applications, algorithm-based fault tolerance, and machine learning and data mining algorithms. Sameh was a part of the KAUST team nominated for the ACM Gordon Bell Prize in 2022 and winning it on 2024 (climate track) for their work on large-scale climate/weather modeling and prediction.
Mary Lai Salvana is an Assistant Professor in Statistics at the University of Connecticut (UConn). Prior to joining UConn, she was a Postdoctoral Fellow in the Department of Mathematics at the University of Houston. She received her B.S. and M.S. degrees in Applied Mathematics from Ateneo de Manila University, Philippines, in 2015 and 2016, respectively, and Ph.D. degree at the King Abdullah University of Science and Technology (KAUST), Saudi Arabia. Her research interests include extreme and catastrophic events, risks, disasters, space-time statistics, environmental statistics, high performance computing, and computational statistics.
Category: Methodology, Technology Training
An introduction to the analysis of incomplete data:
Instructors: Ofer Harel, University of Connecticut
Target Audience: Grad students, practitioners, statisticians, biostatistician
Prerequisites for participants: basic Statistics
Computer and software requirements: NA
Missing data is a common complication in applied research. Although most practitioners are still ignoring the missing data problem, numerous books and research articles demonstrate that dealing with it correctly is very important. Biased results and inefficient estimates are just some of the risks of incorrectly dealing with incomplete data. In this course, we will introduce incomplete data vocabulary and present problems and solutions to the missing data issue. We will emphasize practical implementation of the proposed strategies including discussion of software to implement procedures for incomplete data.
Performance objectives
This one-day course introduces incomplete data vocabulary, assumptions, methods, computing, and software. We include descriptions of how the methods can be implemented in R. In particular, we will illustrate different missing data methodologies and the advantages and disadvantages of their use.
The following are the outlines of the short course.
• Motivation for the importance of incomplete data techniques
• Ad-hoc techniques such as case deletion, weighting, and single imputation techniques.
• Principled techniques such as maximum likelihood, Bayesian and semi-parametric techniques
• Multiple imputation
• Missing data assumptions
• Data exploration
• Data examples using R
• Step-by-step recommendations to the treatment of incomplete data
Time permitting
• Strategies for data analysis with two-types of missing values
Short Bio: Ofer Harel, Ph.D. is the Dean of the College of Liberal Arts and Sciences and a professor in the Department of Statistics. Dr. Harel was the Interim Dean of the College of Liberal Arts and Sciences 2023-2024; the Associate Dean for Research and Graduate Affairs in the College of Liberal Arts and Sciences 2021-2023; the Director of Graduate Admissions at the Department of Statistics 2016-2021 and was a principal Investigator in the Institute for Collaboration on Health, Intervention, and Policy (InCHIP) at the University of Connecticut 2010-2016. Dr. Harel received his doctorate in statistics in 2003 from the Department of Statistics at the Pennsylvania State University; where he developed his methodological expertise in the areas of missing data techniques, diagnostic tests, longitudinal studies, Bayesian methods, sampling techniques, mixture models, latent class analysis, and statistical consulting. Dr. Harel received his post-doctoral training at the University of Washington, Department of Biostatistics, where he worked for the Health Services Research & Development (HSR&D) Center of Excellence, VA Puget Sound Healthcare System, and the National Alzheimer’s Coordinating Center (NACC). Dr. Harel has served as a biostatistical consultant nationally and internationally since 1997. Through his collaborative consulting, Dr. Harel has been involved with a variety of research fields including, but not limited to Alzheimer’s, diabetes, cancer, nutrition, HIV/AIDS, health disparities, anti-racism, and alcohol and drug abuse prevention. Dr. Harel is a member of the National Academy of Science, Engineering and Medicine’s Committee on Applied and Theoretical Statistics. Dr. Harel was a member of the (now restructured) Biostatistical Methods and Research Design (BMRD) Study Section at the National Institute of Health and was appointed to the Bureau of Labor Statistics Technical Advisory Committee (BLSTAC) at the U.S. Bureau of Labor Statistics among many national elected and appointed positions.
Category: Methodology
Morning Half-Day Courses:
Merging Data sources: Record Linkage Techniques and Analysis of Linked Datasets:
Instructors: Roee Gutman, Brown University
Time: Morning Session
Target Audience: Masters and PhD level analysts interested in record linkage
Prerequisites for participants: Familiarity with the EM algorithm, some experience with record linkage
Computer and software requirements: Projector
Different systems create vast amount of personal information. Opportunities to use this information are often missed because this information cannot be combined due to privacy concerns. Record linkage methods link data from multiple sources when unique identifiers are unavailable. These techniques harness the power of advanced algorithms, predictive and probabilistic modelling to link records from disparate sources, even when the data suffers from inconsistencies and discrepancies. Recent computational advances resulted in an explosion of record linkage methods.
The following are the outlines of the short course.
The short course will be comprised of five parts:
First part (1 hour and 45 minutes):
1) General introduction to file linkage
a. Brief history of file linkage
b. Assumptions used in the analysis of linked data sources
c. Comparison of record pairs
2) Linkage Methods
a. Fellegi-Sunter Model Theory – Underlying Probability Model
i. Estimation of Match Probabilities
ii. Expectation-Maximization Algorithm
b. Blocking/Blocking Strategy
c. Error Analysis
d. Bayesian record linkage Methods
Second part (2 hours)
3) Application of record linkage methods
a. Considering semi-identifying information
b. Incorporating relationships between semi-identifying variables
c. Describing methods for identifying true links
d. Implementation of the different linkage methods in statistical packages using real examples
4) Propagating linkage errors.
a. Weighting methods
b. Bayesian methods
c. Imputation methods
d. Implementation of the different analysis methods in statistical packages using real examples
5) Future directions
a. Sensitivity analysis for file linkage assumptions
b. Causal Inference with linked datasets
Short Bio: Dr. Roee Gutman is a Professor in the Department of Biostatistics at Brown University. His areas of expertise are file linkage, causal inference, missing data, Bayesian data analysis and their application to big data sources. Dr. Gutman has authored multiple papers for in which he developed novel methods to analyze linked data sources, and for estimating causal effects from observational studies. These methods were applied to address clinical, epidemiological, and health services and policy questions, especially among the elderly. Dr. Gutman has co-authored over 100 publications and his work is published in leading statistical and subject matter journals. For his work, he received ISPOR Health Economics Outcomes and Research – Methodology Award for a paper on estimating causal effects of Meals on Wheels programs on healthcare utilization using linked datasets. Dr. Gutman was also ASA/NSF/BLS Senior Research Fellow, where he worked with BLS researchers on developing novel record linkage methods.
Dr. Gutman presented a workshop on estimating causal effects with multiple treatments at the International Conference on Health Policy Statistics (ICHPS) 2015. He presented a workshop on analysis of linked datasets at ICHPS 2020 and 2023 and recently at the International Population Data Linkage Network conference. He was invited to present in multiple conferences including the JSM and the Patient Centered Outcome Institute (PCORI) Annual Conference. Dr. Gutman also has vast classroom teaching experience from his work at Brown University.
Category: Metodology/Applied
Unleashing the power of machine learning and deep learning to accelerate clinical development:
Instructors: Li Wang, AbbVie; Yunzhao Xing, AbbVie; Sheng, Zhong, AbbVie.
Time: Morning Session
Target Audience: Professionals in statistics, biostatistics, or related disciplines who have an interest in utilizing machine learning, large language models, and computer vision to improve their work
Prerequisites for participants: Basic understanding of statistics and familiarity with programming as such python concepts. Prior exposure to machine learning concepts will be beneficial
Computer and software requirements: The laptop should have Python3.x installed, along with standard data science libraries such as NumPy, pandas, scikit-learn, or PyTorch. Participants should also have access to a text or code editor, such as Jupyter Notebook or Visual Studio Code, to facilitate hands-on exercises during the course.
With the rapid advancement of machine learning (ML) and deep learning (DL) methodology in the last decade, the performances of prediction tasks in many computer science fields (e.g., natural language processing) have been greatly improved. However, the impact of ML/DL in the field of clinical development has been relatively limited. Hence, we would like to propose a short course to motivate and encourage the use of ML/DL in clinical development. The course starts with an overview of ML/DL methodology evolution over time and the related key concepts (e.g., back-propagation, hyperparameter tuning, etc.). Then the latest developments in image processing and natural language processing are introduced, together with their novel applications in clinical development from our recent projects and submitted papers.
In terms of the course outline, the materials of the course are divided into three sections: I. General ML/DL methodology, II. Image processing and applications, and III. Natural language processing and applications.
The following are the outlines of the short course.
Part I – Machine Learning (ML) and Deep Learning (DL) Basics (45minutes)
Overview of similarities and differences between traditional statistics and ML/DL.
Introduction to fundamental neural network concepts: Data transformation (text, speech, images into numerical formats like vectors, matrices, and tensors). Key components of neural networks (neurons, weights, bias, activation functions). Explanation of feedforward data flow, loss functions, and backpropagation using numeric examples. Optimization techniques, such as gradient descent, to understand how models learn and improve.
Part II – Deep Convolutional Neural Networks (DCNNs) for Computer Vision (1 hour and 30 minutes including the break)
Provide an understanding of DCNNs’ pivotal role in computer vision, introducing image datasets, core operations, and significant architectures like VGGNet.
Discuss the evolution of object detection frameworks from Faster R-CNN to YOLO
Introduce image segmentation techniques including Mask R-CNN and U-Net. Explore the use of Generative Adversarial Networks for generating realistic data.
Provide a practical session demonstrating U-Net’s application in medical imaging.
Part III – Natural Language Processing (NLP) and Applications (1 hour and 15 minutes)
Show the pipeline workflow of a case study of applying NLP to detect adverse drug events using the X platform (formerly known as Twitter) Data
Cover fundamental NLP concepts, including word embeddings with Word2Vec. Provide a historical overview of language model advancements, including RNNs, LSTMs, and transformers. Address transformer-based LLMs, focusing on their architecture, self-attention, and efficiency improvements.
Apply these concepts to the case study, comparing model performances, followed by a code review
Short Bio: Li Wang, PhD, is currently Senior Director and Head of Statistical Innovation group in AbbVie. Li is leading Design Advisory which provides strategic and quantitative consulting as requested to all Development teams in all Therapeutic Areas to facilitate innovative thinking and complex innovative design evaluation. Li also co-leads Development Advanced Analytics capability in AbbVie to drive Machine Learning and Advanced Analytics research and application in Development. Prior to this senior leadership role, he led Immunology and Solid Tumor statistical design and strategy discussions and multiple ML, RWE and Bayesian innovation projects from 2017 to 2019. From 2006 to 2017, he contributed to and subsequently led several NDAs and SNDAs including blockbusters Eliquis, Onglyza and Rinvoq. He is enthusiastic in teaching statistical courses to non-statisticians, and investigating/ promoting novel statistical and machine learning methodologies.
He is very active in statistical communities as board member of ICSA (2023-2025), Chair-elect of Development Committee of SCT (2024), Chair of Education Committee of DIA, Cytel Innovation Advisory Board, DIA biostatistics industry and regulator forum planning committee and past chair of ICSA Midwest Chapter.
Li received his B.S. in Applied Mathematics from Peking University and his Ph.D in Statistics from Virginia Tech.
Li taught this short course in 2024 Regulatory-Industry Statistics Workshop and received lots of good feedbacks from statisticians from both pharmaceutical companies and FDA.
Dr. Yunzhao Xing is an associate director of Statistical Innovation at AbbVie, boasting a PhD in Material Science from the University of North Carolina at Chapel Hill and a background in Physics. Prior to AbbVie, he served as a senior scientist at Halliburton, focusing on sensor modeling and simulation. Since joining AbbVie in 2018, Yunzhao has led numerous successful projects in machine learning, deep learning, and image processing. His skill set encompasses web scraping, simulation modeling, and interactive web application development, making him a pivotal contributor to AbbVie’s Statistical Innovation Group. Yunzhao is recognized for his commitment to pushing the boundaries of statistical innovation.
Dr. Sheng Zhong is the Director of Statistics at AbbVie Inc. He received his Ph.D. in Statistics from the University of Chicago. At AbbVie, he led multiple innovative predictive modeling projects across different fields such as clinical trial enrollment duration forecasting, virtual controls based on targeted learning in single-arm trials, and predictive clinical safety monitoring based on structured and text data. His recent works have led to multiple publications and manuscripts under review. Before joining AbbVie in 2016, Dr. Zhong worked at a big data analytics start-up for heavy machine equipment maintenance, where his work led to 3 US patents.
Category: Methodology and technology training
Bayesian Evidence Synthesis Approaches to Accelerate Rare Disease Drug Development:
Instructors: Satrajit Roychoudhury, Pfizer Inc; Wei Wei, Yale University
Time: Morning Session
Target Audience: Statisticians involved in clinical trials and observation studies
Prerequisites for participants: MS degree in Statistics
Computer and software requirements: NA
Scientists have identified more than 7000 rare diseases which affect more than 30 million people in the United States including children. However, drug development in rare disease faces many challenges including small population size, heterogeneity of treatment effects, and lack of reproducibility between studies conducted in different regions and different populations. As a result, clinical trials studying rare diseases are often underpowered and cannot generate sufficient statistical evidence required for regulatory approval. Innovative statistical methodology can play a crucial role harnessing data from different sources to fulfill this gap.
The following are the outlines of the short course.
I. Introduction: Motivation and general regulatory framework (15 min)
II. Bayesian Methods for borrowing external evidence (60 min)
a. Overview of available methods: power prior, commensurate prior, rMAP, EXNEX, MEMs
b. Deep dive into Bayesian methods (e.g., EXNEX, MEMs)
c. Assess the extent of borrowing using effective sample size
d. Implementation using R
e. Examples
III. Novel approaches for evaluating the causal evidence of an IND (60 min)
a. Basic assumptions in causal inference
b. Matching and weighting based on propensity scores
c. Propensity score weighted multisource-exchangeability modelling (PW-MEM)
d. Evaluate the operating characteristics of PW-MEM under a variety of assumptions
e. Examples
IV. Transfer statistical learning from external sources to the current trial (60 min)
a. Overview of Bayesian Additive Regression Trees (BART)
b. Incorporating external evidence in drug development using BART
c. Examples
V. Regulatory Considerations (30 mins)
a. Overview of regulatory guidelines: FDA and EMA
b. Examples of successes and failures in regulatory approval
VI. Concluding Remarks and Discussion (15 min)
Short Bio: Dr. Satrajit Roychoudhury is an Executive Director and Head of the Statistical Research and Innovation group in Pfizer Inc. He has 17 years of extensive experience in working with different phases of clinical trials for drug and vaccine. His research interest includes survival analysis, use of model-based approaches and Bayesian methods in clinical trials. He served as the industry co-chair for ASA Biopharmaceutical Section Regulatory-Industry Workshop in 2018 and co-chair for DIA/FDA Biostatistics Industry and Regulator Forum in 2023. Satrajit is an elected Fellow of the American Statistical Association and recipient of Royal Statistical Society (RSS)/Statisticians in the Pharmaceutical Industry (PSI) Statistical Excellence in the Pharmaceutical Industry Award in 2023 and Young Statistical Scientist Award from the International Indian Statistical Association in 2019.
Wei Wei is an Assistant Professor in the Department of Biostatistics at the Yale School of Public Health. He received his Ph.D. in Biostatistics from the Medical University of South Carolina and joined the Department of Biostatistics at YSPH in 2017 as an associate research scientist. Dr. Wei’s research area focuses on development of early phase clinical trial designs, with particular interest in cancer targeted and immunotherapeutic agents. In addition to his research in cancer clinical trials, Wei’s research expertise also includes statistical genomics, biomarker discovery and neurophysiology.
Category: Methodology, Application
Quantitative decision making for staging up to phase 3 clinical development:
Instructors: Cong Han, Astellas Pharma Global Development, Inc; Yusuke Yamaguchi, Astellas Pharma Global Development, Inc; Annie Wang, Astellas Pharma Global Development, Inc; Yongming Qu, Eli Lilly and Company
Time: Morning Session
Target Audience: Biostatisticians working on application of statistics in biopharmaceutical research and development
Prerequisites for participants: NA
Computer and software requirements: NA
Quantitative decision making (QDM) in drug development involves pre-specifying a set of Go/No-Go decision criteria, and evaluating operating characteristics of these criteria. A key application of QDM is to make decision for the next stage of development based on available data. While decisions can be based on various success criteria, common scenarios focus on parameter that characterize the treatment effect or an estimate thereof, or the probability of technical success. The course will cover the foundations of QDM in clinical development. Topics will include defining a critical success factor (CSF) for a phase 2 proof-of-concept study, using frequentist and Bayesian predictive models to evaluate CSF, and employing novel estimators to enhance estimation efficiency. The course will also address challenges in rare disease development, where small, often uncontrolled studies form a basis for stage up to phase 3 controlled studies. A closely related issue is that, following QDM, directly using phase 2 study results to plan phase 3 studies is subject to selection bias, potentially leading to over-estimation of the treatment effect. This issue can be further exacerbated with additional endpoint and/or subgroup selection, as well as potential violation of pre-determined QDM rule in light of seemingly promising post hoc analyses of phase 2 data. The course will also cover the theory and methods for addressing selection bias when planning phase 3 based on positive phase 2 results.
The following are the outlines of the short course.
The course will consist of 4 parts:
1. Introduction to QDM Framework and Methodologies. (8:30 – 9:30)
We will begin by introducing the concept of QDM framework for phase 3 stage up and the fundamentals of its methodologies. This includes a discussion on the importance of QDM in drug development, how to define Go/No-Go decision criteria, and how to assess the study operating characteristics based on these criteria. Go/No-Go decision criteria will be defined in two ways: (i) frequentist approaches using point and interval estimation of the treatment effect of interest; (ii) Bayesian approaches using posterior probability and posterior predictive probability of the treatment effect of interest.
2. Bayesian QDM in Rare Disease Drug Development. (9:30 – 10:15)
We will explore the implementation of Bayesian QDM in rare disease drug development, where phase 2 PoC studies are often small or even infeasible due to the rarity of target diseases. Bayesian QDM offers a potential solution to the challenge of small sample size by incorporating existing prior information, such as expert opinion, data from previously completed clinical trials, and real-world data. Additionally, the Bayesian framework, which continuously updates knowledge through probabilistic statements, is particularly well-suited for decision analytics within individual trials and across entire drug development programs. We will also discuss Bayesian QDM in scenarios where uncontrolled PoC studies serve as a basis for controlled phase 3 registration studies – a common scenario in cell and gene therapy development.
3. Addressing Selection Bias in QDM. (10:30 – 11:30)
Implementation of QDM (or any implicit decision making) can induce a selection bias when using phase 2 results to plan phase 3 studies. We will discuss approaches to treatment effect estimation following QDM, including direct estimation of treatment effect accounting for selection, estimation and subsequent correction for bias, and adjustment for bias induced by multiple comparisons.
4. Utilizing Prediction Models in QDM. (11:30 – 12:30)
We will share examples of using prediction models in QDM, focusing on: the use of biomarkers in understanding the efficacy and safety in early phase studies, predicting the longer-term outcome from short-term data, and modeling the difference in population in phase 2 to phase 3 studies.
Short Bio: Cong Han, PhD, is an Executive Director, Biostatistics at Astellas Pharma Global Development Inc., where he leads statistical support for cell & gene therapy and ophthalmology programs. He has clinical trial and clinical development experience in academic and pharmaceutical industry settings in a variety of therapeutic areas, including experience in implementing QDM as well as advising on QDM activities.
Yusuke Yamaguchi, PhD, is an Associate Director, Biostatistics at Astellas Pharma Global Development Inc., serving as a lead statistician supporting cell & gene therapy programs. He has published over 40 papers on statistical methodology and clinical trials, including meta-analysis, longitudinal data analysis, and dose-finding trial design.
Annie Wang, PhD, is a Director of Biostatistics at Astellas Pharma Global Development Inc., where she serves as a team lead supporting cell & gene therapy and ophthalmology programs. With over 17 years of pharmaceutical industry experience across multiple therapeutic areas and all stages of drug development, her research expertise includes adaptive design, Bayesian approaches, and dose-response modeling.
Yongming Qu, PhD, is a Vice President at Eli Lilly and Company. He has more than 20 years of pharmaceutical industry and has been active in research in statistical methodology including the areas of surrogate endpoints, modeling and prediction, estimands, missing data imputation, and novel study designs. He published nearly 100 peer reviewed articles in statistical and medical journals. He is a Fellow of American Statistical Association.
Category: Methodology
Interface between Regulation and Statistics in Drug Development:
Instructors: Birol Emir, Pfizer Inc; Michael Gaffney, Independent Consultant; Demissie Alemayehu, Pfizer Inc
Time: Morning Session
Target Audience: This course is particularly aimed at statisticians who are relatively new to the pharmaceutical industry and wish to broaden their knowledge and understanding of the interplay between statistics and regulatory science in drug development.
Prerequisites for participants: The material is mostly written at a level that is accessible to audience with an intermediate knowledge of statistics.
Computer and software requirements: NA
This course is aimed primarily at statisticians who are relatively new to the pharmaceutical industry and wish to broaden their knowledge and understanding of the interplay between statistics and regulatory science in drug development. The main focus of the course is raising awareness in the intersection of statistics and regulatory affairs, with special emphasis on salient features of traditional and emerging issues and methodologies in the design, conduct, analysis, and reporting of clinical trials or observational studies intended for regulatory purposes. While the course is aimed at statisticians with limited experiences in this area, it may also be of benefit to other more experienced statisticians who wish to refresh their knowledge of current topics or keep up to date on best practices and regulatory developments. The course consists of four sections, with each section dedicated to a specific topic in regulatory affairs and statistics. In each case, the topic will be discussed from the statistical and regulatory perspectives. We will highlight current and emerging trends and suggest appropriate best practices. Notably, the course will also cover the recent progress in machine learning and Big Data analytics and the prevailing regulatory thinking on the integration of these new techniques in drug development. Issues of processing, analyzing and reporting multi-dimensional data will be highlighted, with special reference to data integrity, privacy and confidentiality.
The following are the outlines of the short course.
Part 1 introduces basic statistical and regulatory issues with special reference to the role of regulations and guidance documents, and the evolving role of the statistician vis-à-vis the changing regulatory and healthcare landscapes. In Part 2, we will discuss major statistical issues that commonly arise in the course of drug development and regulatory interactions, outlining measures that should be taken to ensure the validity of inferential results that are intended to be the basis for regulatory decision. The discussion will be illustrated with respect to regulatory guidance documents and best practices. Part 3 highlights the role of the statistician in the course of drug development, with special emphasis on the skills required to ensure effective interactions with regulatory and other external bodies. Part 4 addresses trending topics in drug development, with emphasis on the current regulatory thinking and the associated challenges and opportunities.
At the end of the course, we believe that the course participants will have a good understanding of the statistical and regulatory issues that commonly arise in the course of drug development. Notably, prospective attendees of this course will get a thorough appreciation of the current state of the statistical and regulatory sciences in the context of pharmaceutical research. In addition, attendees will be exposed to the behaviors and capabilities that are essential in their interactions with internal stakeholders and external partners, including DMCs and regulatory bodies.
Short Bio: Birol Emir, PhD, is Executive Director and Head of Real-World Evidence (RWE) Statistics at Pfizer Inc. He is a Fellow of the American Statistical Association and has served as Adjunct Professor of Statistics and Lecturer at Columbia University in New York. His primary focuses have been on real-world evidence generation, predictive modeling, and genomic data analysis. He has numerous publications in refereed journals, and recently, he co-authored “Interface between regulation and statistics in drug development (Alemayehu, Emir and Gaffney 2021, CRC Press) and he co-edited a book to fill the gap in health economics and outcome research (Alemayehu et al, 2017, CRC press). He has given many invited talks and short courses at statistical and clinical conferences.
Michael Gaffney, PhD, is a retired Vice President, Statistics, at Pfizer, and received his PhD from New York University School of Environmental Medicine with his dissertation in the area of multistage model of cancer induction. Dr. Gaffney has spent his 43- year career in pharmaceutical research concentrating in the areas of design and analysis of clinical trials and regulatory interaction for drug approval and product defense. He has interacted with FDA, EMA, MHRA, and regulators in Canada and Japan on over 25 distinct regulatory approvals and product issues in many therapeutic areas. Dr. Gaffney has published 40 peer- reviewed articles and has presented at numerous scientific meetings in diverse areas of
modeling cancer induction, variance components, harmonic regression, factor analysis, propensity scores, meta- analysis, large safety trials,
and sample size re-estimation. Dr. Gaffney was recently a member of the Council for International Organizations of Medical Sciences (CIOMS)
X committee and was a co- author of CIOMS X: Evidence Synthesis and Meta-Analysis for Drug Safety.
Demissie Alemayehu, PhD, is Vice President and Head of the Statistical Research and Data Science Center at Pfizer Inc. He is a Fellow of the
American Statistical Association, has published widely, and has served on the editorial boards of major journals, including the Journal of the
American Statistical Association and the Journal of Nonparametric Statistics. Additionally, he has been on the faculties of both Columbia
University and Western Michigan University. He has co- authored a monograph entitled Patient-Reported Outcomes: Measurement,
Implementation and Interpretation and co- edited another, Statistical Topics in Health Economics and Outcome Research, both published by
Chapman & Hall/ CRC Press.
Category: Methodology and career development
A Bootcamp on Git for Project Management with Applications in Data Science:
Instructors: Jun Yan, University of Connecticut
Time: Morning Session
Target Audience: researchers and practitioners who want to become proficient in reproducible data science
Prerequisites for participants: Data science and writing or programming experience
Computer and software requirements: Git; Quarto; R/Python; LaTeX
This half-day hands-on short course is designed to demystify Git, a foundational tool for version control and collaboration used by virtually all major tech companies—such as Google, Microsoft, and Amazon—to manage complex projects and streamline teamwork. Mastering Git is essential for building robust, reproducible data science workflows. This bootcamp is tailored for statisticians and data scientists, including students eager to expand their technical skills. Participants will gain practical experience in core Git functionalities, such as tracking changes, branching, and collaborative workflows, while also exploring its integration with Quarto for reproducible reporting. Additionally, the course incorporates data ethics as a case study, showcasing how Git facilitates version-controlled discussions on critical data challenges. By the end of the session, attendees will have a solid grasp of Git and actionable insights into reproducible and ethical data science practices, elevating their productivity and collaboration.
The following are the outlines of the short course.
Part 1 (1 hour 45 minutes)
+ Introduction to Reproducible Data Science (15 minutes)
– Discuss the importance of reproducibility and collaborative practices.
– Overview of version control systems in data science.
+ Getting Started with Git (30 minutes)
– Introduce Git fundamentals: repositories, commits, branches, and merges.
– Guide participants through installing and configuring Git.
+ Hands-on Git Exercises (30 minutes)
– Practice initializing a repository and making commits.
– Explore branching and merging through interactive examples.
+ Introduction to Quarto (15 minutes)
– Present Quarto as a tool for reproducible reporting and documentation.
– Demonstrate how Quarto integrates with Git for version control.
+ Hands-on Quarto Exercises (15 minutes)
– Create a simple Quarto document.
– Implement version control on the document using Git.
Break (15 minutes)
Part 2 (Remaining Time)
+ Collaborative Project: Co-developing Notes on Data Science Ethics and Communication (1 hour 30 minutes)
– Organize participants into groups to work on a shared Git repository.
– Assign topics related to data science ethics and communication for collaborative writing.
– Facilitate the use of Git workflows: cloning, branching, committing, pushing, and pulling changes.
– Address merge conflicts and demonstrate their resolution.
– Encourage discussions on ethical practices and effective communication strategies in data science.
+ Wrap-up and Q&A (15 minutes)
– Summarize key takeaways from the session.
– Provide additional resources for further learning.
– Open the floor for questions and feedback.
Short Bio: Jun Yan is a Professor in the Department of Statistics at the University of Connecticut and a Research Fellow at the Center for Population Health at UConn Health. He received his PhD in statistics from University of Wisconsin–Madison in 2003. Before joining UConn in 2007, he was at the University of Iowa for four years. Dr. Yan’s methodological research interests include spatial extremes, measurement error, survival analysis, clustered data analysis, and statistical computing, most of which are motivated by his cross-disciplinary collaborations. His application domains are environmental sciences, public health, and sports. In particular, he has worked on statistical methods and applications in detection and attribution of climate change. With a special interest in making his statistical methods available via open source software, he and his coauthors developed and maintain a collection of R packages in the public domain. In 2020, he started editorship of the Journal of Data Science. He is a fellow of the American Statistical Association and the Institute of Mathematical Statistics.
Category: Technology training; Career development
Afternoon Half-Day Courses:
Statistical methods for time-to-event data subject to truncation:
Instructors: Jing Qian, University of Massachusetts Amherst
Time: Afternoon Session
Target Audience: students, practitioners or researchers with an interest in understanding statistical analysis of time-to-event data subject to truncation.
Prerequisites for participants: knowledge of statistical inference; basic knowledge of survival analysis.
Computer and software requirements: a computer/laptop installed with R (and RStudio) is recommended
Truncated time-to-event data arises in various fields, including biomedical sciences, public health, epidemiology, and astronomy. It involves biased sampling where the event time is observed only if it falls within a certain interval. This short course reviews statistical methods for time-to-event data subject to left, right, and sequential truncation, exploring both classical and advanced techniques.
The first half introduces classical risk-set adjustment methods for estimating event time distributions and conducting regression analysis with left-truncated data, with or without additional right censoring. Methods for right-truncated data will also be discussed. The assumption of quasi-independence between truncation and event times, which is crucial for the validity of classical methods, will be emphasized, along with hypothesis tests for assessing this assumption.
The second half covers recent methodological advances for analyzing truncated time-to-event data. Topics include methods for estimation and regression under dependent truncation, sequential truncation in observational cohort studies with complex sampling schemes, and techniques for estimation and regression under sequential truncation. Discussions will be supplemented with real-world data examples. R software will be used to demonstrate the implementation of the techniques.
The following are the outlines of the short course.
Teaching Plan for a half-day course entitled “”Statistical methods for time-to-event data subject to truncation””
(using morning time as an illustration)
8:30-9:30am, Part I: Introduction to time-to-event data subject to truncation, highlighting the difference between censoring and truncation. Classical risk-set adjustment methods for estimating event time distribution with left-truncated time-to-event data, with or without additional right censoring. Estimation of event time distribution with right-truncated time-to-event data.
9:30-10:15am: Part II: Regression analysis with left-truncated and right-censored time-to-event data, including Cox model and accelerated failure time model. Regression analysis with right-truncated data.
10:15-10:30am: 15 minutes break
10:30-11:45am: Part III: Hypothesis tests for assessing quasi-independence between truncation and event times. One-sample estimation and regression analysis methods under dependent truncation.
11:45am-12:30pm: Part IV: The concept of sequential truncation in observational cohort studies with complex sampling schemes. Methods for estimating event time distributions and performing regression analysis in the presence of sequential truncation.
Short Bio: Dr. Jing Qian is a Professor of Biostatistics in the Department of Biostatistics and Epidemiology at the University of Massachusetts Amherst, with extensive experience in statistical methodology and its applications to public health and biomedical research. Dr. Qian’s research focuses on the development of statistical methods for survival analysis of biomedical outcomes subject to complex censoring or sampling, biomarker evaluation and risk prediction, and covariates subject to censoring and truncation. His collaborative research spans neurodegenerative diseases such as Alzheimer’s and Parkinson’s diseases, breast cancer epidemiology, and health services research. He has served as the Principal Investigator on multiple NIH-funded grants and has published extensively in leading statistical and biomedical journals. Dr. Qian is also an experienced educator, having taught graduate-level courses on introductory, intermediate, and advanced biostatistical methods for over a decade.
Dr. Jing Qian is an experienced educator, having taught graduate-level courses on introductory, intermediate, and advanced biostatistical methods for over a decade at the University of Massachusetts Amherst. His teaching portfolio includes introductory and applied biostatistics courses for public health students, such as Introduction to Biostatistics and Intermediate Biostatistics; intermediate-level theory and methods courses for biostatistics graduate students, such as Fundamentals of Probability and Statistical Inference and Topics in Health Data Science; and advanced statistical theory and methods courses for Ph.D. students in biostatistics, such as Applied Statistical Learning and Advanced Statistical Inference. In all of these settings, Dr. Qian emphasizes the challenge and importance of engaging students and fostering active learning in the classroom.
Beyond classroom teaching, Dr. Qian has served as the primary dissertation advisor to more than 10 postdoctoral research fellows, doctoral students, and master’s students.
Category: Methodology and application
Introduction of Dynamic Borrowing in Clinical Trials and Regulatory Submission:
Instructors: Jerry Li, BMS; Ivan Chan, BMS; Inna Perevozskaya, BMS; Hao Sun, BMS
Time: Afternoon Session
Target Audience: Statisticians working on clinical trials
Prerequisites for participants: NA
Computer and software requirements: NA
Clinical trials represent a significant portion of drug development in both budget and duration. While randomized clinical trials are still gold standard, given the availability of the sheer amount of prior clinical trial data and real-world data/evidence, finding innovative ways to design more efficient clinical trials and to supplement relevant data for regulatory submission have become imperative and led to the increasing popularity of dynamic borrowing.
Dynamic borrowing can bring significant benefits to expedite drug development and to a company’s portfolio. Specifically, dynamic borrowing can overcome challenges when patients are difficult to enroll, reduce the size/duration/risk of a new trial ensuring adequate power, have great operational and cost saving benefits, and boost the power and improve the efficiency of analysis for a trial with a limited sample size.
This short course will cover the common sources of data for borrowing and introduce the approaches of both frequentist and Bayesian borrowing as well as the regulatory landscape in this space.
The following are the outlines of the short course.
The first part of this short course will introduce the rationale and overall benefits of dynamic borrowing, The first part will also cover the general methods. After the break, the second part will be more specific about the methods, regulatory landscape, and possible case studies and demo of an R Shiny App.
Short Bio: Dr. Jerry Li is currently a Director and TA Lead of Hematology Malignant Myeloid Diseases in Global Biostatistics and Data Sciences (GBDS), BMS. Jerry leads statistical support for the clinical development of multiple assets for clinical trials design including phase2/3 seamless design, interactions with worldwide health authorities, and life cycle management of the assets. Jerry established and co-lead the Dynamic Borrowing Working Group at BMS. Dr. also recently co-organized and co-moderated a whole-day Biostatistics Research and Innovation Network (BRAIN) meeting at BMS dedicated to dynamic borrowing topic.
Prior to BMS, Jerry was at Merck and Daiichi Sankyo following working at the FDA. He has held positions with increasing responsibilities in multiple therapeutic areas including oncology, neurosciences, immunology, and infectious disease and demonstrated a track record of successful regulatory approvals.
In addition to dynamic borrowing, Jerry is also interested in dose optimization, phase 2/3 seamless design, statistical modeling of disease-modifying treatment effect, and properties of log-rank test following covariate-adaptive randomization in oncology trials. Jerry received his Ph.D. in statistics from the University of Maryland, College Park.
Dr. Ivan Chan has more than 25 years of experience in the pharmaceutical industry. He is currently a VP and interim Head of Global Biometrics & Data Sciences at Bristol Myers Squibb. Prior to joining BMS, Ivan was VP and Head of Statistical Sciences at AbbVie leading multiple therapeutic areas. In addition, he spent 21 years previously at Merck Research Laboratories where he led the global statistical support for vaccines and early oncology.
Ivan received his B.S. in Statistics from the Chinese University of Hong Kong and Ph.D. in Biostatistics from the University of Minnesota. He is an elected Fellow of the American Statistical Association (ASA) and an elected Fellow of the Society for Clinical Trials (SCT). Ivan was the 2021 recipient of the Deming Lecturer Award from ASA for his outstanding contributions to vaccine development. He currently serves as Executive Director of the International Society for Biopharmaceutical Statistics and Co-Chair of Deming Conference on Applied Statistics. Ivan has previously served as the President of the International Chinese Statistical Association and the Program Chair of the ASA Biopharmaceutical Section. He has 90+ publications in statistical and clinical journals.
Dr. Inna Perevozskaya is a Fellow of the American Statistical Association and a Senior Director and Senior Biometrics Fellow, Head of Statistical Methodology at BMS, and co-lead of dynamic borrowing working group at BMS.
Dr. Hao Sun is currently a senior manager in Global Biostatistics and Data Sciences (GBDS), BMS. Hao received his PhD in Statistics from Iowa State University in 2022. In addition to supporting clinical trials at BMS, Hao is a co-lead of Methodology and Tools subteam within Dynamic Borrowing Working Group at BMS. He is also involved in research of dose optimization. Hao has successfully mentioned several summer interns in dynamic borrowing and dose optimization.
During his PhD, Hao was involved in multiple reach projects including providing high-dimensional mixed graphical model, establishing the consistency of graph reconstruction under complex survey sample designs, and developing design-based BIC for neighbor selection with group lasso to recover the true neighborhood as well as optimizing survey pseudo composite likelihood with coordinate gradient descent to estimate edge parameters. Other projects that Hao was working on include road change detection, Shiny App development for National Resource Inventory, and mixture responses for small area estimation.
Category: Methodology
Comparing Python and R for Data Visualization:
Instructors: Jose Manuel Magallanes, Pontificia Universidad Catolica del Peru
Time: Afternoon Session
Target Audience: Basic to Intermediate Python/R users
Prerequisites for participants: Basic R and Python (not necesarily both)
Computer and software requirements: Laptop is useful to complete exercises. R and Python installed, or access to GoogleColab and RStudio Cloud
The short course will introduce the plotting capabilities of Python and R for tabular, network and geographical data.
The presentation includes a some basic rules of thumb for data visualization. Then, we will compare GGPLOT2 in R with ALTAIR in Python. For network data it will be emphasized the capabilities of both Python (networkx and others ) and R (mainly igraph) to highlight relevent players and connections, and communities. Finally, we will work on shapefiles, paying special attention to projections and principles behind choropleth maps using both R (sf) and Python (geopandas).
The session will make basic use of GitHub for storing and replicate our work.
The following are the outlines of the short course.
Part 1: Presentation
a. Principles of Data Viz
b. Review of Tools to be used
c. Installations needed
Part2. Tabular Data Visualization
a. Univariate plot (CATegorical and NUMerical)
b. Bivariate Plot (CAT-CAT / NUM-NUM/ CAT-NUM)
c. Multivariate plots
Part3. Network data
a. Formatting data as networks.
b. Computing and visualizing relevant nodes
c. Exploring communities
Part4. Geographical Data
a. Maps and Projections
b. Chroropleths and data discretization
Short Bio: Professor Magallanes has two Doctoral degrees. One in Computational Social Science from George Mason University, and another one in Psychology from Universidad Nacional Mayor de San Marcos (UNMSM). He also holds a Master degree in Political Science and Public Management from PUCP, and a BSc in Computer Science from UNMSM. He has received multidisciplinary training on computational approaches on governance matters from University Michigan (ICPSR), National University of Australia, National University of Singapore (ISS), University of Chicago (Argonne NL), Carnegie Mellon University (CASOS), and Harvard Kennedy School.
Professor Magallanes is a Full Professor at the Departement of Social Sciences – Pontificia Universidad Catolica del Peru (PUCP). He is also a part-time Professor at the Universidad Nacional Mayor de San Marcos (UNMSM), and a Lecturer at the The Data Analytics and Computational Social Science (DACSS) program in the School of Public Policy at UMass Amherst. He has been a visiting professor at the University of Washington, and a visiting scholar at Duke, Getulio Vargas Foundation (Brazil), and Universidad de los Andes (Colombia).
Category: Methodology
Win Statistics (Win Ratio, Win Odds, and Net Benefit): Theories and Applications:
Instructors: Gaohong Dong, Sarepta Therapeutics;
Time: Afternoon Session
Target Audience: Statisticians, PhD students, and Statistical researchers in academia, industry, and government
Prerequisites for participants: NA
Computer and software requirements: NA
Over the past decade, the win ratio (Pocock et al. 2012), the win odds (Dong et al. 2019), and the net benefit (Buyse 2010)−as the ratio, odds, and difference of win proportions, respectively−have been comprehensively studied. The three win statistics hierarchically analyze prioritized multiple outcomes. Compared to the traditional “time to first event” analysis for multiple time-to-event outcomes, the win statistics allow the prioritization of multiple outcomes and effectively conduct a “time to worst event” analysis, which can be clinically more meaningful. Moreover, win statistics can incorporate multiple endpoints of same or mixed data types (e.g., time-to-event, ordinal, …), can handle repeated events, semi-competing risks, and non-proportional hazards situations.
The win ratio and the stratified win ratio (Dong et al., 2018) have been applied in the design and analysis of Phase III clinical trials, and have supported regulatory approvals, such as tafamidis and Attruby, respectively. The win odds has also been applied in practice.
The following are the outlines of the short course.
Part 1: Introduction and Theoretical Foundations
1. Introduction of win statistics
1.1. Motivation examples and issues of conventional time-to-first-event analyses
1.2. Win ratio, net benefit, and Finkelstein-Schoenfeld test
1.3. Mann-Whitney parameter
1.4. Win odds
2. Point and variance estimators
3. Complement of win statistics
Part 2: Advanced Concepts and Methods
4. Impact of follow-up time and censoring, and IPCW adjustment
4.1. Impact of follow-up time and censoring
4.2. IPCW (inverse-probability-of-censoring weighing) adjustment
4.3. Other adjustments
4.4. Use of win statistics under non-proportional hazards
Break
Part 2: Advanced Concepts and Methods (continued)
5. Stratified win statistics and handing of noncollapsibility
6. Regression analyses
7. Sample size and power calculations
Part 3: Applications and Practical Considerations
8. Applications
8.1. Cardiovascular trials (focusing on the ATTRibute-CM trial)
8.2. COVID-19 trials
8.3. Pediatric benefit-risk
8.4. Evidence synthesis of efficacy outcomes in oncology trials
8.5. Other applications
9. Regulatory perspective of win statistics
10. Software
11. Limitations and advantages of win statistics
12. Summary
Key references:
Finkelstein and Schoenfeld (1999, 2019); Buyse (2010); Pocock et al. (2012); Dong et al. (2016, 2018, 2020a, 2020b, 2020c, 2021, 2023a, 2023b, 2024); Luo et al. (2015); Bebu and Lachin (2016); Oakes (2016); Peng (2020); Brunner, Vandemeulebroecke, and Mütze (2021); Mao et al. (2021, 2022, 2023; 2024); Gasparyan et al. (2021 and 2022); ); Yu and Ganju (2022); Matsouak (2022); Yang et al. (2022); Cui, Dong, Kuan, and Huang (20223); Seifu et al. (2023); Wang, Zhou, Zhang, Kim et al. (2023); Maurer et al. (2018); Redfors et al. (2020); Lopes et al. (2021); Voors et al. (2022); Romiti et al. (2023); Weatherald et al. (2023); Kondo et al. (2023); Freund et al. (2023); Gregson et al. (2023); Barnhart et al. (2024); Pocock et al. (2024); Gillmore et al. (2024).
Short Bio: Gaohong Dong, PhD, has 20 years of experience in the pharmaceutical industry. He is a Director of Biostatistics at Sarepta Therapeutics. Prior to joining Sarepta, he worked at BeiGene and Novartis. Additionally, he worked as a consultant under his own entity of iStats Inc. Gaohong has been supporting drug development in multiple therapeutic areas including rare diseases, solid organ transplant, stem-cell transplant, infection diseases, and oncology. He is a co-author of many highly cited medical papers in transplant. Gaohong is deeply passionate about statistical research. He published peer-reviewed statistical journal papers and book chapters on Bayesian-Frequentist design, adaptive design, missing data imputation, meta-analysis, and composite of prioritized multiple outcomes. In recent years, his research has focused on the win statistics (win ratio, win odds, and net benefit). His research of the stratified win ratio and the win odds have been applied to the design and analysis of clinical trials, including many Phase III studies across multiple disease areas. Notably, the stratified win ratio (Dong et al., 2018) is the primary analysis for the ATTRibute-CM trial, which is the base for the FDA approval of Attruby in November 2024. Gaohong has been an Associate Editor of the Journal of Biopharmaceutical Statistics since 2017, and has served on the Scientific Program Committees for several major statistical conferences such as Regulatory-Industry Statistics Workshop (RISW), ICSA Applied Statistics Symposium, and Statistics in Pharmaceuticals in recent years.
Category: Methodology
Bayesian Machine Learning Methods for Partially Observed Data:
Instructors: Sujit Ghosh, NC State University;
Time: Afternoon Session
Target Audience: graduate students, postdoctoral fellows, early career researchers
Prerequisites for participants: graduate course in mathematical statistics and probability theory
Computer and software requirements: R and JAGS (both available freely on the web)
This short course offers an in-depth exploration of Bayesian Machine Learning (ML) techniques designed to address challenges in data analysis involving data irregularities (e.g., missing values and censored observations). Drawing on methods from the forthcoming second edition of the book titled Bayesian Statistical Methods (co-authored by the instructor), the course will feature cutting-edge methodologies tailored for modern data challenges. The classical ML methods often become difficult to use when faced with data irregularities. The Bayesian hierarchical modeling framework provides seamless integration of imputations (via posterior predictive distributions) and parameter estimation for predictive analytics. Key topics include: (i) High-dimensional regression using Bayesian shrinkage priors (e.g., spike and slab and horseshoe) ; (ii) Non-parametric regression with shape constraint; (ii) Advanced Monte Carlo methods for efficient posterior sampling (e.g., slice, MALA etc.) and (iv) Causal inference for partially observed data (optional and time permitting). Through hands-on examples, participants will learn to implement these techniques on diverse datasets, gaining skills to manage and extract insights from complex, irregular, and high-dimensional data. The course is ideal for researchers, data scientists, and statisticians eager to enhance their understanding of Bayesian methods in practice.
The following are the outlines of the short course.
Teaching Plan for Bayesian Machine learning Methods for Partially Observed Data
Duration: 4 hours (1:30 – 5:30 pm with 15-min break))
Format: Lecture, hands-on coding sessions, and Q&A
Schedule:
Introduction and Overview (25 minutes)
(i) Overview of the course content and objectives.
(ii) Brief introduction to Bayesian approaches in machine learning.
(iii) Tools: R and relevant packages (e.g., rjags, rstan, brms).
1. High-Dimensional Regression with Bayesian Shrinkage Priors (45 minutes)
(i) Introduction to Bayesian shrinkage priors (e.g., spike and slab, horseshoe priors).
(ii) Application to high-dimensional regression problems with missing values.
(iii) Hands-on: Fitting models using R packages (e.g., bas, boomspikeslab, rjags)
Objective: Identify relevant predictors in datasets with high-dimensional covariates.
2. Posterior Summaries of Simulation Based Methods (45 minutes)
(i) Interpreting and summarizing posterior distributions using MC samples.
(ii) Overview of tools for visualization and interpretation.
(iii) Hands-on: Visualization techniques using various plotting packages
Objective: Communicate insights effectively using Bayesian output.
Break (15 minutes)
3. Non-Parametric Regression Techniques (50 minutes)
(i) Generalized Additive Models and Gaussian Process Regression.
(ii) Applications of non-parametric smoothing with shape constraint.
(iii) Hands-on: Fitting generalized additive models using R packages
Objective: Model nonlinear relationships and smooth trends in complex datasets.
4. Elements of Causal Inference for Partially Observed Data (45 minutes)
(i) Bayesian approaches for causal inference under missingness.
(ii) Handling confounding and partial observability.
(iii) Hands-on: Implementing Bayesian causal models using various R packages
Objective: Address causal questions in datasets with incomplete observations.
Closing Q&A and Wrap-Up (15 minutes)
Summary of key takeaways.
Open forum for questions and feedback.
Resources for further learning, including the upcoming second edition of Bayesian Statistical Methods.
Software and Tools
R Packages:
Modeling: bas, BNPqte, boomspikeslab, brms, CausalBNPBook, mgcv, rjags, rstan
Visualization: bayesplot, posterior.
Participants are expected to have R and RStudio installed before the session (https://posit.co/download/rstudio-desktop/). This schedule ensures participants gain both conceptual understanding and practical experience, tailored to the constraints of a half-day format.
Short Bio: Dr. Sujit Ghosh is a Professor in the Department of Statistics at NC State University, where he has dedicated over three decades to advancing statistical methodology and applications. His expertise lies in the analysis of biomedical and environmental data, and he has significantly contributed to the field through his teaching, research, and mentorship. Prof. Ghosh has authored over 150 refereed journal articles, addressing statistical challenges in biomedical sciences, environmental studies, econometrics, and engineering. He co-authored the widely acclaimed textbook Bayesian Statistical Methods, first published in 2019 (and scheduled to release its second edition in 2025). A celebrated mentor and educator, he received the D.D. Mason Faculty Award in 2023 and the Cavell Brownie Mentoring Award in 2014. Most recently, in 2023, he received Distinguished Alumni Award given by the Department of Statistics at UConn. Renowned for his engaging teaching style, Prof. Ghosh has delivered over 200 invited lectures and seminars at prestigious conferences and institutions worldwide. He has also offered numerous short courses and has been a visiting professor at leading global universities. His dedication to advancing statistical science continues to inspire both students and colleagues in the field.
Category: Methodology, Career development
Statistical Inference in Large Language Models:
Instructors: Weijie Su, University of Pennsylvania; Qi Long, University of Pennsylvania; Xiang Li, University of Pennsylvania
Time: Afternoon Session
Target Audience: PhD students and faculty who are interested in generative AI
Prerequisites for participants: NA
Computer and software requirements: NA
Large Language Models (LLMs) have recently stood out as revolutionary AI tools for processing data in the form of text. However, when harnessing their potential for statistical decision-making, it becomes essential to understand the risks of their outputs. Evaluating the uncertainty and confidence levels associated with LLMs presents both challenges and intriguing opportunities for today’s statisticians. The aim of this one-day short course is to equip statisticians with the skills to integrate inferential concepts into the applications and advancement of LLMs. Course topics include: 1) a brief introduction to the fundamentals of LLMs, tailored for those new to transformers and deep learning; 2) a primer on statistical inference techniques specifically for text data using LLMs; and 3) in-depth exploration of LLM applications in medical domains and the broader data science field. By the end of the course, attendees will possess the skills needed to empower LLMs with statistical inference. While this course promises a deep and enriching dive into the confluence of statistics and advanced AI, no prior knowledge of LLMs is required.
The following are the outlines of the short course.
Take the morning session as an example. The time arrangement would be similar for the afternoon session.
Morning Half-Day Course (8:30 a.m. – 12:30 p.m.)
Session 1: Understanding and Building LLM Foundations
8:30 a.m. – 10:15 a.m. (1 hour 45 minutes)
1. Understand the Evolution and Significance of LLMs
– Recognize the evolution and importance of Large Language Models in AI and data processing.
2. Grasp LLM Architectures and Mechanics
– Describe core principles and architectures of LLMs, including transformers and attention mechanisms.
– Learn how text data is processed, tokenized, and embedded in LLMs.
Break
10:15 a.m. – 10:30 a.m. (15 minutes)
Session 2: Challenges, Applications, and Ethics in LLMs
10:30 a.m. – 12:30 p.m. (2 hours)
1. Identify Challenges in LLMs
– Pinpoint common challenges and limitations such as overfitting and bias.
– Appreciate the importance of critically evaluating model outputs.
2. Analyze Real-World LLM Applications
– Evaluate the use of LLMs in healthcare settings, such as processing clinical notes and predicting patient outcomes.
– Identify other applications, including sentiment analysis and recommendation systems.
3. Navigate Ethical and Responsible Use of LLMs
– Recognize potential biases in medical text data and broader implications in data science.
– Advocate for the fair, ethical, and responsible use of LLMs across various domains.
Short Bio: Weijie Su is an Associate Professor in the Wharton Statistics and Data Science Department and, by courtesy, in the Departments of Computer and Information Science and Mathematics at the University of Pennsylvania. He is a co-director of Penn Research in Machine Learning (PRiML) Center. Prior to joining Penn, he received his Ph.D. in Statistics from Stanford University in 2016 and a bachelor’s degree in Mathematics from Peking University in 2011. His research interests span the statistical foundations of generative AI, privacy-preserving machine learning, high-dimensional statistics, and optimization. He serves as an associate editor of the Journal of Machine Learning Research, Journal of the American Statistical Association, Foundations and Trends in Statistics, and Operations Research, and he is currently guest editing a special issue on Statistics for Large Language Models and Large Language Models for Statistics in Stat. His work has been recognized with several awards, such as the Stanford Anderson Dissertation Award, NSF CAREER Award, Sloan Research Fellowship, IMS Peter Hall Prize, SIAM Early Career Prize in Data Science, ASA Noether Early Career Award, and the ICBS Frontiers of Science Award in Mathematics.
Qi Long, PhD, is a Professor of Biostatistics, Computer and Information Science, and Statistics and Data Science at the University of Pennsylvania. The current focus of his research lab is to advance responsible, trustworthy statistical and ML/AI methods for equitable, intelligent medicine, with a particular focus on multi-modal and generative AI such as large language models (LLMs). His methods research has been supported by NIH, PCORI, NSF, and ARPA-H. He is an Executive Editor of Statistical Analysis and Data Mining. He is an elected Fellow of AAAS, ASA, and AMIA.
Xiang Li is a postdoctoral researcher at the University of Pennsylvania, collaborating with Prof. Qi Long and Prof. Weijie Su. He received his Ph.D. in 2023 and B.S. in 2018 from the School of Mathematical Sciences at Peking University. His research lies at the intersection of statistics, stochastic optimization, and machine learning, with a recent focus on large language models. During his Ph.D., he made significant contributions to federated learning, stochastic approximation, online decision-making, and online statistical inference. His work has been featured at leading machine learning conferences, including ICML, ICLR, and NeurIPS, as well as in top journals such as JMLR and AOS.
Category: Methodology
An Outstanding Supervisor: Leading for Motivation, Innovation, and Retention:
Instructors: Claude Petit (Astellas Pharma) in collaboration with the Leadership in Practice Committee (LiPCom) of the Biopharmaceutical section of the ASA.
Time: Afternoon Session
Target Audience: If you lead a team or are considering a supervisory role, this course is for you.
This short course will bring to life the foundational concepts for becoming the ideal supervisor. Attendees will gain a deeper understanding of the essential leadership competencies that will empower them to grow a mentee or direct report, thus enabling them, in turn, to reach their full potential as well. The rewards of this development will cascade through the organization. Participants will learn and understand the expectations and behaviors necessary for becoming a supervisor for whom employees will want to work, increasing their team productivity through an elevated level of engagement. Engagement and fulfillment of employees is achievable when they feel motivated, are challenged to be the best they can be and are able to accomplish more than they thought they could. This course will consist of lecture, videos, and interactive panel discussions where participants will hear from seasoned and successful leaders about how they have learned from their experiences and developed tips and tricks for growing their supervisory skill set. Finally, participants will learn how to measure the right outcomes for enabling sustained growth in this dimension. It is said that employees do not leave companies, they leave supervisors. While many other leadership courses provide advice to statisticians, statistical analysts, and data scientists on how to be effective leaders, this course focuses on the critical role supervisors/professors/advisors play in their employees’ journeys to becoming strong leaders as well as individuals who propose and drive innovative ideas/solutions and effectively implement them. Strong supervisors, model desired employee behaviors, act as sponsors as well as mentors, contribute to their employees’ career satisfaction, support their employees’ work/life balance and generally retain good employees. If you are currently leading a team, managing a group, or considering a supervisory role, this course will help you be more effective.
This short course is being offered in collaboration with the Leadership in Practice Committee (LiPCom) of the Biopharmaceutical section of the ASA.
The following are the outlines of the short course.
This short course will bring to life the foundational concepts for becoming the ideal supervisor. Attendees will gain a deeper understanding of the essential leadership competencies that will empower them to grow a mentee or direct report, thus enabling them, in turn, to reach their full potential as well. The rewards of this development will cascade through the organization. Participants will learn and understand the expectations and behaviors necessary for becoming a supervisor for whom employees will want to work, increasing their team productivity through an elevated level of engagement. Engagement and fulfillment of employees is achievable when they feel motivated, are challenged to be the best they can be and are able to accomplish more than they thought they could. This course will consist of lecture, videos, and interactive panel discussions where participants will hear from seasoned and successful leaders about how they have learned from their experiences and developed tips and tricks for growing their supervisory skill set. Finally, participants will learn how to measure the right outcomes for enabling sustained growth in this dimension. It is said that employees do not leave companies, they leave supervisors. While many other leadership courses provide advice to statisticians, statistical analysts, and data scientists on how to be effective leaders, this course focuses on the critical role supervisors/professors/advisors play in their employees’ journeys to becoming strong leaders as well as individuals who propose and drive innovative ideas/solutions and effectively implement them. Strong supervisors, model desired employee behaviors, act as sponsors as well as mentors, contribute to their employees’ career satisfaction, support their employees’ work/life balance and generally retain good employees. If you are currently leading a team, managing a group, or considering a supervisory role, this course will help you be more effective.
Short Bio: Claude Petit earned her PhD in Biostatistics, concurrent with a medical degree in 1999 from the University of Kremlin Bicêtre (France) where she studied under Prof. Jean Maccario, employing Bayesian methods as applied to clinical trials, specifically involving the study and treatment of schizophrenia. She served as Adjunct Professor in Mathematics & Statistics at the University of Grenoble (1999), Medical University of Paris (2004), and at Ecole Nationale de la Statistique et d’Administration Informatique (ENSAI), she has been a lecturer at the Yale School of Public Health between 2012 and 2024.
Working in the field of statistics since 1994, Dr. Petit has worked at Sanofi-Aventis (formerly Rhone Poulenc Rorer); ESCLI (CRO); Laboratoires Servier; as well as Lincoln (CRO). She joined Boehringer Ingelheim, France as Biostatistics and Programming Head in 2004. After her move to the US in 2007, she served as Executive Director of Biostatistics and then Vice President of Biostatistics and Data management with Boehringer Ingelheim till July 2021. Currently, VP Statistical and Real World Data Science at Astellas, Claude is leading a global team of talented Statisticians and Programmers in US, Europe, Japan and China.
Eternal learner, she has a passion for leadership, growth and teaching. In 2021, she became a certified Executive Coach and funded Creating & Coaching Essential Leaders, LLC to empower one woman at a time.