Katherine Kempfert

Fourth-Year PhD Student in Statistics

University of California Berkeley


I am a fourth-year PhD student in statistics at the University of California Berkeley, where I am advised by Dr. Christopher Paciorek. Broadly, my research involves combining classical time series methods and machine learning approaches for improved prediction, generation, intepretation, and uncertainty quantification. Specifically, I am applying generative long short-term memory (LSTM) neural networks to rainfall time series.

I am grateful to be supported by the Chancellor's Fellowship and National Science Foundation Graduate Research Fellowship.

Previously, I double majored in mathematics and statistics at the University of Florida. My undergraduate research projects in statistics had applications in areas ranging from computer vision to classical music, as well as public health.

Outside of statistics, my personal interests and hobbies include playing piano, creating art, enjoying the outdoors, and spending time with my cat.


  • Statistical Machine Learning
  • Applied & Computational Statistics
  • Time Series Analysis


  • PhD in Statistics, August 2019 - present

    University of California Berkeley

  • BS in Statistics and Mathematics, May 2019

    University of Florida

Research Experience


Generative Deep Learning for Rainfall Modeling

University of California Berkeley

Oct 2019 – Present Advisor: Dr. Christopher Paciorek
  • Fit generative long short-term memory (LSTM) neural networks on rainfall time series from weather stations across the United States; generated thousands of synthetic time series from these fits; and analyzed the results to detect and characterize climate change
  • Performed simulation studies to draw connections between state-space models and recurrent neural networks

Forecasting Dengue Fever in Brazil with Diverse Data Streams

Los Alamos National Laboratory

Jun 2018 – Aug 2019 Advisors: Dr. Carrie Manore, Dr. Geoffrey Fairchild, Dr. David Osthus, & Dr. Nidhi Parikh
  • Began the project by participating in the 10-week Parallel Computing Summer School and returned the next summer in the Information Systems & Modeling (A-1) research group
  • Forecasted dengue fever with high accuracy and confidence for all 27 states of Brazil using time series variables from heterogeneous data streams (doctors’ offices, weather stations, satellites, and Google Health Trends)
  • Systematically compared predictive performance among variants of SARIMA, vector autoregression, seasonal trend decomposition, and ensembles combining these methods; reached Pearson correlation coefficients (between observed and fitted values) of up to 96.44% for 2-week-ahead forecasting

Predicting Classical Composers with Musical Scores

University of Florida

Aug 2017 – May 2019 Advisor: Dr. Samuel Wong
  • Classified the composer of Haydn and Mozart string quartets based on musical scores and set benchmark results that exceed 85% leave-one-out classification accuracy
  • Developed novel features based on the sonata form that can be automatically computed and applied to other tasks in music information retrieval (MIR)
  • Provided model-based interpretations about Haydn and Mozart that could be relevant to musicologists

Nonlinear Dimension Reduction for Gender Classification via Faces

University of North Carolina Wilmington & University of Florida

May 2017 – Feb 2019 Advisors: Dr. Samuel Wong, Dr. Yishi Wang, & Dr. Cuixian Chen
  • Participated in the 10-week Statistical Data Mining & Machine Learning NSF-REU at the University of North Carolina Wilmington then continued research project for over two years
  • Developed a novel machine learning pipeline for the large face database Morph-II; classified over 55,000 photographs in Morph-II as picturing either a male or a female; and reached over 95% cross-validated accuracy (competitive with benchmark)
  • Compared the performance of kernel principal component analysis (KPCA), supervised KPCA, and kernel linear discrimnant analysis via simulation studies and results on Morph-II


Where Does Haydn End and Mozart Begin? Composer Classification of String Quartets

For centuries, the history and music of Joseph Franz Haydn and Wolfgang Amadeus Mozart have been compared by scholars. Recently, the …

A Comparison Study on Nonlinear Dimension Reduction Methods with Kernel Variations: Visualization, Optimization and Classification

Because of high dimensionality, correlation among covariates, and noise contained in data, dimension reduction (DR) techniques are …

Preliminary Studies on a Large Face Database

We perform preliminary studies on a large longitudinal face database MORPH-II, which is a benchmark dataset in the field of computer …


  • kempfert@berkeley.edu
  • 343 Evans Hall, Department of Statistics, University of California Berkeley, Berkeley, CA 94720

Curriculum Vitae

A recent copy of my CV can be found here.

Last updated September 15, 2022.