Katherine Kempfert

First Year PhD Student in Statistics

University of California Berkeley


I am a first year PhD student in statistics at the University of California Berkeley. My research interests involve using statistical machine learning methods to address applied problems.

Previously, I double majored in mathematics and statistics at the University of Florida. My undergraduate research projects in statistics had applications in areas ranging from computer vision to classical music, as well as public health.

Outside of statistics, my personal interests and hobbies include music, art, hiking, and spending time with my cats.


  • Statistical Machine Learning
  • Applied & Computational Statistics
  • High-dimensional data


  • PhD in Statistics, August 2019 - present

    University of California Berkeley

  • BS in Statistics and Mathematics, May 2019

    University of Florida

Research Experience


Forecasting Dengue Fever in Brazil with Diverse Data Streams

Los Alamos National Laboratory

Jun 2018 – Aug 2019 Advisors: Dr. Carrie Manore, Dr. Geoffrey Fairchild, Dr. David Osthus, & Dr. Nidhi Parikh
  • Began the project by participating in the 10-week Parallel Computing Summer School and returned the next summer in the Information Systems & Modeling (A-1) research group
  • Forecasted dengue fever with high accuracy and confidence for all 27 states of Brazil using time series variables from diverse data streams (doctors’ offices, weather stations, satellites, and Google Health Trends)
  • Systematically compared predictive performance among variants of SARIMA, vector autoregression, seasonal trend decomposition, and ensembles combining these methods; reached Pearson correlation coefficients (between observed and fitted values) of up to 96.44% for 2-week-ahead forecasting

Predicting Classical Composers with Musical Scores

University of Florida

Aug 2017 – May 2019 Advisor: Dr. Samuel Wong
  • Classified the composer of Haydn and Mozart string quartets based on musical scores and set benchmark results that exceed 85% leave-one-out classification accuracy
  • Developed novel, musically sophisticated features that can be calculated from musical scores and applied to other music classification tasks
  • Generated insights of interest to musicologists and historians through statistical interpretation of results (via feature selection and estimated coefficients in the final logistic regression model)

Nonlinear Dimension Reduction for Gender Classification via Faces

University of North Carolina Wilmington & University of Florida

May 2017 – Feb 2019 Advisors: Dr. Samuel Wong, Dr. Yishi Wang, & Dr. Cuixian Chen
  • Participated in the 10-week Statistical Data Mining & Machine Learning NSF-REU at the University of North Carolina Wilmington then continued research project for over two years
  • Developed a novel machine learning pipeline for the large face database Morph-II; classified over 55,000 photographs in Morph-II as picturing either a male or a female; and reached over 95% cross-validated accuracy (competitive with benchmark)
  • Compared the performance of kernel principal component analysis (KPCA), supervised KPCA, and kernel linear discrimnant analysis via simulation studies and results on Morph-II

Publications & Submitted Works

A Comparison Study on Nonlinear Dimension Reduction Methods with Kernel Variations: Visualization, Optimization and Classification

Because of high dimensionality, correlation among covariates, and noise contained in data, dimension reduction (DR) techniques are …

Where Does Haydn End and Mozart Begin? Composer Classification of String Quartets

For humans and machines, perceiving differences between string quartets by Joseph Haydn and Wolfgang Amadeus Mozart has been a …

Preliminary Studies on a Large Face Database

We perform preliminary studies on a large longitudinal face database MORPH-II, which is a benchmark dataset in the field of computer …


  • kempfert@berkeley.edu
  • 343 Evans Hall, Department of Statistics, University of California Berkeley, Berkeley, CA 94720