My research interests are in scalable Markov chain Monte Carlo (MCMC) algorithms. A common problem when fitting complex models is overfitting, and Bayesian methods provide a natural solution to this. But MCMC, one of the most popular methods for fitting Bayesian methods, does not scale well with the dataset size. As dataset sizes have been increasing, the need to improve the scalability of MCMC has become clear.

## Jack Baker

I'm currently a third year PhD student at the STOR-i Doctoral Training Centre at Lancaster University. My PhD is in collaboration with the University of Washington. I work on scalable Markov chain Monte Carlo methods under the supervision of Professor Paul Fearnhead, Dr Christopher Nemeth and Professor Emily Fox at the University of Washington.

### Research

#### Publications

- J. Baker, P. Fearnhead, E. B. Fox and C. Nemeth (2018), Large-Scale Stochastic Sampling from the Probability Simplex. Submitted. ArXiv. Code
- J. Baker, P. Fearnhead, E. B. Fox and C. Nemeth (2017), sgmcmc: An R Package for Stochastic Gradient Markov Chain Monte Carlo. Journal of Statistical Software. Accepted. ArXiv.
Code - J. Baker, P. Fearnhead, E. B. Fox and C. Nemeth (2017), Control Variates for Stochastic Gradient MCMC. Statistics and Computing. Accepted. ArXiv. Code

#### Software

#### Unpublished Technical Reports

- Comparison of Markov Chain Monte Carlo Algorithms for Large Datasets Article.

#### Talks, Tutorials and Awards

**Talks**

Warwick University (2018). Invited Speaker. Algorithms and Computation Group.

Afternoon on Bayesian Computation 2018: Control Variates for Stochastic Gradient MCMC

University of Edinburgh (2017). Invited Speaker.

RSC 2016: A Comparison of MCMC for Big Data

Afternoon on Bayesian Computation 2016: A Comparison of MCMC for Big Data

**Grants**

STOR-i Research Fund, research visit to the University of Washington, Summer 2017

Travel and accommodation grant, data science consulting week, Alan Turing Institute, Spring 2017

Young researchers travel grant, MCMSki, Spring 2016

**Posters**

ISBA 2018: Control Variates for Stochastic Gradient MCMC

BayesComp 2018: Control Variates for Stochastic Gradient MCMC

i-like Workshop 2016: A Comparison of MCMC for Big Data

MCMSki 2016: A Comparison of MCMC for Big Data

**Tutorials**

Lancaster University Mathematics Summer School

School Impact Sessions: Lancaster Royal Grammar School

STORC: Linux Terminal Tools for the STOR-i Computing Group.

STORC: An Overview of Coding Tools for the STOR-i Computing Group.

CSML: A Tutorial on STAN for Lancaster University CSML Group.

STORC: A Tutorial on using Git for Version Control for the STOR-i Computing Group.

STORC: An Overview of Useful Linux Command Line Tools for the STOR-i Computing Group.

#### Masters Research Projects

**Lancaster University:**

*Computational statistics for big data***Lancaster University:**

*Exact approximate MCMC***Lancaster University:**

*Artificial neural networks and deep learning*### About Me

#### Academic & Technical Background

I'm a second year PhD student at the STOR-i CDT at Lancaster University. I completed my masters degree with distinction in statistics and OR as part of the CDT. Here I did a number of projects on Markov chain Monte Carlo, and that's where my interest in these methods really began. During the summer of the first year of my PhD I took a data science internship at Improve Digital in Amsterdam.

I completed my undergraduate degree at the University of Edinburgh, where I was awarded a First Class Honours BSc in Mathematics and Statistics. During my time at Edinburgh, I had a couple of summer internships. One at the Met Office, where I modelled the performance and usage of the MASS storage system, which stores the Met Office's climate data. Another at Motorola Solutions, where I developed an alternative compression algorithm in C for the data logs on Motorola's wireless products.

Email: j.baker1 [at] lancaster.ac.uk

### STOR-i

#### About STOR-i

I am currently enrolled in STOR-i, a Doctoral Training Centre based at Lancaster University. It offers a four year PhD programme that focuses on research at the interface between Statistics, Operational Research and industry.

The programme is developed and delivered with industrial partners, and offers the chance to perform research with considerable collaboration with industry. I joined STOR-i in the first year of intake since it was awarded a new round of funding by EPSRC. The funding will allow 60 new students to join STOR-i over the course of the next 5 years.