About me

I’m a data science professional and open-source developer with expertise in healthcare, economics and statistical inference. I have experience with data science projects in sectors such as e-commerce, education & healthcare. 👛📚🩺 I have a quantitative background, having studied MPhil Population Health Sciences (Health Data Science stream) at the University of Cambridge and BSc Economics and the LSE. 🎓

I’m an evangelist for data - I like speaking about data (see talks below), writing about data projects for and sharing resources about data! ⚗

I have experience working in public companies, start-ups and consultancies. I tackle projects from multiple perspectives, enabled by the breadth of my experience: I’ve worked on projects for the public, private & third sectors; developed proprietary and open-source software; experience of traditional education & alternative education; experience of full-time employment & contracting.

These are data science packages and apps I’ve developed:

appelpy : Python package for easier regression modelling
obsidiantools : Python package for analysing Obsidian.md knowledge vaults. I gave talks at PyCon UK and Portugal in 2022 about the package. I also developed NLP solutions that automated some of my knowledge management workflows, which were applied to all my MPhil study notes (150k+ word corpus).

Domain knowledge and expertise

Statistical inference: e.g. A/B testing and experimentation; causal inference
Machine learning
Product analytics on variety of data: e.g. user journey optimisation; B2B data; advertising data; textual data
Business intelligence: developing analytics strategy at enterprise level and supporting analysts’ skills development
Healthcare: e.g. epidemiology, genetics, genomics, public health
Economics and econometrics

Tech stack

Python: analytics & statistics packages (e.g. Pandas, Statsmodels, PyMC3), applied machine learning, package development
R, including R Shiny
Comfortable with the major OSs: my main commercial experience with Mac, but I use Linux and Windows for personal projects
Business intelligence: Looker; Tableau
Product analytics: Mixpanel
Databases: primarily BigQuery (standard SQL)
Reverse ETL and marketing data activation: Hightouch
App deployment: e.g. Heroku
Statistical software: Stata; SPSS
Jupyter and JupyterLab, on local and virtual machines
Tech for reproducible data science: e.g. Binder; GNU Make

I’ve worked in agile teams, with philosophies such as Extreme Programming and Test-driven Development (TDD).

Also have exposure to: Airflow; Docker; MLflow; Kubernetes; Terraform

Talks 🎤

Here is content for my data science talks:

PyCon 2022 talks :: Connecting those thoughts: Personal knowledge management with Python
PyData London Meetup (March 2020 - postponed) :: Publishing Your First Project on PyPI
PyData London 2019 :: On the Path to Causal Inference
- Video & Slides
- A primer on causal inference applied to problems in travel technology. What is causal inference, why is it important and how can we apply it to projects?