About me

I’m a data science professional and open-source developer with expertise in healthcare, economics and statistical inference. I have experience with data science projects in sectors such as e-commerce, education & healthcare. 👛📚🩺 I have a quantitative background, having studied MPhil Population Health Sciences (Health Data Science stream) at the University of Cambridge and BSc Economics and the LSE. 🎓

I’m an evangelist for data - I like speaking about data (see talks below), writing about data projects for and sharing resources about data! ⚗

I have experience working in public companies, start-ups and consultancies. I tackle projects from multiple perspectives, enabled by the breadth of my experience: I’ve worked on projects for the public, private & third sectors; developed proprietary and open-source software; experience of traditional education & alternative education; experience of full-time employment & contracting.

These are data science packages and apps I’ve developed:

  • appelpy : Python package for easier regression modelling
  • obsidiantools : Python package for analysing Obsidian.md knowledge vaults. I gave talks at PyCon UK and Portugal in 2022 about the package. I also developed NLP solutions that automated some of my knowledge management workflows, which were applied to all my MPhil study notes (150k+ word corpus).

Domain knowledge and expertise

  • Statistical inference: e.g. A/B testing and experimentation; causal inference
  • Machine learning
  • Product analytics on variety of data: e.g. user journey optimisation; B2B data; advertising data; textual data
  • Business intelligence: developing analytics strategy at enterprise level and supporting analysts’ skills development
  • Healthcare: e.g. epidemiology, genetics, genomics, public health
  • Economics and econometrics

Tech stack

  • Python: analytics & statistics packages (e.g. Pandas, Statsmodels, PyMC3), applied machine learning, package development
  • R, including R Shiny
  • Comfortable with the major OSs: my main commercial experience with Mac, but I use Linux and Windows for personal projects
  • Business intelligence: Looker; Tableau
  • Product analytics: Mixpanel
  • Databases: primarily BigQuery (standard SQL)
  • Reverse ETL and marketing data activation: Hightouch
  • App deployment: e.g. Heroku
  • Statistical software: Stata; SPSS
  • Jupyter and JupyterLab, on local and virtual machines
  • Tech for reproducible data science: e.g. Binder; GNU Make

I’ve worked in agile teams, with philosophies such as Extreme Programming and Test-driven Development (TDD).

Also have exposure to: Airflow; Docker; MLflow; Kubernetes; Terraform

Talks 🎤

Here is content for my data science talks:

  • PyCon 2022 talks :: Connecting those thoughts: Personal knowledge management with Python
  • PyData London Meetup (March 2020 - postponed) :: Publishing Your First Project on PyPI
  • PyData London 2019 :: On the Path to Causal Inference
    • Video & Slides
    • A primer on causal inference applied to problems in travel technology. What is causal inference, why is it important and how can we apply it to projects?