Probability refresher

The goal of probability theory is to quantify uncertainty. Once an event occurs, the outcome is known (e.g. it rained today or it didn’t). However, before an event happens, probability...

Generalized Linear Models (GLMs)

Generalized Linear Models (GLMs) are a broad class of models that extend linear regression to handle a variety of outcome variables, each potentially having a different distribution and link function....

Logistic regression: theory review

Logistic regression is a model used to predict a binary outcome (dependent variable) based on a set of features (independent variables). For example, it could be applied to customer browsing...

Linear regression: theory review

Linear regression is a model for predicting a continuous outcome (dependent variable), from a set of features (independent variable/s). For example, you could use it on past home sales data...

Trustworthy online experiments

Some time ago I read the book Trustworthy Online Controlled Experiments: a practical guide to A/B testing by Kohavi, Tang, and Xu. I was pleasantly surprised by how practical it...

Interviewing for US based DS jobs

Some notes on my experience interviewing for Data Scientist jobs in the US in 2021. For context, I was based in the Netherlands working as a Data Scientist and required...

Statistical power: concept and calculations

The statistical power of a test is simply the chance of detecting an effect, when one exists. The concept may be more familiar in the context of a medical test....

Metric validation for AB testing

In AB testing, the t-test is one of the most commonly used tests. However, if the assumptions are not met, the results are not valid. A key assumption we make...

Confidence intervals

The confidence interval quantifies the uncertainty of a sample estimate. When we estimate a population parameter with a sample statistic, it’s unlikely it equals the population value exactly. For example,...

Non-inferiority testing

Non-inferiority tests are just one-sided tests with a margin, but they are quite useful in experimentation. For example, let’s say you’re adding a new feature to your web shop. You’re...

What is a p value, and how it relates to error rates?

The p-value might seem simple at first, but the definition tends to confuse people. Formally, the p-value is the probability of obtaining a result at least as extreme as the...

Hypothesis testing: Two sample tests for proportions

This post covers the most commly used statistical tests for comparing a binary (success/failure) metric in two independent samples. For example, imagine we run an A/B experiment where our primary...

Hypothesis testing: Two sample tests for continuous metrics

This post covers the statistical tests I use most often for comparing a continuous metric in two independent samples. In general, I recommend using Welch’s t-test, and if the assumptions...

Weighted statistics and the t-test

Sometimes the sample data we have doesn’t represent the population well. For example, maybe you run a survey and the response rate is higher for males than females. If the...

Population estimates: one sample standard errors and confidence intervals

Often we calculate point estimates for a population based on a sample of data (e.g. the mean). How confident should we be in those estimates? Well, one quantifiable source of...