Linear regression (theory review)
Linear regression is a model for predicting a continuous outcome (dependent variable), from a set of features (independent variable/s). For example, you could use it on past home sales data...
Trustworthy online experiments
Some time ago I read the book Trustworthy Online Controlled Experiments: a practical guide to A/B testing by Kohavi, Tang, and Xu. I was pleasantly surprised by how practical it...
Interviewing for US based DS jobs
Some notes on my experience interviewing for Data Scientist jobs in the US in 2021. For context, I was based in the Netherlands working as a Data Scientist and required...
Statistical power: concept and calculations
The statistical power of a test is simply the chance of detecting an effect, when one exists. The concept may be more familiar in the context of a medical test....
Metric validation for AB testing
In AB testing, the t-test is one of the most commonly used tests. However, if the assumptions are not met, the results are not valid. A key assumption we make...
Confidence intervals
The confidence interval quantifies the uncertainty of a sample estimate. When we estimate a population parameter with a sample statistic, it’s unlikely it equals the population value exactly. For example,...
Non-inferiority testing
Non-inferiority tests are just one-sided tests with a margin, but they are quite useful in experimentation. For example, let’s say you’re adding a new feature to your web shop. You’re...
What is a p value, and how it relates to error rates?
The p-value might seem simple at first, but the definition tends to confuse people. Formally, the p-value is the probability of obtaining a result at least as extreme as the...
Hypothesis testing: Two sample tests for proportions
This post covers the most commly used statistical tests for comparing a binary (success/failure) metric in two independent samples. For example, imagine we run an A/B experiment where our primary...
Hypothesis testing: Two sample tests for continuous metrics
This post covers the statistical tests I use most often for comparing a continuous metric in two independent samples. In general, I recommend using Welch’s t-test, and if the assumptions...