Type i error increase with sequential testing

4/24/2023

In such an argument, we examine what would have happened if something was true, until we reach a contradiction, which allows us to declare it false.įirst, we chose a variable we want to measure our results by. Statistical significance is useful in quantifying uncertainty.īut how does it help us quantify uncertainty? In order to use it, we design and execute a null-hypothesis statistical test, which is a case of Reductio ad absurdum argumentation. It is precisely this uncertainty, coupled with our desire to have a reliable prediction about the future, that necessitates the use of statistical significance, which is a tool for measuring the level of uncertainty in our data. Most of the time you will notice distinctly different performance in the two groups, with the largest difference in the beginning and the smallest towards the end of however long you decide to run the test. If you want to get a good idea of how this variance can play with your mind, just run a couple of A/A tests and observe the results over time. Even without doing anything different to one group or the other, they will appear different. Its presence means that if we split our users into two randomly selected groups, we will observe noticeable differences between the behavior of these two groups, including on Key Performance Indicators such as e-commerce conversion rates, lead conversion rates, bounce rates, etc. This uncertainty is due to the natural variance of the behavior of the groups we observe. What we do is we measure a sample of the potentially infinitely many future visitors to a site and then use our observations on that sample to predict how visitors would behave in the future. In any such measurement of a finite sample, in which we try to gain knowledge about a whole population and/or to make prediction about future behavior, there is inherent uncertainty in both our measurement and in our prediction. In A/B testing we are limited by the time, resources and users we are happy to commit to any given test. However, in the real-world of online business, there are limitations we need to work with. In an ideal world, we would be all-knowing and there would be no uncertainty, so experimentation will be unnecessary. However, going the scientific way, you can estimate the effect of your involvement and basically predict the future ( isn’t that the coolest thing!). You can always choose to skip the science and go with your hunch, or to use just observational data. online controlled experiment, is the only scientific way to establish a causal link between our (intended) action(s) and any results we observe. This is where A/B testing comes into play, since an A/B test, a.k.a. However, knowing which actions lead to improvements is not a trivial task. In many online marketing / UX activities, we aim to take actions that improve the business bottom-line: acquire more visitors, convert more visitors, generate higher revenue per visitor, increase retention and reduce churn, increase repeat orders, etc. To explain it properly, we need to take a small step back to get the bigger picture, first. The first part where I explain the concept is theory-heavy by necessity, while the second part is more practically-oriented, covering how to choose a proper statistical significance level, how to avoid common misinterpretations, misuses, etc. This is not my first take on the topic, but it is my best attempt to lay it out in as plain English as possible: covering every angle, but without going into math and unnecessary detail. The concept of statistical significance is central to planning, executing and evaluating A/B (and multivariate) tests, but at the same time it is the most misunderstood and misused statistical tool in internet marketing, conversion optimization, landing page optimization, and user testing.

0 Comments

Type i error increase with sequential testing

Leave a Reply.

Author

Archives

Categories