This app features several simulators, accessible on the top navigation bar, or by clicking below:
Click on the blue text in each simulator to show additional content, including questions and visualizations.
This simulator was supported by Grant Number UL1 TR002377 from the National Center for Advancing Translational Sciences (NCATS). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH. This app and the demonstrations herein were developed by Ethan Heinzen, Dr. Tracey Weissgerber and Dr. Stacey Winham, except where other sources are cited.
Do you like what you see here? This app is part of an online course on data visualization and statistical analysis for small sample size studies. To learn more, check out Sample Size Matters: Misconceptions about Graphs and Statistical Analyses in Lab and Clinical Research.
App version 3.1.8
The data distribution is one factor that we consider when deciding which statistical test to use. When we have many observations, it is easy to determine the data distribution. As the sample sizes decreases, it becomes increasingly difficult or impossible to identify the data distribution. This tool allows you to visually examine how our ability to identify the data distribution changes with sample size.
Change the values of 'n' in the boxes to view samples of different sizes.
Population
In the first tab, you saw how our ability to visually identify the data distribution changes with sample size. One way to statistically determine whether the data distribution is normal is to use a normality test, such as Shapiro-Wilk, Kolmogorov-Smirnov, etc. (Show details >>) This tool allows you to look at how the percentage of samples that fail a Shapiro-Wilk normality test changes depending on the data distribution and the sample size.
The Shapiro-Wilk normality test is used here. The null hypothesis is that the data is from a normal distribution.
Expected Results
Visualizations 1 and 2 allow you to examine how summary statistics, such as the mean and standard deviation (SD), would change if you repeated the same experiment 100 times. Visualization 3 examines how the cumulative mean changes as your sample size increases. Explore these three visualizations to examine the relationship between sample size and the precision of summary statistics; then use what you've learned to answer the questions below.
The visualizations below show how your summary statistics might change if you repeated the same experiment over and over again. Most of the time we only perform one experiment.
The shaded area here represents the true population SD.
100 samples of size N are drawn from the distribution indicated in the drop down menu, and the resulting sample statistics from the second dropdown are computed. The red line represents the population mean or median. Adjust the sample sizes across the top of the table to examine how N impacts these statistics.
The terms "Corridor of Stability" and "Seas of Uncertainty" were borrowed from previous papers examining the stability of correlation coefficients
(Schönbrodt FD, Perugini M. At what sample size do correlations stabilize?. J Res in Personality 2013; 47(5):609-612. doi:10.1016/j.jrp.2013.05.009)
and effect sizes
(Lakens D, Evers ERK. Sailing From the Seas of Chaos Into the Corridor of Stability: Practical Recommendations to Increase the Informational Value of Studies. Perspectives on Psychological Science 2014; 9(3): 278-292. doi:10.1177/1745691614528520).
We encourage users to consult these papers for more information on how sample size affects the uncertainty surrounding different types of estimates.
Power is the probability that you will detect a significant effect if your hypothesis, also called the alternative hypothesis, is true. This simulator repeats the same experiment 100 times,under two different scenarios:
The simulator records the p-values for each of the 100 experiments; then creates a histogram showing the distribution of the p-values for each scenario. You can adjust power by changing the number in the power box. The red bar in the top panels shows you the percentage of p-values that are less than the significance level for each scenario. The zoomed in panels show you the distribution of p-values between 0 and the significance level.
If there is an effect (the alternative hypothesis is true), which p-values are more common: p<0.005 or 0.045
To answer this question using the simulator, show the lower set of graphs that present p-values between 0 and 0.05. Does your answer change if power is 80% vs. 50% vs. 20%?
How would you describe the shape of the p-curve in each of the following scenarios?
Use this simulator to explore different strategies for increasing power. The tool is comparing two independent groups using a 2 sample unpaired t-test. Within each section of the simulator, you'll be able to enter values for group A and group B. As in the previous tab, the simulator performs 100 experiments and creates a histogram of p-values for two different scenarios:
The first section examines the effect of sample size. The other factors that affect power are fixed and cannot be changed - we'll look at these factors in later sections. The second section examines changes in the effect size, or difference between the means. The third section examines changes in variability, or the standard deviation (SD) for each group. The fourth section allows you to adjust the sample sizes, difference between means (effect size) and variability (standard deviations) for each group simultaneously.
An effect size measures the magnitude of the difference between groups. Examples include the difference between group means, or the difference between group medians. The effect size can be measured as the raw difference, or as a standardized effect size, which scales the difference relative to the variability. An example of a commonly used standardized effect size measure is Cohen's D (d=difference in group means divided by the standard deviation); a value of d=0.2 standard deviations is a small effect, d=0.5 standard deviations is a medium effect, and d=0.8 standard deviations is a large effect.
This simulator draws a sample from two independent groups each of size n=25, from a population with a given true effect size (raw difference in means). The first tool allows you to visually assess how the effect size measures the difference between groups; you can adjust the raw difference in means in the boxes labeled 'Effect Size'. The second tool allows you to visually assess how the estimate of the effect size is related to the sample size; you can adjust the standardized difference in means in the boxes labeled 'Effect Size'.
What is effect size?
P-values are impacted by sample size and standard deviation in addition to effect size. This simulator allows you to explore the relationship between p-values and sample size, standard deviation, and effect size. Each figure displays the results of one experiment to compare two independent groups. The p-value of a two-sample independent t-test to compare the means for Group A and Group B is displayed in blue above the graph. The input boxes allow you to control the sample size, standard deviation or effect size. Values of the effect size, standard deviation, and sample size are displayed on the right.
Click "Show effects of sample size" below.
Click "Show effects of variability (standard deviation)" below
Click "Show effects of differences in means (effect size)" below.
After examining these three simulators, why should we avoid assuming that a smaller p-value means that we've found a larger effect?
In basic biomedical science, scientists routinely report p-values; however, they seldom report effect sizes. Why is this a problem? How might we interpret data differently if scientists focused on effect sizes, rather than p-values?
Effect Size | 0.5 |
---|---|
SD | 1.0 |
n per group | 25 |
---|---|
Effect Size | 0.5 |
n per group | 25 |
---|---|
SD | 1.0 |
When a study is underpowered, samples with significant results tend to overestimate effect size ("winner's curse"). This simulator allows you to examine the relationship between effect size, p-value, and sample size. The simulator repeats an experiment 100 times to compare two independent groups. The figure shows the estimates of the effect size (mean difference) plotted against the p-value from a two-sample t-test for each of these 100 experiments. The true effect size (mean difference) is 1.0, denoted by the red line. You can control the sample size for each group using the input boxes. The power of the test is displayed about each graph. You can select the "Show Histograms" option to see the distribution of p-values and effect sizes. Samples in red denote p < 0.05. Below the graph is a table to summarize the samples with p < 0.05 compared to those with p ≥ 0.05.
Publication bias is the idea that studies that show statistically significant results get published more easily than those that don't, which leads to overestimates in effect sizes reported in the literature. In this activity, 20 studies that investigate a specific hypothesis to compare the means of two groups are simulated, of various sample sizes. A standardized effect size is calculated, and if it's statistically significant with p < 0.05, the study is "published". If it's not statistically significant, the study gets published with the probability specified below. A meta-analysis is a technique used to combine the results across studies, and results are usually visualized using a "forest plot". The forest plot below displays the results from all studies, as well as the combined estimates from the meta-analysis of all the studies (in black) and only the published studies (in red) to illustrate the effect of publication bias. For more information on how to read this forest plot, please see this video explanation.
It's recommended that you reset to the defaults when you're done with each question.
How does sample size differ between:
How does the percent of significant studies differ by sample size?
How does the percent of published studies differ by sample size?
DISCLAIMER: The content on the site is NOT medical advice. Although some content may be provided by medical professionals, users acknowledge that access or use of the content does not create a provider-patient relationship and does not constitute medical advice, treatment, diagnosis or services of any kind. The information is provided for educational purposes only and as such is not a substitute for professional medical attention and treatment by medical professionals. Users are solely responsible and accept all liability resulting from use of the content and any related services or products.