The Absolute Essentials of Sample Size Analysis
Or: You too can be a statistical power guru
College of Health Sciences
Define statistical power analysis
Understand how effect size, alpha level, and sample size affect power
Perform simple power calculations
Use web resources to determine sample size requirements
From the Rubric:
Setting and Sample:
Describes the population from which the sample will be or was drawn.
Describes and defends the sampling method including the sampling frame used.
Describes and defends the sample size.
Describes the eligibility criteria for study participants.
Describes the characteristics of the selected sample.
The wrong thing to write in your proposal (or say at your defense)
“I chose n=15 because that was all I could get”
“I chose n=25 because I want to just get this done”.
What is an “adequate” sample size?
Many refer to the Central Limit Theorem which tells us that, for sample sizes larger than about 30, the shape of the original distribution doesn’t matter, since the shape of the sampling distribution will approach normality.
Thus, many feel that 30 is an adequate sample size. We will see that this is, in most cases, erroneous.
What is Statistical Power?
Statistical Power is the probability that a given statistical test will detect a real treatment effect
The question of interest is: How large must my sample be to ensure a reasonable likelihood of detecting a difference if it really exists in the population?
Why do we care about statistical power?
If we don’t have a reasonable chance of detecting a real treatment effect, then there probably aren’t compelling reasons to do the study
High statistical power helps improve the chances that our findings are not only due to chance
What is true in the Population
Treatments Have No Treatment Has An
You Determine There is NO
Effect Correct Conclusion Type II Error
You Determine There is A Type I Error Correct Conclusion
Treatment Effect () (1-)
Statistical Power Standards
The generally accepted value for power is .80 (80%). This means that we have to show that, give our sample size, we can expect to find a real treatment effect (or mean difference) 80% of the time. In other words, if I repeat this study 100 times, I should reject the null hypothesis 80 times if there is indeed an effect
Another way to think about Statistical Power
If your statistical test has statistical power at .50 (50%), this means that your test will do no better and finding a real effect than tossing a coin. You are just as likely to guess the outcome by flipping a coin than by spending the time do collect and analyze data
Statistical power is essential. You need to be able to demonstrate that you have the conditions necessary to ensure your likelihood of detecting an effect
We will show the factors that influence power and how you can determine how large of a sample size you need
Statistical Power in Studies
In large grants where potentially millions of dollars are being spent, funders have a vested interest in requiring researchers to demonstrate adequate power
In dissertation research, while there are not millions of dollars at stake, the goal is to conduct original research that contributes to the literature
Sample size considerations are important if you are to demonstrate a contribution to the literature
How Do We Do It?
Most students are not experts; luckily, there are some ways to achieve this without having to buy expensive books
There are three things that influence power in a study
We will address each individually
Recall that alpha level (Type I error) is the chance that you will find a significant treatment effect when one doesn’t exist.
Recall also that we traditionally use two values: .05 and .01
When we choose a larger value for alpha, we expand the rejection region. If we expand this area, we provide more opportunities to reject the null hypothesis (correctly). Thus, larger values of alpha result in more power.
Alpha Level (Continued)
Since there really is no justification, typically, for using larger alpha levels, by convention, we use .05 (this means essentially that there is only a 5% chance that we will make a Type 1 error; that is, incorrectly rejecting the null hypothesis).
So, let us assume that alpha level is set at .05. The other factors we have control over are effect size and sample size.
The standard definition of effect size is:
Thus, for the simple two-group design, the effect size (which is just a measure of how large the statistical difference is) is given by the difference between the before and after treatment means divided by the average standard deviation.
There are two consequences of the effect size that affect statistical power
Effect Size (Continued)
The larger the mean difference (or the greater the change in the mean before and after the treatment), the more likely you are to detect this difference in the population.
Measures that are more sensitive (i.e., those with less random variability) will enhance your ability to detect an effect.
Cohen’s d is a popular measure of effect size. It’s exact formula is based on the t-statistic and is calculated as:
d = M1 – M2
Cohen specified the following effect sizes:
Medium: d=.50 to .80
Large: d > .80
Next Step: How do we find effect sizes?
In your research, you have to make your best guess as to what effect size you expect (Hallahan & Rosenthal, 1996)
Rely on previous research
Rely on what has been found in pilot work
Effect Sizes from Previous Research
Lipsey and Wilson (1993) provide effect sizes for a number of psychological, educational, and behavioral treatments
Lipsey, M.W., & Wilson, D.B. (1993). The efficacy of psychological, educational, and behavioral treatment: Confirmation from meta-analysis. American Psychologist, 48(12), 1181-1209.
Review the literature in your area to determine the magnitude of effect sizes found. Remember, you can usually do this extracting means and standard deviations from the article and computing a rough estimate.
In the absence of any other data, Cohen’s d can be used. You must make a sound judgment on the effect size you expect to obtain at best.
Most of the time, we assume we will find small or medium effect size
You can choose a number of effect sizes – for example, d-.2,.3,.4,.5 – and determine sample sizes accordingly.
Other Measures of Effect Size
Correlation Coefficients: Square of the Correlation
Multiple Regression: R2
ω2 – measure of effect size for analysis of variance (provides similar values to R2)
Small effect: ω2 < .06
Medium effect: .06 to .14
Large effect: ω2 > .14
Calculating Sample Size
Tables provided include those for independent and dependent t-tests, correlation, and 3-5 group ANOVA designs
Tables come from: http://fsweb.berry.edu/academic/education/vbissonnette/tables/tables.html
We will assume .05, 2-tailed tests (this is standard)
For each test, the effect size is across the top, and the power is listed down the left column. We can focus on Power=.80
Assume alpha=.05, 2-tailed tests
Determine the effect size of interest
Use tables to compute the sample size
I want to know the estimated sample size to compare two treatment programs for juvenile offenders on a measure of delinquency. I find the effect sizes in a number of studies to be: .17, .48, .25, .27. What should my sample size be to ensure 80% power?
I am interested in investigating how adolescents process information using simulated video games. There is not much literature in this area, and the one study I found suggested that those in the primed condition (M=6.7, SD=5.1) were more likely to interpret acts as aggressive than those who were in a control condition (M=3.5, SD=4.8). Using this study, what sample size would I need to perform a similar study of aggression and information processing?
I am interested in examining the relationship between western advertising and beer consumption in rural India. What sample size to I need to compare two towns (one that has had advertising and one that has no advertising)?
I am interested in examining the correlation between the amount of money spent purchasing lottery tickets and the scores on a measure of problem gambling. The literature estimates that I would expect a correlation of about .40. What sample size do I need to need to detect this correlation?
I am proposing a study to examine 3 different exercise conditions and their effect on anxiety: weight lifting, running, and tai chi. In my examination of the literature, the average R2 found in the studies was .10. How many people do I need to have in each group to ensure power = .80?
Many times, students want to do regressions. The reality is that multiple regression and analysis of variance are the same mathematically
We look at R2, the overall explained variance by variables in the model
Cohen suggests the following effect size definitions:
Small: R2 < .13
Medium: .13 – .26
Large: R2 > .26
The tables for multiple regression are more complicated (but not terribly difficult to use)
Tabachnick & Fidell (2001, p. 117) suggest the following “thumbrule” to compute sample size:
N >= (8/f2) + (m-1)
Where f2 = R2/(1- R2)
m=number of predictor variables
Example #6 (Multiple Regression)
If I am testing a model that has 3 predictors (anxiety, depression and number of siblings) of a dependent measure of alcohol consumption, and I conservatively estimate a small effect size, what sample size is required for this test?
Web-based Power Calculators
There are some websites that offer statistical power calculators. One example is:
This website offers options for calculating different problems.
Larger sample sizes are best for enhancing the ability to detect effects; however, sample sizes must be reasonable in size and cost effective
Sensitive measures help to enhance power by ensuring that the variability is small
Sample size determination is an essential component of your study that needs to be addressed. You can use power calculators, power tables from statistical books, or information from here; you just need to be explicit about where the information came from.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Mahwah, NJ: Lawrence Erlbaum.
Gravetter, F.J., & Wallnau, L.B. (2004). Statistics for the behavioral sciences (6th ed.). Belmont, CA: Thompson-Wadsworth.
Hallahan, M., & Rosenthal, R. (1996). Statistical power: Concepts, procedures, and applications. Behavior Research and Therapy, 34, 489-499.
Lipsey, M.W., & Wilson, D.B. (1993). The efficacy of psychological, educational, and behavioral treatment: Confirmation from meta-analysis. American Psychologist, 49(12), 1181-1209.
Murphy, K.R., & Myors, B. (1998). Statistical power analysis: A simple and general model for traditional and modern hypothesis tests. Hillsdale, NJ: Erlbaum.
Rossi, J. (1990). Statistical power of psychological research: What have we gained in 20 years? Journal of Consulting and Clinical Psychology, 58(5), 646-656.
Tabachnick, B.G., & Fidell, L.S. (2001). Using multivariate statistics (4th ed.). Needham Heights, MA: Allyn & Bacon.