The Absolute Essentials of Sample Size Analysis

The Absolute Essentials of Sample Size AnalysisOr: You too can be a statistical power guru

College of Health Sciences


Define statistical power analysisUnderstand how effect size, alpha level, and sample size affect powerPerform simple power calculationsUse web resources to determine sample size requirements

From the Rubric:

Setting and Sample: Describes the population from which the sample will be or was drawn. Describes and defends the sampling method including the sampling frame used. Describes and defends the sample size. Describes the eligibility criteria for study participants. Describes the characteristics of the selected sample.

The wrong thing to write in your proposal (or say at your defense)

“I chose n=15 because that was all I could get”“I chose n=25 because I want to just get this done”.

What is an “adequate” sample size?

Many refer to the Central Limit Theorem which tells us that, for sample sizes larger than about 30, the shape of the original distribution doesn’t matter, since the shape of the sampling distribution will approach normality.Thus, many feel that 30 is an adequate sample size. We will see that this is, in most cases, erroneous.

What is Statistical Power?

Statistical Power is the probability that a given statistical test will detect a real treatment effectThe question of interest is: How large must my sample be to ensure a reasonable likelihood of detecting a difference if it really exists in the population?

Why do we care about statistical power?

If we don’t have a reasonable chance of detecting a real treatment effect, then there probably aren’t compelling reasons to do the studyHigh statistical power helps improve the chances that our findings are not only due to chance

Statistical Decisions

  What is true in the Population Treatments Have No Treatment Has An Effect EffectYou Determine There is NO Effect Correct Conclusion Type II Error (1-) ()You Determine There is A Type I Error Correct ConclusionTreatment Effect () (1-)     

Statistical Power Standards

The generally accepted value for power is .80 (80%). This means that we have to show that, give our sample size, we can expect to find a real treatment effect (or mean difference) 80% of the time. In other words, if I repeat this study 100 times, I should reject the null hypothesis 80 times if there is indeed an effect

Another way to think about Statistical Power

If your statistical test has statistical power at .50 (50%), this means that your test will do no better and finding a real effect than tossing a coin. You are just as likely to guess the outcome by flipping a coin than by spending the time do collect and analyze data


Statistical power is essential. You need to be able to demonstrate that you have the conditions necessary to ensure your likelihood of detecting an effectWe will show the factors that influence power and how you can determine how large of a sample size you need

Statistical Power in Studies

In large grants where potentially millions of dollars are being spent, funders have a vested interest in requiring researchers to demonstrate adequate powerIn dissertation research, while there are not millions of dollars at stake, the goal is to conduct original research that contributes to the literature Sample size considerations are important if you are to demonstrate a contribution to the literature

How Do We Do It?

Most students are not experts; luckily, there are some ways to achieve this without having to buy expensive booksThere are three things that influence power in a studyAlpha LevelEffect SizeSample SizeWe will address each individually

Alpha Level

Recall that alpha level (Type I error) is the chance that you will find a significant treatment effect when one doesn’t exist.Recall also that we traditionally use two values: .05 and .01When we choose a larger value for alpha, we expand the rejection region. If we expand this area, we provide more opportunities to reject the null hypothesis (correctly). Thus, larger values of alpha result in more power.

Alpha Level (Continued)

Since there really is no justification, typically, for using larger alpha levels, by convention, we use .05 (this means essentially that there is only a 5% chance that we will make a Type 1 error; that is, incorrectly rejecting the null hypothesis). So, let us assume that alpha level is set at .05. The other factors we have control over are effect size and sample size.

Effect Size

The standard definition of effect size is:Mean DifferenceStandard DeviationThus, for the simple two-group design, the effect size (which is just a measure of how large the statistical difference is) is given by the difference between the before and after treatment means divided by the average standard deviation.There are two consequences of the effect size that affect statistical power

Effect Size (Continued)

The larger the mean difference (or the greater the change in the mean before and after the treatment), the more likely you are to detect this difference in the population.Measures that are more sensitive (i.e., those with less random variability) will enhance your ability to detect an effect.

Cohen’s d

Cohen’s d is a popular measure of effect size. It’s exact formula is based on the t-statistic and is calculated as:d = M1 – M2 SDCohen specified the following effect sizes:Small: d<.50Medium: d=.50 to .80Large: d > .80

Next Step: How do we find effect sizes?

In your research, you have to make your best guess as to what effect size you expect (Hallahan & Rosenthal, 1996)Rely on previous researchRely on what has been found in pilot workCohen’s Advice

Effect Sizes from Previous Research

Lipsey and Wilson (1993) provide effect sizes for a number of psychological, educational, and behavioral treatmentsLipsey, M.W., & Wilson, D.B. (1993). The efficacy of psychological, educational, and behavioral treatment: Confirmation from meta-analysis. American Psychologist, 48(12), 1181-1209.Review the literature in your area to determine the magnitude of effect sizes found. Remember, you can usually do this extracting means and standard deviations from the article and computing a rough estimate.

Using Estimates

In the absence of any other data, Cohen’s d can be used. You must make a sound judgment on the effect size you expect to obtain at best.Most of the time, we assume we will find small or medium effect size You can choose a number of effect sizes – for example, d-.2,.3,.4,.5 – and determine sample sizes accordingly.

Other Measures of Effect Size

Correlation Coefficients: Square of the CorrelationMultiple Regression: R2ω2 – measure of effect size for analysis of variance (provides similar values to R2)Small effect: ω2 < .06Medium effect: .06 to .14Large effect: ω2 > .14

Calculating Sample SizeUsing Tables

Tables provided include those for independent and dependent t-tests, correlation, and 3-5 group ANOVA designsTables come from: will assume .05, 2-tailed tests (this is standard)For each test, the effect size is across the top, and the power is listed down the left column. We can focus on Power=.80


Assume alpha=.05, 2-tailed testsDetermine the effect size of interestUse tables to compute the sample size

Example #1

I want to know the estimated sample size to compare two treatment programs for juvenile offenders on a measure of delinquency. I find the effect sizes in a number of studies to be: .17, .48, .25, .27. What should my sample size be to ensure 80% power?

Example #2

I am interested in investigating how adolescents process information using simulated video games. There is not much literature in this area, and the one study I found suggested that those in the primed condition (M=6.7, SD=5.1) were more likely to interpret acts as aggressive than those who were in a control condition (M=3.5, SD=4.8). Using this study, what sample size would I need to perform a similar study of aggression and information processing?

Example #3

I am interested in examining the relationship between western advertising and beer consumption in rural India. What sample size to I need to compare two towns (one that has had advertising and one that has no advertising)?

Example #4

I am interested in examining the correlation between the amount of money spent purchasing lottery tickets and the scores on a measure of problem gambling. The literature estimates that I would expect a correlation of about .40. What sample size do I need to need to detect this correlation?

Example #5

I am proposing a study to examine 3 different exercise conditions and their effect on anxiety: weight lifting, running, and tai chi. In my examination of the literature, the average R2 found in the studies was .10. How many people do I need to have in each group to ensure power = .80?

Multiple Regression

Many times, students want to do regressions. The reality is that multiple regression and analysis of variance are the same mathematicallyWe look at R2, the overall explained variance by variables in the modelCohen suggests the following effect size definitions:Small: R2 < .13Medium: .13 – .26Large: R2 > .26

Multiple Regression

The tables for multiple regression are more complicated (but not terribly difficult to use)Tabachnick & Fidell (2001, p. 117) suggest the following “thumbrule” to compute sample size:N >= (8/f2) + (m-1)Where f2 = R2/(1- R2)m=number of predictor variables

Example #6 (Multiple Regression)

If I am testing a model that has 3 predictors (anxiety, depression and number of siblings) of a dependent measure of alcohol consumption, and I conservatively estimate a small effect size, what sample size is required for this test?

Web-based Power Calculators

There are some websites that offer statistical power calculators. One example is: website offers options for calculating different problems.


Larger sample sizes are best for enhancing the ability to detect effects; however, sample sizes must be reasonable in size and cost effectiveSensitive measures help to enhance power by ensuring that the variability is small

Conclusions (Cont)

Sample size determination is an essential component of your study that needs to be addressed. You can use power calculators, power tables from statistical books, or information from here; you just need to be explicit about where the information came from.


Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Mahwah, NJ: Lawrence Erlbaum.Gravetter, F.J., & Wallnau, L.B. (2004). Statistics for the behavioral sciences (6th ed.). Belmont, CA: Thompson-Wadsworth.Hallahan, M., & Rosenthal, R. (1996). Statistical power: Concepts, procedures, and applications. Behavior Research and Therapy, 34, 489-499.Lipsey, M.W., & Wilson, D.B. (1993). The efficacy of psychological, educational, and behavioral treatment: Confirmation from meta-analysis. American Psychologist, 49(12), 1181-1209.Murphy, K.R., & Myors, B. (1998). Statistical power analysis: A simple and general model for traditional and modern hypothesis tests. Hillsdale, NJ: Erlbaum.Rossi, J. (1990). Statistical power of psychological research: What have we gained in 20 years? Journal of Consulting and Clinical Psychology, 58(5), 646-656.Tabachnick, B.G., & Fidell, L.S. (2001). Using multivariate statistics (4th ed.). Needham Heights, MA: Allyn & Bacon.