## Hypothesis Testing

Assignment Name: Weekly Summary 7.1

Course Name and Number:

Data Analytics CBSC520

# Abstract

As written by S. Christian Albright and Wayne L. Winston, in this chapter we would discuss about concepts In Hypotheses testing, null and alternate hypotheses one-tailed versus two-tailed tests, types of errors, significance level and rejection region, significance from p-values, Hypotheses tests and confidence intervals, Hypotheses tests, tests for normality and chi-square tests for Independence.

# Introduction

When you make inferences about a population based on sample data, you can perform the analysis in either of two ways. You can proceed as in the previous chapter, where you calculate a point estimate of a population parameter and then form a confidence interval around this point estimate. In this way you bring no preconceived ideas to the analysis but instead let the data speak for themselves in estimating the parameter’s true value. In contrast, an analyst often has a theory, or hypothesis, that he or she would like to test. This hypothesis might be that a new packaging design will produce more sales than the current design, that a new drug will have a higher cure rate for a given disease than any drug currently on the market, that people who smoke cigarettes are more susceptible to heart disease than nonsmokers, and so on. In this case the analyst typically collects sample data and checks whether the data provide enough evidence to support the hypothesis. The hypothesis that the analyst is attempting to prove is called the alternative hypothesis. It is also frequently called the research hypothesis. The opposite of the alternative hypothesis is called the null hypothesis. It usually represents the current thinking or status quo. That is, it is usually the accepted theory that the analyst is trying to disprove. The burden of proof is on the alternative hypothesis.

# Concepts in Hypothesis Testing

Before we plunge into the details of specific hypothesis tests, it is useful to discuss the concepts behind hypothesis testing. There are several concepts and statistical terms involved, all of which lead eventually to the key concept of statistical significance. Example 9.1 provides context for the discussion of these concepts.

The manager of Pepperoni Pizza Restaurant has recently begun experimenting with a new method of baking pizzas. He would like to base the decision whether to switch from the old method to the new method on customer reactions, so he performs an experiment. For 100 randomly selected customers who order a pepperoni pizza for home delivery, he includes both an old-style and a free new style pizza. He asks the customers to rate the difference between the pizzas on a -10 to +10 scale, where -10 means that they strongly favor the old style, +10 means they strongly favor the new style, and 0 means they are indifferent between the two styles. How might he proceed by using hypothesis testing?

# Null and Alternative Hypotheses

The manager would like to prove that the new method provides better-tasting pizza, so this becomes the alternative hypothesis. The opposite, that the old-style pizzas are at least as good as the new style pizzas, becomes the null hypothesis. He judges which of these are true based on the mean rating over the entire customer population, labeled μ. If it turns out that μ≤ 0, the null hypothesis is true. If μ> 0, the alternative hypothesis is true. Usually, the null hypothesis is labeled H0 and the alternative hypothesis is labeled Ha. In our example, they can be specified as H0: μ≤ 0 and Ha: μ> 0. The null and alternative hypotheses divide all possibilities into two nonoverlapping sets, exactly one of which must be true.

# One-Tailed versus Two-Tailed Tests

The form of the alternative hypothesis can be either one-tailed or one-tailed or one-tailed two-tailed, depending on what the analyst is trying to prove. The pizza manager’s alternative hypothesis is one-tailed because he is hoping to prove that the customers’ ratings are, on average, greater than 0. The only sample results that will lead to rejection of the null hypothesis are those in a direction, namely, those where the sample mean rating is positive. On the other hand, if the manager sets up his rating scale in the reverse order, so that negative ratings favor the new-style pizza, the test is still one-tailed, but now only negative sample means lead to rejection of the null hypothesis. In contrast, a two-tailed test is one where results in either of two directions can lead to rejection of the null hypothesis. A slight modification of the pizza example where a two-tailed alternative might be appropriate is the following. Suppose the manager currently uses two methods for producing pepperoni pizzas. He is thinking of discontinuing one of these methods if it appears that customers, on average, favor one method over the other. Therefore, he runs the same experiment as before, but now the hypotheses he tests are H0: μ = 0 versus Ha: μ ≠ 0, where μ is again the mean rating across the customer population. In this case either a large positive sample mean, or a large negative sample mean, or a large negative sample mean or will lead to rejection of the null hypothesis and presumably to discontinuing one of the production methods. A one-tailed alternative is one that is supported only by evidence in a single direction. A two-tailed alternative is one that is supported by evidence in either of two directions. Once hypotheses are set up, it is easy to detect whether the test is one-tailed or two-tailed. One-tailed alternatives are phrased in terms of “> or <”. Two-tailed alternatives are phrased in terms of “≠” The pizza manager’s alternative hypothesis is one tailed because he is trying to prove that the new-style pizza is better than the old-style pizza.

# Types of Errors

Regardless of whether the manager decides to accept or reject the null hypothesis, it might be the wrong decision. He might incorrectly reject the null hypothesis when it is true, or he might incorrectly accept the null hypothesis when it is false; these are respectively called type I and type II errors. A type I error occurs when you incorrectly reject a null hypothesis that is true. A type II error occurs when you incorrectly accept a null hypothesis that is false. The traditional hypothesis-testing procedure favors caution in terms of rejecting the null hypothesis. Given this rather conservative way of thinking, you are inclined to accept the null hypothesis unless the sample evidence provides strong support for the alternative hypothesis.

# Significance Level and Rejection Region

To decide how strong the evidence in favor of the alternative hypothesis must be to reject the null hypothesis, one approach is to prescribe the probability of a type I error that you are willing to tolerate. This type I error probability is usually denoted by α and is most commonly set equal to 0.05. The value of α is called the significance level of the test. The rejection region is the set of sample data that leads to the rejection of the null hypothesis. The significance level, α, determines the size of the rejection region. Sample results in the rejection region are called statistically significant at the α level. It is important to understand the effect of varying α: If α is small, such as 0.01, the probability of a type I error is small, and a lot of sample evidence in favor of the alternative hypothesis is required before the null hypothesis can be rejected When α is larger, such as 0.10, the rejection region is larger, and it is easier to reject the null hypothesis.

# Significance from p-values

A second approach is to avoid the use of a significance level and instead simply report how significant the sample evidence is. This approach is currently more popular. It is done by means of a p-value. The p-value is the probability of seeing a random sample at least as extreme as the observed sample, given that the null hypothesis is true. The smaller the p-value, the more evidence there is in favor of the alternative hypothesis. Sample evidence is statistically significant at the α level only if the p-value is less than α. The advantage of the p-value approach is that you don’t have to choose a significance value α ahead of time, and p-values are included in virtually all statistical software output.

# Type II Errors and Power

A type II error occurs when the alternative hypothesis is true but there isn’t enough evidence in the sample to reject the null hypothesis. This type of error is traditionally considered less important than a type I error, but it can lead to serious consequences in real situations. For example, in medical trials on a proposed new cancer drug, a type II error occurs if the new drug is superior to existing drugs, but experimental evidence is not sufficiently conclusive to warrant marketing the new drug. For patients suffering from cancer, this is obviously a serious error. As we stated previously, the alternative hypothesis is typically the hypothesis a researcher wants to prove. If it is in fact true, the researcher wants to be able to reject the null hypothesis and hence avoid a type II error. The probability that she can do so is called the power of the test that is, the power is one minus the probability of a type II error. There are several ways to achieve high power, the most obvious of which is to increase sample size. By sampling more members of the population, you are better able to see whether the alternative is true and hence avoid a type II error if the alternative is indeed true. As in the previous chapter, there are formulas that specify the sample size required to achieve a certain power for a given set of hypotheses. We will not pursue these in this book, but you should be aware that they exist. The power of a test is one minus the probability of a type II error. It is the probability of rejecting the null hypothesis when the alternative hypothesis is true.

# Hypothesis Tests and Confidence Intervals

The results of hypothesis tests are often accompanied by confidence intervals. This provides two complementary ways to interpret the data. There is also a more formal connection between the two, at least for two-tailed tests. When using a confidence interval to perform a two-tailed hypothesis test, reject the null hypothesis if and only if the hypothesized value does not lie inside a confidence interval for the parameter.

# Practical versus Statistical Significance

Statistically significant results are those that produce sufficiently small p-values. In other words, statistically significant results are those that provide strong evidence in support of the alternative hypothesis. Such results are not necessarily significant in terms of importance. They might be significant only in the statistical sense. There is always a possibility of statistical significance but not practical significance with large sample sizes. By contrast, with small samples, results may not be statistically significant even if they would be of practical significance.

# Hypothesis Tests for a Population Mean

As with confidence intervals, the key to the analysis is the sampling distribution of the sample mean. If you subtract the true mean from the sample mean and divide the difference by the standard error, the result has a t distribution with n – 1 degrees of freedom. In a hypothesis-testing context, the true mean to use is the null hypothesis value, specifically, the borderline value between the null and alternative hypotheses. This value is usually labeled μ0. To run the test, referred to as the t test for a population mean, you calculate the test statistic as shown below:

t-value = (1-X) – µ0/(s/√n)

# Hypothesis Tests for Other Parameters

Just as we developed confidence intervals for a variety of parameters, we can develop hypothesis tests for other parameters. In each case, the sample data are used to calculate a test statistic that has a well-known sampling distribution. Then a corresponding p-value measures the support for the alternative hypothesis.

# Hypothesis Tests for a Population Proportion

To test a population proportion p, recall that the sample proportion has a sampling distribution that is approximately normal when the sample size is reasonably large. Specifically, the distribution of the standardized value.

P^-P/√P(1-P)/n

is approximately normal with mean 0 and standard deviation 1.

This leads to the following z test for a population proportion. Let p0 be the borderline value of p between the null and alternative hypotheses. Then p0 is substituted for p to obtain the test statistic below:

z-value = P^ – P0/√P0(1-P0)/n

# Hypothesis Tests for Differences between Population Means

The comparison problem, where the difference between two population means is tested, is one of the most important problems analyzed with statistical methods. The form of the analysis depends on whether the two samples are independent or paired. If the samples are paired, then the test is referred to as the t test for difference between means from paired samples.

t-value = (1-D) – D0/Sd/√n

If the samples are independent, the test is referred to as the t test for difference between means from independent samples. Test statistic for independent samples test of difference between means:

t-value = ((1-X1) – (1-X2)) – D0 / Sp√1/n1+1/n2

Sp = √(n1-1) s1^2 + (n2-1) s2^2/n1+n2-2

# Hypothesis Test for Equal Population Variances

The two-sample procedure for a difference between population means depends on whether population variances are equal. Therefore, it is natural to test first for equal variances. This test is referred to as the F test for equality of two variances. The test statistic for this test is the ratio of sample variances:

F-value = S1^2/S2^2

The null hypothesis is that this ratio is 1 (equal variances), whereas the alternative is that it is not 1 (unequal variances). If the population variances are equal, this test statistic has an F distribution with n1 – 1 and n2 – 1 degrees of freedom.

# Hypothesis Tests for Differences between Population Proportions

One of the most common uses of hypothesis testing is to test whether two population proportions are equal. The following z test for difference between proportions can then be used. As usual, the test on the difference between the two values requires a standard error. Standard error for difference between sample proportions:

SE (P1^ – P2^) = √Pc^(1-Pc^) (1/n1 + 1/n2)

Resulting test statistic for difference between proportions:

z-value = P1^ – P2^/SE (P1^ – P2^)

# Tests for Normality

Many statistical procedures assume that population data are normally distributed. The tests that allow you to test this assumption are called tests for normality. The first test is called a chi-square goodness-of-fit test. A histogram of the sample data is compared to the expected bell-shaped histogram that would be observed if the data were normally distributed with the same mean and standard deviation as in the sample. If the two histograms are sufficiently similar, the null hypothesis of normality is accepted. The goodness-of-fit measure in the equation below is used as a test statistic.

Chi-Square value =

# Chi-Square Test for Independence

The chi-square test for independence is used in situations where a population is categorized in two different ways. For example, people might be characterized by their smoking habits and their drinking habits. The question then is whether these two attributes are independent in a probabilistic sense. They are independent if information on a person’s drinking habits is of no use in predicting the person’s smoking habits (and vice versa). The null hypothesis for this test is that the two attributes are independent. This test is based on the counts in a contingency (or cross-tabs) table. It tests whether the row variable is probabilistically independent of the column variable.

# References

S. Christian Albright/Wayne L. Winston (2017). Business Analytics – Data Analysis and Decision Making. Cengage Learning

ISBN: 978-1-305-94754-2ss

To view and download a complete answer, scroll down to the bottom to pay 