# CBSC520.17

Weekly Summary 6.1

This week we learned two chapters. One was about Sampling and Sampling Distribution and the other on Confidence Interval Estimation. In this summary, we see what the concepts in these chapters help us learn.

In Chapter 7, Sampling and Sampling Distribution, we how different types of sampling schemes are used, and using samples from the population, we can find out behaviors of the population. Before we jump into Sampling, we see what terms are used in it. As we know already, population means entire data that need to be studied; like Census. A frame is list of all items in a population. The sample members that can be selected are called as Sampling Units. Probability Sample, is a sample in which the sampling units are randomly chosen. We see this type of sample choosing using Random function in Excel. Then we have judgmental sampling, where samples are chosen according to the sampling person’s judgment (Albright & Winston, 2017).

Sampling schemes do cost to the company, some are cheaper and some are costly – but the costly ones give more accurate information. We saw a simple method of sampling that is called ‘Simple Random Sampling’. In this type of sampling, we have the count of sample size as ‘n’, and every member of the set has same probability of getting picked. These are simple and easy to analyze as their statistical properties are simple. This type of sampling is not used much in real world as the each member of the set has the same chance of getting picked. The reason behind this can be understood using the example of a picking candidates for personal interview – the result can be spread out over a large geographical region (Albright & Winston, 2017).

We saw different methods of choosing random samples. One of the methods is using =RAND() function in excel. We did solve a problem P07_07 in lab 6 where we pick random samples in excel. One method is the systematic sampling. Here we divide the population by the sample size, in blocks – then we use a random selection of a number between 1 and the number in each block. In the first block the random number of sample is selected. Let us call that k. Now from each block k-th sample number is selected. Another method of sampling is called the Stratified Sampling. In this method – the sub-population within the population is usually identified. These subpopulations are called strata. Now from each strata – a simple random sample is selected. We then see the Cluster Sampling. In this sampling, the population is divided into clusters. And then a random sample of clusters is selected. An example of clusters can be city, or its blocks. We also saw Multistage Sampling Schemes. This like Cluster Sampling – but multistage. Here using the geography example, an n number of locations is selected for the sample – this is the first stage. Then from that sample set, random city blocks or other areas are selected – this is the second stage. Then systematic sampling is done in the next step (Albright & Winston, 2017).

Fig. 1. Random Sampling Solution, (Source: Lab 6).

Next is Estimation. It is the reason why we do sampling. We do sampling to estimate properties of the population. There are two sources of errors in estimation. Sampling Error and Non-sampling error. Sampling error can be explained as estimating the entire population to have the properties that are shown by the sample. Non-sampling error can be because of couple of reasons. This can include – a part of sample did not take the survey, survey responses were not truly given, questions were not correctly formed, etc. (Albright & Winston, 2017).

We then learned a few terms that are important in Sampling. One is Point Estimate. It is a process where one finds an approx. value of a population parameter from the random samples. The point estimate is more accurate if the sample size is large (Britannica, 2016).

Another term is Sampling Error or also called as Estimation Error – this is the difference between the best guess value and the true value of the population parameter that is being estimated. Next is Sampling distribution – it is the distribution of point estimates from each sample (Albright & Winston, 2017).

Next is Confidence Interval – “A Confidence Interval is a range of values we are fairly sure our true value lies in. It is based on Mean and Standard Deviation.” (“Confidence Interval”, n.d.). We use the Z table to find the confidence interval percentage value.

Next we learned is The Central Limit Theorem (CLT). This theorem states that a sampling distribution is of the sample mean is a normal distribution when the sample is large. The approximation increases as the sample size increases. When you add or take an average a selected sample set from any type of distribution – this result will be a normal distribution if the sample size is large (Albright & Winston, 2017).

The next chapter we learned is Confidence Interval Estimation. “Statistical inference is the process through which inferences about a population are made based on certain statistics calculated from a sample of data drawn from that population” (“Statistical Inference”, 2012).

Statistical interferences are of two types. Confidence interval estimation helps find a point estimate and a confidence interval plus or minus the point estimate. Hypothesis testing will figure out if the data will support for a particular hypothesis (Albright & Winston, 2017).

Confidence Interval mostly are formed by the statement:

The central limit theorem is equivalent to the below formula:

Here Z is the standardized quantity, with a normal distribution having mean 0 and standard deviation as 1 (Albright & Winston, 2017). We also have a table for finding a Z value.

Now most of the time – we do not have the population data, hence the standard deviation is unknown, so we use the sample standard deviation and substitute in formula for Z. Also when this change is made, this normal distribution is no longer normal, and this gives a new sampling distribution called t-distribution (Albright & Winston, 2017).

The formula for t-value is:

This t-value shows the number of standard error by how much the sample mean differs from the population mean (Albright & Winston, 2017).

The t-distribution curve is similar to the normal distribution curve. The below figure 2 shows a t-distribution curve:

Fig. 2. t-distribution, (Source: Albright & Winston, 2017).

The above curve shows that t-distribution has a bell shaped curve. It is centered at 0. Here degrees of freedom plays a role in deciding the spread-ness of the curve. The degrees of freedom is known by the sample size n. It is found by n – 1. The smaller the sample size, the smaller the degrees of freedom, and the more the curve is spread. The larger the sample size, the larger is the degrees of freedom, the less the curve is spread. The figure above shows two curves – one with small degrees of freedom 5, and large degrees of freedom 30 (Albright & Winston, 2017).

We learned Confidence Intervals – for the mean and for the standard deviation. We did solve problems with Confidence Intervals in Lab 6. First we see Confidence Interval for the Mean. To find the solution, we need to first select a confidence level – usually this is 90%, 95% or 99%. We use the sampling distribution of the point estimate to find the multiple of the standard error (SE). This SE is subtracted or added to find the required confidence levels.

The below figure 3 is an example of the solution that we did in Lab 3 to find the confidence interval for the mean and then the standard deviation:

Fig. 3. Confidence Interval of the Mean and the Standard Deviation, (Source: Lab 6).

Next we see is Confidence Interval for a Proportion. A good example to explain this estimation is shown by Surveys. Using Surveys, estimation of the proportion is done. Hence it is needed to find confidence interval for a population proportion. We need to find the point estimate, the standard error of this point estimate, and a multiple that depends on the confidence level (Albright & Winston, 2017).

“The sample proportion is calculated directly from the sample data with a COUNTIF function. Note the formula for the z multiple in row 6. The argument for the NORM.S.INV function is 95% + 5%/2, or 97.5%” (Albright & Winston, 2017).

Below figure 4 shows how to find confidence interval for proportion in Excel:

Fig. 4. Confidence Interval for Proportion, (Source: Albright & Winston, 2017).

Next is Confidence Interval for Standard Deviation. Here to find the CL for SD, we first find chi-square distribution values. The distribution for s is not symmetric. Its approximate distribution is right skewed – called as chi-square distribution (Albright & Winston, 2017). We found the Confidence Interval for SD in Lab 6. Figure 3 above shows how to calculate CI for the standard deviation in Excel. This is also shown using formula in below figure 5.

Fig. 5. Confidence Interval for the Standard Deviation, (Source: Albright & Winston, 2017).

Next is Confidence Interval for the difference between the means. This is an important statistical finding used. This is found by comparison of two population means (Albright & Winston, 2017). Below figure 6, we see calculation of the same in Excel using formula:

Fig. 6. Confidence Interval for the difference between means, (Source: Albright & Winston, 2017).

Next we learned is Confidence Interval for the difference between proportions. This analysis is done similarly to the above CI for the difference between means. It is found by the comparison of two population proportion. The formula for CI for population proportion is give by:

(Albright & Winston, 2017).

Next we learned choosing sample size. This is important – because depending on the sample size, the confidence level and the confidence interval increases. The sample size is directly proportional to confidence level and the confidence interval. Sample size should be appropriately selected as we need to make the confidence interval narrow (Albright & Winston, 2017).

Sample Size Selection for the Estimation of the Mean: The right sample size for the estimation of the mean can be found using the formula

Where

But here in the above, since sample size selection is done before, we do not have s available yet. So this is replace by an estimate standard deviation value of the population, and replacing the t-multiple value with the z-multiple value. This results in the below formula:

(Albright & Winston, 2017).

Thank you.

References

Albright, S. C., & Winston, W. L. (2017). Business analytics: Data analysis & decision making. Mason, OH: Cengage Learning

Britannica, T. (2016, April 22). Point estimation. Retrieved April 14, 2019, from https://www.britannica.com/science/point-estimation

Confidence Interval. (n.d.). Retrieved April 14, 2019, from https://www.mathsisfun.com/data/confidence-interval.html

Statistical Inference. (2012). Retrieved April 14, 2019, from https://www.sciencedirect.com/topics/neuroscience/statistical-inference

To view and download a complete answer, scroll down to the bottom to pay 