Week 2 Discussion

1.    Discuss the advantages of constructing a relative frequency distribution as opposed to a frequency distribution.

A frequency distribution gives the number of observations or counts that fall in a particular category. The frequencies can be any whole number from 0 to infinity. On the other hand, a relative frequency distribution gives the proportion of observations that fall in a particular category. The proportion is always between 0 and 1. So, from a Relative Frequency Distribution, it is easier to have an idea about the data.

2.    What are the characteristics of a data set that would lead you to construct a bar chart?

A bar graph is a map or graph that displays categorical information with rectangular bars with heights or lengths equal to the values they represent. Bar charts are useful to show information in nominal and ordinal groups. Nominal data is classified by quantitative or qualitative details such as birth county or college subject. Ordinary data is identical, but it is also possible to rate the different categories.

3.    What are the characteristics of a data set that would lead you to construct a pie chart?

Generally, pie charts are used to display percentage or proportional information, and the percentage represented by each class is generally shown next to the corresponding pie slice. Pie charts are ideal for showing information for around 6 or less classes. It is hard for the eye to differentiate between the relative sizes of the different sectors when there are more classes and thus the map is difficult to interpret.

4.    Discuss the differences in data that would lead you to construct a line chart as opposed to a scatter plot.

Line charts can show ongoing data over time, set to a common scale, and are therefore suitable for showing data patterns at similar intervals or over time. Class data are evenly distributed along the horizontal axis in a line map, and all value data are evenly distributed along the vertical axis.

Scatter charts are widely used to display and compare numerical values such as information from science, statistics, and engineering. Such maps are useful in showing the value relationships in multiple data sets, and they can map two groups of numbers as one set of xy coordinates.

5.    Consider the following questions concerning the sample variance:

a.     Is it possible for a variance to be negative? Explain.

Variance is the average squared deviation from the mean. 

To calculate Variance, take each number in the data set, calculate the differences between the individual numbers and the mean of the data set. Then you square each of the differences. Then you find the average of all the squared numbers from the previous step. Because the squared deviations are all positive numbers or zeroes, their smallest possible mean is zero. It can’t be negative. This average of the squared deviations is in fact variance. Therefore, variance can’t be negative.

b.   What is the smallest value a variance can be? Under what conditions does the variance equal this smallest value?

The smallest value variance can reach is exactly zero. This is when all the numbers in the data set are the same, therefore all the deviations from the mean are zero, all squared deviations are zero and their average (variance) is also zero.

If there are at least two numbers in a data set which are not equal, variance must be greater than zero.

c.    Under what conditions is the sample variance smaller than the corresponding sample standard deviation?

Yes, the variance can be NUMERICALLY lower than the standard deviation, in case that the variance is less than 1, but comparing the variance and standard deviation in size is meaningless, because they are measured in DIFFERENT UNITS.

For example, if

then, the standard deviation is

6.    For a continuous variable that has a bell-shaped distribution, determine the percentiles associated with the endpoints of the intervals specified in the Empirical Rule.

7.    Since the standard deviation of a set of data requires more effort to compute than the range does, what advantages does the standard deviation have when discussing the spread in a set of data?

  • This rule also referred to as the three-sigma rule or 68-95-99.7 rule, is a statistical rule which states that for a normal distribution, almost all data falls within three standard deviations of the mean.  Broken down, the empirical rule shows that 68% falls within the first standard deviation (µ ± σ), 95% within the first two standard deviations (µ ± 2σ), and 99.7% within the first three standard deviations (µ ± 3σ).

Advantages of Standard Deviation over Range:

1. The Standard Deviation involves all the data points in a dataset unlike the range which involves only the extreme values (maximum and minimum values of the data set).

2. Standard Deviation is not much affected by the outliers in the data unlike the range.

8.    The mode seems like a very simple measure of the location of a distribution. When would the mode be preferred over the median or the mean?

The Mode is preferred over the Median or the Mean when we are interested in the frequency of the data point as if it is not our intention that the average values of the data set or the mid observation of the data.

Eg. Suppose the manager of a shoe company wants to know which size of shoe sells the most. In this case, the Mode is ideal and is preferred over the Mean or the Median.

Place an Order

Plagiarism Free!

Scroll to Top