Lab 1

Lab 1

University of the Potomac

CBSC520 Data Analytics

INTRODUCTION

This Lab is mainly based on the differentiation of Categorical variable and nominal variable. Then finding the count of both the variable from the data provided in chapter 02 and then importing the spreadsheets to prove the calculation, then recorded all the variable as per the question and then conclusion was given based in each calculation.

Lab 1

The file P02_03.xlsx contains data from a survey of 399 people regarding a government environmental policy.

Which of the variables in this data set are categorical? Which of these are nominal; which are ordinal?

Gender, state, age and opinion are Categorical, Gender and State are Nominal. Age and Opinion are Ordinal.

For each categorical variable, create a column chart of counts.

For category “Age”

Age Count
Young 87
Middle-aged 218
Elderly 94

For category “Gender”

Gender Count
1 165
2 234

For category “State”

State Count
Arizona 33
California 33
Florida 41
Illinois 38
Michigan 46
Minnesota 31
New York 46
Ohio 44
Texas 48
Virginia 39

For category “Opinion”

Opinion Count
Agree 82
Strongly Agree 86
Neutral 68
Disagree 85
Strongly Disagree 78

Recode the data into a new data set, making four transformations: (1) change Gender to list “Male” or “Female”; (2) change Children to list “No children” or “At least one child”; (3) change Salary to be categorical with categories “Less than $40K,” “Between $40K and $70K,” “Between $70K
and $100K,” and “Greater than $100K ” (where you can treat the breakpoints however you like); and (4) change Opinion to be a numerical code from 1 to 5 for Strongly Disagree to Strongly Agree. Then create a column chart of counts for the new Salary variable.

Recoded data

From the original date assumed gender “1” to be “Male and gender “2” to be “Female”.

From the original data, assumed “0” children to “No Children” and rest to “At least One Children.

Salary Count
Less than $40K 27
Between $40K and $70K 119
Between $70K and $100K 168
Greater than $100K 85

Create a histogram for each of the two sets of exam scores.

  • The file P02_10.xlsx contains midterm and final exam scores for 96 students in a corporate finance course.

What are the mean and median scores on each of these exams?

Mean of Midterm is 79.36458333, Median of Midterm is 79.

Mean of Final is 80.03125, Median of Midterm is 79.5.

Explain why the mean and median values are different for these data.

For midterm, the difference is 0.364, between mean and median as mean is average of all the midterm values, whereas the median is taken as middle most observation from the data, and mean is sometimes affected due to presence of extreme values. For final, the difference here is 0.5312 which is higher than midterm, as final has extreme values more.

Based on your previous answers, how would you characterize this group’s performance on the midterm and on the final exam?

From the observation from the histogram chart, it is very evident that out 96 students, more than half of the students scored above 63, in midterm more than 55 students have scored above 63, whereas in final also 55 students have scored above 63 marks. From both the histogram chart, on the right it is skewed more which tend of higher marks.

Create a new column of differences (final exam score minus midterm score). A positive value means the student improved, and a negative value means the student did the opposite. What are the mean and median of the differences? What does a histogram of the differences indicate?

Here is the data after adding the new column “Differences” (Final exam score minus midterm score).

Mean is 0.66666667, Median is 2.

From the above histogram chart, 29 students are in the difference range of (-0.6 to 4.1) which mean there is an improvement in Final Exam comparing to the midterm.

  • The Consumer Confidence Index (CCI) attempts to measure people’s feelings about general business conditions, employment opportunities, and their own income prospects. Monthly average values of the CCI are listed in the file P02_20.xlsx.
  • a. Create a time series graph of the CCI values.

Average salary is $75,970.

  • Have U.S. consumers become more or less confident through time?
  • From looking the graph, the US consumers has become very less confidence through the time.
  • How would you explain recent variations in the overall trend of the CCI?
    • Consumer Confidence Index (CCI) was very peak starting from June 1997, till October 1999, it was in its peak and it was gradually growing. After October 2000, the CCI keeps on decreasing till April 2003, from there onward it started to raise again, but gradually end up decreasing month by month every year till Feb 2009. Later it was again increase but not to the level of June1997.
    • The file P02_03.xlsx contains data from a survey of 399 people regarding an environmental policy. Use filters for each of the following.
    • Identify all respondents who are female, middle-aged, and have two children. What is the average salary of these respondents?
    • Assumes gender “2” to be female and here is the data
  • Average salary is $98,573
    • Identify all respondents who are elderly and strongly disagree with the environmental policy. What is the average salary of these respondents?

Total respondents with opinion “Strong agree” is 86. “Young” age respondents with opinion “Strong agree” is 18. Thus, proportion is 0.209302326 which is 20.93%.

  • Identify all respondents who strongly agree with the environmental policy. What proportion of these individuals are young?

From above options, opted “1”, from the original data assumed gender “1” to be “men” and here is the data.

  • Identify all respondents who are either
  • middle-aged men with at least one child and an annual salary of at least $50,000
  • or
    • middle-aged women with two or fewer children and an annual salary of at least $30,000.

What are the mean and median salaries of the respondents who meet these conditions? What proportion of the respondents who satisfy these conditions agree or strongly agree with the environmental policy?

Mean for the above data is $96,614.

Median for the above data is $93,443.

Total respondents are 60.

“Agree” respondents are 13. “Strong agree” respondents are 13. Total respondents with “agree” and “strong agree” 26. Proportion is 0.433333333 which is 43.33%.

Conclusion

From the above four lab questions, it was totally about descriptive statistics, which majorly extracted the given data, modified the data, proportion of the asked data, mean, median, histogram, table charts. It was a good exercise sort which majorly learned and observed of descriptive statistics.

Reference

S. Christian Albright/Wayne L. Winston (2017). Business Analytics – Data Analysis and Decision Making. Cengage Learning

Place an Order

Plagiarism Free!

Scroll to Top