Lab 1
University of the Potomac
CBSC520 Data Analytics
INTRODUCTION
This Lab is mainly based on the differentiation of Categorical variable and nominal variable. Then finding the count of both the variable from the data provided in chapter 02 and then importing the spreadsheets to prove the calculation, then recorded all the variable as per the question and then conclusion was given based in each calculation.
Lab 1
The file P02_03.xlsx contains data from a survey of 399 people regarding a government environmental policy.
Which of the variables in this data set are categorical? Which of these are nominal; which are ordinal?
Gender, state, age and opinion are Categorical, Gender and State are Nominal. Age and Opinion are Ordinal.
For each categorical variable, create a column chart of counts.
For category “Age”
Age | Count |
---|---|
Young | 87 |
Middle-aged | 218 |
Elderly | 94 |
For category “Gender”
Gender | Count |
---|---|
1 | 165 |
2 | 234 |
For category “State”
State | Count |
---|---|
Arizona | 33 |
California | 33 |
Florida | 41 |
Illinois | 38 |
Michigan | 46 |
Minnesota | 31 |
New York | 46 |
Ohio | 44 |
Texas | 48 |
Virginia | 39 |
For category “Opinion”
Opinion | Count |
---|---|
Agree | 82 |
Strongly Agree | 86 |
Neutral | 68 |
Disagree | 85 |
Strongly Disagree | 78 |
Recode the data into a new data set, making four transformations: (1) change Gender to list “Male” or “Female”; (2) change Children to list “No children” or “At least one child”; (3) change Salary to be categorical with categories “Less than $40K,” “Between $40K and $70K,” “Between $70K
and $100K,” and “Greater than $100K ” (where you can treat the breakpoints however you like); and (4) change Opinion to be a numerical code from 1 to 5 for Strongly Disagree to Strongly Agree. Then create a column chart of counts for the new Salary variable.
Recoded data
From the original date assumed gender “1” to be “Male and gender “2” to be “Female”.
From the original data, assumed “0” children to “No Children” and rest to “At least One Children.
Salary | Count |
---|---|
Less than $40K | 27 |
Between $40K and $70K | 119 |
Between $70K and $100K | 168 |
Greater than $100K | 85 |
Create a histogram for each of the two sets of exam scores.
- The file P02_10.xlsx contains midterm and final exam scores for 96 students in a corporate finance course.
What are the mean and median scores on each of these exams?
Mean of Midterm is 79.36458333, Median of Midterm is 79.
Mean of Final is 80.03125, Median of Midterm is 79.5.
Explain why the mean and median values are different for these data.
For midterm, the difference is 0.364, between mean and median as mean is average of all the midterm values, whereas the median is taken as middle most observation from the data, and mean is sometimes affected due to presence of extreme values. For final, the difference here is 0.5312 which is higher than midterm, as final has extreme values more.
Based on your previous answers, how would you characterize this group’s performance on the midterm and on the final exam?
From the observation from the histogram chart, it is very evident that out 96 students, more than half of the students scored above 63, in midterm more than 55 students have scored above 63, whereas in final also 55 students have scored above 63 marks. From both the histogram chart, on the right it is skewed more which tend of higher marks.
Create a new column of differences (final exam score minus midterm score). A positive value means the student improved, and a negative value means the student did the opposite. What are the mean and median of the differences? What does a histogram of the differences indicate?
Here is the data after adding the new column “Differences” (Final exam score minus midterm score).
Mean is 0.66666667, Median is 2.
From the above histogram chart, 29 students are in the difference range of (-0.6 to 4.1) which mean there is an improvement in Final Exam comparing to the midterm.
- The Consumer Confidence Index (CCI) attempts to measure people’s feelings about general business conditions, employment opportunities, and their own income prospects. Monthly average values of the CCI are listed in the file P02_20.xlsx.
- a. Create a time series graph of the CCI values.
Average salary is $75,970.
- Have U.S. consumers become more or less confident through time?
- From looking the graph, the US consumers has become very less confidence through the time.
- How would you explain recent variations in the overall trend of the CCI?
- Consumer Confidence Index (CCI) was very peak starting from June 1997, till October 1999, it was in its peak and it was gradually growing. After October 2000, the CCI keeps on decreasing till April 2003, from there onward it started to raise again, but gradually end up decreasing month by month every year till Feb 2009. Later it was again increase but not to the level of June1997.
- The file P02_03.xlsx contains data from a survey of 399 people regarding an environmental policy. Use filters for each of the following.
- Identify all respondents who are female, middle-aged, and have two children. What is the average salary of these respondents?
- Assumes gender “2” to be female and here is the data
- Average salary is $98,573
- Identify all respondents who are elderly and strongly disagree with the environmental policy. What is the average salary of these respondents?
Total respondents with opinion “Strong agree” is 86. “Young” age respondents with opinion “Strong agree” is 18. Thus, proportion is 0.209302326 which is 20.93%.
- Identify all respondents who strongly agree with the environmental policy. What proportion of these individuals are young?
From above options, opted “1”, from the original data assumed gender “1” to be “men” and here is the data.
- Identify all respondents who are either
- middle-aged men with at least one child and an annual salary of at least $50,000
- or
- middle-aged women with two or fewer children and an annual salary of at least $30,000.
What are the mean and median salaries of the respondents who meet these conditions? What proportion of the respondents who satisfy these conditions agree or strongly agree with the environmental policy?
Mean for the above data is $96,614.
Median for the above data is $93,443.
Total respondents are 60.
“Agree” respondents are 13. “Strong agree” respondents are 13. Total respondents with “agree” and “strong agree” 26. Proportion is 0.433333333 which is 43.33%.
Conclusion
From the above four lab questions, it was totally about descriptive statistics, which majorly extracted the given data, modified the data, proportion of the asked data, mean, median, histogram, table charts. It was a good exercise sort which majorly learned and observed of descriptive statistics.
Reference
S. Christian Albright/Wayne L. Winston (2017). Business Analytics – Data Analysis and Decision Making. Cengage Learning
Place an Order
Plagiarism Free!
Create an Account
Create an account at Top Tutor Online
- Allows you to track orders.
- Receive personal messages.
- Send messages to a tutor.
Post a Question/ Assignment
Post your specific assignment
- Tutors will be notified of your assignment.
- Review your question and include all the details.
- A payment Link will be sent to you.
Wait for your Answer!
Make payment and wait for your answer
- Make payment in accordance with the number of pages to be written.
- Wait for your Answer as a professional works on your paper.
- You will be notified when your Answer is ready.