Estimating Models Using Dummy Variables

Estimating Models Using Dummy Variables

Name

School

Instructor

Date due

A dummy variable is a dichotomous variable which has been coded to represent a variable with a higher level of measurement. It takes on the values 1 and 0; 1 means something has an attribute or is true while 0 means something is not true. In regression model, if there are k categories, we would include k-1 dummy variables in the regression because any one dummy variable is perfectly collinear with remaining set of dummies (Angrist, 2001).

In reference to the General Social Survey dataset, I consider hours per day of watching TV as the independent variable and choose two variables as the predictor variables; race of the respondent and respondent’s sex. I then create dummy variables for respondent’s sex which has two categories- 1 for males and 2 for females. The research question for this analysis is;

“Is there a statistically significant effect on hours per day of watching TV by race of the respondent and respondent’s sex?”

Using SPSS software for analysis I first create the two dummy variables for the respondent’s sex variable, one for males and one for females. I omit the female category to be the comparison group (Allison, 2002).

The following tables show the output from the analysis, that is, the model summary, the ANOVA and the coefficients table.

ModelRR SquareAdjusted R SquareStd. Error of the Estimate
1.096a.009.0082.577
Sum of SquaresDfMean SquareFSig.
1Regression102.885251.4437.746.000b
 Residual11064.57616666.641  
 Total11167.4611668   
Standardized CoefficientstSig.
BStd. ErrorBeta  
1(Constant)2.426.158 15.356.000
 male1.103.127.020.809.418
 RACE OF RESPONDENT.379.098.0943.870.000

From the coefficients table, the multiple regression model becomes;

HOURS PER DAY OF WATCHING TV=2.426+0.103MALES+ 0.379 RACE OF THE RESPONDENT.

The model clearly implies that, holding all other factors constant the number of hours per day of watching TV is 2.426.Considering the race of the respondents, one unit increase in the race of the respondents leads to an increase in the number of hours of watching TV by 0.379 units. For the dummy variables, we male1 equals to 1 if the person is male and 0 if the person is female. So for males, and considering all other factors constant, the predicted number of hours of watching TV is 2.426+0.103(1) or 2.4363 and for females, the predicted number of hours of watching TV is 2.426+0.103(0) or 2.426. From the analysis therefore, we find that the race of the respondents have a statistically significant effect on the number of hours per day of watching TV (Angrist, 2001).

Multiple regression analysis makes several key assumptions that are important to adhere to in order to have proper interpretation of the models. These assumptions include linearity, independence of errors, homoscedasticity, multicollinearity, undue influence and normal distribution of errors. Running in regression analysis helps check if the model works well for the data. Running appropriate diagnostics for the model above, we have the following output results;

ModelRR SquareAdjusted R SquareStd. Error of the EstimateDurbin-Watson
1.096a.009.0082.5771.974

The table above has the Durbin-Watson statistic which provides information about independence of errors. The value 1.974 which is approximately 2.0 indicates that there is absolutely no correlation between the residuals.

Sum of SquaresdfMean SquareFSig.
1Regression102.885251.4437.746.000b
 Residual11064.57616666.641  
 Total11167.4611668   

From the ANOVA table, we see that our overall significant value is 0.000 which is significant.

Standardized CoefficientstSig.Collinearity Statistics
BStd. ErrorBeta  ToleranceVIF
1(Constant)2.426.158 15.356.000  
 male1.103.127.020.809.418.9991.001
 RACE OF RESPONDENT.379.098.0943.870.000.9991.001

From the coefficients table, we have the variance inflation factor (VIF) which helps tell about multicollinearity. The VIF values as in the table are 1.001 for both the predictor variables. Since the value is below 10, we assume that the multicollinearity assumption was met in the model (Angrist, 2001).

 MinimumMaximumMeanStd. DeviationN
Predicted Value2.803.672.98.2481669
Std. Predicted Value-.7132.752.0001.0001669
Standard Error of Predicted Value.092.189.106.0261669
Adjusted Predicted Value2.783.692.98.2491669
Residual-3.66521.092.0002.5761669
Std. Residual-1.4228.185.000.9991669
Stud. Residual-1.4268.191.0001.0001669
Deleted Residual-3.68521.124.0002.5811669
Stud. Deleted Residual-1.4278.358.0011.0061669
Mahal. Distance1.1327.9461.9991.8471669
Cook’s Distance.000.105.001.0041669
Centered Leverage Value.001.005.001.0011669

The table above has values for Cook’s Distance which tells about undue influence. We see that the Cook’s Distance above ranges from minimum 0.000 to maximum 0.105, which is below 1.0, hence we can assume that we have no undue influence in this model.

The histogram above tells about distribution of errors. It’s clear that our distribution is fairly normal, hence, the assumption of normal distribution of errors is met or we can say that we do not have a significant deviation from the normal (Yuan & Lin, 2006).

The scatter plot above provides information about homoscedasticity. Since there is no discernible pattern with the spread of the scatter, the homoscedasticity assumption has been met by the model. The scatter plot also shows the there is a linear relationship (Angrist, 2001). Therefore, from the above analysis, all the assumptions were met by the regression model obtained (Allyson, 2002).

References

Allison, P. D. (2002). Missing data: Quantitative applications in the social sciences. British Journal of Mathematical and Statistical Psychology55(1), 193-196.

Angrist, J. D. (2001). Estimation of limited dependent variable models with dummy endogenous regressors: simple strategies for empirical practice. Journal of business & economic statistics19(1), 2-28.

Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology)68(1), 49-67.




Click following link to download this document

Estimating Models Using Dummy Variables.docx







Place an Order

Plagiarism Free!