Multiple Regression Analysis Using PASW Statistics
Tajudeen Rasheed
Week 6 PASW Assignment
Multiple Regression Analysis Using PASW Statistics
Multiple regression is an extension of simple linear regression (Field, 2013). It is used when we want to predict the value of a variable based on the value of two or more other variables (Field, 2013). According to Field (2013) the variable we want to predict is called the dependent variable (or sometimes, the outcome, target or criterion variable). The variables we are using to predict the value of the dependent variable are called the independent variables (or sometimes, the predictor, explanatory or regressor variables) (Field, 2013). Multiple regression allows us to determine the overall fit (variance explained) of the model and the relative contribution of each of the predictors to the total variance explained (Field, 2013).
Section 1
Assumptions for Multiple Regression
Outliers are simply single data points within data that do not follow the usual pattern. The problem with outliers is that they can have a negative effect on the multiple regression, reducing the accuracy of the results. Fortunately, when using SPSS to run multiple regression on data, it is easy to detect possible outliers (Field, 2013).
- (1) There should be no significant Outliers
Multiple regression assumption of normality of variables explained that the dependent
- (2) Normality of variables
variable and each of your independent variables, is a normally distributed. According to Field (2013) this assumption needs will be verified by checking graphically (either a histogram with normal distribution curve, or with a Q-Q-Plot or scatterplot).
These are data that make N value incomplete and can have a negative effect on the
- (3) Missing data
multiple regression, reducing the accuracy of the results. Fortunately, when using PASW to run multiple regression, it is easy to detect possible missing data from descriptive statistics output (Field, 2013).
Data must not show multicollinearity in multiple regression, this occurs when two or more independent variables are highly correlated with each other. This leads to problems with understanding which independent variable contributes to the variance explained in the dependent variable, as well as technical issues in calculating a multiple regression model. Multicollinearity misleadingly inflates the standard errors. Thus, it makes some variables statistically insignificant while they should be otherwise significant (Field, 2013).
- (4) No Multicolinearity (Variance Inflation Factor – VIF)
Data needs to show homogeneity of variance, which is where the variances along the line of best fit remain similar as we move along the line of regression (Field, 2013).
- (5) Homogeneity of variance
This assumption is very important. If there is a positive relationship between the dependent variable of outcome in one group and two or more independent variables, we assume that there is a positive relationship. If the relationship displayed in the scatterplots and partial regression plots are not linear, then either run a non-linear regression analysis or “transform” the data, using PASW Statistics (Field, 2013).
- (6) Homogeneity of regression
Section 2
Testing whether the assumptions are met or not
The assumptions of outliers, normality of variance and multicolinearity will be checked to find out whether the data in the Supermodel.sav data set met the assumption or not using PASW Statistics.
Running of assumptions on PASW statistics
with the PASW statistics to get graph of boxplots below.
- (1) There should be no outliers: the testing of this assumption within the groups is done
* Chart Builder.
GGRAPH
/GRAPHDATASET NAME=”graphdataset” VARIABLES=beauty MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id(“graphdataset”))
DATA: beauty=col(source(s), name(“beauty”))
DATA: id=col(source(s), name(“$CASENUM”), unit.category())
GUIDE: axis(dim(2), label(“Attractiveness (%)”))
ELEMENT: schema(position(bin.quantile.letter(1*beauty)), label(id))
END GPL.
GGraph
11-OCT-2014 14:41:34 | ||
Input | Data | C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav |
---|---|---|
Active Dataset | DataSet1 | |
File Label | File created by MATRIX | |
Filter | ||
Weight | ||
Split File | ||
N of Rows in Working Data File | 231 | |
GGRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=beauty MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE.BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: beauty=col(source(s), name(“beauty”)) DATA: id=col(source(s), name(“$CASENUM”), unit.category()) GUIDE: axis(dim(2), label(“Attractiveness (%)”)) ELEMENT: schema(position(bin.quantile.letter(1*beauty)), label(id))END GPL. | ||
Resources | Processor Time | 00:00:00.50 |
Elapsed Time | 00:00:00.49 |
[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
* Chart Builder.
GGRAPH
/GRAPHDATASET NAME=”graphdataset” VARIABLES=years MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id(“graphdataset”))
DATA: years=col(source(s), name(“years”))
DATA: id=col(source(s), name(“$CASENUM”), unit.category())
GUIDE: axis(dim(2), label(“Number of Years as a Model”))
ELEMENT: schema(position(bin.quantile.letter(1*years)), label(id))
END GPL.
GGraph
11-OCT-2014 14:42:09 | ||
Input | Data | C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav |
---|---|---|
Active Dataset | DataSet1 | |
File Label | File created by MATRIX | |
Filter | ||
Weight | ||
Split File | ||
N of Rows in Working Data File | 231 | |
GGRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=years MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE.BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: years=col(source(s), name(“years”)) DATA: id=col(source(s), name(“$CASENUM”), unit.category()) GUIDE: axis(dim(2), label(“Number of Years as a Model”)) ELEMENT: schema(position(bin.quantile.letter(1*years)), label(id))END GPL. | ||
Resources | Processor Time | 00:00:00.58 |
Elapsed Time | 00:00:00.55 |
[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
* Chart Builder.
GGRAPH
/GRAPHDATASET NAME=”graphdataset” VARIABLES=age MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id(“graphdataset”))
DATA: age=col(source(s), name(“age”))
DATA: id=col(source(s), name(“$CASENUM”), unit.category())
GUIDE: axis(dim(2), label(“Age (Years)”))
ELEMENT: schema(position(bin.quantile.letter(1*age)), label(id))
END GPL.
GGraph
11-OCT-2014 14:42:35 | ||
Input | Data | C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav |
---|---|---|
Active Dataset | DataSet1 | |
File Label | File created by MATRIX | |
Filter | ||
Weight | ||
Split File | ||
N of Rows in Working Data File | 231 | |
GGRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=age MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE.BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: age=col(source(s), name(“age”)) DATA: id=col(source(s), name(“$CASENUM”), unit.category()) GUIDE: axis(dim(2), label(“Age (Years)”)) ELEMENT: schema(position(bin.quantile.letter(1*age)), label(id))END GPL. | ||
Resources | Processor Time | 00:00:00.53 |
Elapsed Time | 00:00:00.51 |
[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
* Chart Builder.
GGRAPH
/GRAPHDATASET NAME=”graphdataset” VARIABLES=salary MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id(“graphdataset”))
DATA: salary=col(source(s), name(“salary”))
DATA: id=col(source(s), name(“$CASENUM”), unit.category())
GUIDE: axis(dim(2), label(“Salary per Day (£)”))
ELEMENT: schema(position(bin.quantile.letter(1*salary)), label(id))
END GPL.
GGraph
11-OCT-2014 14:42:55 | ||
Input | Data | C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav |
---|---|---|
Active Dataset | DataSet1 | |
File Label | File created by MATRIX | |
Filter | ||
Weight | ||
Split File | ||
N of Rows in Working Data File | 231 | |
GGRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=salary MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE.BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: salary=col(source(s), name(“salary”)) DATA: id=col(source(s), name(“$CASENUM”), unit.category()) GUIDE: axis(dim(2), label(“Salary per Day (£)”)) ELEMENT: schema(position(bin.quantile.letter(1*salary)), label(id))END GPL. | ||
Resources | Processor Time | 00:00:00.56 |
Elapsed Time | 00:00:00.51 |
[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
The boxplot is a graphical display of the data that shows: (1) median, which is the middle black line, (2) middle 50% of scores, which is the shaded region, (3) top and bottom 25% of scores, which are the lines extending out of the shaded region, (4) the smallest and largest (non-outlier) scores, which are the horizontal lines at the top/bottom of the boxplot, and (5) outliers. The boxplot shows both “mild” outliers and “extreme” outliers. Mild outliers are any score more than 1.5*IQR from the rest of the scores, and are indicated by open dots. IQR stands for “Interquartile range”, and is the middle 50% of the scores. Extreme outliers are any score more than 3*IQR from the rest of the scores, and are indicated by stars. However, these benchmarks are arbitrarily chosen, similar to how p<.05 is arbitrarily chosen. For “boxplots above”, there are open dots and stars that display cases of outlier in each variable as follows:
Attractiveness outliers are: 23, 194, 73, 33, 180, 60, and 67.
Numbers of year outliers are: 57, 91, 157, 155, 5, 190, 212, 60, 131, and 18.
Age outliers are: 57, 155, 91, 190, 5, 32, 224, 157, 114, 60, 18, and 131.
Salary per year outliers are: dots; 91, 2, 170, 191, 41, 24, 83, 50 and the stars*; 5, 135, 155, 198, 116, and 127. It should be noted that boxplot display all the cases of outliers. In summary, this output of boxplots identified 42 outliers from value N = 231, therefore the assumption of outlier is not met.
This assumption will be verified using PASW statistics by checking graphically with a Q-Q-Plot.
- (2) Normality of variables
PPLOT
/VARIABLES=salary age years beauty
/NOLOG
/NOSTANDARDIZE
/TYPE=Q-Q
/FRACTION=BLOM
/TIES=MEAN
/DIST=NORMAL.
PPlot
11-OCT-2014 15:42:27 | ||
Input | Data | C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav |
---|---|---|
Active Dataset | DataSet1 | |
File Label | File created by MATRIX | |
Filter | ||
Weight | ||
Split File | ||
N of Rows in Working Data File | 231 | |
Date | ||
Missing Value Handling | Definition of Missing | User-defined missing values are treated as missing. |
Cases Used | For a given sequence or time series variable, cases with missing values are not used in the analysis. Cases with negative or zero values are also not used, if the log transform is requested. | |
PPLOT /VARIABLES=salary age years beauty /NOLOG /NOSTANDARDIZE /TYPE=Q-Q /FRACTION=BLOM /TIES=MEAN /DIST=NORMAL. | ||
Resources | Processor Time | 00:00:04.25 |
Elapsed Time | 00:00:04.02 | |
Use | From | First observation |
To | Last observation | |
Time Series Settings (TSET) | Amount of Output | PRINT = DEFAULT |
Saving New Variables | NEWVAR = CURRENT | |
Maximum Number of Lags in Autocorrelation or Partial Autocorrelation Plots | MXAUTO = 16 | |
Maximum Number of Lags Per Cross-Correlation Plots | MXCROSS = 7 | |
Maximum Number of New Variables Generated Per Procedure | MXNEWVAR = 60 | |
Maximum Number of New Cases Per Procedure | MXPREDICT = 1000 | |
Treatment of User-Missing Values | MISSING = EXCLUDE | |
Confidence Interval Percentage Value | CIN = 95 | |
Tolerance for Entering Variables in Regression Equations | TOLER = .0001 | |
Maximum Iterative Parameter Change | CNVERGE = .001 | |
Method of Calculating Std. Errors for Autocorrelations | ACFSE = IND | |
Length of Seasonal Period | Unspecified | |
Variable Whose Values Label Observations in Plots | Unspecified | |
Equations Include | CONSTANT |
[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
MOD_1 | ||
Series or Sequence | 1 | Salary per Day (£) |
---|---|---|
2 | Age (Years) | |
3 | Number of Years as a Model | |
4 | Attractiveness (%) | |
None | ||
0 | ||
0 | ||
No periodicity | ||
Not applied | ||
Distribution | Type | Normal |
Location | Estimated | |
Scale | Estimated | |
Blom’s | ||
Mean rank of tied values | ||
Salary per Day (£) | Age (Years) | Number of Years as a Model | Attractiveness (%) | ||
231 | 231 | 231 | 231 | ||
Number of Missing Values in the Plot | User-Missing | 0 | 0 | 0 | 0 |
---|---|---|---|---|---|
System-Missing | 0 | 0 | 0 | 0 | |
GGraph
Salary per Day (£) | Age (Years) | Number of Years as a Model | Attractiveness (%) | ||
Normal Distribution | Location | 11.3385 | 18.0679 | 4.5854 | 75.9447 |
---|---|---|---|---|---|
Scale | 16.02644 | 2.42190 | 1.57865 | 6.77303 | |
Salary per Day (£)
Age (Years)
Number of Years as a Model
Attractiveness (%)
From graphs above that were meant to determine the normality of all the variables, it was observed that residuals fall substantially within the normal curve as they were all very close to the line though in some few graphs the residuals were not exactly close to the Q-Q plot line, therefore the assumption of normality of variable is met.
Multicollinearity misleadingly inflates the standard errors. Thus, it makes some variables statistically insignificant while they should be otherwise significant. Therefore, in our enhanced multiple regression guide, using PASW Statistics will detect multicollinearity through an inspection of correlation coefficients and Tolerance/VIF (Variance Inflation Factor) values; and interpreting these correlation coefficients and Tolerance/VIF values determine whether our data meets or violates this assumption (Field, 2013).
- Testing of Multicolinearity assumption using PASW statistics.
GET
FILE=’C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav’.
DATASET NAME DataSet1 WINDOW=FRONT.
CORRELATIONS
/VARIABLES=salary age years beauty
/PRINT=ONETAIL NOSIG
/MISSING=PAIRWISE.
Correlations
11-OCT-2014 20:44:45 | ||
Input | Data | C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav |
---|---|---|
Active Dataset | DataSet1 | |
File Label | File created by MATRIX | |
Filter | ||
Weight | ||
Split File | ||
N of Rows in Working Data File | 231 | |
Missing Value Handling | Definition of Missing | User-defined missing values are treated as missing. |
Cases Used | Statistics for each pair of variables are based on all the cases with valid data for that pair. | |
CORRELATIONS /VARIABLES=salary age years beauty /PRINT=ONETAIL NOSIG /MISSING=PAIRWISE. | ||
Resources | Processor Time | 00:00:00.05 |
Elapsed Time | 00:00:00.03 |
[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
Salary per Day (£) | Age (Years) | Number of Years as a Model | Attractiveness (%) | ||
Salary per Day (£) | Pearson Correlation | 1 | .397** | .337** | .068 |
---|---|---|---|---|---|
Sig. (1-tailed) | .000 | .000 | .152 | ||
N | 231 | 231 | 231 | 231 | |
Age (Years) | Pearson Correlation | .397** | 1 | .955** | .261** |
Sig. (1-tailed) | .000 | .000 | .000 | ||
N | 231 | 231 | 231 | 231 | |
Number of Years as a Model | Pearson Correlation | .337** | .955** | 1 | .173** |
Sig. (1-tailed) | .000 | .000 | .004 | ||
N | 231 | 231 | 231 | 231 | |
Attractiveness (%) | Pearson Correlation | .068 | .261** | .173** | 1 |
Sig. (1-tailed) | .152 | .000 | .004 | ||
N | 231 | 231 | 231 | 231 | |
The assumption of multicolinearity was determined using the correlations and excluded variables outputs. It was noted that the age and number of years as model on the correlations output shows values of 0.955 which is higher than 0.80 an acceptable value. This shows that the correlations illustrate a value of 0.955 for the IV age and years as a model. Similarly, checking the excluded variables output, the Variance Inflation Factor (VIF) scores for the IV of years as a model exceeded 10 with a value of 11.31, and the tolerance is less than 0.1 at 0.088. Conclusively, the assumption of the multicolinearity is not met.
Beta In | t | Sig. | Partial Correlation | Collinearity Statistics | ||||
Tolerance | VIF | Minimum Tolerance | ||||||
1 | Salary per Day (£) | -.043b | -.612 | .541 | -.040 | .842 | 1.188 | .842 |
---|---|---|---|---|---|---|---|---|
Number of Years as a Model | -.856b | -4.130 | .000 | -.264 | .088 | 11.311 | .088 | |
2 | Salary per Day (£) | -.088c | -1.289 | .199 | -.085 | .822 | 1.217 | .082 |
Summarily, it was observed that out of the three assumptions tested only normality of variable assumption passed the test and the two other assumptions failed the test. In a real research study, as a researcher I will exclude those outliers from my data but in this situation a data set was provided for this assignment, I will proceed with the analysis of the data with caution especially on the accuracy of the results.
Section 3
Hypotheses
The statement of null and alternative (research) hypotheses from the variables in the data above are as follows:
DV: Attractiveness.
IV: Age, salary per day, and years of model.
B0: The intercept for statistical significance.
The null hypothesis stated that;
H0: ß1=ß2=ß3=0: In the population, all the partial regression coefficients equal to zero.
The alternate hypothesis stated that;
H1: ß1≠ß2≠ß3≠0: In the population, all the partial regression coefficients does not equal to zero.
It should be noted that;
-The alpha level: α = .05
-In this case, the test of prediction will be tested statistically with PASW (SPSS) Version 21.
-PASW (SPSS) assumes that the all the statistical assumptions of multiple regression were met but in actual situation the two assumptions; outliers and multicolinearity were not met and it was only the assumption of normality that was met.
Section 4
PASW syntax for Multiple Regression
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS CI(95) R ANOVA COLLIN TOL CHANGE ZPP
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT beauty
/METHOD=FORWARD salary age years
/SCATTERPLOT=(*ZRESID ,*ZPRED)
/RESIDUALS HISTOGRAM(ZRESID) NORMPROB(ZRESID).
Section 5
PASW outputs for Multiple Regression
Regression
11-OCT-2014 17:44:47 | ||
Input | Data | C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav |
---|---|---|
Active Dataset | DataSet1 | |
File Label | File created by MATRIX | |
Filter | ||
Weight | ||
Split File | ||
N of Rows in Working Data File | 231 | |
Missing Value Handling | Definition of Missing | User-defined missing values are treated as missing. |
Cases Used | Statistics are based on cases with no missing values for any variable used. | |
REGRESSION /DESCRIPTIVES MEAN STDDEV CORR SIG N /MISSING LISTWISE /STATISTICS COEFF OUTS CI(95) R ANOVA COLLIN TOL CHANGE ZPP /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT beauty /METHOD=FORWARD salary age years /SCATTERPLOT=(*ZRESID ,*ZPRED) /RESIDUALS HISTOGRAM(ZRESID) NORMPROB(ZRESID). | ||
Resources | Processor Time | 00:00:01.52 |
Elapsed Time | 00:00:01.52 | |
Memory Required | 1956 bytes | |
Additional Memory Required for Residual Plots | 896 bytes |
[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
Mean | Std. Deviation | N | |
Attractiveness (%) | 75.9447 | 6.77303 | 231 |
---|---|---|---|
Salary per Day (£) | 11.3385 | 16.02644 | 231 |
Age (Years) | 18.0679 | 2.42190 | 231 |
Number of Years as a Model | 4.5854 | 1.57865 | 231 |
Salary per Day (£) | Age (Years) | Number of Years as a Model | |||
.068 | .261 | .173 | |||
1.000 | .397 | .337 | |||
.397 | 1.000 | .955 | |||
.337 | .955 | 1.000 | |||
.152 | .000 | .004 | |||
. | .000 | .000 | |||
.000 | . | .000 | |||
.000 | .000 | . | |||
231 | 231 | 231 | |||
231 | 231 | 231 | |||
231 | 231 | 231 | |||
231 | 231 | 231 | |||
Model | Variables Entered | Variables Removed | Method | ||
---|---|---|---|---|---|
1 | Age (Years) | . | Forward (Criterion: Probability-of-F-to-enter <= .050) | ||
2 | Number of Years as a Model | . | Forward (Criterion: Probability-of-F-to-enter <= .050) | ||
From the variable Entered/Removed output shows variables retained and variable removed for this forward entered method. It was observed that the predictor of salary per day was removed which indicated that the variable is not statistically significant. Therefore, age and number of years as a model displayed in the output are predictors among the independent variables. Hence, they shows the variability to the dependent variable (attractiveness).
Model | R | R Square | Adjusted R Square | Std. Error of the Estimate | Change Statistics | ||||
---|---|---|---|---|---|---|---|---|---|
R Square Change | F Change | df1 | df2 | Sig. F Change | |||||
1 | .261a | .068 | .064 | 6.55252 | .068 | 16.740 | 1 | 229 | .000 |
2 | .365b | .133 | .125 | 6.33427 | .065 | 17.053 | 1 | 228 | .000 |
The model summary output above is used in determining how well the model fit, the R square column display the change for the second significant variable at 0.65(6.5%) of variance. Conclusively, these two statistically significant variables account for the variability level of the dependent variable. Hence, salary per day is not a predictive factor while the age of a model and years spent as a model are predictors of attractiveness.
Sum of Squares | df | Mean Square | F | Sig. | ||
1 | Regression | 718.757 | 1 | 718.757 | 16.740 | .000b |
---|---|---|---|---|---|---|
Residual | 9832.248 | 229 | 42.936 | |||
Total | 10551.004 | 230 | ||||
2 | Regression | 1402.977 | 2 | 701.489 | 17.483 | .000c |
Residual | 9148.027 | 228 | 40.123 | |||
Total | 10551.004 | 230 | ||||
The ANOVA output at F- ratio column shows that the independent variables are statistically significantly predict the dependent variable F (2, 288) = 17. 483, p = 0.0001 < 0.0005 that is regression model is a good fit of the data.
Standardized Coefficients | t | Sig. | 95.0% Confidence Interval for B | Correlations | Collinearity Statistics | ||||||||
B | Std. Error | Beta | Lower Bound | Upper Bound | Zero-order | Partial | Part | Tolerance | |||||
1 | (Constant) | 62.757 | 3.252 | 19.298 | .000 | 56.349 | 69.164 | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Age (Years) | .730 | .178 | .261 | 4.091 | .000 | .378 | 1.081 | .261 | .261 | .261 | 1.000 | ||
2 | (Constant) | 38.288 | 6.707 | 5.708 | .000 | 25.072 | 51.505 | ||||||
Age (Years) | 3.017 | .580 | 1.079 | 5.201 | .000 | 1.874 | 4.160 | .261 | .326 | .321 | .088 | ||
Number of Years as a Model | -3.674 | .890 | -.856 | -4.130 | .000 | -5.428 | -1.921 | .173 | -.264 | -.255 | .088 | ||
The coefficient output shows statistical significance of the independent variables, the unstandardized (and or standardized) coefficients is equal to 0 (zero) in the population. If p < 0.05, we conclude that the coefficients are statistically different to (zero). The t-value and corresponding p-value are located in the “t” and “sig.” columns, respectively. From the significant (sig.) column all independent variables coefficient except that of salary per day are statistically significant different from 0 (zero), although the intercept at B0 is for statistically significance.
Beta In | t | Sig. | Partial Correlation | Collinearity Statistics | ||||
Tolerance | VIF | Minimum Tolerance | ||||||
1 | Salary per Day (£) | -.043b | -.612 | .541 | -.040 | .842 | 1.188 | .842 |
---|---|---|---|---|---|---|---|---|
Number of Years as a Model | -.856b | -4.130 | .000 | -.264 | .088 | 11.311 | .088 | |
2 | Salary per Day (£) | -.088c | -1.289 | .199 | -.085 | .822 | 1.217 | .082 |
Model | Dimension | Eigenvalue | Condition Index | Variance Proportions | ||
---|---|---|---|---|---|---|
(Constant) | Age (Years) | Number of Years as a Model | ||||
1 | 1 | 1.991 | 1.000 | .00 | .00 | |
2 | .009 | 15.019 | 1.00 | 1.00 | ||
2 | 1 | 2.944 | 1.000 | .00 | .00 | .00 |
2 | .055 | 7.301 | .03 | .00 | .10 | |
3 | .001 | 54.019 | .97 | 1.00 | .90 | |
Minimum | Maximum | Mean | Std. Deviation | N | |
Predicted Value | 68.4663 | 82.4314 | 75.9447 | 2.46980 | 231 |
---|---|---|---|---|---|
Residual | -15.09981 | 22.88941 | .00000 | 6.30667 | 231 |
Std. Predicted Value | -3.028 | 2.626 | .000 | 1.000 | 231 |
Std. Residual | -2.384 | 3.614 | .000 | .996 | 231 |
Charts
Section 6
APA Style tables for academic reporting
Table 1
Mean | Std. Deviation | N | |
Attractiveness (%) | 75.9447 | 6.77303 | 231 |
---|---|---|---|
Salary per Day (£) | 11.3385 | 16.02644 | 231 |
Age (Years) | 18.0679 | 2.42190 | 231 |
Number of Years as a Model | 4.5854 | 1.57865 | 231 |
Table 2
Attractiveness (%) | Salary per Day (£) | Age (Years) | Number of Years as a Model | ||
Pearson Correlation | Attractiveness (%) | 1.000 | .068 | .261 | .173 |
---|---|---|---|---|---|
Salary per Day (£) | .068 | 1.000 | .397 | .337 | |
Age (Years) | .261 | .397 | 1.000 | .955 | |
Number of Years as a Model | .173 | .337 | .955 | 1.000 | |
Sig. (1-tailed) | Attractiveness (%) | . | .152 | .000 | .004 |
Salary per Day (£) | .152 | . | .000 | .000 | |
Age (Years) | .000 | .000 | . | .000 | |
Number of Years as a Model | .004 | .000 | .000 | . | |
Section 7
APA Report for Multiple Regression
Multiple Regression statistical analysis was conducted on data in the Supermodel.sav data set from the Field text to predict attractiveness of model from age in years, salary per day and years of model. From the descriptive statistics table 1, the N value is 231 and there was no missing N value reported. The independent variables are age, salary per day and years of model while the dependent variable is the attractiveness. Three assumptions were tested namely; outlier, normality of variable and multicolinearity using PASW statistic before the analysis of data. It was detected that only assumption of normality of variables was met, the two other assumptions of multicolinearity and outlier were violated.
The null hypothesis (H0) is rejected as the p = 0.0001 < 0.0005 while the alternate hypothesis (H1) is accepted. Therefore, looking at the significant (sig.) column, the independent variables display (age and years of model) have a coefficients that are statistically significantly different from 0 (zero) at (F (2, 288) = 17.483, p = 0.0001 < 0.0005, R2 = 0.133, one-tailed). That is, there is sufficient evidence to conclude that age and years of model are predictors of attractiveness of models. The salary was removed from the model output and this shows that it is statistically not significant. Therefore, the salary per day is not a predictive of attractiveness of a model.
In conclusion, a multiple regression was run to predict attractiveness of a model from age, years of model and salary per day. The two variables; age and years of model statistically significantly predicted attractiveness (F (2, 288) = 17.483, p < 0.0005, R2 = .133). Only the two independent variables (age and years of model) added statistically significantly to the prediction of attractiveness, p <0.05.
Section 8
Describe how you would compute the sample size using power =.80, effect size of .50, alpha of .05. Does your analysis support the sample size of the data you ran?
Calculation of Sample Size using G*Power
Steps:
Input parameter
- Test family: F-tests selected.
- Statistical test: Linear multiple regression: fixed model, R2 deviation from zero.
- Type of power analysis: A prior: Compute required sample size- given α, power, and effect size selected.
- Does your sample size support the sample size of the data you ran?
Effect size f2 – 0.50
α err prob – 0.05
Power (1-β err prob) – 0.80
Number of predictors – 3
Then after entering all these parameter, I click on calculate to get the output parameter and copied graph below;
Output parameter
Noncentrality parameter λ – 13.5000000
Critical F – 3.0279984
Numerator df- 3
Denominator df – 23
Total sample size – 27
Actual power – 0.8182141
Output graph representation
Using G*Power to calculate Sample size = 27
No, the sample size calculated using G*Power did not support the sample size of the data run (N = 231). The calculated sample size using G*Power was around one tenth of the actual sample size (N = 231) run on multiple regression statistical analysis. In research study large sample size is more dependable in term of generalization of the results of the study to larger population (Green & Salkind, 2014). Therefore, the sample size used for this analysis was adequate and far above calculated sample size with G*Power, that is adequate number of subjects were used for the study.
References
Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics (4th ed.). London: Sage.
Green, S. B., & Salkind, N. J. (2014). Using SPSS for Windows and Macintosh: Analyzing and understanding data (7th ed.). Upper Saddle River, NJ: Pearson Education.
Sage Publications. (2013). Andy Field’s Datasets [Data files]. Available from Discovering Statistics Using IBM SPSS Statistics companion website: http://www.sagepub.com/field4e/study/datasets.htm
IBM PASW (formerly SPSS) Statistical Software version 21.
G*Power statistical software 3.1.9.2.
Click following link to download this document
Multiple Regression Analysis Using PASW Statistics.docx