Multiple Regression Analysis Using PASW Statistics

29 Sep No Comments

Tajudeen Rasheed

Week 6 PASW Assignment

Multiple Regression Analysis Using PASW Statistics

Multiple regression is an extension of simple linear regression (Field, 2013). It is used when we want to predict the value of a variable based on the value of two or more other variables (Field, 2013). According to Field (2013) the variable we want to predict is called the dependent variable (or sometimes, the outcome, target or criterion variable). The variables we are using to predict the value of the dependent variable are called the independent variables (or sometimes, the predictor, explanatory or regressor variables) (Field, 2013). Multiple regression allows us to determine the overall fit (variance explained) of the model and the relative contribution of each of the predictors to the total variance explained (Field, 2013).

Section 1

Assumptions for Multiple Regression

Outliers are simply single data points within data that do not follow the usual pattern. The problem with outliers is that they can have a negative effect on the multiple regression, reducing the accuracy of the results. Fortunately, when using SPSS to run multiple regression on data, it is easy to detect possible outliers (Field, 2013).

  1. (1) There should be no significant Outliers

Multiple regression assumption of normality of variables explained that the dependent

  1. (2) Normality of variables

variable and each of your independent variables, is a normally distributed. According to Field (2013) this assumption needs will be verified by checking graphically (either a histogram with normal distribution curve, or with a Q-Q-Plot or scatterplot).

These are data that make N value incomplete and can have a negative effect on the

  1. (3) Missing data

multiple regression, reducing the accuracy of the results. Fortunately, when using PASW to run multiple regression, it is easy to detect possible missing data from descriptive statistics output (Field, 2013).

Data must not show multicollinearity in multiple regression, this occurs when two or more independent variables are highly correlated with each other. This leads to problems with understanding which independent variable contributes to the variance explained in the dependent variable, as well as technical issues in calculating a multiple regression model. Multicollinearity misleadingly inflates the standard errors. Thus, it makes some variables statistically insignificant while they should be otherwise significant (Field, 2013).

  1. (4) No Multicolinearity (Variance Inflation Factor – VIF)

Data needs to show homogeneity of variance, which is where the variances along the line of best fit remain similar as we move along the line of regression (Field, 2013).

  1. (5) Homogeneity of variance

This assumption is very important. If there is a positive relationship between the dependent variable of outcome in one group and two or more independent variables, we assume that there is a positive relationship. If the relationship displayed in the scatterplots and partial regression plots are not linear, then either run a non-linear regression analysis or “transform” the data, using PASW Statistics (Field, 2013).

  1. (6) Homogeneity of regression

Section 2

Testing whether the assumptions are met or not

The assumptions of outliers, normality of variance and multicolinearity will be checked to find out whether the data in the Supermodel.sav data set met the assumption or not using PASW Statistics.

Running of assumptions on PASW statistics

with the PASW statistics to get graph of boxplots below.

  1. (1) There should be no outliers: the testing of this assumption within the groups is done

* Chart Builder.

GGRAPH

/GRAPHDATASET NAME=”graphdataset” VARIABLES=beauty MISSING=LISTWISE REPORTMISSING=NO

/GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

SOURCE: s=userSource(id(“graphdataset”))

DATA: beauty=col(source(s), name(“beauty”))

DATA: id=col(source(s), name(“$CASENUM”), unit.category())

GUIDE: axis(dim(2), label(“Attractiveness (%)”))

ELEMENT: schema(position(bin.quantile.letter(1*beauty)), label(id))

END GPL.

GGraph

11-OCT-2014 14:41:34
 
Input Data C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
  Active Dataset DataSet1
  File Label File created by MATRIX
  Filter
  Weight
  Split File
  N of Rows in Working Data File 231
GGRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=beauty MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE.BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: beauty=col(source(s), name(“beauty”)) DATA: id=col(source(s), name(“$CASENUM”), unit.category()) GUIDE: axis(dim(2), label(“Attractiveness (%)”)) ELEMENT: schema(position(bin.quantile.letter(1*beauty)), label(id))END GPL.
Resources Processor Time 00:00:00.50
  Elapsed Time 00:00:00.49

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

* Chart Builder.

GGRAPH

/GRAPHDATASET NAME=”graphdataset” VARIABLES=years MISSING=LISTWISE REPORTMISSING=NO

/GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

SOURCE: s=userSource(id(“graphdataset”))

DATA: years=col(source(s), name(“years”))

DATA: id=col(source(s), name(“$CASENUM”), unit.category())

GUIDE: axis(dim(2), label(“Number of Years as a Model”))

ELEMENT: schema(position(bin.quantile.letter(1*years)), label(id))

END GPL.

GGraph

11-OCT-2014 14:42:09
 
Input Data C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
  Active Dataset DataSet1
  File Label File created by MATRIX
  Filter
  Weight
  Split File
  N of Rows in Working Data File 231
GGRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=years MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE.BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: years=col(source(s), name(“years”)) DATA: id=col(source(s), name(“$CASENUM”), unit.category()) GUIDE: axis(dim(2), label(“Number of Years as a Model”)) ELEMENT: schema(position(bin.quantile.letter(1*years)), label(id))END GPL.
Resources Processor Time 00:00:00.58
  Elapsed Time 00:00:00.55

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

* Chart Builder.

GGRAPH

/GRAPHDATASET NAME=”graphdataset” VARIABLES=age MISSING=LISTWISE REPORTMISSING=NO

/GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

SOURCE: s=userSource(id(“graphdataset”))

DATA: age=col(source(s), name(“age”))

DATA: id=col(source(s), name(“$CASENUM”), unit.category())

GUIDE: axis(dim(2), label(“Age (Years)”))

ELEMENT: schema(position(bin.quantile.letter(1*age)), label(id))

END GPL.

GGraph

11-OCT-2014 14:42:35
 
Input Data C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
  Active Dataset DataSet1
  File Label File created by MATRIX
  Filter
  Weight
  Split File
  N of Rows in Working Data File 231
GGRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=age MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE.BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: age=col(source(s), name(“age”)) DATA: id=col(source(s), name(“$CASENUM”), unit.category()) GUIDE: axis(dim(2), label(“Age (Years)”)) ELEMENT: schema(position(bin.quantile.letter(1*age)), label(id))END GPL.
Resources Processor Time 00:00:00.53
  Elapsed Time 00:00:00.51

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

* Chart Builder.

GGRAPH

/GRAPHDATASET NAME=”graphdataset” VARIABLES=salary MISSING=LISTWISE REPORTMISSING=NO

/GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

SOURCE: s=userSource(id(“graphdataset”))

DATA: salary=col(source(s), name(“salary”))

DATA: id=col(source(s), name(“$CASENUM”), unit.category())

GUIDE: axis(dim(2), label(“Salary per Day (£)”))

ELEMENT: schema(position(bin.quantile.letter(1*salary)), label(id))

END GPL.

GGraph

11-OCT-2014 14:42:55
 
Input Data C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
  Active Dataset DataSet1
  File Label File created by MATRIX
  Filter
  Weight
  Split File
  N of Rows in Working Data File 231
GGRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=salary MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE.BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: salary=col(source(s), name(“salary”)) DATA: id=col(source(s), name(“$CASENUM”), unit.category()) GUIDE: axis(dim(2), label(“Salary per Day (£)”)) ELEMENT: schema(position(bin.quantile.letter(1*salary)), label(id))END GPL.
Resources Processor Time 00:00:00.56
  Elapsed Time 00:00:00.51

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

The boxplot is a graphical display of the data that shows: (1) median, which is the middle black line, (2) middle 50% of scores, which is the shaded region, (3) top and bottom 25% of scores, which are the lines extending out of the shaded region, (4) the smallest and largest (non-outlier) scores, which are the horizontal lines at the top/bottom of the boxplot, and (5) outliers. The boxplot shows both “mild” outliers and “extreme” outliers. Mild outliers are any score more than 1.5*IQR from the rest of the scores, and are indicated by open dots. IQR stands for “Interquartile range”, and is the middle 50% of the scores. Extreme outliers are any score more than 3*IQR from the rest of the scores, and are indicated by stars. However, these benchmarks are arbitrarily chosen, similar to how p<.05 is arbitrarily chosen. For “boxplots above”, there are open dots and stars that display cases of outlier in each variable as follows:

Attractiveness outliers are: 23, 194, 73, 33, 180, 60, and 67.

Numbers of year outliers are: 57, 91, 157, 155, 5, 190, 212, 60, 131, and 18.

Age outliers are: 57, 155, 91, 190, 5, 32, 224, 157, 114, 60, 18, and 131.

Salary per year outliers are: dots; 91, 2, 170, 191, 41, 24, 83, 50 and the stars*; 5, 135, 155, 198, 116, and 127. It should be noted that boxplot display all the cases of outliers. In summary, this output of boxplots identified 42 outliers from value N = 231, therefore the assumption of outlier is not met.

This assumption will be verified using PASW statistics by checking graphically with a Q-Q-Plot.

  1. (2) Normality of variables

PPLOT

/VARIABLES=salary age years beauty

/NOLOG

/NOSTANDARDIZE

/TYPE=Q-Q

/FRACTION=BLOM

/TIES=MEAN

/DIST=NORMAL.

PPlot

11-OCT-2014 15:42:27
 
Input Data C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
  Active Dataset DataSet1
  File Label File created by MATRIX
  Filter
  Weight
  Split File
  N of Rows in Working Data File 231
  Date
Missing Value Handling Definition of Missing User-defined missing values are treated as missing.
  Cases Used For a given sequence or time series variable, cases with missing values are not used in the analysis. Cases with negative or zero values are also not used, if the log transform is requested.
PPLOT /VARIABLES=salary age years beauty /NOLOG /NOSTANDARDIZE /TYPE=Q-Q /FRACTION=BLOM /TIES=MEAN /DIST=NORMAL.
Resources Processor Time 00:00:04.25
  Elapsed Time 00:00:04.02
Use From First observation
  To Last observation
Time Series Settings (TSET) Amount of Output PRINT = DEFAULT
  Saving New Variables NEWVAR = CURRENT
  Maximum Number of Lags in Autocorrelation or Partial Autocorrelation Plots MXAUTO = 16
  Maximum Number of Lags Per Cross-Correlation Plots MXCROSS = 7
  Maximum Number of New Variables Generated Per Procedure MXNEWVAR = 60
  Maximum Number of New Cases Per Procedure MXPREDICT = 1000
  Treatment of User-Missing Values MISSING = EXCLUDE
  Confidence Interval Percentage Value CIN = 95
  Tolerance for Entering Variables in Regression Equations TOLER = .0001
  Maximum Iterative Parameter Change CNVERGE = .001
  Method of Calculating Std. Errors for Autocorrelations ACFSE = IND
  Length of Seasonal Period Unspecified
  Variable Whose Values Label Observations in Plots Unspecified
  Equations Include CONSTANT

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

MOD_1
Series or Sequence 1 Salary per Day (£)
  2 Age (Years)
  3 Number of Years as a Model
  4 Attractiveness (%)
None
0
0
No periodicity
Not applied
Distribution Type Normal
  Location Estimated
  Scale Estimated
Blom’s
Mean rank of tied values
Salary per Day (£) Age (Years) Number of Years as a Model Attractiveness (%)
231 231 231 231
Number of Missing Values in the Plot User-Missing 0 0 0 0
  System-Missing 0 0 0 0

GGraph

Salary per Day (£) Age (Years) Number of Years as a Model Attractiveness (%)
Normal Distribution Location 11.3385 18.0679 4.5854 75.9447
  Scale 16.02644 2.42190 1.57865 6.77303

Salary per Day (£)

Age (Years)

Number of Years as a Model

Attractiveness (%)

From graphs above that were meant to determine the normality of all the variables, it was observed that residuals fall substantially within the normal curve as they were all very close to the line though in some few graphs the residuals were not exactly close to the Q-Q plot line, therefore the assumption of normality of variable is met.

Multicollinearity misleadingly inflates the standard errors. Thus, it makes some variables statistically insignificant while they should be otherwise significant. Therefore, in our enhanced multiple regression guide, using PASW Statistics will detect multicollinearity through an inspection of correlation coefficients and Tolerance/VIF (Variance Inflation Factor) values; and interpreting these correlation coefficients and Tolerance/VIF values determine whether our data meets or violates this assumption (Field, 2013).

  1. Testing of Multicolinearity assumption using PASW statistics.

GET

FILE=’C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav’.

DATASET NAME DataSet1 WINDOW=FRONT.

CORRELATIONS

/VARIABLES=salary age years beauty

/PRINT=ONETAIL NOSIG

/MISSING=PAIRWISE.

Correlations

11-OCT-2014 20:44:45
 
Input Data C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
  Active Dataset DataSet1
  File Label File created by MATRIX
  Filter
  Weight
  Split File
  N of Rows in Working Data File 231
Missing Value Handling Definition of Missing User-defined missing values are treated as missing.
  Cases Used Statistics for each pair of variables are based on all the cases with valid data for that pair.
CORRELATIONS /VARIABLES=salary age years beauty /PRINT=ONETAIL NOSIG /MISSING=PAIRWISE.
Resources Processor Time 00:00:00.05
  Elapsed Time 00:00:00.03

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

Salary per Day (£) Age (Years) Number of Years as a Model Attractiveness (%)
Salary per Day (£) Pearson Correlation 1 .397** .337** .068
  Sig. (1-tailed)   .000 .000 .152
  N 231 231 231 231
Age (Years) Pearson Correlation .397** 1 .955** .261**
  Sig. (1-tailed) .000   .000 .000
  N 231 231 231 231
Number of Years as a Model Pearson Correlation .337** .955** 1 .173**
  Sig. (1-tailed) .000 .000   .004
  N 231 231 231 231
Attractiveness (%) Pearson Correlation .068 .261** .173** 1
  Sig. (1-tailed) .152 .000 .004  
  N 231 231 231 231

The assumption of multicolinearity was determined using the correlations and excluded variables outputs. It was noted that the age and number of years as model on the correlations output shows values of 0.955 which is higher than 0.80 an acceptable value.  This shows that the correlations illustrate a value of 0.955 for the IV age and years as a model. Similarly, checking the excluded variables output, the Variance Inflation Factor (VIF) scores for the IV of years as a model exceeded 10 with a value of 11.31, and the tolerance is less than 0.1 at 0.088. Conclusively, the assumption of the multicolinearity is not met.

Beta In t Sig. Partial Correlation Collinearity Statistics
        Tolerance VIF Minimum Tolerance
1 Salary per Day (£) -.043b -.612 .541 -.040 .842 1.188 .842
  Number of Years as a Model -.856b -4.130 .000 -.264 .088 11.311 .088
2 Salary per Day (£) -.088c -1.289 .199 -.085 .822 1.217 .082

Summarily, it was observed that out of the three assumptions tested only normality of variable assumption passed the test and the two other assumptions failed the test. In a real research study, as a researcher I will exclude those outliers from my data but in this situation a data set was provided for this assignment, I will proceed with the analysis of the data with caution especially on the accuracy of the results.

Section 3

Hypotheses

The statement of null and alternative (research) hypotheses from the variables in the data above are as follows:

DV: Attractiveness.

IV: Age, salary per day, and years of model.

B0: The intercept for statistical significance.

The null hypothesis stated that;

H0: ß1=ß2=ß3=0: In the population, all the partial regression coefficients equal to zero.

The alternate hypothesis stated that;

H1: ß1≠ß2≠ß3≠0: In the population, all the partial regression coefficients does not equal to zero.

It should be noted that;

-The alpha level: α = .05

-In this case, the test of prediction will be tested statistically with PASW (SPSS) Version 21.

-PASW (SPSS) assumes that the all the statistical assumptions of multiple regression were met but in actual situation the two assumptions; outliers and multicolinearity were not met and it was only the assumption of normality that was met.

Section 4

PASW syntax for Multiple Regression

REGRESSION

/DESCRIPTIVES MEAN STDDEV CORR SIG N

/MISSING LISTWISE

/STATISTICS COEFF OUTS CI(95) R ANOVA COLLIN TOL CHANGE ZPP

/CRITERIA=PIN(.05) POUT(.10)

/NOORIGIN

/DEPENDENT beauty

/METHOD=FORWARD salary age years

/SCATTERPLOT=(*ZRESID ,*ZPRED)

/RESIDUALS HISTOGRAM(ZRESID) NORMPROB(ZRESID).

Section 5

PASW outputs for Multiple Regression

Regression

11-OCT-2014 17:44:47
 
Input Data C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
  Active Dataset DataSet1
  File Label File created by MATRIX
  Filter
  Weight
  Split File
  N of Rows in Working Data File 231
Missing Value Handling Definition of Missing User-defined missing values are treated as missing.
  Cases Used Statistics are based on cases with no missing values for any variable used.
REGRESSION /DESCRIPTIVES MEAN STDDEV CORR SIG N /MISSING LISTWISE /STATISTICS COEFF OUTS CI(95) R ANOVA COLLIN TOL CHANGE ZPP /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT beauty /METHOD=FORWARD salary age years /SCATTERPLOT=(*ZRESID ,*ZPRED) /RESIDUALS HISTOGRAM(ZRESID) NORMPROB(ZRESID).
Resources Processor Time 00:00:01.52
  Elapsed Time 00:00:01.52
  Memory Required 1956 bytes
  Additional Memory Required for Residual Plots 896 bytes

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

  Mean Std. Deviation N
Attractiveness (%) 75.9447 6.77303 231
Salary per Day (£) 11.3385 16.02644 231
Age (Years) 18.0679 2.42190 231
Number of Years as a Model 4.5854 1.57865 231
Salary per Day (£) Age (Years) Number of Years as a Model
.068 .261 .173
1.000 .397 .337
.397 1.000 .955
.337 .955 1.000
.152 .000 .004
. .000 .000
.000 . .000
.000 .000 .
231 231 231
231 231 231
231 231 231
231 231 231
Model Variables Entered Variables Removed Method
1 Age (Years) . Forward (Criterion: Probability-of-F-to-enter <= .050)
2 Number of Years as a Model . Forward (Criterion: Probability-of-F-to-enter <= .050)

From the variable Entered/Removed output shows variables retained and variable removed for this forward entered method. It was observed that the predictor of salary per day was removed which indicated that the variable is not statistically significant. Therefore, age and number of years as a model displayed in the output are predictors among the independent variables. Hence, they shows the variability to the dependent variable (attractiveness).

Model R R Square Adjusted R Square Std. Error of the Estimate Change Statistics
          R Square Change F Change df1 df2 Sig. F Change
1 .261a .068 .064 6.55252 .068 16.740 1 229 .000
2 .365b .133 .125 6.33427 .065 17.053 1 228 .000

The model summary output above is used in determining how well the model fit, the R square column display the change for the second significant variable at 0.65(6.5%) of variance. Conclusively, these two statistically significant variables account for the variability level of the dependent variable. Hence, salary per day is not a predictive factor while the age of a model and years spent as a model are predictors of attractiveness.

Sum of Squares df Mean Square F Sig.
1 Regression 718.757 1 718.757 16.740 .000b
  Residual 9832.248 229 42.936    
  Total 10551.004 230      
2 Regression 1402.977 2 701.489 17.483 .000c
  Residual 9148.027 228 40.123    
  Total 10551.004 230      

The ANOVA output at F- ratio column shows that the independent variables are statistically significantly predict the dependent variable F (2, 288) = 17. 483, p = 0.0001 < 0.0005 that is regression model is a good fit of the data.

Standardized Coefficients t Sig. 95.0% Confidence Interval for B Correlations Collinearity Statistics
B Std. Error Beta     Lower Bound Upper Bound Zero-order Partial Part Tolerance  
1 (Constant) 62.757 3.252   19.298 .000 56.349 69.164          
  Age (Years) .730 .178 .261 4.091 .000 .378 1.081 .261 .261 .261 1.000  
2 (Constant) 38.288 6.707   5.708 .000 25.072 51.505          
  Age (Years) 3.017 .580 1.079 5.201 .000 1.874 4.160 .261 .326 .321 .088  
  Number of Years as a Model -3.674 .890 -.856 -4.130 .000 -5.428 -1.921 .173 -.264 -.255 .088  

The coefficient output shows statistical significance of the independent variables, the unstandardized (and or standardized) coefficients is equal to 0 (zero) in the population. If p < 0.05, we conclude that the coefficients are statistically different to (zero). The t-value and corresponding p-value are located in the “t” and “sig.” columns, respectively. From the significant (sig.) column all independent variables coefficient except that of salary per day are statistically significant different from 0 (zero), although the intercept at B0 is for statistically significance.

Beta In t Sig. Partial Correlation Collinearity Statistics
        Tolerance VIF Minimum Tolerance
1 Salary per Day (£) -.043b -.612 .541 -.040 .842 1.188 .842
  Number of Years as a Model -.856b -4.130 .000 -.264 .088 11.311 .088
2 Salary per Day (£) -.088c -1.289 .199 -.085 .822 1.217 .082
Model Dimension Eigenvalue Condition Index Variance Proportions
        (Constant) Age (Years) Number of Years as a Model
1 1 1.991 1.000 .00 .00  
  2 .009 15.019 1.00 1.00  
2 1 2.944 1.000 .00 .00 .00
  2 .055 7.301 .03 .00 .10
  3 .001 54.019 .97 1.00 .90
  Minimum Maximum Mean Std. Deviation N
Predicted Value 68.4663 82.4314 75.9447 2.46980 231
Residual -15.09981 22.88941 .00000 6.30667 231
Std. Predicted Value -3.028 2.626 .000 1.000 231
Std. Residual -2.384 3.614 .000 .996 231

Charts

Section 6

APA Style tables for academic reporting

Table 1

  Mean Std. Deviation N
Attractiveness (%) 75.9447 6.77303 231
Salary per Day (£) 11.3385 16.02644 231
Age (Years) 18.0679 2.42190 231
Number of Years as a Model 4.5854 1.57865 231

Table 2

Attractiveness (%) Salary per Day (£) Age (Years) Number of Years as a Model
Pearson Correlation Attractiveness (%) 1.000 .068 .261 .173
  Salary per Day (£) .068 1.000 .397 .337
  Age (Years) .261 .397 1.000 .955
  Number of Years as a Model .173 .337 .955 1.000
Sig. (1-tailed) Attractiveness (%) . .152 .000 .004
  Salary per Day (£) .152 . .000 .000
  Age (Years) .000 .000 . .000
  Number of Years as a Model .004 .000 .000 .
           
           
           
           

Section 7

APA Report for Multiple Regression

Multiple Regression statistical analysis was conducted on data in the Supermodel.sav data set from the Field text to predict attractiveness of model from age in years, salary per day and years of model. From the descriptive statistics table 1, the N value is 231 and there was no missing N value reported. The independent variables are age, salary per day and years of model while the dependent variable is the attractiveness. Three assumptions were tested namely; outlier, normality of variable and multicolinearity using PASW statistic before the analysis of data. It was detected that only assumption of normality of variables was met, the two other assumptions of multicolinearity and outlier were violated.

The null hypothesis (H0) is rejected as the p = 0.0001 < 0.0005 while the alternate hypothesis (H1) is accepted. Therefore, looking at the significant (sig.) column, the independent variables display (age and years of model) have a coefficients that are statistically significantly different from 0 (zero) at (F (2, 288) = 17.483, p = 0.0001 < 0.0005, R2 = 0.133, one-tailed). That is, there is sufficient evidence to conclude that age and years of model are predictors of attractiveness of models. The salary was removed from the model output and this shows that it is statistically not significant. Therefore, the salary per day is not a predictive of attractiveness of a model.

In conclusion, a multiple regression was run to predict attractiveness of a model from age, years of model and salary per day. The two variables; age and years of model statistically significantly predicted attractiveness (F (2, 288) = 17.483, p < 0.0005, R2 = .133). Only the two independent variables (age and years of model) added statistically significantly to the prediction of attractiveness, p <0.05.

Section 8

Describe how you would compute the sample size using power =.80, effect size of .50, alpha of .05. Does your analysis support the sample size of the data you ran?

Calculation of Sample Size using G*Power

Steps:

Input parameter

  1. Test family: F-tests selected.
  2. Statistical test: Linear multiple regression: fixed model, R2 deviation from zero.
  3. Type of power analysis: A prior: Compute required sample size- given α, power, and effect size selected.
  4. Effect size f2 – 0.50

    α err prob – 0.05

    Power (1-β err prob) – 0.80

    Number of predictors – 3

    Then after entering all these parameter, I click on calculate to get the output parameter and copied graph below;

    Output parameter

    Noncentrality parameter λ – 13.5000000

    Critical F – 3.0279984

    Numerator df- 3

    Denominator df – 23

    Total sample size – 27

    Actual power – 0.8182141

    Output graph representation

    Using G*Power to calculate Sample size = 27

    No, the sample size calculated using G*Power did not support the sample size of the data run (N = 231). The calculated sample size using G*Power was around one tenth of the actual sample size (N = 231) run on multiple regression statistical analysis. In research study large sample size is more dependable in term of generalization of the results of the study to larger population (Green & Salkind, 2014). Therefore, the sample size used for this analysis was adequate and far above calculated sample size with G*Power, that is adequate number of subjects were used for the study.

    1. Does your sample size support the sample size of the data you ran?
    2. References

      Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics (4th ed.). London: Sage.

      Green, S. B., & Salkind, N. J. (2014). Using SPSS for Windows and Macintosh: Analyzing and understanding data (7th ed.). Upper Saddle River, NJ: Pearson Education. 

      Sage Publications. (2013). Andy Field’s Datasets [Data files]. Available from Discovering Statistics Using IBM SPSS Statistics companion website: http://www.sagepub.com/field4e/study/datasets.htm

      IBM PASW (formerly SPSS) Statistical Software version 21.

      G*Power statistical software 3.1.9.2.




Click following link to download this document

Multiple Regression Analysis Using PASW Statistics.docx