Multiple Regression Analysis Using PASW Statistics

29 Sep No Comments

Tajudeen Rasheed

Week 6 PASW Assignment

Multiple Regression Analysis Using PASW Statistics

Multiple regression is an extension of simple linear regression (Field, 2013). It is used when we want to predict the value of a variable based on the value of two or more other variables (Field, 2013). According to Field (2013) the variable we want to predict is called the dependent variable (or sometimes, the outcome, target or criterion variable). The variables we are using to predict the value of the dependent variable are called the independent variables (or sometimes, the predictor, explanatory or regressor variables) (Field, 2013). Multiple regression allows us to determine the overall fit (variance explained) of the model and the relative contribution of each of the predictors to the total variance explained (Field, 2013).

Section 1

Assumptions for Multiple Regression

Outliers are simply single data points within data that do not follow the usual pattern. The problem with outliers is that they can have a negative effect on the multiple regression, reducing the accuracy of the results. Fortunately, when using SPSS to run multiple regression on data, it is easy to detect possible outliers (Field, 2013).

  1. (1) There should be no significant Outliers

Multiple regression assumption of normality of variables explained that the dependent

  1. (2) Normality of variables

variable and each of your independent variables, is a normally distributed. According to Field (2013) this assumption needs will be verified by checking graphically (either a histogram with normal distribution curve, or with a Q-Q-Plot or scatterplot).

These are data that make N value incomplete and can have a negative effect on the

  1. (3) Missing data

multiple regression, reducing the accuracy of the results. Fortunately, when using PASW to run multiple regression, it is easy to detect possible missing data from descriptive statistics output (Field, 2013).

Data must not show multicollinearity in multiple regression, this occurs when two or more independent variables are highly correlated with each other. This leads to problems with understanding which independent variable contributes to the variance explained in the dependent variable, as well as technical issues in calculating a multiple regression model. Multicollinearity misleadingly inflates the standard errors. Thus, it makes some variables statistically insignificant while they should be otherwise significant (Field, 2013).

  1. (4) No Multicolinearity (Variance Inflation Factor – VIF)

Data needs to show homogeneity of variance, which is where the variances along the line of best fit remain similar as we move along the line of regression (Field, 2013).

  1. (5) Homogeneity of variance

This assumption is very important. If there is a positive relationship between the dependent variable of outcome in one group and two or more independent variables, we assume that there is a positive relationship. If the relationship displayed in the scatterplots and partial regression plots are not linear, then either run a non-linear regression analysis or “transform” the data, using PASW Statistics (Field, 2013).

  1. (6) Homogeneity of regression

Section 2

Testing whether the assumptions are met or not

The assumptions of outliers, normality of variance and multicolinearity will be checked to find out whether the data in the Supermodel.sav data set met the assumption or not using PASW Statistics.

Running of assumptions on PASW statistics

with the PASW statistics to get graph of boxplots below.

  1. (1) There should be no outliers: the testing of this assumption within the groups is done

* Chart Builder.

GGRAPH

/GRAPHDATASET NAME=”graphdataset” VARIABLES=beauty MISSING=LISTWISE REPORTMISSING=NO

/GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

SOURCE: s=userSource(id(“graphdataset”))

DATA: beauty=col(source(s), name(“beauty”))

DATA: id=col(source(s), name(“$CASENUM”), unit.category())

GUIDE: axis(dim(2), label(“Attractiveness (%)”))

ELEMENT: schema(position(bin.quantile.letter(1*beauty)), label(id))

END GPL.

GGraph

11-OCT-2014 14:41:34
 
InputDataC:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
 Active DatasetDataSet1
 File LabelFile created by MATRIX
 Filter
 Weight
 Split File
 N of Rows in Working Data File231
GGRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=beauty MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE.BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: beauty=col(source(s), name(“beauty”)) DATA: id=col(source(s), name(“$CASENUM”), unit.category()) GUIDE: axis(dim(2), label(“Attractiveness (%)”)) ELEMENT: schema(position(bin.quantile.letter(1*beauty)), label(id))END GPL.
ResourcesProcessor Time00:00:00.50
 Elapsed Time00:00:00.49

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

* Chart Builder.

GGRAPH

/GRAPHDATASET NAME=”graphdataset” VARIABLES=years MISSING=LISTWISE REPORTMISSING=NO

/GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

SOURCE: s=userSource(id(“graphdataset”))

DATA: years=col(source(s), name(“years”))

DATA: id=col(source(s), name(“$CASENUM”), unit.category())

GUIDE: axis(dim(2), label(“Number of Years as a Model”))

ELEMENT: schema(position(bin.quantile.letter(1*years)), label(id))

END GPL.

GGraph

11-OCT-2014 14:42:09
 
InputDataC:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
 Active DatasetDataSet1
 File LabelFile created by MATRIX
 Filter
 Weight
 Split File
 N of Rows in Working Data File231
GGRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=years MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE.BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: years=col(source(s), name(“years”)) DATA: id=col(source(s), name(“$CASENUM”), unit.category()) GUIDE: axis(dim(2), label(“Number of Years as a Model”)) ELEMENT: schema(position(bin.quantile.letter(1*years)), label(id))END GPL.
ResourcesProcessor Time00:00:00.58
 Elapsed Time00:00:00.55

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

* Chart Builder.

GGRAPH

/GRAPHDATASET NAME=”graphdataset” VARIABLES=age MISSING=LISTWISE REPORTMISSING=NO

/GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

SOURCE: s=userSource(id(“graphdataset”))

DATA: age=col(source(s), name(“age”))

DATA: id=col(source(s), name(“$CASENUM”), unit.category())

GUIDE: axis(dim(2), label(“Age (Years)”))

ELEMENT: schema(position(bin.quantile.letter(1*age)), label(id))

END GPL.

GGraph

11-OCT-2014 14:42:35
 
InputDataC:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
 Active DatasetDataSet1
 File LabelFile created by MATRIX
 Filter
 Weight
 Split File
 N of Rows in Working Data File231
GGRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=age MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE.BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: age=col(source(s), name(“age”)) DATA: id=col(source(s), name(“$CASENUM”), unit.category()) GUIDE: axis(dim(2), label(“Age (Years)”)) ELEMENT: schema(position(bin.quantile.letter(1*age)), label(id))END GPL.
ResourcesProcessor Time00:00:00.53
 Elapsed Time00:00:00.51

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

* Chart Builder.

GGRAPH

/GRAPHDATASET NAME=”graphdataset” VARIABLES=salary MISSING=LISTWISE REPORTMISSING=NO

/GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

SOURCE: s=userSource(id(“graphdataset”))

DATA: salary=col(source(s), name(“salary”))

DATA: id=col(source(s), name(“$CASENUM”), unit.category())

GUIDE: axis(dim(2), label(“Salary per Day (£)”))

ELEMENT: schema(position(bin.quantile.letter(1*salary)), label(id))

END GPL.

GGraph

11-OCT-2014 14:42:55
 
InputDataC:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
 Active DatasetDataSet1
 File LabelFile created by MATRIX
 Filter
 Weight
 Split File
 N of Rows in Working Data File231
GGRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=salary MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE.BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: salary=col(source(s), name(“salary”)) DATA: id=col(source(s), name(“$CASENUM”), unit.category()) GUIDE: axis(dim(2), label(“Salary per Day (£)”)) ELEMENT: schema(position(bin.quantile.letter(1*salary)), label(id))END GPL.
ResourcesProcessor Time00:00:00.56
 Elapsed Time00:00:00.51

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

The boxplot is a graphical display of the data that shows: (1) median, which is the middle black line, (2) middle 50% of scores, which is the shaded region, (3) top and bottom 25% of scores, which are the lines extending out of the shaded region, (4) the smallest and largest (non-outlier) scores, which are the horizontal lines at the top/bottom of the boxplot, and (5) outliers. The boxplot shows both “mild” outliers and “extreme” outliers. Mild outliers are any score more than 1.5*IQR from the rest of the scores, and are indicated by open dots. IQR stands for “Interquartile range”, and is the middle 50% of the scores. Extreme outliers are any score more than 3*IQR from the rest of the scores, and are indicated by stars. However, these benchmarks are arbitrarily chosen, similar to how p<.05 is arbitrarily chosen. For “boxplots above”, there are open dots and stars that display cases of outlier in each variable as follows:

Attractiveness outliers are: 23, 194, 73, 33, 180, 60, and 67.

Numbers of year outliers are: 57, 91, 157, 155, 5, 190, 212, 60, 131, and 18.

Age outliers are: 57, 155, 91, 190, 5, 32, 224, 157, 114, 60, 18, and 131.

Salary per year outliers are: dots; 91, 2, 170, 191, 41, 24, 83, 50 and the stars*; 5, 135, 155, 198, 116, and 127. It should be noted that boxplot display all the cases of outliers. In summary, this output of boxplots identified 42 outliers from value N = 231, therefore the assumption of outlier is not met.

This assumption will be verified using PASW statistics by checking graphically with a Q-Q-Plot.

  1. (2) Normality of variables

PPLOT

/VARIABLES=salary age years beauty

/NOLOG

/NOSTANDARDIZE

/TYPE=Q-Q

/FRACTION=BLOM

/TIES=MEAN

/DIST=NORMAL.

PPlot

11-OCT-2014 15:42:27
 
InputDataC:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
 Active DatasetDataSet1
 File LabelFile created by MATRIX
 Filter
 Weight
 Split File
 N of Rows in Working Data File231
 Date
Missing Value HandlingDefinition of MissingUser-defined missing values are treated as missing.
 Cases UsedFor a given sequence or time series variable, cases with missing values are not used in the analysis. Cases with negative or zero values are also not used, if the log transform is requested.
PPLOT /VARIABLES=salary age years beauty /NOLOG /NOSTANDARDIZE /TYPE=Q-Q /FRACTION=BLOM /TIES=MEAN /DIST=NORMAL.
ResourcesProcessor Time00:00:04.25
 Elapsed Time00:00:04.02
UseFromFirst observation
 ToLast observation
Time Series Settings (TSET)Amount of OutputPRINT = DEFAULT
 Saving New VariablesNEWVAR = CURRENT
 Maximum Number of Lags in Autocorrelation or Partial Autocorrelation PlotsMXAUTO = 16
 Maximum Number of Lags Per Cross-Correlation PlotsMXCROSS = 7
 Maximum Number of New Variables Generated Per ProcedureMXNEWVAR = 60
 Maximum Number of New Cases Per ProcedureMXPREDICT = 1000
 Treatment of User-Missing ValuesMISSING = EXCLUDE
 Confidence Interval Percentage ValueCIN = 95
 Tolerance for Entering Variables in Regression EquationsTOLER = .0001
 Maximum Iterative Parameter ChangeCNVERGE = .001
 Method of Calculating Std. Errors for AutocorrelationsACFSE = IND
 Length of Seasonal PeriodUnspecified
 Variable Whose Values Label Observations in PlotsUnspecified
 Equations IncludeCONSTANT

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

MOD_1
Series or Sequence1Salary per Day (£)
 2Age (Years)
 3Number of Years as a Model
 4Attractiveness (%)
None
0
0
No periodicity
Not applied
DistributionTypeNormal
 LocationEstimated
 ScaleEstimated
Blom’s
Mean rank of tied values
Salary per Day (£)Age (Years)Number of Years as a ModelAttractiveness (%)
231231231231
Number of Missing Values in the PlotUser-Missing0000
 System-Missing0000

GGraph

Salary per Day (£)Age (Years)Number of Years as a ModelAttractiveness (%)
Normal DistributionLocation11.338518.06794.585475.9447
 Scale16.026442.421901.578656.77303

Salary per Day (£)

Age (Years)

Number of Years as a Model

Attractiveness (%)

From graphs above that were meant to determine the normality of all the variables, it was observed that residuals fall substantially within the normal curve as they were all very close to the line though in some few graphs the residuals were not exactly close to the Q-Q plot line, therefore the assumption of normality of variable is met.

Multicollinearity misleadingly inflates the standard errors. Thus, it makes some variables statistically insignificant while they should be otherwise significant. Therefore, in our enhanced multiple regression guide, using PASW Statistics will detect multicollinearity through an inspection of correlation coefficients and Tolerance/VIF (Variance Inflation Factor) values; and interpreting these correlation coefficients and Tolerance/VIF values determine whether our data meets or violates this assumption (Field, 2013).

  1. Testing of Multicolinearity assumption using PASW statistics.

GET

FILE=’C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav’.

DATASET NAME DataSet1 WINDOW=FRONT.

CORRELATIONS

/VARIABLES=salary age years beauty

/PRINT=ONETAIL NOSIG

/MISSING=PAIRWISE.

Correlations

11-OCT-2014 20:44:45
 
InputDataC:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
 Active DatasetDataSet1
 File LabelFile created by MATRIX
 Filter
 Weight
 Split File
 N of Rows in Working Data File231
Missing Value HandlingDefinition of MissingUser-defined missing values are treated as missing.
 Cases UsedStatistics for each pair of variables are based on all the cases with valid data for that pair.
CORRELATIONS /VARIABLES=salary age years beauty /PRINT=ONETAIL NOSIG /MISSING=PAIRWISE.
ResourcesProcessor Time00:00:00.05
 Elapsed Time00:00:00.03

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

Salary per Day (£)Age (Years)Number of Years as a ModelAttractiveness (%)
Salary per Day (£)Pearson Correlation1.397**.337**.068
 Sig. (1-tailed) .000.000.152
 N231231231231
Age (Years)Pearson Correlation.397**1.955**.261**
 Sig. (1-tailed).000 .000.000
 N231231231231
Number of Years as a ModelPearson Correlation.337**.955**1.173**
 Sig. (1-tailed).000.000 .004
 N231231231231
Attractiveness (%)Pearson Correlation.068.261**.173**1
 Sig. (1-tailed).152.000.004 
 N231231231231

The assumption of multicolinearity was determined using the correlations and excluded variables outputs. It was noted that the age and number of years as model on the correlations output shows values of 0.955 which is higher than 0.80 an acceptable value.  This shows that the correlations illustrate a value of 0.955 for the IV age and years as a model. Similarly, checking the excluded variables output, the Variance Inflation Factor (VIF) scores for the IV of years as a model exceeded 10 with a value of 11.31, and the tolerance is less than 0.1 at 0.088. Conclusively, the assumption of the multicolinearity is not met.

Beta IntSig.Partial CorrelationCollinearity Statistics
    ToleranceVIFMinimum Tolerance
1Salary per Day (£)-.043b-.612.541-.040.8421.188.842
 Number of Years as a Model-.856b-4.130.000-.264.08811.311.088
2Salary per Day (£)-.088c-1.289.199-.085.8221.217.082

Summarily, it was observed that out of the three assumptions tested only normality of variable assumption passed the test and the two other assumptions failed the test. In a real research study, as a researcher I will exclude those outliers from my data but in this situation a data set was provided for this assignment, I will proceed with the analysis of the data with caution especially on the accuracy of the results.

Section 3

Hypotheses

The statement of null and alternative (research) hypotheses from the variables in the data above are as follows:

DV: Attractiveness.

IV: Age, salary per day, and years of model.

B0: The intercept for statistical significance.

The null hypothesis stated that;

H0: ß1=ß2=ß3=0: In the population, all the partial regression coefficients equal to zero.

The alternate hypothesis stated that;

H1: ß1≠ß2≠ß3≠0: In the population, all the partial regression coefficients does not equal to zero.

It should be noted that;

-The alpha level: α = .05

-In this case, the test of prediction will be tested statistically with PASW (SPSS) Version 21.

-PASW (SPSS) assumes that the all the statistical assumptions of multiple regression were met but in actual situation the two assumptions; outliers and multicolinearity were not met and it was only the assumption of normality that was met.

Section 4

PASW syntax for Multiple Regression

REGRESSION

/DESCRIPTIVES MEAN STDDEV CORR SIG N

/MISSING LISTWISE

/STATISTICS COEFF OUTS CI(95) R ANOVA COLLIN TOL CHANGE ZPP

/CRITERIA=PIN(.05) POUT(.10)

/NOORIGIN

/DEPENDENT beauty

/METHOD=FORWARD salary age years

/SCATTERPLOT=(*ZRESID ,*ZPRED)

/RESIDUALS HISTOGRAM(ZRESID) NORMPROB(ZRESID).

Section 5

PASW outputs for Multiple Regression

Regression

11-OCT-2014 17:44:47
 
InputDataC:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
 Active DatasetDataSet1
 File LabelFile created by MATRIX
 Filter
 Weight
 Split File
 N of Rows in Working Data File231
Missing Value HandlingDefinition of MissingUser-defined missing values are treated as missing.
 Cases UsedStatistics are based on cases with no missing values for any variable used.
REGRESSION /DESCRIPTIVES MEAN STDDEV CORR SIG N /MISSING LISTWISE /STATISTICS COEFF OUTS CI(95) R ANOVA COLLIN TOL CHANGE ZPP /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT beauty /METHOD=FORWARD salary age years /SCATTERPLOT=(*ZRESID ,*ZPRED) /RESIDUALS HISTOGRAM(ZRESID) NORMPROB(ZRESID).
ResourcesProcessor Time00:00:01.52
 Elapsed Time00:00:01.52
 Memory Required1956 bytes
 Additional Memory Required for Residual Plots896 bytes

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

 MeanStd. DeviationN
Attractiveness (%)75.94476.77303231
Salary per Day (£)11.338516.02644231
Age (Years)18.06792.42190231
Number of Years as a Model4.58541.57865231
Salary per Day (£)Age (Years)Number of Years as a Model
.068.261.173
1.000.397.337
.3971.000.955
.337.9551.000
.152.000.004
..000.000
.000..000
.000.000.
231231231
231231231
231231231
231231231
ModelVariables EnteredVariables RemovedMethod
1Age (Years).Forward (Criterion: Probability-of-F-to-enter <= .050)
2Number of Years as a Model.Forward (Criterion: Probability-of-F-to-enter <= .050)

From the variable Entered/Removed output shows variables retained and variable removed for this forward entered method. It was observed that the predictor of salary per day was removed which indicated that the variable is not statistically significant. Therefore, age and number of years as a model displayed in the output are predictors among the independent variables. Hence, they shows the variability to the dependent variable (attractiveness).

ModelRR SquareAdjusted R SquareStd. Error of the EstimateChange Statistics
     R Square ChangeF Changedf1df2Sig. F Change
1.261a.068.0646.55252.06816.7401229.000
2.365b.133.1256.33427.06517.0531228.000

The model summary output above is used in determining how well the model fit, the R square column display the change for the second significant variable at 0.65(6.5%) of variance. Conclusively, these two statistically significant variables account for the variability level of the dependent variable. Hence, salary per day is not a predictive factor while the age of a model and years spent as a model are predictors of attractiveness.

Sum of SquaresdfMean SquareFSig.
1Regression718.7571718.75716.740.000b
 Residual9832.24822942.936  
 Total10551.004230   
2Regression1402.9772701.48917.483.000c
 Residual9148.02722840.123  
 Total10551.004230   

The ANOVA output at F- ratio column shows that the independent variables are statistically significantly predict the dependent variable F (2, 288) = 17. 483, p = 0.0001 < 0.0005 that is regression model is a good fit of the data.

Standardized CoefficientstSig.95.0% Confidence Interval for BCorrelationsCollinearity Statistics
BStd. ErrorBeta  Lower BoundUpper BoundZero-orderPartialPartTolerance 
1(Constant)62.7573.252 19.298.00056.34969.164     
 Age (Years).730.178.2614.091.000.3781.081.261.261.2611.000 
2(Constant)38.2886.707 5.708.00025.07251.505     
 Age (Years)3.017.5801.0795.201.0001.8744.160.261.326.321.088 
 Number of Years as a Model-3.674.890-.856-4.130.000-5.428-1.921.173-.264-.255.088 

The coefficient output shows statistical significance of the independent variables, the unstandardized (and or standardized) coefficients is equal to 0 (zero) in the population. If p < 0.05, we conclude that the coefficients are statistically different to (zero). The t-value and corresponding p-value are located in the “t” and “sig.” columns, respectively. From the significant (sig.) column all independent variables coefficient except that of salary per day are statistically significant different from 0 (zero), although the intercept at B0 is for statistically significance.

Beta IntSig.Partial CorrelationCollinearity Statistics
    ToleranceVIFMinimum Tolerance
1Salary per Day (£)-.043b-.612.541-.040.8421.188.842
 Number of Years as a Model-.856b-4.130.000-.264.08811.311.088
2Salary per Day (£)-.088c-1.289.199-.085.8221.217.082
ModelDimensionEigenvalueCondition IndexVariance Proportions
    (Constant)Age (Years)Number of Years as a Model
111.9911.000.00.00 
 2.00915.0191.001.00 
212.9441.000.00.00.00
 2.0557.301.03.00.10
 3.00154.019.971.00.90
 MinimumMaximumMeanStd. DeviationN
Predicted Value68.466382.431475.94472.46980231
Residual-15.0998122.88941.000006.30667231
Std. Predicted Value-3.0282.626.0001.000231
Std. Residual-2.3843.614.000.996231

Charts

Section 6

APA Style tables for academic reporting

Table 1

 MeanStd. DeviationN
Attractiveness (%)75.94476.77303231
Salary per Day (£)11.338516.02644231
Age (Years)18.06792.42190231
Number of Years as a Model4.58541.57865231

Table 2

Attractiveness (%)Salary per Day (£)Age (Years)Number of Years as a Model
Pearson CorrelationAttractiveness (%)1.000.068.261.173
 Salary per Day (£).0681.000.397.337
 Age (Years).261.3971.000.955
 Number of Years as a Model.173.337.9551.000
Sig. (1-tailed)Attractiveness (%)..152.000.004
 Salary per Day (£).152..000.000
 Age (Years).000.000..000
 Number of Years as a Model.004.000.000.
      
      
      
      

Section 7

APA Report for Multiple Regression

Multiple Regression statistical analysis was conducted on data in the Supermodel.sav data set from the Field text to predict attractiveness of model from age in years, salary per day and years of model. From the descriptive statistics table 1, the N value is 231 and there was no missing N value reported. The independent variables are age, salary per day and years of model while the dependent variable is the attractiveness. Three assumptions were tested namely; outlier, normality of variable and multicolinearity using PASW statistic before the analysis of data. It was detected that only assumption of normality of variables was met, the two other assumptions of multicolinearity and outlier were violated.

The null hypothesis (H0) is rejected as the p = 0.0001 < 0.0005 while the alternate hypothesis (H1) is accepted. Therefore, looking at the significant (sig.) column, the independent variables display (age and years of model) have a coefficients that are statistically significantly different from 0 (zero) at (F (2, 288) = 17.483, p = 0.0001 < 0.0005, R2 = 0.133, one-tailed). That is, there is sufficient evidence to conclude that age and years of model are predictors of attractiveness of models. The salary was removed from the model output and this shows that it is statistically not significant. Therefore, the salary per day is not a predictive of attractiveness of a model.

In conclusion, a multiple regression was run to predict attractiveness of a model from age, years of model and salary per day. The two variables; age and years of model statistically significantly predicted attractiveness (F (2, 288) = 17.483, p < 0.0005, R2 = .133). Only the two independent variables (age and years of model) added statistically significantly to the prediction of attractiveness, p <0.05.

Section 8

Describe how you would compute the sample size using power =.80, effect size of .50, alpha of .05. Does your analysis support the sample size of the data you ran?

Calculation of Sample Size using G*Power

Steps:

Input parameter

  1. Test family: F-tests selected.
  2. Statistical test: Linear multiple regression: fixed model, R2 deviation from zero.
  3. Type of power analysis: A prior: Compute required sample size- given α, power, and effect size selected.
  4. Effect size f2 – 0.50

    α err prob – 0.05

    Power (1-β err prob) – 0.80

    Number of predictors – 3

    Then after entering all these parameter, I click on calculate to get the output parameter and copied graph below;

    Output parameter

    Noncentrality parameter λ – 13.5000000

    Critical F – 3.0279984

    Numerator df- 3

    Denominator df – 23

    Total sample size – 27

    Actual power – 0.8182141

    Output graph representation

    Using G*Power to calculate Sample size = 27

    No, the sample size calculated using G*Power did not support the sample size of the data run (N = 231). The calculated sample size using G*Power was around one tenth of the actual sample size (N = 231) run on multiple regression statistical analysis. In research study large sample size is more dependable in term of generalization of the results of the study to larger population (Green & Salkind, 2014). Therefore, the sample size used for this analysis was adequate and far above calculated sample size with G*Power, that is adequate number of subjects were used for the study.

    1. Does your sample size support the sample size of the data you ran?
    2. References

      Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics (4th ed.). London: Sage.

      Green, S. B., & Salkind, N. J. (2014). Using SPSS for Windows and Macintosh: Analyzing and understanding data (7th ed.). Upper Saddle River, NJ: Pearson Education. 

      Sage Publications. (2013). Andy Field’s Datasets [Data files]. Available from Discovering Statistics Using IBM SPSS Statistics companion website: http://www.sagepub.com/field4e/study/datasets.htm

      IBM PASW (formerly SPSS) Statistical Software version 21.

      G*Power statistical software 3.1.9.2.




Click following link to download this document

Multiple Regression Analysis Using PASW Statistics.docx

To view and download a complete answer, scroll down to the bottom to pay Pay to view


Would you like your assignment done free from plagiarism by an expert? Place your order now and it shall be done within the time frame you indicate.