Multiple Regression Analysis Using PASW Statistics

Tajudeen Rasheed

Week 6 PASW Assignment

Multiple regression is an extension of simple linear regression (Field, 2013). It is used when we want to predict the value of a variable based on the value of two or more other variables (Field, 2013). According to Field (2013) the variable we want to predict is called the dependent variable (or sometimes, the outcome, target or criterion variable). The variables we are using to predict the value of the dependent variable are called the independent variables (or sometimes, the predictor, explanatory or regressor variables) (Field, 2013). Multiple regression allows us to determine the overall fit (variance explained) of the model and the relative contribution of each of the predictors to the total variance explained (Field, 2013).

Section 1

Assumptions for Multiple Regression

Outliers are simply single data points within data that do not follow the usual pattern. The problem with outliers is that they can have a negative effect on the multiple regression, reducing the accuracy of the results. Fortunately, when using SPSS to run multiple regression on data, it is easy to detect possible outliers (Field, 2013).

(1) There should be no significant Outliers

Multiple regression assumption of normality of variables explained that the dependent

(2) Normality of variables

variable and each of your independent variables, is a normally distributed. According to Field (2013) this assumption needs will be verified by checking graphically (either a histogram with normal distribution curve, or with a Q-Q-Plot or scatterplot).

These are data that make N value incomplete and can have a negative effect on the

(3) Missing data

multiple regression, reducing the accuracy of the results. Fortunately, when using PASW to run multiple regression, it is easy to detect possible missing data from descriptive statistics output (Field, 2013).

Data must not show multicollinearity in multiple regression, this occurs when two or more independent variables are highly correlated with each other. This leads to problems with understanding which independent variable contributes to the variance explained in the dependent variable, as well as technical issues in calculating a multiple regression model. Multicollinearity misleadingly inflates the standard errors. Thus, it makes some variables statistically insignificant while they should be otherwise significant (Field, 2013).

(4) No Multicolinearity (Variance Inflation Factor – VIF)

Data needs to show homogeneity of variance, which is where the variances along the line of best fit remain similar as we move along the line of regression (Field, 2013).

(5) Homogeneity of variance

This assumption is very important. If there is a positive relationship between the dependent variable of outcome in one group and two or more independent variables, we assume that there is a positive relationship. If the relationship displayed in the scatterplots and partial regression plots are not linear, then either run a non-linear regression analysis or “transform” the data, using PASW Statistics (Field, 2013).

(6) Homogeneity of regression

Section 2

Testing whether the assumptions are met or not

The assumptions of outliers, normality of variance and multicolinearity will be checked to find out whether the data in the Supermodel.sav data set met the assumption or not using PASW Statistics.

Running of assumptions on PASW statistics

with the PASW statistics to get graph of boxplots below.

(1) There should be no outliers: the testing of this assumption within the groups is done

* Chart Builder.

GGRAPH

/GRAPHDATASET NAME=”graphdataset” VARIABLES=beauty MISSING=LISTWISE REPORTMISSING=NO

/GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

SOURCE: s=userSource(id(“graphdataset”))

DATA: beauty=col(source(s), name(“beauty”))

DATA: id=col(source(s), name(“$CASENUM”), unit.category())

GUIDE: axis(dim(2), label(“Attractiveness (%)”))

ELEMENT: schema(position(bin.quantile.letter(1*beauty)), label(id))

END GPL.

GGraph

11-OCT-2014 14:41:34

Input	Data	C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
	Active Dataset	DataSet1
	File Label	File created by MATRIX
	Filter
	Weight
	Split File
	N of Rows in Working Data File	231
GGRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=beauty MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE.BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: beauty=col(source(s), name(“beauty”)) DATA: id=col(source(s), name(“$CASENUM”), unit.category()) GUIDE: axis(dim(2), label(“Attractiveness (%)”)) ELEMENT: schema(position(bin.quantile.letter(1*beauty)), label(id))END GPL.
Resources	Processor Time	00:00:00.50
	Elapsed Time	00:00:00.49

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

* Chart Builder.

GGRAPH

/GRAPHDATASET NAME=”graphdataset” VARIABLES=years MISSING=LISTWISE REPORTMISSING=NO

/GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

SOURCE: s=userSource(id(“graphdataset”))

DATA: years=col(source(s), name(“years”))

DATA: id=col(source(s), name(“$CASENUM”), unit.category())

GUIDE: axis(dim(2), label(“Number of Years as a Model”))

ELEMENT: schema(position(bin.quantile.letter(1*years)), label(id))

END GPL.

GGraph

11-OCT-2014 14:42:09

Input	Data	C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
	Active Dataset	DataSet1
	File Label	File created by MATRIX
	Filter
	Weight
	Split File
	N of Rows in Working Data File	231
GGRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=years MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE.BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: years=col(source(s), name(“years”)) DATA: id=col(source(s), name(“$CASENUM”), unit.category()) GUIDE: axis(dim(2), label(“Number of Years as a Model”)) ELEMENT: schema(position(bin.quantile.letter(1*years)), label(id))END GPL.
Resources	Processor Time	00:00:00.58
	Elapsed Time	00:00:00.55

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

* Chart Builder.

GGRAPH

/GRAPHDATASET NAME=”graphdataset” VARIABLES=age MISSING=LISTWISE REPORTMISSING=NO

/GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

SOURCE: s=userSource(id(“graphdataset”))

DATA: age=col(source(s), name(“age”))

DATA: id=col(source(s), name(“$CASENUM”), unit.category())

GUIDE: axis(dim(2), label(“Age (Years)”))

ELEMENT: schema(position(bin.quantile.letter(1*age)), label(id))

END GPL.

GGraph

11-OCT-2014 14:42:35

Input	Data	C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
	Active Dataset	DataSet1
	File Label	File created by MATRIX
	Filter
	Weight
	Split File
	N of Rows in Working Data File	231
GGRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=age MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE.BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: age=col(source(s), name(“age”)) DATA: id=col(source(s), name(“$CASENUM”), unit.category()) GUIDE: axis(dim(2), label(“Age (Years)”)) ELEMENT: schema(position(bin.quantile.letter(1*age)), label(id))END GPL.
Resources	Processor Time	00:00:00.53
	Elapsed Time	00:00:00.51

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

* Chart Builder.

GGRAPH

/GRAPHDATASET NAME=”graphdataset” VARIABLES=salary MISSING=LISTWISE REPORTMISSING=NO

/GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

SOURCE: s=userSource(id(“graphdataset”))

DATA: salary=col(source(s), name(“salary”))

DATA: id=col(source(s), name(“$CASENUM”), unit.category())

GUIDE: axis(dim(2), label(“Salary per Day (£)”))

ELEMENT: schema(position(bin.quantile.letter(1*salary)), label(id))

END GPL.

GGraph

11-OCT-2014 14:42:55

Input	Data	C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
	Active Dataset	DataSet1
	File Label	File created by MATRIX
	Filter
	Weight
	Split File
	N of Rows in Working Data File	231
GGRAPH /GRAPHDATASET NAME=”graphdataset” VARIABLES=salary MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE.BEGIN GPL SOURCE: s=userSource(id(“graphdataset”)) DATA: salary=col(source(s), name(“salary”)) DATA: id=col(source(s), name(“$CASENUM”), unit.category()) GUIDE: axis(dim(2), label(“Salary per Day (£)”)) ELEMENT: schema(position(bin.quantile.letter(1*salary)), label(id))END GPL.
Resources	Processor Time	00:00:00.56
	Elapsed Time	00:00:00.51

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

The boxplot is a graphical display of the data that shows: (1) median, which is the middle black line, (2) middle 50% of scores, which is the shaded region, (3) top and bottom 25% of scores, which are the lines extending out of the shaded region, (4) the smallest and largest (non-outlier) scores, which are the horizontal lines at the top/bottom of the boxplot, and (5) outliers. The boxplot shows both “mild” outliers and “extreme” outliers. Mild outliers are any score more than 1.5*IQR from the rest of the scores, and are indicated by open dots. IQR stands for “Interquartile range”, and is the middle 50% of the scores. Extreme outliers are any score more than 3*IQR from the rest of the scores, and are indicated by stars. However, these benchmarks are arbitrarily chosen, similar to how p<.05 is arbitrarily chosen. For “boxplots above”, there are open dots and stars that display cases of outlier in each variable as follows:

Attractiveness outliers are: 23, 194, 73, 33, 180, 60, and 67.

Numbers of year outliers are: 57, 91, 157, 155, 5, 190, 212, 60, 131, and 18.

Age outliers are: 57, 155, 91, 190, 5, 32, 224, 157, 114, 60, 18, and 131.

Salary per year outliers are: dots; 91, 2, 170, 191, 41, 24, 83, 50 and the stars*; 5, 135, 155, 198, 116, and 127. It should be noted that boxplot display all the cases of outliers. In summary, this output of boxplots identified 42 outliers from value N = 231, therefore the assumption of outlier is not met.

This assumption will be verified using PASW statistics by checking graphically with a Q-Q-Plot.

(2) Normality of variables

PPLOT

/VARIABLES=salary age years beauty

/NOLOG

/NOSTANDARDIZE

/TYPE=Q-Q

/FRACTION=BLOM

/TIES=MEAN

/DIST=NORMAL.

PPlot

11-OCT-2014 15:42:27

Input	Data	C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
	Active Dataset	DataSet1
	File Label	File created by MATRIX
	Filter
	Weight
	Split File
	N of Rows in Working Data File	231
	Date
Missing Value Handling	Definition of Missing	User-defined missing values are treated as missing.
	Cases Used	For a given sequence or time series variable, cases with missing values are not used in the analysis. Cases with negative or zero values are also not used, if the log transform is requested.
PPLOT /VARIABLES=salary age years beauty /NOLOG /NOSTANDARDIZE /TYPE=Q-Q /FRACTION=BLOM /TIES=MEAN /DIST=NORMAL.
Resources	Processor Time	00:00:04.25
	Elapsed Time	00:00:04.02
Use	From	First observation
	To	Last observation
Time Series Settings (TSET)	Amount of Output	PRINT = DEFAULT
	Saving New Variables	NEWVAR = CURRENT
	Maximum Number of Lags in Autocorrelation or Partial Autocorrelation Plots	MXAUTO = 16
	Maximum Number of Lags Per Cross-Correlation Plots	MXCROSS = 7
	Maximum Number of New Variables Generated Per Procedure	MXNEWVAR = 60
	Maximum Number of New Cases Per Procedure	MXPREDICT = 1000
	Treatment of User-Missing Values	MISSING = EXCLUDE
	Confidence Interval Percentage Value	CIN = 95
	Tolerance for Entering Variables in Regression Equations	TOLER = .0001
	Maximum Iterative Parameter Change	CNVERGE = .001
	Method of Calculating Std. Errors for Autocorrelations	ACFSE = IND
	Length of Seasonal Period	Unspecified
	Variable Whose Values Label Observations in Plots	Unspecified
	Equations Include	CONSTANT

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

MOD_1
Series or Sequence	1	Salary per Day (£)
	2	Age (Years)
	3	Number of Years as a Model
	4	Attractiveness (%)
None
0
0
No periodicity
Not applied
Distribution	Type	Normal
	Location	Estimated
	Scale	Estimated
Blom’s
Mean rank of tied values

Salary per Day (£)	Age (Years)	Number of Years as a Model	Attractiveness (%)
231	231	231	231
Number of Missing Values in the Plot	User-Missing	0	0	0	0
	System-Missing	0	0	0	0

GGraph

Salary per Day (£)	Age (Years)	Number of Years as a Model	Attractiveness (%)
Normal Distribution	Location	11.3385	18.0679	4.5854	75.9447
	Scale	16.02644	2.42190	1.57865	6.77303

Salary per Day (£)

Age (Years)

Number of Years as a Model

Attractiveness (%)

From graphs above that were meant to determine the normality of all the variables, it was observed that residuals fall substantially within the normal curve as they were all very close to the line though in some few graphs the residuals were not exactly close to the Q-Q plot line, therefore the assumption of normality of variable is met.

Multicollinearity misleadingly inflates the standard errors. Thus, it makes some variables statistically insignificant while they should be otherwise significant. Therefore, in our enhanced multiple regression guide, using PASW Statistics will detect multicollinearity through an inspection of correlation coefficients and Tolerance/VIF (Variance Inflation Factor) values; and interpreting these correlation coefficients and Tolerance/VIF values determine whether our data meets or violates this assumption (Field, 2013).

Testing of Multicolinearity assumption using PASW statistics.

GET

FILE=’C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav’.

DATASET NAME DataSet1 WINDOW=FRONT.

CORRELATIONS

/VARIABLES=salary age years beauty

/PRINT=ONETAIL NOSIG

/MISSING=PAIRWISE.

Correlations

11-OCT-2014 20:44:45

Input	Data	C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
	Active Dataset	DataSet1
	File Label	File created by MATRIX
	Filter
	Weight
	Split File
	N of Rows in Working Data File	231
Missing Value Handling	Definition of Missing	User-defined missing values are treated as missing.
	Cases Used	Statistics for each pair of variables are based on all the cases with valid data for that pair.
CORRELATIONS /VARIABLES=salary age years beauty /PRINT=ONETAIL NOSIG /MISSING=PAIRWISE.
Resources	Processor Time	00:00:00.05
	Elapsed Time	00:00:00.03

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

Salary per Day (£)	Age (Years)	Number of Years as a Model	Attractiveness (%)
Salary per Day (£)	Pearson Correlation	1	.397**	.337**	.068
	Sig. (1-tailed)		.000	.000	.152
	N	231	231	231	231
Age (Years)	Pearson Correlation	.397**	1	.955**	.261**
	Sig. (1-tailed)	.000		.000	.000
	N	231	231	231	231
Number of Years as a Model	Pearson Correlation	.337**	.955**	1	.173**
	Sig. (1-tailed)	.000	.000		.004
	N	231	231	231	231
Attractiveness (%)	Pearson Correlation	.068	.261**	.173**	1
	Sig. (1-tailed)	.152	.000	.004
	N	231	231	231	231

The assumption of multicolinearity was determined using the correlations and excluded variables outputs. It was noted that the age and number of years as model on the correlations output shows values of 0.955 which is higher than 0.80 an acceptable value. This shows that the correlations illustrate a value of 0.955 for the IV age and years as a model. Similarly, checking the excluded variables output, the Variance Inflation Factor (VIF) scores for the IV of years as a model exceeded 10 with a value of 11.31, and the tolerance is less than 0.1 at 0.088. Conclusively, the assumption of the multicolinearity is not met.

Beta In	t	Sig.	Partial Correlation	Collinearity Statistics
				Tolerance	VIF	Minimum Tolerance
1	Salary per Day (£)	-.043b	-.612	.541	-.040	.842	1.188	.842
	Number of Years as a Model	-.856b	-4.130	.000	-.264	.088	11.311	.088
2	Salary per Day (£)	-.088c	-1.289	.199	-.085	.822	1.217	.082

Summarily, it was observed that out of the three assumptions tested only normality of variable assumption passed the test and the two other assumptions failed the test. In a real research study, as a researcher I will exclude those outliers from my data but in this situation a data set was provided for this assignment, I will proceed with the analysis of the data with caution especially on the accuracy of the results.

Section 3

Hypotheses

The statement of null and alternative (research) hypotheses from the variables in the data above are as follows:

DV: Attractiveness.

IV: Age, salary per day, and years of model.

B0: The intercept for statistical significance.

The null hypothesis stated that;

H0: ß1=ß2=ß3=0: In the population, all the partial regression coefficients equal to zero.

The alternate hypothesis stated that;

H1: ß1≠ß2≠ß3≠0: In the population, all the partial regression coefficients does not equal to zero.

It should be noted that;

-The alpha level: α = .05

-In this case, the test of prediction will be tested statistically with PASW (SPSS) Version 21.

-PASW (SPSS) assumes that the all the statistical assumptions of multiple regression were met but in actual situation the two assumptions; outliers and multicolinearity were not met and it was only the assumption of normality that was met.

Section 4

PASW syntax for Multiple Regression

REGRESSION

/DESCRIPTIVES MEAN STDDEV CORR SIG N

/MISSING LISTWISE

/STATISTICS COEFF OUTS CI(95) R ANOVA COLLIN TOL CHANGE ZPP

/CRITERIA=PIN(.05) POUT(.10)

/NOORIGIN

/DEPENDENT beauty

/METHOD=FORWARD salary age years

/SCATTERPLOT=(*ZRESID ,*ZPRED)

/RESIDUALS HISTOGRAM(ZRESID) NORMPROB(ZRESID).

Section 5

PASW outputs for Multiple Regression

Regression

11-OCT-2014 17:44:47

Input	Data	C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav
	Active Dataset	DataSet1
	File Label	File created by MATRIX
	Filter
	Weight
	Split File
	N of Rows in Working Data File	231
Missing Value Handling	Definition of Missing	User-defined missing values are treated as missing.
	Cases Used	Statistics are based on cases with no missing values for any variable used.
REGRESSION /DESCRIPTIVES MEAN STDDEV CORR SIG N /MISSING LISTWISE /STATISTICS COEFF OUTS CI(95) R ANOVA COLLIN TOL CHANGE ZPP /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT beauty /METHOD=FORWARD salary age years /SCATTERPLOT=(ZRESID ,ZPRED) /RESIDUALS HISTOGRAM(ZRESID) NORMPROB(ZRESID).
Resources	Processor Time	00:00:01.52
	Elapsed Time	00:00:01.52
	Memory Required	1956 bytes
	Additional Memory Required for Residual Plots	896 bytes

[DataSet1] C:UsersRASHEEDDownloadsdsus4datadsus4dataSupermodel.sav

	Mean	Std. Deviation	N
Attractiveness (%)	75.9447	6.77303	231
Salary per Day (£)	11.3385	16.02644	231
Age (Years)	18.0679	2.42190	231
Number of Years as a Model	4.5854	1.57865	231

Salary per Day (£)	Age (Years)	Number of Years as a Model
.068	.261	.173
1.000	.397	.337
.397	1.000	.955
.337	.955	1.000
.152	.000	.004
.	.000	.000
.000	.	.000
.000	.000	.
231	231	231
231	231	231
231	231	231
231	231	231
Model	Variables Entered		Variables Removed	Method
1	Age (Years)		.	Forward (Criterion: Probability-of-F-to-enter <= .050)
2	Number of Years as a Model		.	Forward (Criterion: Probability-of-F-to-enter <= .050)

From the variable Entered/Removed output shows variables retained and variable removed for this forward entered method. It was observed that the predictor of salary per day was removed which indicated that the variable is not statistically significant. Therefore, age and number of years as a model displayed in the output are predictors among the independent variables. Hence, they shows the variability to the dependent variable (attractiveness).

Model	R	R Square	Adjusted R Square	Std. Error of the Estimate	Change Statistics
					R Square Change	F Change	df1	df2	Sig. F Change
1	.261a	.068	.064	6.55252	.068	16.740	1	229	.000
2	.365b	.133	.125	6.33427	.065	17.053	1	228	.000

The model summary output above is used in determining how well the model fit, the R square column display the change for the second significant variable at 0.65(6.5%) of variance. Conclusively, these two statistically significant variables account for the variability level of the dependent variable. Hence, salary per day is not a predictive factor while the age of a model and years spent as a model are predictors of attractiveness.

Sum of Squares	df	Mean Square	F	Sig.
1	Regression	718.757	1	718.757	16.740	.000b
	Residual	9832.248	229	42.936
	Total	10551.004	230
2	Regression	1402.977	2	701.489	17.483	.000c
	Residual	9148.027	228	40.123
	Total	10551.004	230

The ANOVA output at F- ratio column shows that the independent variables are statistically significantly predict the dependent variable F (2, 288) = 17. 483, p = 0.0001 < 0.0005 that is regression model is a good fit of the data.

Standardized Coefficients	t	Sig.	95.0% Confidence Interval for B		Correlations			Collinearity Statistics
B	Std. Error	Beta			Lower Bound	Upper Bound	Zero-order	Partial	Part	Tolerance
1	(Constant)	62.757	3.252		19.298	.000	56.349	69.164
	Age (Years)	.730	.178	.261	4.091	.000	.378	1.081	.261	.261	.261	1.000
2	(Constant)	38.288	6.707		5.708	.000	25.072	51.505
	Age (Years)	3.017	.580	1.079	5.201	.000	1.874	4.160	.261	.326	.321	.088
	Number of Years as a Model	-3.674	.890	-.856	-4.130	.000	-5.428	-1.921	.173	-.264	-.255	.088

The coefficient output shows statistical significance of the independent variables, the unstandardized (and or standardized) coefficients is equal to 0 (zero) in the population. If p < 0.05, we conclude that the coefficients are statistically different to (zero). The t-value and corresponding p-value are located in the “t” and “sig.” columns, respectively. From the significant (sig.) column all independent variables coefficient except that of salary per day are statistically significant different from 0 (zero), although the intercept at B0 is for statistically significance.

Beta In	t	Sig.	Partial Correlation	Collinearity Statistics
				Tolerance	VIF	Minimum Tolerance
1	Salary per Day (£)	-.043b	-.612	.541	-.040	.842	1.188	.842
	Number of Years as a Model	-.856b	-4.130	.000	-.264	.088	11.311	.088
2	Salary per Day (£)	-.088c	-1.289	.199	-.085	.822	1.217	.082

Model	Dimension	Eigenvalue	Condition Index	Variance Proportions
				(Constant)	Age (Years)	Number of Years as a Model
1	1	1.991	1.000	.00	.00
	2	.009	15.019	1.00	1.00
2	1	2.944	1.000	.00	.00	.00
	2	.055	7.301	.03	.00	.10
	3	.001	54.019	.97	1.00	.90

	Minimum	Maximum	Mean	Std. Deviation	N
Predicted Value	68.4663	82.4314	75.9447	2.46980	231
Residual	-15.09981	22.88941	.00000	6.30667	231
Std. Predicted Value	-3.028	2.626	.000	1.000	231
Std. Residual	-2.384	3.614	.000	.996	231

Charts

Section 6

APA Style tables for academic reporting

Table 1

	Mean	Std. Deviation	N
Attractiveness (%)	75.9447	6.77303	231
Salary per Day (£)	11.3385	16.02644	231
Age (Years)	18.0679	2.42190	231
Number of Years as a Model	4.5854	1.57865	231

Table 2

Attractiveness (%)	Salary per Day (£)	Age (Years)	Number of Years as a Model
Pearson Correlation	Attractiveness (%)	1.000	.068	.261	.173
	Salary per Day (£)	.068	1.000	.397	.337
	Age (Years)	.261	.397	1.000	.955
	Number of Years as a Model	.173	.337	.955	1.000
Sig. (1-tailed)	Attractiveness (%)	.	.152	.000	.004
	Salary per Day (£)	.152	.	.000	.000
	Age (Years)	.000	.000	.	.000
	Number of Years as a Model	.004	.000	.000	.

Section 7

APA Report for Multiple Regression

Multiple Regression statistical analysis was conducted on data in the Supermodel.sav data set from the Field text to predict attractiveness of model from age in years, salary per day and years of model. From the descriptive statistics table 1, the N value is 231 and there was no missing N value reported. The independent variables are age, salary per day and years of model while the dependent variable is the attractiveness. Three assumptions were tested namely; outlier, normality of variable and multicolinearity using PASW statistic before the analysis of data. It was detected that only assumption of normality of variables was met, the two other assumptions of multicolinearity and outlier were violated.

The null hypothesis (H0) is rejected as the p = 0.0001 < 0.0005 while the alternate hypothesis (H1) is accepted. Therefore, looking at the significant (sig.) column, the independent variables display (age and years of model) have a coefficients that are statistically significantly different from 0 (zero) at (F (2, 288) = 17.483, p = 0.0001 < 0.0005, R2 = 0.133, one-tailed). That is, there is sufficient evidence to conclude that age and years of model are predictors of attractiveness of models. The salary was removed from the model output and this shows that it is statistically not significant. Therefore, the salary per day is not a predictive of attractiveness of a model.

In conclusion, a multiple regression was run to predict attractiveness of a model from age, years of model and salary per day. The two variables; age and years of model statistically significantly predicted attractiveness (F (2, 288) = 17.483, p < 0.0005, R2 = .133). Only the two independent variables (age and years of model) added statistically significantly to the prediction of attractiveness, p <0.05.

Section 8

Describe how you would compute the sample size using power =.80, effect size of .50, alpha of .05. Does your analysis support the sample size of the data you ran?

Calculation of Sample Size using G*Power

Steps:

Input parameter

Test family: F-tests selected.
Statistical test: Linear multiple regression: fixed model, R2 deviation from zero.
Type of power analysis: A prior: Compute required sample size- given α, power, and effect size selected.

Effect size f2 – 0.50

α err prob – 0.05

Power (1-β err prob) – 0.80

Number of predictors – 3

Then after entering all these parameter, I click on calculate to get the output parameter and copied graph below;

Output parameter

Noncentrality parameter λ – 13.5000000

Critical F – 3.0279984

Numerator df- 3

Denominator df – 23

Total sample size – 27

Actual power – 0.8182141

Output graph representation

Using G*Power to calculate Sample size = 27

No, the sample size calculated using G*Power did not support the sample size of the data run (N = 231). The calculated sample size using G*Power was around one tenth of the actual sample size (N = 231) run on multiple regression statistical analysis. In research study large sample size is more dependable in term of generalization of the results of the study to larger population (Green & Salkind, 2014). Therefore, the sample size used for this analysis was adequate and far above calculated sample size with G*Power, that is adequate number of subjects were used for the study.

Does your sample size support the sample size of the data you ran?

References

Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics (4th ed.). London: Sage.

Green, S. B., & Salkind, N. J. (2014). Using SPSS for Windows and Macintosh: Analyzing and understanding data (7th ed.). Upper Saddle River, NJ: Pearson Education.

Sage Publications. (2013). Andy Field’s Datasets [Data files]. Available from Discovering Statistics Using IBM SPSS Statistics companion website: http://www.sagepub.com/field4e/study/datasets.htm

IBM PASW (formerly SPSS) Statistical Software version 21.

G*Power statistical software 3.1.9.2.

Place an Order

Plagiarism Free!

Create an Account

Create an account at Top Tutor Online

Allows you to track orders.
Receive personal messages.
Send messages to a tutor.

Create Account

Post a Question/ Assignment

Post your specific assignment

Tutors will be notified of your assignment.
Review your question and include all the details.
A payment Link will be sent to you.

Post a Question

Wait for your Answer!

Make payment and wait for your answer

Make payment in accordance with the number of pages to be written.
Wait for your Answer as a professional works on your paper.
You will be notified when your Answer is ready.

💙🤍💚

Related Posts