The intended purpose of this paper is to discuss four articles that are written about test limitations and some of the errors that can occur. It will show multiple situations where researchers used their tested hypotheses and how they used the assumptions of both normal and asymptotical distribution. It will be discussed as to why some researchers cautioned the use and interpretation of the Cronbach’s alpha. It will be discussed as to how important assessments are both valid and reliable. This paper will also discuss the assessment known as the SAT. Regardless of what type of assessment that is used it is important that it is valid and reliable.

A popular test that is used by many colleges is the SAT test. In the beginning it was known as the scholastic aptitude test, but in the early 1990’s it was called the reasoning test. Now it is called the subject test. The purpose of the SAT test is to measure college readiness and to show the student’s future academic success. This assessment is used as a predicator to determine if the student is ready for college. This assessment can be retaken but the student has to pay a fee. There has been no evidence that shows that the scores significantly change if the student takes the assessment more than one time (Grissmer, 2000).

Every assessment should be valid which means that there is evidence that it measures what it is said to measure. The interpretation of the scores have to be appropriated for the intended purpose. In relation to the SAT assessment there has been research that has proven this to be a valid and reliable assessment. With all assessments there can be errors and this assessment is no different. Some errors that can occur with this assessment is the wording of the questions, computer errors in scoring, if there are students who do not speak good English then this could also be a problem. The components of this test are 52 multiple choice questions in reading, 44 multiple-choice questions in writing and language, 58 multiple choice question broken up into 20 no calculator and 38 with calculator, and some colleges use essay where the student reads an article then builds a persuasive argument (Miller, & Lovler, 2015).

An example of an error could be if the administrator stops a part of the test five-minuets early, and the next part of the test goes over five-minuets then this could be an error. If the student’s have to re-write a section of the test in cursive writing then this could be an issue, because there are many student’s who cannot read or write in cursive. If students are nervous about taking the test or not well-rested then this can be an issue (Murray, & Herrnstein, 1992).

There are other examples that can cause errors such as the students education level, bad administration of the test, limitations placed on the test are just a few. The article titled “Hypothesis testing for coefficient alpha: An SEM approach, and “Starting at the beginning: An introduction to coefficient alpha and internal consistency”, explains how important it is to prove that a test is both valid and reliable. The alpha is shown by the confidence proportion of a test that is represented by a coefficient of 95%. The alpha is the level along with the chance of making a type 1-error in a test (Maydeu-Olivares, Coffman, Garcia-Forero, & Gallardo-Pujol, 2010).

There are different ways to test a hypotheses for coefficient alpha and one way is if the coefficient alpha equals a prespecified value, two statistically independent sample alphas may arise when testing the equality of coefficient alpha across groups. A third method is involving two statistically dependent sample alphas as may arise when testing the equality of alpha across time or even when testing the equality of alpha for two test scores that are within one sample (Maydeu-Olivares, et al., 2010).

There are assessments like the intelligibility context scale (ICS) that is used to measure the level of intelligibility with the ability of spoken and auditory language as it relates in context to the conversation. This assessment has a high internal consistency with a Cronbach’s alpha at a 93% with construct validity. The comprehension of written-grammar test (CWT) which is an assessment that measure the comprehension of written grammar. The coefficient alpha had a 98% construct validity for children wo are hearing and mote than 90% for children who are deaf or hard of hearing (DHH). This test has a four-week test-retest reliability, know-group validity (Cannon, Hubley, Millhoff, & Mazlouman, 2016).

With the assessments of the ICS and CWT it is stressed that when using the Cronbach’s alpha, it should in fact be approached with caution, and the reason is because I number of different errors can be produced. For example, is a person takes the Wechsler Adult Intelligence Scale-III (WAIS-III) and score 80, the main point is that the person with that score started learning English language only two-years prior. This means the coefficient alpha for the new English learner is skewed and is not an accurate intelligence reflection of the test taker (Streiner, 2003).

A myth associated with the coefficient alphas can be once it is determined in one study, then we know the reliability of the scale under all circumstances. Reliability relies on the test and retest scores of an assessment and those parallel with assessment that test the same ability. With the ICS assessment it is translated in more than 60-languages, but it is not stated if its assessments were comprisable to other assessments. The same is with the CWT there is no mention of the assessment being compared to similar assessments to produce the same results, and with both of these assessments the sample size was small (Miller, & Lovler, 2015).

Another myth is that bigger is always better, as assessors it is favored to have high levels of agreement between raters that produce consistent scores as times goes on. If an assessment has subscales, they should each be more homogeneous than the scale as a whole, and they should not be too high, because higher scores could reflect redundancy in questions. These myths are very important factors that need to be considered, because neither of the assessments are explicitly shown to have dealt with these issues (Streiner, 2003).

Another myth is that the alpha only measures the internal consistency of the scale. This is not always true. The reason for this myth is existent is because the alpha is significantly affected by the length of the scale (Streiner, 2003).

When testing a hypothesis involving a single-sample coefficient alpha, within a Sem framework. In order to get a coefficient alpha a model-free standard error, was not a complicated procedure. Hypothesis testing where two statistically independent sample coefficient alphas. Two populations were used to extend the previous SEM setup two populations. The most complicated one would be is a hypothesis testing that includes two statistically dependent sample coefficient alphas. Two test scores computed on the same sample of participants have to be considered. One thing that can occur is that two test scores being compared are alternate forms of the same exact test, and when two test scores correspond to pretest and posttest administrations of the exact same test (Maydeu-Olivares, et al., 2010).

The authors attribute the assumptions of both normal and asymptotically distributions to the ways used to perform a hypothesis testing. When using a coefficient alpha, it relies on the estimation of the variability of sample of coefficient alpha. Initial proposals for estimating the standard error for the coefficient alpha, is based on the model and the distributional assumptions. The statistical inferences for coefficient alpha are model free type and they do not require the assumptions that all the components which make up the test score were normally distributed (Maydeu-Olivares, et al., 2010).

The Cronbach’s alpha is the only used index of the reliability of a scale because of the internal consistency, adversely the use of and interpretations may lead to many errors, and it could be a counterproductive index of reliability if it is not properly interpreted. The Cronbach’s alpha is the function of the group of numbers of items on an assessment, the average between the items paired and the variance of the scores in total. The alpha is affected by the length of the scale and if the values are high it does not guarantee internal consistency or unidimensional values, and this can make the alpha values be too high (Streiner, 2003).

Clearly the four alpha associated myths to determine reliability are it is a fixed property of the scale, the measuring of the internal consistency of the scale, higher values are taken into consideration over lower ones when comparing them and it is supposed restriction to the range of zero to one (Streiner, 2003).

In the early 2000’s there was a scoring error that happened with the SAT scores and it was acknowledged by Pearson Educational Measurement. The company made scoring errors in more than 8,000 tests in the state of Minnesota, and it prevented many high school seniors from graduating. The company made errors in the states of Washington, and Virginia. Pearson Educational Measurement stated that the SAT errors did affect 4,000 students out of 495,000 that had taken the test. The cause was excessive moisture that caused the answer sheets to expand before they were scanned at the company’s large test-processing site in Austin, Texas (Arenson, & Henriques, 2006).

Pearson Educational Measurement was sued, and they stated that they were correcting the problem making sure that it would not happen again. The scores were off by as many as 400 points out of 2,400 points on the three-part test. Families in Minnesota sued them because the student on the writing section should have gotten a score of 700 but only received 690, and there was another student who had complained about their score as well. This was what started the college board to conduct an investigation of the company (Arenson, & Henriques, 2006).

This made students that were going to take their SAT assessment really nervous because this test is the most widely used. The score helps to determine if a student will be admitted to the college that they want to attend. This is important to the student because their success in college can determine what type of life style they will be living. A person’s type of career can depend on their education level (Arenson, & Henriques, 2006).

Every parent wants their child to have the best education that is available to them. There are a lot of jobs that require a person to have a college education. It has been on the news lately where some well-known celebrities paid for their child or children to be admitted to some of the best colleges. This was done by getting someone else to take their child’s SAT test or having someone adjust the SAT scores. This was wrong and when the parents did it, they knew that it was wrong, but they probably never thought that they would be caught. There are several celebrities that could be facing serious jail time because of what they have done (Wermund, & Hefling, 2019).

The parents are not the only ones who are in trouble though, the people who took the bribes to alter the test scores, and the ones who took the test for the students are also in trouble as well, and they are facing serious jail time. The parents only wanted the best for their child or children, but this is something that should never have happened for any reason whatsoever. Some of the children have stated that they did not know what their parents had done. This is a perfect example of how scoring errors can happen on the administrators’ part. The administrators were bribed in some of these cases (Wermund, & Hefling, 2019).

The main purpose of any study or test is to be reliable, and valid and all margins of errors have to be taken into consideration. The alpha is the expression of acceptance or confidence proportion of a study or assessment and it has to be represented by a coefficient of ninety-five percent or point-ninety-five, leaving a point zero twenty-five on each zone of a normal distribution as error margin. The Cronbach’s alpha is the most widely used index of the reliability of a scale, but the use of it and the interpretation of it can lead to so many different errors (Streiner, 2003).


Each and every one of the above-mentioned articles have shown how important it is that any test be both valid and reliable. The test has to measure what it says it will in order for it to be valid and reliable. Some errors that can occur are the test administrators, if the student is not focused on the test, the environment, any language barrier, the length of the test, all of these are just a few of the errors that can occur. An error can mean that a person may be misdiagnosed or placed on the wrong type of medication or placed in the wrong type of therapy. The one giving the test should be trained the proper way on how to administer the test. This paper has shown how important it is for no errors to be made. It is important that a test and retest can be given, as well as a pilot test being conducted first. The pilot test will ensure that the test is measuring what it is intended to measure.


