It is important to interpret the slope of the line in the context of the situation represented by the data. The slope of the line, b, describes how changes in the variables are related. You should NOT use the line to predict the final exam score for a student who earned a grade of 50 on the third exam, because 50 is not within the domain of the x-values in the sample data, which are between 65 and 75. You could use the line to predict the final exam score for a student who earned a grade of 73 on the third exam. If the scatter plot indicates that there is a linear relationship between the variables, then it is reasonable to use a best fit line to make predictions for y given x within the domain of x-values in the sample data, but not necessarily for x-values outside that domain. Remember, it is always important to plot a scatter diagram first. A scatter plot of the data is shown, together with a residuals plot. The owner has data for a 2-year period and chose nine days at random. EXAMPLE:Ī shop owner uses a straight-line regression to estimate the number of ice cream cones that would be sold in a day based on the temperature at noon. By observing the scatter plot of the data, the residuals plot, and the box plot of residuals, together with the linear correlation coefficient, we can usually determine if it is reasonable to conclude that the data are linearly correlated. A box plot of the residuals is also helpful to verify that there are no outliers in the data. It should also show constant error variance, meaning the residuals should not consistently increase (or decrease) as the explanatory variable x increases.Ī residuals plot can be created using StatCrunch or a TI calculator. While a scatter plot of the data should resemble a straight line, a residuals plot should appear random, with no pattern and no outliers. The residuals plot is often shown together with a scatter plot of the data. A residuals plot shows the explanatory variable x on the horizontal axis and the residual for that value on the vertical axis. The difference between these values is called the residual. For each data point used to create the correlation line, a residual y - y can be calculated, where y is the observed value of the response variable and y is the value predicted by the correlation line. Residuals PlotsĪ residuals plot can be used to help determine if a set of ( x, y) data is linearly correlated. r is the correlationĬoefficient, which is discussed in the next section. The slope b can be written as b = r ( s y s x ) b = r ( s y s x ) where s y = the standard deviation of the y values and s x = the standard deviation of the x values. The best fit line always passes through the point ( x ¯, y ¯ ) ( x ¯, y ¯ ). The sample means of the x values and the y values are x ¯ x ¯ and y ¯ y ¯, respectively. Where a = y ¯ − b x ¯ a = y ¯ − b x ¯ and b = Σ ( x − x ¯ ) ( y − y ¯ ) Σ ( x − x ¯ ) 2 b = Σ ( x − x ¯ ) ( y − y ¯ ) Σ ( x − x ¯ ) 2. , 11.įor the example about the third exam scores and the final exam scores for the 11 statistics students, there are 11 data points. Here the point lies above the line and the residual is positive.įor each data point, you can calculate the residuals or errors, y i - ŷ i = ε i for i = 1, 2, 3. In the diagram in Figure 12.10, y 0 – ŷ 0 = ε 0 is the residual for the point shown. If the observed data point lies below the line, the residual is negative, and the line overestimates that actual data value for y. If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line. The absolute value of a residual measures the vertical distance between the actual value of y and the estimated value of y. It is not an error in the sense of a mistake. The term y 0 – ŷ 0 = ε 0 is called the "error" or residual.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |