- To determine how much variance two variables share, or how much variance is explained, or accounted for, by a set of variables (predictors) in an outcome variable.
- Values can range from 0.00 to 1.00, or 0 to 100%.
- In terms of regression analysis, the coefficient of determination is an overall measure of the accuracy of the regression model.
- In simple linear regression analysis, the calculation of this coefficient is to square the r value between the two values, where r is the correlation coefficient.
- In a multiple linear regression analysis, R2 is known as the multiple correlation coefficient of determination.
- It helps to describe how well a regression line fits (a.k.a., goodness of fit). An R2 value of 0 indicates that the regression line does not fit the set of data points and a value of 1 indicates that the regression line perfectly fits the set of data points.
- By definition, R2 is calculated by one minus the Sum of Squares of Residuals (SSerror) divided by the Total Sum of Squares (SStotal): R2 = 1 – (SSerror / SStotal).
- In the case of a multiple linear regression, if the predictor variables are too correlated with one another (referred to as multicollinearity), this can cause the coefficient of determination to be higher in value.
- If, for whatever reason, there is multicollinearity in the regression model, the Adjusted R Squared (Adjusted Coefficient of Determination) should be interpreted. The Adjusted R2 can take on negative values, but should always be less than or equal to the Coefficient of Determination. Note: The Adjusted R2 will only increase if more predictors variables are added to the regression model.
- Inversely, the Coefficient of Non-Determination explains the amount of unexplained, or unaccounted for, variance between two variables, or between a set of variables (predictors) in an outcome variable. Where the Coefficient of Non-Determination is simply 1 – R2.