Saturday, March 29, 2008

F-Statistic


RSS is the Regression Sum of Squared
SSE is the Sum of Squared Error
n is the total number of observations
k is the number of independent variables in the regression equation

k and [n - ( k + 1)] is the number of degree of freedom for calculating a F-statistic.

For unbiased MSE calculation, the number of regression coefficients need to be substracted from the n to form the number of degree of freedom for the denominator.

In my understanding, F-statistic aims to highlight the ratio of explained variation to unexplained variation for the regression equation.

The basic concept is roughly like this: Assuming the absence of such regression equation in estimating the value of a dependent variable, let say Y. We probably will use the arithmetic average, denoted Y-bar to estimate the value of Y. In other word, in such cases, all of the variations would be unexplained, which means you can't really tell what factors that causes your prediction deviates from the actual.

So now, some smart ass come out with an equation that claims to be a better estimation method than arithmetic average. Ideally, the RSS will be larger than SSE because now the regression cofficients in the equation would partially (might be fully) explains the deviation from the actual.

If the equation performs as good (as worse) as the arithmetic mean method, then F-statistic would give you a 0 value because RSS will be zero.

No comments: