r - evaluating linear regression (in microsoft machine learning -
im playing linear regression in azure machine learning , evaluating model.
im still bit unsure various metrics evaluation mean , show, appreciate correction if incorrect.
- mean absolute error: mean of residuals (errors).
- root mean squared error: std dev of residuals. can see how far mean/median absolute error is.
- relative absolute error: percentage value shows percentage difference between relative error , absolute error. lower values better, indicating lower difference.
- relative squared error: square of error relative square of absolute. unsure gives me on relative absolute error.
- coefficient of determination: indication of correlation between inputs. +1 or -1 indicate perfect correlation, 0 indicates none.
- the histogram showing frequency of various buckets of error magnitudes. shows lot of small errors. frequency decreasing value of error increases, indicating, when taken along poor metrics above there sku or outliers having large influence on model, making less accurate.
are these definitions , assumptions correct?
you correct on points. make sure talking in same terms, little bit of background:
a linear regression uses data on outcome variable y
, independent variables x1, x2, ..
, tries find linear combination of x1, x2, ..
best predicts y
. once "best linear combination" established, can assess quality of fit (i.e. quality of model) in multiple ways. 6 points mention key metrics quality of regression equation.
running regression gives multiple "ingredients". example, every observation predicted value outcome variable. difference between observed value of y
, predicted value called residual or error. residuals can negative (if y
overestimated) , positive (if y
underestimated). closer residuals zero, better. but, "close"? metrics present supposed give insight in this.
- mean absolute error: takes absolute value of residuals , takes mean of that.
- root mean square error: standard deviation of residuals. see, how large spread of residuals. residuals squared , therefore, high residuals count in more small residuals. low rmse good.
relative absolute error: absolute error fraction of real value of outcome variable
y
. in case, predictions on average 75% higher/lower actual value ofy
.relative squared error: squared error (
residual^2
) fraction of real value.- coefficient of determination: correct. ranges between 0 , 1 , can interpreted explanatory power of independent variables in explaining
y
. in fact, in case independent variables can model 38,15% of variation iny
. also, if have 1 independent variable, coefficient equal squared correlation coefficient.
root mean squared error , coefficient of determination important metrics in situations. honest, i've never seen other metrics being reported.
Comments
Post a Comment