Tuesday, March 8, 2011

Correlation

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In statistics, multiple correlation is a linear relationship among more than two variables. It is measured by the coefficient of multiple determination, denoted as R2, which is a measure of the fit of a linear regression. A regression's R2 falls somewhere between zero and one (assuming a constant term has been included in the regression); a higher value indicates a stronger relationship among the variables, with a value of one indicating that all data points fall exactly on a line in multidimensional space and a value of zero indicating no relationship at all between the independent variables collectively and the dependent variable.
Unlike the coefficient of determination in a regression involving just two variables, the coefficient of multiple determination is not computationally commutative: a regression of y on x and z will in general have a different R2 than will a regression of z on x and y. For example, suppose that in a particular sample the variable z is uncorrelated with both x and y, while x and y are linearly related to each other. Then a regression of z on y and x will yield an R2 of zero, while a regression of y on x and z will yield a positive R2.

[edit] Fundamental equation of multiple regression analysis

The coefficient of multiple determination R2 (a scalar), can be computed using the vector c of cross-correlations between the predictor variables and the criterion variable, its transpose c', and the matrix Rxx of inter-correlations between predictor variables. The "fundamental equation of multiple regression analysis"[1] is
R2 = c' Rxx−1 c.
The expression on the left side denotes the coefficient of multiple determination. The terms on the right side are the transposed vector c ' of cross-correlations, the inverse of the matrix Rxx of inter-correlations, and the vector c of cross-correlations. Note that if all the predictor variables are uncorrelated, the matrix Rxx is the identity matrix and R2 simply equals c' c, the sum of the squared cross-correlations. Otherwise, the inverted matrix of the inter-correlations removes the redundant variance that results from the inter-correlations of the predictor variables.

No comments:

Post a Comment