Why does my mex-function for linear regression return r-squared values above 1?
    3 views (last 30 days)
  
       Show older comments
    
In a mex-file I have a function called ols() that is supposed to calculate a linear regression for two vectors, 'x' and 'y'. Unfortunately, every now and then ols() returns r-squared values that are higher than 1, so at some point there must be an error in my calculations...but I can't figure out where. Any help would be be greatly appreciated.
std::vector<float> ols(const std::vector<float>& x, const std::vector<float>& y){
    if(x.size() != y.size()){ 
        mexErrMsgTxt("Error: The length of vector 'x' does not equal the length of vector 'y'.");
    }   
      float sumx  = 0.0;                     
      float sumx2 = 0.0;                     
      float sumxy = 0.0;                    
      float sumy  = 0.0;                     
      float sumy2 = 0.0;                     
      float n = x.size();
      for (int i=0;i<n;i++){ 
          sumx  += x[i];       
          sumx2 += pow(x[i], 2);  
          sumxy += x[i] * y[i];
          sumy  += y[i];      
          sumy2 += pow(y[i], 2); 
      } 
      float denominator = (n * sumx2 - pow(sumx, 2));
      float slope     = (n * sumxy  -  sumx * sumy) / denominator;
      float intercept = (sumy * sumx2  -  sumx * sumxy) / denominator;
      float r         = (sumxy - sumx * sumy / n) /    
                          sqrt((sumx2 - pow(sumx, 2)/n) *
                          (sumy2 - pow(sumy, 2)/n));
      float rsquared  = pow(r, 2);
      std::vector<float> stats;
      stats.push_back(slope);
      stats.push_back(intercept);
      stats.push_back(rsquared);
      return stats;
  }
0 Comments
Accepted Answer
  James Tursa
      
      
 on 2 Oct 2018
        
      Edited: James Tursa
      
      
 on 2 Oct 2018
  
      I haven't run your code, but just looking at it you are using the worst algorithm numerically with single precision numbers, which is not a good combination. At the very least you should be using a better algorithm, and maybe doing the calculations in double precision. E.g., see this link:
5 Comments
  James Tursa
      
      
 on 2 Oct 2018
				
      Edited: James Tursa
      
      
 on 2 Oct 2018
  
			As a start, maybe try the two pass algorithm since it is much better than the one pass algorithm you are currently using.
More Answers (0)
See Also
Categories
				Find more on Descriptive Statistics in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!