How to evaluate the fitted distributions by comparing quantitative statistics?

7 views (last 30 days)
Hi,
I'm running a matrix of regressions (say, 10*50 regressions, same original dataset, different approaches).
Beside simply compare the R-square of the regressions, I also wish to know how are the residuals distributed since the residuals are important to further investigations. I wish the residuals to be convergent and quite well distributed so that it means the original dataset and fitted curve are closely linked (probability of deviation to be minimized).
Since there are lots of them, the plotting methodology are not suitable (histogram, qqplot, etc.). I wish there are some statistics that can evaluate how good the fitted distributions are so that I can pick the 'good models' and use it as a good approach to the orignal dataset.
With human comparison, I feel the fitted distribution is likely t-localization-scale distributed. I tried coftest, kstest and adtest. However, they all reject the null hypothesis (even when R-square is 0.96). I would like to know how can I change the criteria of null hypothesis? Furthermore, the p-value are miserably low. I'm deeply confused with it.
Here are a comparison of some of the distributions:
As you can see the data1 and data2 are quite well distributed as they have the shape while data3 and data4 are not-so-convergent. I tried to compare loglikelihood but even data4 has some 1.0e+07 level of high loglikelihood. What am I supposed to do?
  7 Comments
Frank
Frank on 27 May 2023
Thanks very much for your time!
By reviewing VAR models I have to admit there are similarities in some aspects. In my understanding VAR model is for forcasting (predict yt based on preview values), but what I'm focusing here is actually the relationship (distribution) among different variables. But you are right, if we ignore the definition of time series properties they are much alike (PCA and (ZZ')^-1 seem quite smilar methodology?). My statistics are getting rusty, I might need more time to figure the true difference.
After examing a few tests, the key issue that cause the problem is still the huge number of observations.It not only affect the tests, but also influence the distfit() function initially. As far as I have explored, there is no setting in distributionFitter tool to target the fitting only to the main area.
I sincerely hope there will be some good tools and methodologies out there so I don't have to build the wheel...
Anyway, thanks very much!
Ive J
Ive J on 6 Jun 2023
I'm not familiar with time-series modeling, but the general idea you've mentioned looks like what partial least-squares (PLS) does, and in that sense you don't need to first get PCs and fit an OLS. Nonetheless, what is your objective with the residuals? Your residuals should be normal otherwise it's a clear violation of OLS assumptions, and as you've already figured it out qq plots are one way to visually check this assumption. Also, what's wrong with adjusted R-squared you get from the linear model (check `fitlm`) which is a direct measure (as @the cyclist just mentioned) of goodness-of-fit.

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!