Cross-validation the output of "scatteredInterpolant" in order to choose best method (linear, nearest, and natural)

2 views (last 30 days)
Dear all,
I had the value of precipitation in 93 scattered coordinate stations; I used "scatteredInterpolant" to interpolate this 93 scattered data in gridded coordinates.
[new_lons,new_lats] = ndgrid(44.25:.5:63.75,24.25:.5:39.75); %make the grid for my new lats/lons
After doing that I achieved 1280 gridded data. The exact coordinates of above mentioned 93 scattered data not included between these 1280. I mean I achieved new values on new coordinates.
Now I want to check R2 and RMSE of different methods that included in scatteredInterpolant (linear, natural, and nearest) to investigate which interpolation method was good for my data set.
I think I should using scattered interpolation again to interpolate these 1280 values on initial 93 scattered coordinates and check R2 and RMSE of values in first 93 original scattered vales and new interpolated 93 values that interpolated using scatteredinterpolant before.
So am I right?
Is there any better approach available?
I appreciate any suggestions.
Thank you

Accepted Answer

Bjorn Gustavsson
Bjorn Gustavsson on 5 May 2020
To me that sound somewhat sensible, but would primarily check the regular-grid interpolation-method, and not the scatteredInterpolant-methods. My first idea would be to try a leave-one-out attack instead. If you leave one point out from your 93 you could still create the scatteredInterpolanting, then you have one test-point to compare that with an actual observation, then you can repeat and leave another point out (preferably not from the perimeter, I'd guess) to build some statistics.
HTH
  5 Comments
Bjorn Gustavsson
Bjorn Gustavsson on 5 May 2020
1, yeah, it should be good to do this for all months. It seems to be possible to manipulate the interpolant F - meaning that it should be possible to change the values (your precipitations), this would make it very efficient to loop over all months.
I would calculate all sorts of statistics once the core functionality is running, RMSE, R2, correlation etc. That part will cost you almost nothing extra.
2, I'd try to do it for all points on the interior. If you think about what happens when we exclude a point on the convex hull that point will now be outside your grid, and the value we get there will be extrapolated by F, not interpolated. And extrapolation is a dicey thing to do. So I'd separate your points into a group of internal points and the perimeter-points. Then I'd run this LOOCV on all the internal points (scatteredInterpolant built with 92 points but only exclude points from the internal-points-set). If this takes too long sure, use a smaller set.

Sign in to comment.

More Answers (0)

Categories

Find more on Interpolation in Help Center and File Exchange

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!