Compare two CDF distributions

26 views (last 30 days)
MEC
MEC on 16 Mar 2023
Commented: Jeff on 16 Mar 2023
I am trying to compare two CDF distributions that are generated from two datasets of elevation. One dataset is observed elevations from a DEM (HeightDis.txt), the other is predicted elevations from a model (ModelHeight.txt). I want to generate a goodness of fit for how well the model is matching observed elevations.
I tried to use ktest2 but for that they need to be vectors. My two distributions are two-column matrices. The first column is the elevation value, the second column is the probability. The two distributions have different values in both columns. So my question is how do I covert these two distributions into a format that can be used in ktest2 without comprimising the data? I feel that this is an obvious problem, but have not found a solution.
  3 Comments
MEC
MEC on 16 Mar 2023
Apologies. Digital Elevation Model.
Jeff
Jeff on 16 Mar 2023
Does the model have any free parameters that you are estimating from these observed data?

Sign in to comment.

Answers (1)

Star Strider
Star Strider on 16 Mar 2023
I am not sure that either of those tests would be appropriate for these data.
A1 = readmatrix('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1326675/ModelHeight.txt');
A2 = readmatrix('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1326680/HeightDis.txt');
figure
plot(A1(:,1), A1(:,2), '.', 'DisplayName','Model')
hold on
plot(A2(:,1), A2(:,2), '.', 'DisplayName','Observed')
hold off
grid
legend('Location','best')
pdf1 = gradient(A1(:,2)) ./ gradient(A1(:,1));
pdf2 = gradient(A2(:,2)) ./ gradient(A2(:,1));
figure
plot(A1(:,1), pdf1, '.-', 'DisplayName','Model')
hold on
plot(A2(:,1), pdf2, '.-', 'DisplayName','Observed')
hold off
grid
legend('Location','best')
They do not appear to be normally distributed in any event, although assuming that they have the same underlying distribution (whatever it is), perhaps the ranksum test (if these could be considered unpaired data) would be appropriate,, however on the original data, not the probability distributions.
.
  2 Comments
MEC
MEC on 16 Mar 2023
Thank you for the comment. I wanted to use a KS test for easy comparison with another model output, which is also a KS test. But your point is a good one. I also wanted to avoid using the "data" that comes out of the model because it would require some more time-intensive coding that I wished to avoid. Plus it seemed this should, in theory, have been an easy thing to do, which clearly it is proving not to be.
Star Strider
Star Strider on 16 Mar 2023
My pleasure!
Based on the PDF plots, the data appear to not be normally distributed, so I doubt that it would be worthwhile to test for that, although if that is part of your analysis, then it could be appropriate to consider. If you want to compare the model to the data to see if the model explains the data, a completely different approach would be required. That the independent variables are not the same definitely complicates any analysis.

Sign in to comment.

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!