Comparing lists of years for similarity
2 views (last 30 days)
Show older comments
My problem involves calibrating a numerical model which predicts some event which happens or not in each year. It could be economic events, coral bleaching, or many other things. I want to compare the similarity of results from different model versions, or with real-world historical data.
The models are expected to miss quite often, so looking for exact matches won't do. Size of error matters so Wilcoxson rank-sum won't do. The lists will often be different in length, and they could be quite a bit longer than my examples below.
Examples of what is subjectively "good" and "bad".
A = [1968 1972 1991 1993 2001 2010]
B = [1968 1972 1993 2001 2010]
C = [1969 1973 1991 1995 2001 2011]
D = [1950 1960 1991 1993 2001 2050]
E = [1968 1972 1991 1993 2001 2010 2050]
Consider A to be "correct"
B is missing one year entirely, but this is not disastrous.
C has only two matching values, but the others are close, I'd call this better than B.
D has three exact matches, but the others are way off. I'd consider this the worst.
E has five exact matches and one really bad point. Again, not disastrous.
Of course I don't expect an algorithm to match my subjective evaluation all the time. I just want it to take the things I have mentioned into account.
If I were to make up an algorithm off the cuff I'd probably try to for look points with near neighbors in the other list and score their distances root-mean-square style, with some maximum value counted against any points left with no neighbor. This is really crude, and there must be a better way.
Suggestions, please!
0 Comments
Answers (3)
Guillaume
on 14 Dec 2016
It sounds like you need some sort of edit distance calculation. A pure edit distance algorithm would rank B as better than C (1 deletion vs 4 substitutions) but you can weight the deletions more than the substitutions and give different weight to the substitutions by how far they are from the original value.
2 Comments
Guillaume
on 15 Dec 2016
Yes, as I said you can modify the standard algorithm to give different weight to substitutions depending on how far they are from the original value.
The concept of what you are trying to do is definitively one of an edit distance, so I'm sure you can find an algorithm already developed somewhere.
KSSV
on 14 Dec 2016
You can try this ismembertol https://in.mathworks.com/help/matlab/ref/ismembertol.html. You can fix some tolerance limits and find out whether two sets of numbers have any common elements. You can decide your scenarios by setting your tolerance limits.
Image Analyst
on 15 Dec 2016
What about ismember() and/or setdiff()? You don't need ismembertol() if all your numbers (years) are integers. setdiff() tells you what numbers are different between the two vectors, and ismember() tells you what number are the same in the two vectors. Neither one cares about position but I don't think that matters to you - you only care if the number(s) is/are present or not in the array.
0 Comments
See Also
Categories
Find more on Discriminant Analysis in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!