Alternatives to using ismember() for string versus cell array of strings
Show older comments
Back when I was using R2009b, I got into the habit of using ~isempty(find(strcmp(thisstring,cellarrayofstrings))) in place of ismember(). In the case of testing a single string against an array of strings, this was about 50x as fast for my particular test arrays in R2009b.
Going back and revisiting some of the same optimization motivations in newer versions (particularly R2015b, though the students use newer versions), I find that it's still about 2x-4x as fast. As these sort of tests are often part of parsing input arguments to a function, these little bits of time just add to the overhead (sometimes as much as 20% of total execution time), and I'd like to pare them down where I can.
I know Jan has CStrAinBP() on the FEX, but at least for a single string versus an array, it's slower than my current approach. While the MEX variant is probably better, I can't practically introduce a dependency on something that a student is going to need priviliges or extra know-how to compile. Compiling MEX functions is just not really an option. That probably rules out ideal solutions.
I've seen some suggestions regarding pre-sorting, though I haven't observed any advantage in doing so. I've seen suggestions to use ismembc, though that requires enough ancillary preparation and output handling that it winds up being slower any way I try to use it. I've tried just making the expression slightly less ugly by doing something like any(strcmp(thisstring,cellarrayofstrings)) or sum(strcmp(thisstring,cellarrayofstrings))>0, though with some sets, those were very slightly slower in older versions.
To clarify, I'm only testing a single string against a cell array of 5-20 strings. I know I'm probably misguided to be chasing a few microseconds, but I'd be glad to know if someone knows a faster or less ugly way than my workaround.
2 Comments
Walter Roberson
on 19 Jul 2020
any(strcmp(thisstring,cellarrayofstrings))
Possibly. In the case that thisstring is character vector or cell array with one character vector or scalar string object, then the any() is not needed.
DGM
on 19 Jul 2020
Accepted Answer
More Answers (0)
Categories
Find more on Loops and Conditional Statements in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!