Convert cell array of strings to unicode quickly

I have an array of approximately 10M strings, and I'm interested in converting each string to its unicode values. Is there a quick, one-line way to convert the whole string array into numeric values? Ideally, I'd love a solution like this:
numeric_matrix = double(string_array);
But of course double (and unicode2native) does not support cells. So my current solution is to loop through the string array:
for ii = 1:length(string_array)
numeric_matrix(ii,:) = double(string_array{ii});
end
Unfortunately this for-loop solution is very inefficient. It can take upwards of 10 minutes for very large numbers of strings. I tried googling this but didn't see anything better. Is there a simpler, faster way to do this, ideally in one line?

 Accepted Answer

Try
numeric_array = cellfun(@uint16, stringarray);
Try it on a smaller subset first as I do not know how the timing would compare. It should have the advantage of not needing to change the internal representation.

3 Comments

Thanks a lot. I should have thought of cellfun!
However it's strange - using the for loop is twice as fast as cellfun. This might be because I'm doing an extra operation on the unicodes (multiplying by a vector and summing), but I don't see why that would penalize cellfun and not the for loop.
As far as I understand, matlab native encoding is not unicode but whatever is your system locale, so converting the string to double (or uint16) may not convert it to unicode unless your locale is also unicode. You would have to call native2unicode on the strings to be sure.
Most likely your cellfun is slower than a loop because you're using an anonymous function to perform your extra operation. Anonymous function calls have a significant overhead in matlab.
Thanks. I'm not interested in the unicode values per se. I just wanted a way to turn a string into a (hopefully) unique numeric value. But that's good to know about unicode.
And thanks for mentioning the anonymous function. That's probably what's happening!

Sign in to comment.

More Answers (0)

Categories

Asked:

on 2 Feb 2016

Commented:

on 2 Feb 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!