How to use Unicode numeric values in regexprep?
Show older comments
How can "Häagen-Dasz" be converted to "Haagen-Dasz" using Uincode numeric values? For example,
regexprep('Häagen-Dasz','ä','A')
works fine, but
regexprep('Häagen-Dasz','\x{C4}','a')
does not. Here, the hexadecimal \x{C4} stands for [latin capital letter a] with diaeresis, i.e. [ä].
1 Comment
VBBV
on 28 Mar 2024
I am not sure if i understand your question right, but Read this answer below
Accepted Answer
More Answers (2)
inp = 'Häagen-Dasz';
baz = @(v)char(v(1)); % only need the first decomposed character.
out = arrayfun(@(c)baz(py.unicodedata.normalize('NFKD',c)),inp) % remove diacritics.
Read more:
https://docs.python.org/3/library/unicodedata.html
https://stackoverflow.com/questions/16467479/normalizing-unicode
regexprep('Häagen-Dasz','ä','A')
regexprep('Häagen-Dasz','ä','\x{C4}')
2 Comments
regexprep('Häagen-Dasz','\x{e4}','a')
VBBV
on 28 Mar 2024
The unicode character for small a is \x{e4}
Categories
Find more on App Building in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!