Convert file encoding before opening

8 views (last 30 days)
Peter
Peter on 28 Aug 2020
Commented: Walter Roberson on 31 Aug 2020
Is there any way to convert the encoding of a text or csv file in MATLAB? Very often my data files get saved with an inconsistent encoding. I'm not sure exactly what causes this, but certain machines will save them with something other than UTF-8 encoding (such as UTF-8 BOM or UCS-2 LE BOM). MATLAB is not able to interpret the file correctly with most other encodings.
I can change the encoding very easily using Notepad++ before importing the file using MATLAB. The problem is that if I edit and save the file, and then try to reimport, the encoding often reverts. This also happens if I create a new data set and forget to switch the encoding before importing. I'd like to be able to make my import script just convert the file every time, so that I don't get errors if I forget to manually switch the encoding for each file first.
  3 Comments
Peter
Peter on 28 Aug 2020
I'm pretty sure that all files have the same encoding throughout. I hadn't considered the possiblility that a single file could contain a mix of encodings, so I hadn't thought to check for that. I would be surprised if a file did switch part way through, based on how I'm writing the data, so I would say we can assume each file uses only a single encoding.
I don't remember getting any files encoded in UTF-16, but maybe I did and just forgot...
The two encodings listed above seem to be the most common (besides UTF-8, which always works for MATLAB).
Walter Roberson
Walter Roberson on 31 Aug 2020
I investigated, and the only way I could figure out to distinguish between UTF16LE BOM and UCS-2 LE BOM, was to look for invalid surrogate pairs. Surrogate pairs would only be used for UTF16 in the case that the code point was 0xD800 to 0xDFFF or 0x10000 or above. Is that realistic?

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!