Convert table with different types

Question

Hüseyin Uzun on 21 Jun 2021

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/861465-convert-table-with-different-types

Commented: Hüseyin Uzun on 22 Jun 2021

Data.xlsx

here is a part of my script, when i read the table, the column 2, 3 and 4 is not recognized as string. According to Matlab these columns are cells. How can I change this? The rest of the columns, except for 1, are strings for which my script works. The solutions I have found in the forums, I could not transfer to my problem.

table=readtable('C:\Users\Hüs\Documents\MATLAB\Datalogger\TFAbisJun21.xlsx');
x=table.Timestamp;      %Datum und Uhrzeit auslesen
x=x(end-468:end);
time=datetime(datestr((x)));
%Raum 210
temp210=table.T0;    %Temperatur Raum210 auslesen
temp210=temp210(end-468:end);
temp210=str2double(temp210);
hum210=table.RH0;          %Luftfeuchtigkeit Raum210 auslesen
hum210=hum210(end-468:end);
hum210=str2double(hum210);
...

6 Comments
Show 4 older commentsHide 4 older comments

dpb on 21 Jun 2021

Open in MATLAB Online

I get no such behavior --

>> opt=detectImportOptions('data.xlsx')
opt = 
  SpreadsheetImportOptions with properties:
   Sheet Properties:
                        Sheet: ''
   Replacement Properties:
                  MissingRule: 'fill'
              ImportErrorRule: 'fill'
   Variable Import Properties: Set types by name using setvartype
                VariableNames: {'Timestamp', 'T0', 'RH0' ... and 25 more}
                VariableTypes: {'datetime', 'double', 'double' ... and 25 more}
        SelectedVariableNames: {'Timestamp', 'T0', 'RH0' ... and 25 more}
              VariableOptions: Show all 28 VariableOptions 
	Access VariableOptions sub-properties using setvaropts/getvaropts
           VariableNamingRule: 'modify'
   Range Properties:
                    DataRange: 'A2' (Start Cell)
           VariableNamesRange: 'A1'
                RowNamesRange: ''
           VariableUnitsRange: ''
    VariableDescriptionsRange: '' 
	To display a preview of the table, use preview
>> preview('data.xlsx',opt)
ans =
  8×28 table
         Timestamp           T0     RH0    DEWI    T1    RH1    DEW1     T2     RH2    DEW2     T3     RH3    DEW3     T4     RH4    DEW4     T5     RH5    DEW5     T6     RH6    DEW6     T7     RH7    DEW7      T8         RH8       DEW8  
    ____________________    ____    ___    ____    __    ___    ____    ____    ___    ____    ____    ___    ____    ____    ___    ____    ____    ___    ____    ____    ___    ____    ____    ___    ____    _______    _______    _______
    12-Jun-2021 21:20:00    23.4    45     10.8    21    41     7.3       19    44     6.5     20.2    51     9.8     19.7    58     11.2    22.2    51     11.6    26.2    42     12.3    20.3    55       11    {'---'}    {'---'}    {'---'}
    12-Jun-2021 21:30:00    23.4    45     10.8    21    40     6.9     19.2    43     6.3     20.2    51     9.8     19.7    57       11    22.2    50     11.3    26.2    42     12.3    20.4    58     11.9    {'---'}    {'---'}    {'---'}
    12-Jun-2021 21:40:00    23.4    45     10.8    21    40     6.9     19.1    43     6.3     20.1    51     9.7     19.6    56     10.6    22.1    50     11.2    26.1    42     12.2    20.5    55     11.2    {'---'}    {'---'}    {'---'}
    12-Jun-2021 21:50:00    23.3    45     10.7    21    40     6.9       19    43     6.2       20    50     9.3     19.4    56     10.4    22.1    50     11.2    26.1    41     11.9    20.2    58     11.7    {'---'}    {'---'}    {'---'}
    12-Jun-2021 22:00:00    23.3    44     10.4    21    40     6.9       19    42     5.8       20    50     9.3     19.7    58     11.2    22.1    50     11.2    26.1    41     11.9    20.7    56     11.6    {'---'}    {'---'}    {'---'}
    12-Jun-2021 22:10:00    23.2    44     10.3    21    40     6.9       19    42     5.8     19.9    50     9.2     19.7    57       11    22.1    49     10.9    26.1    41     11.9    20.2    56     11.2    {'---'}    {'---'}    {'---'}
    12-Jun-2021 22:20:00    23.2    45     10.6    21    39     6.5     18.9    42     5.7     19.8    50     9.1     19.8    55     10.5      22    49     10.8    26.1    41     11.9    20.6    56     11.5    {'---'}    {'---'}    {'---'}
    12-Jun-2021 22:30:00    23.2    44     10.3    21    39     6.5       19    41     5.5     19.8    50     9.1     19.6    53      9.8    22.1    49     10.9    26.1    40     11.5    20.2    54     10.6    {'---'}    {'---'}    {'---'}
>> tData=readtable('data.xlsx');
>> head(tData)
ans =
  8×28 table
         Timestamp           T0     RH0    DEWI    T1    RH1    DEW1     T2     RH2    DEW2     T3     RH3    DEW3     T4     RH4    DEW4     T5     RH5    DEW5     T6     RH6    DEW6     T7     RH7    DEW7      T8         RH8       DEW8  
    ____________________    ____    ___    ____    __    ___    ____    ____    ___    ____    ____    ___    ____    ____    ___    ____    ____    ___    ____    ____    ___    ____    ____    ___    ____    _______    _______    _______
    12-Jun-2021 21:20:00    23.4    45     10.8    21    41     7.3       19    44     6.5     20.2    51     9.8     19.7    58     11.2    22.2    51     11.6    26.2    42     12.3    20.3    55       11    {'---'}    {'---'}    {'---'}
    12-Jun-2021 21:30:00    23.4    45     10.8    21    40     6.9     19.2    43     6.3     20.2    51     9.8     19.7    57       11    22.2    50     11.3    26.2    42     12.3    20.4    58     11.9    {'---'}    {'---'}    {'---'}
    12-Jun-2021 21:40:00    23.4    45     10.8    21    40     6.9     19.1    43     6.3     20.1    51     9.7     19.6    56     10.6    22.1    50     11.2    26.1    42     12.2    20.5    55     11.2    {'---'}    {'---'}    {'---'}
    12-Jun-2021 21:50:00    23.3    45     10.7    21    40     6.9       19    43     6.2       20    50     9.3     19.4    56     10.4    22.1    50     11.2    26.1    41     11.9    20.2    58     11.7    {'---'}    {'---'}    {'---'}
    12-Jun-2021 22:00:00    23.3    44     10.4    21    40     6.9       19    42     5.8       20    50     9.3     19.7    58     11.2    22.1    50     11.2    26.1    41     11.9    20.7    56     11.6    {'---'}    {'---'}    {'---'}
    12-Jun-2021 22:10:00    23.2    44     10.3    21    40     6.9       19    42     5.8     19.9    50     9.2     19.7    57       11    22.1    49     10.9    26.1    41     11.9    20.2    56     11.2    {'---'}    {'---'}    {'---'}
    12-Jun-2021 22:20:00    23.2    45     10.6    21    39     6.5     18.9    42     5.7     19.8    50     9.1     19.8    55     10.5      22    49     10.8    26.1    41     11.9    20.6    56     11.5    {'---'}    {'---'}    {'---'}
    12-Jun-2021 22:30:00    23.2    44     10.3    21    39     6.5       19    41     5.5     19.8    50     9.1     19.6    53      9.8    22.1    49     10.9    26.1    40     11.5    20.2    54     10.6    {'---'}    {'---'}    {'---'}
>> 

whether use detectImportOptions or not; it reads everything except the last three columns as numeric; they are missing values with '---' stored so they are rightfully imported as cellstr(). You could leave those blank in the workbook and they would get imported automagically as NaN for missing value but be of class double.

Or you could fix up the import rule for missing in the import options object.

dpb on 22 Jun 2021

Open in MATLAB Online

Again, it works as expected here without anything special being done...

>> tTF=readtable('TFAbisJun21a.xlsx');
>> head(tTF)
ans =
  8×25 table
         Timestamp           T0     RH0    DEW0     T1     RH1    DEW1     T2     RH2    DEW2     T3     RH3    DEW3     T4     RH4    DEW4    T5     RH5    DEW5    T6     RH6    DEW6    T7     RH7    DEW7
    ____________________    ____    ___    ____    ____    ___    ____    ____    ___    ____    ____    ___    ____    ____    ___    ____    ___    ___    ____    ___    ___    ____    ___    ___    ____
    14-Aug-2020 12:00:00    29.9    49     18.1    21.5    46     9.4     17.1    57     8.5     21.7    70       16    19.3    72     14.1    NaN    NaN    NaN     NaN    NaN    NaN     NaN    NaN    NaN 
    14-Aug-2020 12:10:00      30    49     18.2    21.4    46     9.3     16.9    59     8.9     21.9    70     16.2    19.2    72     14.1    NaN    NaN    NaN     NaN    NaN    NaN     NaN    NaN    NaN 
    14-Aug-2020 12:20:00      30    49     18.2    21.5    47     9.7     17.6    58     9.3     21.9    70     16.2    19.3    72     14.1    NaN    NaN    NaN     NaN    NaN    NaN     NaN    NaN    NaN 
    14-Aug-2020 12:30:00    30.1    49     18.2    21.5    47     9.7     16.7    60     8.9     21.9    70     16.2    19.3    72     14.1    NaN    NaN    NaN     NaN    NaN    NaN     NaN    NaN    NaN 
    14-Aug-2020 12:40:00    30.1    49     18.2    21.5    47     9.7     17.5    58     9.2       22    69     16.1    19.3    72     14.1    NaN    NaN    NaN     NaN    NaN    NaN     NaN    NaN    NaN 
    14-Aug-2020 12:50:00    30.2    49     18.3    21.5    47     9.7     16.7    60     8.9     22.1    70     16.4    19.3    73     14.4    NaN    NaN    NaN     NaN    NaN    NaN     NaN    NaN    NaN 
    14-Aug-2020 13:00:00    30.2    49     18.3    21.5    47     9.7     17.4    59     9.3     22.1    70     16.4    19.3    73     14.4    NaN    NaN    NaN     NaN    NaN    NaN     NaN    NaN    NaN 
    14-Aug-2020 13:10:00    30.3    50     18.8    21.5    47     9.7     17.2    59     9.1     22.2    70     16.5    19.4    73     14.5    NaN    NaN    NaN     NaN    NaN    NaN     NaN    NaN    NaN 
>> winopen('TFAbisJun21a.xlsx');
>> tTF(14:25,14:end)
ans =
  12×12 table
     T4     RH4    DEW4     T5     RH5    DEW5     T6     RH6    DEW6     T7     RH7    DEW7
    ____    ___    ____    ____    ___    ____    ____    ___    ____    ____    ___    ____
    19.4    73     14.5     NaN    NaN     NaN     NaN    NaN     NaN     NaN    NaN     NaN
    19.4    73     14.5     NaN    NaN     NaN     NaN    NaN     NaN     NaN    NaN     NaN
    19.4    73     14.5    31.4     49    19.4    30.7     50    19.1    30.4     51    19.2
    19.4    73     14.5    25.6     50    14.4    29.5     54    19.2    31.8     43    17.7
    19.4    73     14.5    23.2     54    13.4    28.5     56    18.9    22.8     61    14.9
    19.5    74     14.8      23     54    13.2    28.3     57      19    21.6     59    13.3
    19.5    73     14.6    22.5     53    12.5    28.3     54    18.1    21.7     60    13.6
    19.5    72     14.3    22.3     55    12.9    28.1     53    17.7    21.8     63    14.5
    19.4    72     14.2    22.4     53    12.4    28.1     53    17.7    21.6     63    14.3
    19.5    72     14.3    22.5     54    12.8      28     54    17.9    21.5     58    12.9
    19.5    71     14.1    22.5     53    12.5      28     54    17.9    21.7     61    13.9
    19.4    71       14    22.4     54    12.7      28     55    18.2    21.8     64    14.7
>> 

All variables are read as numeric; the last columns are the "---" missing values indicator for the first 15 records or so so they are interpreted as NaN.

As for the rest, there are some missing values scattered around here and there...

>> all(isfinite(tTF{:,2:end}))
ans =
  1×24 logical array
   1   1   1   0   0   0   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
>> 
>> sum(~isfinite(tTF{:,2:end}))
ans =
   0   0   0  38  38  38   0   0   0  33  33  33  23  23  23 291 291 291  35  35  35  22  22  22
>> 

shows you that only the first and third groups of three columns are all finite; all the rest have at least some missing values that are imported as NaN.

>> arrayfun(@(i)find(~isfinite(tTF{:,i}),1),2:24,'UniformOutput',false)
ans =
  1×23 cell array
  Columns 1 through 12
    {0×1 double}    {0×1 double}    {0×1 double}    {[15152]}    {[15152]}    {[15152]}    {0×1 double}    {0×1 double}    {0×1 double}    {[38164]}    {[38164]}    {[38164]}
  Columns 13 through 23
    {[10075]}    {[10075]}    {[10075]}    {[1]}    {[1]}    {[1]}    {[1]}    {[1]}    {[1]}    {[1]}    {[1]}
>> 

shows you the first location in each column that is missing; looking at that location for the first set of three plus the header lines to see which variables they are shows up the following--

>> tTF([1:5 15152+[0:10]],[1 4:7])
ans =
  16×5 table
         Timestamp          DEW0     T1     RH1    DEW1
    ____________________    ____    ____    ___    ____
    14-Aug-2020 12:00:00    18.1    21.5     46    9.4 
    14-Aug-2020 12:10:00    18.2    21.4     46    9.3 
    14-Aug-2020 12:20:00    18.2    21.5     47    9.7 
    14-Aug-2020 12:30:00    18.2    21.5     47    9.7 
    14-Aug-2020 12:40:00    18.2    21.5     47    9.7 
    27-Nov-2020 20:10:00     2.7     NaN    NaN    NaN 
    27-Nov-2020 20:20:00     2.7    20.6     33    3.8 
    27-Nov-2020 20:30:00     2.7    20.5     34    4.1 
    27-Nov-2020 20:40:00     3.3    20.5     34    4.1 
    27-Nov-2020 20:50:00     2.7    20.5     34    4.1 
    27-Nov-2020 21:00:00     2.6    20.5     33    3.7 
    27-Nov-2020 21:10:00     2.6    20.5     32    3.3 
    27-Nov-2020 21:20:00     2.6    20.6     35    4.6 
    27-Nov-2020 21:30:00     2.6    20.6     32    3.4 
    27-Nov-2020 21:40:00     2.6    20.6     34    4.2 
    27-Nov-2020 21:50:00     2.6    20.6     31    2.9 
>> 

For the first case, there's just one set of missing values in the series; but the above sums show there are 37 more locations in the three series overall; although I didn't look to see where they are, specifically.

The data are being read just fine; it's that there are apparently unknown to you and/or unexpectedly, missing values.

You can either remove those, or possibly interpolate to fill in (although the above distinct step change in variable DEW0 makes the latter seem a dubious proposition as it looks like there can be serious discontinuities in the observations).

Just as a comment on style and use of MATLAB, your code makes many copies of the same data in named local variables instead of using the data in the table directly with the variable names that could reduce the amount of code and duplication and make the code general for a particular purpose and just pass the desired variable names to functions.

It is possible to refer to variables in tables by either column position or by variable name as char() variables; look at the section in the documentation on "referencing and accessing data in tables" to see the full illustration of using tables effectively.

Also, it would be better to not bury "magic" numbers like the 468 that is used in subscripting expressions; instead find the location of interest by logic addressing operations so that it can be generalized or, at least, use variables to hold the numeric value if that is not feasible so the value can be changed readily in one location. The difficulty in using constants is that if you remove the missing values, then those locations can change and if they're hard coded as numbers, then that number must change by some unknown number of positions to compensate.

It doesn't appear the problem is in readtable but that you're just not accounting for the nature of the data.

Hüseyin Uzun on 22 Jun 2021

thank you for your detailed answer, again.

This really helped me to understand matlab in generell better. I checked the documentation "referencing and accessing data in tables" and this is really helpful and also helps me working with the table.

Sign in to comment.

Sign in to answer this question.

Convert table with different types

6 Comments
Show 4 older commentsHide 4 older comments

Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Convert table with different types

6 Comments Show 4 older commentsHide 4 older comments

Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

6 Comments
Show 4 older commentsHide 4 older comments