Convert table with different types

3 views (last 30 days)
Hüseyin Uzun
Hüseyin Uzun on 21 Jun 2021
Commented: Hüseyin Uzun on 22 Jun 2021
here is a part of my script, when i read the table, the column 2, 3 and 4 is not recognized as string. According to Matlab these columns are cells. How can I change this? The rest of the columns, except for 1, are strings for which my script works. The solutions I have found in the forums, I could not transfer to my problem.
table=readtable('C:\Users\Hüs\Documents\MATLAB\Datalogger\TFAbisJun21.xlsx');
x=table.Timestamp; %Datum und Uhrzeit auslesen
x=x(end-468:end);
time=datetime(datestr((x)));
%Raum 210
temp210=table.T0; %Temperatur Raum210 auslesen
temp210=temp210(end-468:end);
temp210=str2double(temp210);
hum210=table.RH0; %Luftfeuchtigkeit Raum210 auslesen
hum210=hum210(end-468:end);
hum210=str2double(hum210);
...
  6 Comments
dpb
dpb on 22 Jun 2021
Again, it works as expected here without anything special being done...
>> tTF=readtable('TFAbisJun21a.xlsx');
>> head(tTF)
ans =
8×25 table
Timestamp T0 RH0 DEW0 T1 RH1 DEW1 T2 RH2 DEW2 T3 RH3 DEW3 T4 RH4 DEW4 T5 RH5 DEW5 T6 RH6 DEW6 T7 RH7 DEW7
____________________ ____ ___ ____ ____ ___ ____ ____ ___ ____ ____ ___ ____ ____ ___ ____ ___ ___ ____ ___ ___ ____ ___ ___ ____
14-Aug-2020 12:00:00 29.9 49 18.1 21.5 46 9.4 17.1 57 8.5 21.7 70 16 19.3 72 14.1 NaN NaN NaN NaN NaN NaN NaN NaN NaN
14-Aug-2020 12:10:00 30 49 18.2 21.4 46 9.3 16.9 59 8.9 21.9 70 16.2 19.2 72 14.1 NaN NaN NaN NaN NaN NaN NaN NaN NaN
14-Aug-2020 12:20:00 30 49 18.2 21.5 47 9.7 17.6 58 9.3 21.9 70 16.2 19.3 72 14.1 NaN NaN NaN NaN NaN NaN NaN NaN NaN
14-Aug-2020 12:30:00 30.1 49 18.2 21.5 47 9.7 16.7 60 8.9 21.9 70 16.2 19.3 72 14.1 NaN NaN NaN NaN NaN NaN NaN NaN NaN
14-Aug-2020 12:40:00 30.1 49 18.2 21.5 47 9.7 17.5 58 9.2 22 69 16.1 19.3 72 14.1 NaN NaN NaN NaN NaN NaN NaN NaN NaN
14-Aug-2020 12:50:00 30.2 49 18.3 21.5 47 9.7 16.7 60 8.9 22.1 70 16.4 19.3 73 14.4 NaN NaN NaN NaN NaN NaN NaN NaN NaN
14-Aug-2020 13:00:00 30.2 49 18.3 21.5 47 9.7 17.4 59 9.3 22.1 70 16.4 19.3 73 14.4 NaN NaN NaN NaN NaN NaN NaN NaN NaN
14-Aug-2020 13:10:00 30.3 50 18.8 21.5 47 9.7 17.2 59 9.1 22.2 70 16.5 19.4 73 14.5 NaN NaN NaN NaN NaN NaN NaN NaN NaN
>> winopen('TFAbisJun21a.xlsx');
>> tTF(14:25,14:end)
ans =
12×12 table
T4 RH4 DEW4 T5 RH5 DEW5 T6 RH6 DEW6 T7 RH7 DEW7
____ ___ ____ ____ ___ ____ ____ ___ ____ ____ ___ ____
19.4 73 14.5 NaN NaN NaN NaN NaN NaN NaN NaN NaN
19.4 73 14.5 NaN NaN NaN NaN NaN NaN NaN NaN NaN
19.4 73 14.5 31.4 49 19.4 30.7 50 19.1 30.4 51 19.2
19.4 73 14.5 25.6 50 14.4 29.5 54 19.2 31.8 43 17.7
19.4 73 14.5 23.2 54 13.4 28.5 56 18.9 22.8 61 14.9
19.5 74 14.8 23 54 13.2 28.3 57 19 21.6 59 13.3
19.5 73 14.6 22.5 53 12.5 28.3 54 18.1 21.7 60 13.6
19.5 72 14.3 22.3 55 12.9 28.1 53 17.7 21.8 63 14.5
19.4 72 14.2 22.4 53 12.4 28.1 53 17.7 21.6 63 14.3
19.5 72 14.3 22.5 54 12.8 28 54 17.9 21.5 58 12.9
19.5 71 14.1 22.5 53 12.5 28 54 17.9 21.7 61 13.9
19.4 71 14 22.4 54 12.7 28 55 18.2 21.8 64 14.7
>>
All variables are read as numeric; the last columns are the "---" missing values indicator for the first 15 records or so so they are interpreted as NaN.
As for the rest, there are some missing values scattered around here and there...
>> all(isfinite(tTF{:,2:end}))
ans =
1×24 logical array
1 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>
>> sum(~isfinite(tTF{:,2:end}))
ans =
0 0 0 38 38 38 0 0 0 33 33 33 23 23 23 291 291 291 35 35 35 22 22 22
>>
shows you that only the first and third groups of three columns are all finite; all the rest have at least some missing values that are imported as NaN.
>> arrayfun(@(i)find(~isfinite(tTF{:,i}),1),2:24,'UniformOutput',false)
ans =
1×23 cell array
Columns 1 through 12
{0×1 double} {0×1 double} {0×1 double} {[15152]} {[15152]} {[15152]} {0×1 double} {0×1 double} {0×1 double} {[38164]} {[38164]} {[38164]}
Columns 13 through 23
{[10075]} {[10075]} {[10075]} {[1]} {[1]} {[1]} {[1]} {[1]} {[1]} {[1]} {[1]}
>>
shows you the first location in each column that is missing; looking at that location for the first set of three plus the header lines to see which variables they are shows up the following--
>> tTF([1:5 15152+[0:10]],[1 4:7])
ans =
16×5 table
Timestamp DEW0 T1 RH1 DEW1
____________________ ____ ____ ___ ____
14-Aug-2020 12:00:00 18.1 21.5 46 9.4
14-Aug-2020 12:10:00 18.2 21.4 46 9.3
14-Aug-2020 12:20:00 18.2 21.5 47 9.7
14-Aug-2020 12:30:00 18.2 21.5 47 9.7
14-Aug-2020 12:40:00 18.2 21.5 47 9.7
27-Nov-2020 20:10:00 2.7 NaN NaN NaN
27-Nov-2020 20:20:00 2.7 20.6 33 3.8
27-Nov-2020 20:30:00 2.7 20.5 34 4.1
27-Nov-2020 20:40:00 3.3 20.5 34 4.1
27-Nov-2020 20:50:00 2.7 20.5 34 4.1
27-Nov-2020 21:00:00 2.6 20.5 33 3.7
27-Nov-2020 21:10:00 2.6 20.5 32 3.3
27-Nov-2020 21:20:00 2.6 20.6 35 4.6
27-Nov-2020 21:30:00 2.6 20.6 32 3.4
27-Nov-2020 21:40:00 2.6 20.6 34 4.2
27-Nov-2020 21:50:00 2.6 20.6 31 2.9
>>
For the first case, there's just one set of missing values in the series; but the above sums show there are 37 more locations in the three series overall; although I didn't look to see where they are, specifically.
The data are being read just fine; it's that there are apparently unknown to you and/or unexpectedly, missing values.
You can either remove those, or possibly interpolate to fill in (although the above distinct step change in variable DEW0 makes the latter seem a dubious proposition as it looks like there can be serious discontinuities in the observations).
Just as a comment on style and use of MATLAB, your code makes many copies of the same data in named local variables instead of using the data in the table directly with the variable names that could reduce the amount of code and duplication and make the code general for a particular purpose and just pass the desired variable names to functions.
It is possible to refer to variables in tables by either column position or by variable name as char() variables; look at the section in the documentation on "referencing and accessing data in tables" to see the full illustration of using tables effectively.
Also, it would be better to not bury "magic" numbers like the 468 that is used in subscripting expressions; instead find the location of interest by logic addressing operations so that it can be generalized or, at least, use variables to hold the numeric value if that is not feasible so the value can be changed readily in one location. The difficulty in using constants is that if you remove the missing values, then those locations can change and if they're hard coded as numbers, then that number must change by some unknown number of positions to compensate.
It doesn't appear the problem is in readtable but that you're just not accounting for the nature of the data.
Hüseyin Uzun
Hüseyin Uzun on 22 Jun 2021
thank you for your detailed answer, again.
This really helped me to understand matlab in generell better. I checked the documentation "referencing and accessing data in tables" and this is really helpful and also helps me working with the table.

Sign in to comment.

Answers (0)

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!