How to ignore special characters and retrieve the data prior to the character

I have 40 years of data. Unfortunately, each text file has special characters # or * in them representing the highest or lowest temperatures of that specific day and month. My code works (outside regexp(minT_tbl,'#*','match') and its counterpart). However, the special characters is confusing the program making data wrong. Any help would be great!
close all;
clear all;
clc;
Datafiles = fileDatastore("temp_summary*.txt","ReadFcn",@readMonth,"UniformRead",true);
dataAll = readall(Datafiles)
dataAll.Year = year(dataAll.Day);
dataAll.Month = month(dataAll.Day);
dataAll.DD = day(dataAll.Day)
%delete leap year
LY = (dataAll.Month(:)==2 & dataAll.DD(:)==29);
dataAll(LY,:) = [];
% Unstack variables
minT_tbl = unstack(dataAll,"MinT","Year","GroupingVariables", ["Month","DD"],"VariableNamingRule","preserve")
maxT_tbl = unstack(dataAll,"MaxT","Year","GroupingVariables", ["Month","DD"],"VariableNamingRule","preserve")
yrs =str2double(minT_tbl.Properties.VariableNames(3:end))';
%ignore special characters
regexp(minT_tbl,'#*','match')
regexp(maxT_tbl,'#*','match')
% find min
[Tmin,idxMn] = min(minT_tbl{:,3:end},[],2,'omitnan');
Tmin_yr = yrs(idxMn);
% find max
[Tmax,idxMx] = max(maxT_tbl{:,3:end},[],2,'omitnan');
Tmax_yr = yrs(idxMx);
% find low high
[lowTMax,idxMx] = min(maxT_tbl{:,3:end},[],2,'omitnan');
LowTMax_yr = yrs(idxMx);
% find high low
[highlowTMn,idxMn] = max(minT_tbl{:,3:end},[],2,'omitnan');
HighLowT_yr = yrs(idxMn);
% find avg high
AvgTMx = round(mean(table2array(maxT_tbl(:,3:end)),2,'omitnan'));
% find avg low
AvgTMn = round(mean(table2array(minT_tbl(:,3:end)),2,'omitnan'));
% Results
tempTbl = [maxT_tbl(:,["Month","DD"]), table(Tmax,Tmax_yr,AvgTMx,lowTMax,LowTMax_yr,Tmin,Tmin_yr,AvgTMn,highlowTMn,HighLowT_yr)]
tempTbl2 = splitvars(tempTbl)
FID = fopen('Meda 05 Temperature Climatology.txt','w');
report_date = datetime('now','format','yyyy-MM-dd HH:MM');
fprintf(FID,'Meda 05 Temperature Climatology at %s \n', report_date);
fprintf(FID,"Month DD Temp Max (°F) Tmax_yr AvgTMax (°F) lowTMax (°F) LowTMax_yr TempMin (°F) TMin_yr AvgTMin (°F) HighlowTMin (°F) HighlowT_yr \n");
fprintf(FID,'%3d %6d %7d %14d %11d %11d %15d %11d %13d %10d %13d %17d \n', tempTbl2{:,1:end}');
fclose(FID);
winopen('Meda 05 Temperature Climatology.txt')
function Tbl = readMonth(filename)
opts = detectImportOptions(filename)
opts.ConsecutiveDelimitersRule = 'join';
opts.MissingRule = 'omitvar';
opts = setvartype(opts,'double');
opts.VariableNames = ["Day","MaxT","MinT","AvgT"];
Tbl = readtable(filename,opts);
Tbl = standardizeMissing(Tbl,{999,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'})
Tbl = standardizeMissing(Tbl,{-99,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'})
[~,basename] = fileparts(filename);
nameparts = regexp(basename, '\.', 'split');
dateparts = regexp(nameparts{end}, '_','split');
year_str = dateparts{end}
d = str2double(extract(filename,digitsPattern));
Tbl.Day = datetime(d(3),d(2),Tbl.Day)
end

6 Comments

A couple issues to point out.
  • Because some of your non-leap year files have info for Feb 29, that date gets (correctly) convereted to Mar 1. That means your approach to removing Feb 29 will not catch those dates. You need to check month and day before the date gets converted to a datetime to avoid this. That means checking in the readMonth function.
  • In your read function, none of the following code is used and can be deleted.
[~,basename] = fileparts(filename)
nameparts = regexp(basename, '\.', 'split');
dateparts = regexp(nameparts{end}, '_','split');
year_str = dateparts{end};
  • You can combine all your missing values into a single cell array (added one you missed)
Tbl = standardizeMissing(Tbl,{-99,999,999.9,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'});
  • The line of code that currently calls splitvars is not actually doing anything and can be removed.
tempTbl2 = splitvars(tempTbl)
  • Your MissingRule is likely not the option you want ('omitvar'). This option means do not import a variable (i.e. an entire column of data) if it contains missing data. Fortunately, any missing values have been replaced with a numeric code (e.g. 999) so they are not treated as missing, and that variable is not omitted. I'd remove this line from your code. I think you want to use the EmptyLineRule instead.
What do you mean by "That means checking in the readMonth function"?
You need to check if the month ends in 29 before you convert Day into a datetime. That conversion happens in readMonth. Since 2/29 gets convereted to 3/1 in non-leap year years, you need to identify this date before the converstion to datetime.
Normally this wouldn't be an issue, but for some reason your files contain data for Feb 29 even in non-leap year years.
I wouldn't know where to begin with that. I'm surprised that my LY code
(dataAll.Month(:)== 2 & dataAll.DD(:) == 29);
dataAll(LY,:) = [ ] ;
doesn't do that already by eliminating them
Test it out. It doesn't elminate them because month does not equal 2 anymore, and day does not equal 29. They are now 3 and 1.
dataAll = table();
dataAll.Day = datetime(1981,2,29) % Feb 29, 1981, which is a non-leap year
dataAll = table
Day ___________ 01-Mar-1981
dataAll.Month = month(dataAll.Day);
dataAll.DD = day(dataAll.Day)
dataAll = 1×3 table
Day Month DD ___________ _____ __ 01-Mar-1981 3 1
% Remove all Feb 29 dates from the table
LY = (dataAll.Month(:)== 2 & dataAll.DD(:) == 29);
dataAll(LY,:) = [ ]
dataAll = 1×3 table
Day Month DD ___________ _____ __ 01-Mar-1981 3 1
As you can see, the current LY code did not remove the data.
Well I am stuck then. Because I just tried it different ways and I continue to get the same result.

Sign in to comment.

 Accepted Answer

The following code replaces any * or # characters in a text file with spaces (note that this replaces the existing file with a new file of the same name):
% read the file
fid = fopen(filename,'r');
str = fread(fid,[1 Inf],'*char');
fclose(fid);
% replace any * or # with a space (empty char vector should also work)
str = regexprep(str,'[*#]',' ');
% write the new file
fid = fopen(filename,'w');
fwrite(fid,str);
fclose(fid);
If you don't mind losing the original files that have the * and/or # characters in them, you can run this code for each of your text files before running your code or you can incorporate this code into your readMonth function.
If you want to preserve the original files, make a separate copy of them first, or modify the above code to write to a different file, e.g.:
% write the new file
[fp,fn,ext] = fileparts(filename);
fid = fopen(fullfile(fp,[fn '_modified' ext]),'w');
fwrite(fid,str);
fclose(fid);
and tell fileDatastore to use the modified files only, e.g.:
Datafiles = fileDatastore("temp_summary*_modified.txt","ReadFcn",@readMonth,"UniformRead",true);

20 Comments

@Voss So, I have this idea below. As you can tell, I took what you suggested into account. What I am trying to do is eliminate # * and anything that belongs to February 29th. When I run this, I hope to create the modified txt files to be ran with the code above. Unfortunately, I do get an error with fopen. Any suggestions?
close all;
clear all;
clc;
FID = dir('temp_summary*.txt');
str = fread(FID,[1 Inf],'*char')
% replace any * or # with a space (empty char vector should also work)
str = regexprep(str,'[*#]',' ');
fclose(FID);
for k = 1:numel(FID)
filename = FID(k).name
T{k,:} = readtable(filename);
fn{k,:} = filename;
nrows = size(T{k},1);
Nr = regexp(filename,'\d*','match');
% Check = eomday(cellfun(@str2double,Nr(3)),2) % Check 'filename' Year For Leap Year
% Nr{3} = '2024'; % Artificially Assign Leap Year To Test Code
Yr = cellfun(@str2double,repmat(Nr(3),nrows,1));
Mo = cellfun(@str2double,repmat(Nr(2),nrows,1));
% T{k}.DateTime = datetime(Yr,Mo,T{k}{:,1});
T{k}.YMD = [Yr Mo T{k}{:,1}]; % Create Variable To Replace ''datetime'
end
T{:}
for k = 1:numel(T) % Loop To Remove 29-Feb
Lv = (T{k}.YMD(:,2) == 2) & (T{k}.YMD(:,3) == 29);
T{k}(Lv,:) = [];
if nnz(Lv) % Optional 'if' Block
fprintf('Feb 29 removed from %s\n', fn{k})
end
end
T{:}
% write the new file
[fp,fn,ext] = fileparts(filename);
fid = fopen(fullfile(fp,[fn '_modified' ext]),'w');
fwrite(fid,str);
fclose(fid);
"Unfortunately, I do get an error with fopen. Any suggestions?"
Show us the complete error message.
This doesn't makes sense:
FID = dir('temp_summary*.txt');
str = fread(FID,[1 Inf],'*char')
Don't use output from dir() as input to fread(). The output from dir() is a struct array, and fread() expects a file handle (as returned by fopen), exactly as I showed:
fid = fopen(filename,'r');
str = fread(fid,[1 Inf],'*char');
fclose(fid);
If you want to apply the code in my answer to your set of files, something like the following would work. I strongly suggest you make a backup copy of your text files before you run this in case there are any problems, because this will overwrite the existing files with new files of the same names (where the * and # are replaced with spaces):
files = dir('temp_summary*.txt');
filenames = fullfile({files.folder},{files.name});
for ii = 1:numel(filenames)
% read the file
fid = fopen(filenames{ii},'r');
str = fread(fid,[1 Inf],'*char');
fclose(fid);
% replace any * or # with a space (empty char vector should also work)
str = regexprep(str,'[*#]',' ');
% write the new file
fid = fopen(filenames{ii},'w');
fwrite(fid,str);
fclose(fid);
end
Run that first, and check that the new files are ok (data is correct - same as before - and the * and # are gone).
Once that is done, then let me know and I'll take a look at the separate issue of removing the Feb 29th data.
I'm a novice in terms of coding.
I checked the files and all # * are gone.
That's OK. Start with small tasks, understand how they work, and build up from there.
OK, since the # and * removal is working, let me take a look at how to remove Feb 29th data. I'll comment again.
I included some files from another question, that appear (based on their names) to be February data. I assume you want to remove Feb 29th data only if the year is not a leap year, and if the year is a leap year, to let the Feb 29th data remain in place. The following code does that.
%
% Since (some of) these files have * and/or #, I'm running the removal code
% here. If you have already done that, you don't need to do it again, but
% it doesn't hurt anything to do it again (nothing will be removed from the files).
%
files = dir('temp_summary*.txt');
filenames = fullfile({files.folder},{files.name});
for ii = 1:numel(filenames)
% read the file
fid = fopen(filenames{ii},'r');
str = fread(fid,[1 Inf],'*char');
fclose(fid);
% replace any * or # with a space (empty char vector should also work)
str = regexprep(str,'[*#]',' ');
% write the new file
fid = fopen(filenames{ii},'w');
fwrite(fid,str);
fclose(fid);
end
%
% read the files using modified readMonth, defined below
%
Datafiles = fileDatastore("temp_summary*.txt","ReadFcn",@readMonth,"UniformRead",true);
dataAll = readall(Datafiles);
disp(dataAll)
Day MaxT MinT AvgT ___________ ____ ____ _____ 01-Jan-1981 65 12 38.5 02-Jan-1981 68 28 48 03-Jan-1981 65 17 41 04-Jan-1981 57 22 39.5 05-Jan-1981 46 24 35 06-Jan-1981 61 18 39.5 07-Jan-1981 62 25 43.5 08-Jan-1981 58 12 35 09-Jan-1981 64 11 37.5 10-Jan-1981 65 14 39.5 11-Jan-1981 54 22 38 12-Jan-1981 58 40 49 13-Jan-1981 64 27 45.5 14-Jan-1981 65 19 42 15-Jan-1981 59 19 39 16-Jan-1981 62 23 42.5 17-Jan-1981 61 32 46.5 18-Jan-1981 65 20 42.5 19-Jan-1981 64 27 45.5 20-Jan-1981 70 19 44.5 21-Jan-1981 66 21 43.5 22-Jan-1981 64 22 43 23-Jan-1981 62 33 47.5 24-Jan-1981 57 17 37 25-Jan-1981 55 14 34.5 26-Jan-1981 45 10 27.5 27-Jan-1981 54 23 38.5 28-Jan-1981 52 29 40.5 29-Jan-1981 51 21 36 30-Jan-1981 49 31 40 31-Jan-1981 47 24 35.5 01-Feb-1981 57 12 34.5 02-Feb-1981 58 19 38.5 03-Feb-1981 59 17 37 04-Feb-1981 59 12 35.5 05-Feb-1981 50 12 31.5 06-Feb-1981 54 31 42.5 07-Feb-1981 60 13 36.5 08-Feb-1981 51 29 40 09-Feb-1981 52 36 44 10-Feb-1981 59 24 41.5 11-Feb-1981 61 36 48.5 12-Feb-1981 67 28 46.5 13-Feb-1981 63 21 42 14-Feb-1981 63 29 46 15-Feb-1981 70 26 47 16-Feb-1981 72 29 51.5 17-Feb-1981 77 35 55 18-Feb-1981 79 32 54.5 19-Feb-1981 73 31 52 20-Feb-1981 60 36 47 21-Feb-1981 60 31 45.5 22-Feb-1981 69 30 49.5 23-Feb-1981 71 20 45.5 24-Feb-1981 70 30 50 25-Feb-1981 59 31 45 26-Feb-1981 51 23 37 27-Feb-1981 61 18 38.5 28-Feb-1981 64 21 41.5 01-Feb-1982 51 32 42.5 02-Feb-1982 52 24 39 03-Feb-1982 59 23 40 04-Feb-1982 45 27 37 05-Feb-1982 41 16 28.5 06-Feb-1982 48 10 28 07-Feb-1982 48 8 27 08-Feb-1982 52 20 36 09-Feb-1982 50 19 33.5 10-Feb-1982 35 30 33.5 11-Feb-1982 54 31 42.5 12-Feb-1982 56 27 40.5 13-Feb-1982 60 27 43.5 14-Feb-1982 63 35 48 15-Feb-1982 66 31 47.5 16-Feb-1982 70 40 56 17-Feb-1982 68 36 51 18-Feb-1982 63 41 52 19-Feb-1982 70 33 51 20-Feb-1982 NaN NaN 999.9 21-Feb-1982 NaN NaN 999.9 22-Feb-1982 72 49 60.5 23-Feb-1982 72 34 54 24-Feb-1982 69 27 48 25-Feb-1982 67 32 49.5 26-Feb-1982 67 31 49 27-Feb-1982 66 30 48 28-Feb-1982 68 27 46.5 01-Mar-1981 60 30 46 02-Mar-1981 59 27 42 03-Mar-1981 59 21 39 04-Mar-1981 56 38 47 05-Mar-1981 59 23 42 06-Mar-1981 64 21 42.5 07-Mar-1981 67 34 50.5 08-Mar-1981 69 28 47.5 09-Mar-1981 69 47 57 10-Mar-1981 64 51 58.5 11-Mar-1981 69 40 54.5 12-Mar-1981 70 30 51 13-Mar-1981 61 40 50.5 14-Mar-1981 57 43 49 15-Mar-1981 49 40 44.5 16-Mar-1981 45 36 39.5 17-Mar-1981 50 39 43.5 18-Mar-1981 52 33 42.5 19-Mar-1981 54 29 40.5 20-Mar-1981 56 27 40.5 21-Mar-1981 60 27 42.5 22-Mar-1981 65 26 45.5 23-Mar-1981 70 28 49 24-Mar-1981 69 34 51.5 25-Mar-1981 60 40 51 26-Mar-1981 67 32 49.5 27-Mar-1981 61 37 50 28-Mar-1981 55 36 45.5 29-Mar-1981 60 40 49 30-Mar-1981 66 36 50 31-Mar-1981 51 32 42.5 01-Mar-1982 44 35 40.5 02-Mar-1982 61 31 46 03-Mar-1982 59 24 42 04-Mar-1982 50 43 45.5 05-Mar-1982 51 38 44.5 06-Mar-1982 55 38 46.5 07-Mar-1982 66 28 47 08-Mar-1982 66 32 48 09-Mar-1982 66 27 47.5 10-Mar-1982 62 30 47 11-Mar-1982 66 36 51 12-Mar-1982 55 35 45 13-Mar-1982 60 21 41.5 14-Mar-1982 66 27 46.5 15-Mar-1982 71 38 53 16-Mar-1982 63 40 51.5 17-Mar-1982 64 33 49.5 18-Mar-1982 61 39 50 19-Mar-1982 55 34 45.5 20-Mar-1982 62 28 45 21-Mar-1982 65 32 49.5 22-Mar-1982 70 35 52.5 23-Mar-1982 71 31 52 24-Mar-1982 75 35 55 25-Mar-1982 56 36 47 26-Mar-1982 52 28 40 27-Mar-1982 67 31 49 28-Mar-1982 72 33 52.5 29-Mar-1982 61 40 51.5 30-Mar-1982 66 30 49 31-Mar-1982 65 41 54 01-Mar-1998 66 28 47 02-Mar-1998 65 29 47 03-Mar-1998 62 36 49 04-Mar-1998 63 31 47 05-Mar-1998 52 36 44 06-Mar-1998 53 28 40.5 07-Mar-1998 62 26 44 08-Mar-1998 65 27 46 09-Mar-1998 69 27 48 10-Mar-1998 76 28 52 11-Mar-1998 74 29 51.5 12-Mar-1998 62 44 53 13-Mar-1998 65 43 54 14-Mar-1998 75 32 53.5 15-Mar-1998 73 35 54 16-Mar-1998 73 34 53.5 17-Mar-1998 64 37 50.5 18-Mar-1998 69 27 48 19-Mar-1998 74 34 54 20-Mar-1998 77 31 54 21-Mar-1998 76 36 56 22-Mar-1998 83 37 60 23-Mar-1998 82 50 66 24-Mar-1998 64 49 56.5 25-Mar-1998 60 43 51.5 26-Mar-1998 54 47 50.5 27-Mar-1998 52 34 43 28-Mar-1998 51 34 42.5 29-Mar-1998 60 29 44.5 30-Mar-1998 57 31 44 31-Mar-1998 50 32 41 01-Mar-1999 78 25 51.5 02-Mar-1999 77 28 52.5 03-Mar-1999 70 43 56.5 04-Mar-1999 64 33 48.5 05-Mar-1999 70 24 47 06-Mar-1999 63 33 48 07-Mar-1999 63 18 40.5 08-Mar-1999 61 37 49 09-Mar-1999 63 22 42.5 10-Mar-1999 57 39 48 11-Mar-1999 65 35 50 12-Mar-1999 73 20 46.5 13-Mar-1999 74 28 51 14-Mar-1999 73 24 48.5 15-Mar-1999 67 39 53 16-Mar-1999 78 25 51.5 17-Mar-1999 74 28 51 18-Mar-1999 75 29 52 19-Mar-1999 73 44 58.5 20-Mar-1999 67 26 46.5 21-Mar-1999 74 24 49 22-Mar-1999 71 40 55.5 23-Mar-1999 77 28 52.5 24-Mar-1999 74 36 55 25-Mar-1999 77 31 54 26-Mar-1999 79 33 56 27-Mar-1999 78 33 55.5 28-Mar-1999 80 27 53.5 29-Mar-1999 74 56 65 30-Mar-1999 59 26 42.5 31-Mar-1999 54 15 34.5 01-Mar-2000 59 40 49.5 02-Mar-2000 62 31 46.5 03-Mar-2000 68 32 50 04-Mar-2000 70 36 53 05-Mar-2000 56 36 46 06-Mar-2000 50 31 40.5 07-Mar-2000 56 24 40 08-Mar-2000 50 40 45 09-Mar-2000 57 32 44.5 10-Mar-2000 64 28 46 11-Mar-2000 70 30 50 12-Mar-2000 73 32 52.5 13-Mar-2000 76 32 54 14-Mar-2000 79 34 56.5 15-Mar-2000 77 36 56.5 16-Mar-2000 72 40 56 17-Mar-2000 73 44 58.5 18-Mar-2000 73 39 56 19-Mar-2000 76 32 54 20-Mar-2000 55 39 47 21-Mar-2000 67 40 53.5 22-Mar-2000 74 32 53 23-Mar-2000 71 29 50 24-Mar-2000 74 32 53 25-Mar-2000 78 41 59.5 26-Mar-2000 81 33 57 27-Mar-2000 68 40 54 28-Mar-2000 74 33 53.5 29-Mar-2000 75 33 54 30-Mar-2000 67 40 53.5 31-Mar-2000 70 35 52.5
%
% readMonth function now removes Feb 29th data if the year is not a leap year
%
function Tbl = readMonth(filename)
opts = detectImportOptions(filename);
opts.ConsecutiveDelimitersRule = 'join';
opts.MissingRule = 'omitvar';
opts = setvartype(opts,'double');
opts.VariableNames = ["Day","MaxT","MinT","AvgT"];
Tbl = readtable(filename,opts);
Tbl = standardizeMissing(Tbl,{999,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'});
Tbl = standardizeMissing(Tbl,{-99,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'});
[~,basename] = fileparts(filename);
% use the base file name, not the full file name:
d = str2double(extract(basename,digitsPattern));
if ~leapyear(d(3)) && d(2) == 2 % February of a non-leap-year
Tbl(Tbl.Day == 29,:) = []; % remove the 29th day data, if any
end
Tbl.Day = datetime(d(3),d(2),Tbl.Day);
end
I tried running your code and I received an error (see attached)...code is below
close all;
clear all;
clc;
Datafiles = fileDatastore("temp_summary*.txt","ReadFcn",@readMonth,"UniformRead",true);
dataAll = readall(Datafiles)
dataAll.Year = year(dataAll.Day);
dataAll.Month = month(dataAll.Day);
dataAll.DD = day(dataAll.Day)
% Unstack variables
minT_tbl = unstack(dataAll,"MinT","Year","GroupingVariables", ["Month","DD"],"VariableNamingRule","preserve")
maxT_tbl = unstack(dataAll,"MaxT","Year","GroupingVariables", ["Month","DD"],"VariableNamingRule","preserve")
yrs =str2double(minT_tbl.Properties.VariableNames(3:end))';
% find min
[Tmin,idxMn] = min(minT_tbl{:,3:end},[],2,'omitnan');
Tmin_yr = yrs(idxMn);
% find max
[Tmax,idxMx] = max(maxT_tbl{:,3:end},[],2,'omitnan');
Tmax_yr = yrs(idxMx);
% find low high
[lowTMax,idxMx] = min(maxT_tbl{:,3:end},[],2,'omitnan');
LowTMax_yr = yrs(idxMx);
% find high low
[highlowTMn,idxMn] = max(minT_tbl{:,3:end},[],2,'omitnan');
HighLowT_yr = yrs(idxMn);
% find avg high
AvgTMx = round(mean(table2array(maxT_tbl(:,3:end)),2,'omitnan'));
% find avg low
AvgTMn = round(mean(table2array(minT_tbl(:,3:end)),2,'omitnan'));
% Results
tempTbl = [maxT_tbl(:,["Month","DD"]), table(Tmax,Tmax_yr,AvgTMx,lowTMax,LowTMax_yr,Tmin,Tmin_yr,AvgTMn,highlowTMn,HighLowT_yr)]
tempTbl2 = splitvars(tempTbl)
FID = fopen('Meda 05 Temperature Climatology.txt','w');
report_date = datetime('now','format','yyyy-MM-dd HH:MM');
fprintf(FID,'Meda 05 Temperature Climatology at %s \n', report_date);
fprintf(FID,"Month DD Temp Max (°F) Tmax_yr AvgTMax (°F) lowTMax (°F) LowTMax_yr TempMin (°F) TMin_yr AvgTMin (°F) HighlowTMin (°F) HighlowT_yr \n");
fprintf(FID,'%3d %6d %7d %14d %11d %11d %15d %11d %13d %10d %13d %17d \n', tempTbl2{:,1:end}');
fclose(FID);
winopen('Meda 05 Temperature Climatology.txt')
function Tbl = readMonth(filename)
opts = detectImportOptions(filename);
opts.ConsecutiveDelimitersRule = 'join';
opts.MissingRule = 'omitvar';
opts = setvartype(opts,'double');
opts.VariableNames = ["Day","MaxT","MinT","AvgT"];
Tbl = readtable(filename,opts);
Tbl = standardizeMissing(Tbl,{999,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'});
Tbl = standardizeMissing(Tbl,{-99,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'});
[~,basename] = fileparts(filename);
% use the base file name, not the full file name:
d = str2double(extract(basename,digitsPattern));
if ~leapyear(d(3)) && d(2) == 2 % February of a non-leap-year
Tbl(Tbl.Day == 29,:) = []; % remove the 29th day data, if any
end
Tbl.Day = datetime(d(3),d(2),Tbl.Day);
end
OK, sorry about that. The leapyear function is from the Aerospace Toolbox, which I didn't realize. You can use the attached function instead. Download it and put it in your current MATLAB directory or somewhere else on your path, and then try running the code again.
Unfortunately, I can't download anything on my computer. Can you please post it instead?
function tf = leapyear(y)
if mod(y,4) % year is not divisible by 4
tf = false; % it is a common year
elseif mod(y,100) % year is not divisible by 100
tf = true; % it is a leap year
elseif mod(y,400) % year is not divisible by 400
tf = false; % it is a common year
else
tf = true; % it is a leap year
end
end
Since you load the data one month at a time, a slightly simpler approach might be to just check the last row of the file. If it is 29, delete. No other month will end in 29.
Also, I thought you needed to delete the data for 2/29 from all years including leap years. Is that correct? At least that is what you requested from @Star Strider in your question about leap years. I think this answer will only remove non-leap year 2/29s.
@Cris LaPierre — That’s what I thought, too.
I added a Comment to my Answer, covering that and some other enhancements.
The function that he gave me plus the slight adjustment in the original function allowed all February 29s to be deleted successfully.
@Jonathon Klepatzki — I went back to my code (in the latest Comments that eliminates the # and * characters) and ran it again, saving the derived tables as text files (since the originals were text files) and they were written and read correctly. The ‘YMD’ variable was read as ‘YMD1’, ‘YMD2’, and ‘YMD3’, however all the information was there. (I do not use datetime arrays in my code because of the anomalous treatment of February, however the ‘YMD’ values can be converted to datetime arrays easily enough with the ‘YMD’ variables if leap years are respected and non-leap-years are edited to remove February 29th.) I use a different approach than the code here uses.
The ‘Capture_1.PNG’ information:
imshow(imread('Capture_1.PNG'))
I also do not use the ‘leapyear’ function, whatever it is. I use a completely different approach (that works correctly and without any external functions).
Sorry bud, I was hoping you didn't see the comment I had left. After I went through the files, MATLAB apparently doesn't like them having text of any kind below the four columns. Otherwise, it confuses it with temperature data or whatever else. With that said, I deleted whatever text was in this area of question and the code works fine.
I am lost. My code seems to work correctly when I run it, without any other modifications to it or to the tables or files it creates.
Mentioning me using ‘@’ flags me and I look to see what I need to attend to, if anything, since sometimes it’s just a reference.
@Jonathon Klepatzki, you can specify the NumHeaderLines, VariableNamesLine, VariableUnitsLine, VariableDescriptionsLine, and the DataLines import arguments to correctly import a file that has non-data lines between the variable names and data.
However, where you are using a datastore to import your files, the same import options are used to read in all files. Therefore, all files must be formattted the same or you will get errors like the one you saw.

Sign in to comment.

More Answers (3)

Here is one possible solution, to get the data correctly from the data file:
% Open the data file for reading
FID = fopen('temp_summary.05.03_1998.txt', 'r');
% Initialize a cell array to store the cleaned data
C_Lines = {};
% Read the file line by line
N_line = fgetl(FID);
while ischar(N_line)
% Remove '*' and '#' characters from the line
C_Line = strrep(N_line, '*', '');
C_Line = strrep(C_Line, '#', '');
% Store the cleaned line if it is not empty
if ~isempty(C_Line)
C_Lines{end+1} = C_Line;
end
% Read the next line
N_line = fgetl(FID);
end
% Close the file:
fclose(FID);
% Convert the cell array of cleaned lines to a character array:
C_Data = char(C_Lines)
C_Data = 32×49 char array
' Day Maximum Temp Minimum Temp Average Temp' ' 01 66 28 47.0 ' ' 02 65 29 47.0 ' ' 03 62 36 49.0 ' ' 04 63 31 47.0 ' ' 05 52 36 44.0 ' ' 06 53 28 40.5 ' ' 07 62 26 44.0 ' ' 08 65 27 46.0 ' ' 09 69 27 48.0 ' ' 10 76 28 52.0 ' ' 11 74 29 51.5 ' ' 12 62 44 53.0 ' ' 13 65 43 54.0 ' ' 14 75 32 53.5 ' ' 15 73 35 54.0 ' ' 16 73 34 53.5 ' ' 17 64 37 50.5 ' ' 18 69 27 48.0 ' ' 19 74 34 54.0 ' ' 20 77 31 54.0 ' ' 21 76 36 56.0 ' ' 22 83 37 60.0 ' ' 23 82 50 66.0 ' ' 24 64 49 56.5 ' ' 25 60 43 51.5 ' ' 26 54 47 50.5 ' ' 27 52 34 43.0 ' ' 28 51 34 42.5 ' ' 29 60 29 44.5 ' ' 30 57 31 44.0 ' ' 31 50 32 41.0 '
I think another rather straightforward approach is to treat * and # as delmiters.
I've simplified the read function for readability
Datafiles = fileDatastore("temp_summary*.txt","ReadFcn",@readMonth,"UniformRead",true);
dataAll = readall(Datafiles)
dataAll = 93×4 table
Day MaxT MinT AvgT ___ ____ ____ ____ 1 66 28 47 2 65 29 47 3 62 36 49 4 63 31 47 5 52 36 44 6 53 28 40.5 7 62 26 44 8 65 27 46 9 69 27 48 10 76 28 52 11 74 29 51.5 12 62 44 53 13 65 43 54 14 75 32 53.5 15 73 35 54 16 73 34 53.5
function Tbl = readMonth(filename)
Tbl = readtable(filename,"ConsecutiveDelimitersRule","join","ReadVariableNames",false,...
"Delimiter",{' ','\t','*','#'},"LeadingDelimitersRule",'ignore',...
'EmptyLineRule','skip');
Tbl.Properties.VariableNames = {'Day' 'MaxT' 'MinT' 'AvgT'};
end

5 Comments

I tried your straightforward approach and received an error.
"Error using fileDatestore (line 226)
Error using previewFcn @readmonth for file:
....
The VariableNames property must contain one name for each variable in the table
However, using this (yours) suggestion
Tbl = standardizeMissing(Tbl,{-99,999,999.9,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'});
worked beautifully. In fact, I was surprised to see it corrected the data for the 1st of each month. Sadly, March 1st continues to give me grief with bad data.
The code I shared does not produce this error with any of the 12 files you have shared so far. Can you identify which file is causing this error and share it for us to test with?
Hmm. Works here. Have you shared the full error message (all the red text)?
Datafiles = fileDatastore("temp_summary*.txt","ReadFcn",@readMonth,"UniformRead",true);
dataAll = readall(Datafiles)
dataAll = 124×4 table
Day MaxT MinT AvgT ___ ____ ____ ____ 1 65 12 38.5 2 68 28 48 3 65 17 41 4 57 22 39.5 5 46 24 35 6 61 18 39.5 7 62 25 43.5 8 58 12 35 9 64 11 37.5 10 65 14 39.5 11 54 22 38 12 58 40 49 13 64 27 45.5 14 65 19 42 15 59 19 39 16 62 23 42.5
function Tbl = readMonth(filename)
Tbl = readtable(filename,"ConsecutiveDelimitersRule","join","ReadVariableNames",false,...
"Delimiter",{' ','\t','*','#'},"LeadingDelimitersRule",'ignore',...
'EmptyLineRule','skip');
Tbl.Properties.VariableNames = {'Day' 'MaxT' 'MinT' 'AvgT'};
end
Everything that was said were provided.

Sign in to comment.

To answer the original question:
An alternative way to read the files is to use FixedWidthImportOptions together with readtable() https://www.mathworks.com/help/matlab/ref/matlab.io.text.fixedwidthimportoptions.html

Categories

Products

Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!