How to ignore special characters and retrieve the data prior to the character

Question

Jonathon Klepatzki on 6 Feb 2024

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/2078906-how-to-ignore-special-characters-and-retrieve-the-data-prior-to-the-character

Commented: Cris LaPierre on 13 Feb 2024

I have 40 years of data. Unfortunately, each text file has special characters # or * in them representing the highest or lowest temperatures of that specific day and month. My code works (outside regexp(minT_tbl,'#*','match') and its counterpart). However, the special characters is confusing the program making data wrong. Any help would be great!

close all;
clear all;
clc;
Datafiles = fileDatastore("temp_summary*.txt","ReadFcn",@readMonth,"UniformRead",true);
dataAll = readall(Datafiles)
dataAll.Year = year(dataAll.Day);
dataAll.Month = month(dataAll.Day);
dataAll.DD = day(dataAll.Day)
%delete leap year
LY = (dataAll.Month(:)==2 & dataAll.DD(:)==29);
dataAll(LY,:) = [];
% Unstack variables
minT_tbl = unstack(dataAll,"MinT","Year","GroupingVariables", ["Month","DD"],"VariableNamingRule","preserve")
maxT_tbl = unstack(dataAll,"MaxT","Year","GroupingVariables", ["Month","DD"],"VariableNamingRule","preserve")
yrs =str2double(minT_tbl.Properties.VariableNames(3:end))';
%ignore special characters
regexp(minT_tbl,'#*','match')
regexp(maxT_tbl,'#*','match')
% find min
[Tmin,idxMn] = min(minT_tbl{:,3:end},[],2,'omitnan');
Tmin_yr = yrs(idxMn);
% find max
[Tmax,idxMx] = max(maxT_tbl{:,3:end},[],2,'omitnan');
Tmax_yr = yrs(idxMx);
% find low high
[lowTMax,idxMx] = min(maxT_tbl{:,3:end},[],2,'omitnan');
LowTMax_yr = yrs(idxMx);
% find high low
[highlowTMn,idxMn] = max(minT_tbl{:,3:end},[],2,'omitnan');
HighLowT_yr = yrs(idxMn);
% find avg high
AvgTMx = round(mean(table2array(maxT_tbl(:,3:end)),2,'omitnan'));
% find avg low
AvgTMn = round(mean(table2array(minT_tbl(:,3:end)),2,'omitnan'));
% Results
tempTbl = [maxT_tbl(:,["Month","DD"]), table(Tmax,Tmax_yr,AvgTMx,lowTMax,LowTMax_yr,Tmin,Tmin_yr,AvgTMn,highlowTMn,HighLowT_yr)]
tempTbl2 = splitvars(tempTbl)
FID = fopen('Meda 05 Temperature Climatology.txt','w');
report_date = datetime('now','format','yyyy-MM-dd HH:MM');
fprintf(FID,'Meda 05 Temperature Climatology at %s \n', report_date);
fprintf(FID,"Month   DD   Temp Max (°F)  Tmax_yr   AvgTMax (°F)  lowTMax (°F)  LowTMax_yr   TempMin (°F)  TMin_yr   AvgTMin (°F)  HighlowTMin (°F)  HighlowT_yr \n");
fprintf(FID,'%3d %6d %7d %14d %11d %11d %15d %11d %13d %10d %13d %17d \n', tempTbl2{:,1:end}');
fclose(FID);
winopen('Meda 05 Temperature Climatology.txt')
function Tbl = readMonth(filename)
opts = detectImportOptions(filename)
opts.ConsecutiveDelimitersRule = 'join';
opts.MissingRule = 'omitvar';
opts = setvartype(opts,'double');
opts.VariableNames = ["Day","MaxT","MinT","AvgT"];
Tbl = readtable(filename,opts);
Tbl = standardizeMissing(Tbl,{999,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'})
Tbl = standardizeMissing(Tbl,{-99,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'})
[~,basename] = fileparts(filename);
nameparts = regexp(basename, '\.', 'split');
dateparts = regexp(nameparts{end}, '_','split');
year_str = dateparts{end}
d = str2double(extract(filename,digitsPattern));
Tbl.Day = datetime(d(3),d(2),Tbl.Day)
end

6 Comments
Show 4 older commentsHide 4 older comments

Cris LaPierre on 7 Feb 2024

Edited: Cris LaPierre on 7 Feb 2024

A couple issues to point out.

Because some of your non-leap year files have info for Feb 29, that date gets (correctly) convereted to Mar 1. That means your approach to removing Feb 29 will not catch those dates. You need to check month and day before the date gets converted to a datetime to avoid this. That means checking in the readMonth function.

In your read function, none of the following code is used and can be deleted.

[~,basename] = fileparts(filename)

nameparts = regexp(basename, '\.', 'split');

dateparts = regexp(nameparts{end}, '_','split');

year_str = dateparts{end};

You can combine all your missing values into a single cell array (added one you missed)

Tbl = standardizeMissing(Tbl,{-99,999,999.9,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'});

The line of code that currently calls splitvars is not actually doing anything and can be removed.

tempTbl2 = splitvars(tempTbl)

Your MissingRule is likely not the option you want ('omitvar'). This option means do not import a variable (i.e. an entire column of data) if it contains missing data. Fortunately, any missing values have been replaced with a numeric code (e.g. 999) so they are not treated as missing, and that variable is not omitted. I'd remove this line from your code. I think you want to use the EmptyLineRule instead.

Cris LaPierre on 7 Feb 2024

Open in MATLAB Online

Test it out. It doesn't elminate them because month does not equal 2 anymore, and day does not equal 29. They are now 3 and 1.

dataAll = table();
dataAll.Day = datetime(1981,2,29)  % Feb 29, 1981, which is a non-leap year
dataAll = table
        Day    
    ___________

    01-Mar-1981
dataAll.Month = month(dataAll.Day);
dataAll.DD = day(dataAll.Day)
dataAll = 1×3 table
        Day        Month    DD
    ___________    _____    __

    01-Mar-1981      3      1 
% Remove all Feb 29 dates from the table
LY = (dataAll.Month(:)== 2 & dataAll.DD(:) == 29);
dataAll(LY,:) = [ ]
dataAll = 1×3 table
        Day        Month    DD
    ___________    _____    __

    01-Mar-1981      3      1 

As you can see, the current LY code did not remove the data.

Jonathon Klepatzki on 7 Feb 2024

Well I am stuck then. Because I just tried it different ways and I continue to get the same result.

Sign in to comment.

Sign in to answer this question.

Answer 1

Voss on 6 Feb 2024

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/2078906-how-to-ignore-special-characters-and-retrieve-the-data-prior-to-the-character#answer_1403981

Edited: Voss on 6 Feb 2024

Open in MATLAB Online

The following code replaces any * or # characters in a text file with spaces (note that this replaces the existing file with a new file of the same name):

% read the file
fid = fopen(filename,'r');
str = fread(fid,[1 Inf],'*char');
fclose(fid);
% replace any * or # with a space (empty char vector should also work)
str = regexprep(str,'[*#]',' ');
% write the new file
fid = fopen(filename,'w');
fwrite(fid,str);
fclose(fid);

If you don't mind losing the original files that have the * and/or # characters in them, you can run this code for each of your text files before running your code or you can incorporate this code into your readMonth function.

If you want to preserve the original files, make a separate copy of them first, or modify the above code to write to a different file, e.g.:

% write the new file
[fp,fn,ext] = fileparts(filename);
fid = fopen(fullfile(fp,[fn '_modified' ext]),'w');
fwrite(fid,str);
fclose(fid);

and tell fileDatastore to use the modified files only, e.g.:

Datafiles = fileDatastore("temp_summary*_modified.txt","ReadFcn",@readMonth,"UniformRead",true);

20 Comments
Show 18 older commentsHide 18 older comments

Jonathon Klepatzki on 8 Feb 2024

Edited: Walter Roberson on 8 Feb 2024

Open in MATLAB Online

@Voss So, I have this idea below. As you can tell, I took what you suggested into account. What I am trying to do is eliminate # * and anything that belongs to February 29th. When I run this, I hope to create the modified txt files to be ran with the code above. Unfortunately, I do get an error with fopen. Any suggestions?

close all;
clear all;
clc;
FID = dir('temp_summary*.txt');
str = fread(FID,[1 Inf],'*char')
% replace any * or # with a space (empty char vector should also work)
str = regexprep(str,'[*#]',' ');
fclose(FID);
for k = 1:numel(FID)
    filename = FID(k).name
    T{k,:} = readtable(filename);
    fn{k,:} = filename; 
    nrows = size(T{k},1);
    Nr = regexp(filename,'\d*','match');
    % Check = eomday(cellfun(@str2double,Nr(3)),2)                        % Check 'filename' Year For Leap Year
    % Nr{3} = '2024';                                                     % Artificially Assign Leap Year To Test Code
    Yr = cellfun(@str2double,repmat(Nr(3),nrows,1));
    Mo = cellfun(@str2double,repmat(Nr(2),nrows,1));
    % T{k}.DateTime = datetime(Yr,Mo,T{k}{:,1});
    T{k}.YMD = [Yr Mo T{k}{:,1}];                                       % Create  Variable To Replace ''datetime' 
end
T{:}
for k = 1:numel(T)                                                      % Loop To Remove 29-Feb
    Lv = (T{k}.YMD(:,2) == 2) & (T{k}.YMD(:,3) == 29);
    T{k}(Lv,:) = [];
    if nnz(Lv)                                                          % Optional 'if' Block
        fprintf('Feb 29 removed from %s\n', fn{k})
    end
end
T{:}
% write the new file
[fp,fn,ext] = fileparts(filename);
fid = fopen(fullfile(fp,[fn '_modified' ext]),'w');
fwrite(fid,str);
fclose(fid);

Voss on 8 Feb 2024

Open in MATLAB Online

This doesn't makes sense:

FID = dir('temp_summary*.txt');

str = fread(FID,[1 Inf],'*char')

Don't use output from dir() as input to fread(). The output from dir() is a struct array, and fread() expects a file handle (as returned by fopen), exactly as I showed:

fid = fopen(filename,'r');

str = fread(fid,[1 Inf],'*char');

fclose(fid);

If you want to apply the code in my answer to your set of files, something like the following would work. I strongly suggest you make a backup copy of your text files before you run this in case there are any problems, because this will overwrite the existing files with new files of the same names (where the * and # are replaced with spaces):

files = dir('temp_summary*.txt');
filenames = fullfile({files.folder},{files.name});
for ii = 1:numel(filenames)
    
    % read the file
    fid = fopen(filenames{ii},'r');
    str = fread(fid,[1 Inf],'*char');
    fclose(fid);
    
    % replace any * or # with a space (empty char vector should also work)
    str = regexprep(str,'[*#]',' ');
    
    % write the new file
    fid = fopen(filenames{ii},'w');
    fwrite(fid,str);
    fclose(fid);
    
end

Run that first, and check that the new files are ok (data is correct - same as before - and the * and # are gone).

Once that is done, then let me know and I'll take a look at the separate issue of removing the Feb 29th data.

Voss on 8 Feb 2024

Edited: Voss on 8 Feb 2024

Open in MATLAB Online

I included some files from another question, that appear (based on their names) to be February data. I assume you want to remove Feb 29th data only if the year is not a leap year, and if the year is a leap year, to let the Feb 29th data remain in place. The following code does that.

%
% Since (some of) these files have * and/or #, I'm running the removal code
% here. If you have already done that, you don't need to do it again, but
% it doesn't hurt anything to do it again (nothing will be removed from the files).
%
files = dir('temp_summary*.txt');
filenames = fullfile({files.folder},{files.name});
for ii = 1:numel(filenames)
    
    % read the file
    fid = fopen(filenames{ii},'r');
    str = fread(fid,[1 Inf],'*char');
    fclose(fid);
    
    % replace any * or # with a space (empty char vector should also work)
    str = regexprep(str,'[*#]',' ');
    
    % write the new file
    fid = fopen(filenames{ii},'w');
    fwrite(fid,str);
    fclose(fid);
    
end
%
% read the files using modified readMonth, defined below
% 
Datafiles = fileDatastore("temp_summary*.txt","ReadFcn",@readMonth,"UniformRead",true);
dataAll = readall(Datafiles);
disp(dataAll)
        Day        MaxT    MinT    AvgT 
    ___________    ____    ____    _____

    01-Jan-1981     65      12      38.5
    02-Jan-1981     68      28        48
    03-Jan-1981     65      17        41
    04-Jan-1981     57      22      39.5
    05-Jan-1981     46      24        35
    06-Jan-1981     61      18      39.5
    07-Jan-1981     62      25      43.5
    08-Jan-1981     58      12        35
    09-Jan-1981     64      11      37.5
    10-Jan-1981     65      14      39.5
    11-Jan-1981     54      22        38
    12-Jan-1981     58      40        49
    13-Jan-1981     64      27      45.5
    14-Jan-1981     65      19        42
    15-Jan-1981     59      19        39
    16-Jan-1981     62      23      42.5
    17-Jan-1981     61      32      46.5
    18-Jan-1981     65      20      42.5
    19-Jan-1981     64      27      45.5
    20-Jan-1981     70      19      44.5
    21-Jan-1981     66      21      43.5
    22-Jan-1981     64      22        43
    23-Jan-1981     62      33      47.5
    24-Jan-1981     57      17        37
    25-Jan-1981     55      14      34.5
    26-Jan-1981     45      10      27.5
    27-Jan-1981     54      23      38.5
    28-Jan-1981     52      29      40.5
    29-Jan-1981     51      21        36
    30-Jan-1981     49      31        40
    31-Jan-1981     47      24      35.5
    01-Feb-1981     57      12      34.5
    02-Feb-1981     58      19      38.5
    03-Feb-1981     59      17        37
    04-Feb-1981     59      12      35.5
    05-Feb-1981     50      12      31.5
    06-Feb-1981     54      31      42.5
    07-Feb-1981     60      13      36.5
    08-Feb-1981     51      29        40
    09-Feb-1981     52      36        44
    10-Feb-1981     59      24      41.5
    11-Feb-1981     61      36      48.5
    12-Feb-1981     67      28      46.5
    13-Feb-1981     63      21        42
    14-Feb-1981     63      29        46
    15-Feb-1981     70      26        47
    16-Feb-1981     72      29      51.5
    17-Feb-1981     77      35        55
    18-Feb-1981     79      32      54.5
    19-Feb-1981     73      31        52
    20-Feb-1981     60      36        47
    21-Feb-1981     60      31      45.5
    22-Feb-1981     69      30      49.5
    23-Feb-1981     71      20      45.5
    24-Feb-1981     70      30        50
    25-Feb-1981     59      31        45
    26-Feb-1981     51      23        37
    27-Feb-1981     61      18      38.5
    28-Feb-1981     64      21      41.5
    01-Feb-1982     51      32      42.5
    02-Feb-1982     52      24        39
    03-Feb-1982     59      23        40
    04-Feb-1982     45      27        37
    05-Feb-1982     41      16      28.5
    06-Feb-1982     48      10        28
    07-Feb-1982     48       8        27
    08-Feb-1982     52      20        36
    09-Feb-1982     50      19      33.5
    10-Feb-1982     35      30      33.5
    11-Feb-1982     54      31      42.5
    12-Feb-1982     56      27      40.5
    13-Feb-1982     60      27      43.5
    14-Feb-1982     63      35        48
    15-Feb-1982     66      31      47.5
    16-Feb-1982     70      40        56
    17-Feb-1982     68      36        51
    18-Feb-1982     63      41        52
    19-Feb-1982     70      33        51
    20-Feb-1982    NaN     NaN     999.9
    21-Feb-1982    NaN     NaN     999.9
    22-Feb-1982     72      49      60.5
    23-Feb-1982     72      34        54
    24-Feb-1982     69      27        48
    25-Feb-1982     67      32      49.5
    26-Feb-1982     67      31        49
    27-Feb-1982     66      30        48
    28-Feb-1982     68      27      46.5
    01-Mar-1981     60      30        46
    02-Mar-1981     59      27        42
    03-Mar-1981     59      21        39
    04-Mar-1981     56      38        47
    05-Mar-1981     59      23        42
    06-Mar-1981     64      21      42.5
    07-Mar-1981     67      34      50.5
    08-Mar-1981     69      28      47.5
    09-Mar-1981     69      47        57
    10-Mar-1981     64      51      58.5
    11-Mar-1981     69      40      54.5
    12-Mar-1981     70      30        51
    13-Mar-1981     61      40      50.5
    14-Mar-1981     57      43        49
    15-Mar-1981     49      40      44.5
    16-Mar-1981     45      36      39.5
    17-Mar-1981     50      39      43.5
    18-Mar-1981     52      33      42.5
    19-Mar-1981     54      29      40.5
    20-Mar-1981     56      27      40.5
    21-Mar-1981     60      27      42.5
    22-Mar-1981     65      26      45.5
    23-Mar-1981     70      28        49
    24-Mar-1981     69      34      51.5
    25-Mar-1981     60      40        51
    26-Mar-1981     67      32      49.5
    27-Mar-1981     61      37        50
    28-Mar-1981     55      36      45.5
    29-Mar-1981     60      40        49
    30-Mar-1981     66      36        50
    31-Mar-1981     51      32      42.5
    01-Mar-1982     44      35      40.5
    02-Mar-1982     61      31        46
    03-Mar-1982     59      24        42
    04-Mar-1982     50      43      45.5
    05-Mar-1982     51      38      44.5
    06-Mar-1982     55      38      46.5
    07-Mar-1982     66      28        47
    08-Mar-1982     66      32        48
    09-Mar-1982     66      27      47.5
    10-Mar-1982     62      30        47
    11-Mar-1982     66      36        51
    12-Mar-1982     55      35        45
    13-Mar-1982     60      21      41.5
    14-Mar-1982     66      27      46.5
    15-Mar-1982     71      38        53
    16-Mar-1982     63      40      51.5
    17-Mar-1982     64      33      49.5
    18-Mar-1982     61      39        50
    19-Mar-1982     55      34      45.5
    20-Mar-1982     62      28        45
    21-Mar-1982     65      32      49.5
    22-Mar-1982     70      35      52.5
    23-Mar-1982     71      31        52
    24-Mar-1982     75      35        55
    25-Mar-1982     56      36        47
    26-Mar-1982     52      28        40
    27-Mar-1982     67      31        49
    28-Mar-1982     72      33      52.5
    29-Mar-1982     61      40      51.5
    30-Mar-1982     66      30        49
    31-Mar-1982     65      41        54
    01-Mar-1998     66      28        47
    02-Mar-1998     65      29        47
    03-Mar-1998     62      36        49
    04-Mar-1998     63      31        47
    05-Mar-1998     52      36        44
    06-Mar-1998     53      28      40.5
    07-Mar-1998     62      26        44
    08-Mar-1998     65      27        46
    09-Mar-1998     69      27        48
    10-Mar-1998     76      28        52
    11-Mar-1998     74      29      51.5
    12-Mar-1998     62      44        53
    13-Mar-1998     65      43        54
    14-Mar-1998     75      32      53.5
    15-Mar-1998     73      35        54
    16-Mar-1998     73      34      53.5
    17-Mar-1998     64      37      50.5
    18-Mar-1998     69      27        48
    19-Mar-1998     74      34        54
    20-Mar-1998     77      31        54
    21-Mar-1998     76      36        56
    22-Mar-1998     83      37        60
    23-Mar-1998     82      50        66
    24-Mar-1998     64      49      56.5
    25-Mar-1998     60      43      51.5
    26-Mar-1998     54      47      50.5
    27-Mar-1998     52      34        43
    28-Mar-1998     51      34      42.5
    29-Mar-1998     60      29      44.5
    30-Mar-1998     57      31        44
    31-Mar-1998     50      32        41
    01-Mar-1999     78      25      51.5
    02-Mar-1999     77      28      52.5
    03-Mar-1999     70      43      56.5
    04-Mar-1999     64      33      48.5
    05-Mar-1999     70      24        47
    06-Mar-1999     63      33        48
    07-Mar-1999     63      18      40.5
    08-Mar-1999     61      37        49
    09-Mar-1999     63      22      42.5
    10-Mar-1999     57      39        48
    11-Mar-1999     65      35        50
    12-Mar-1999     73      20      46.5
    13-Mar-1999     74      28        51
    14-Mar-1999     73      24      48.5
    15-Mar-1999     67      39        53
    16-Mar-1999     78      25      51.5
    17-Mar-1999     74      28        51
    18-Mar-1999     75      29        52
    19-Mar-1999     73      44      58.5
    20-Mar-1999     67      26      46.5
    21-Mar-1999     74      24        49
    22-Mar-1999     71      40      55.5
    23-Mar-1999     77      28      52.5
    24-Mar-1999     74      36        55
    25-Mar-1999     77      31        54
    26-Mar-1999     79      33        56
    27-Mar-1999     78      33      55.5
    28-Mar-1999     80      27      53.5
    29-Mar-1999     74      56        65
    30-Mar-1999     59      26      42.5
    31-Mar-1999     54      15      34.5
    01-Mar-2000     59      40      49.5
    02-Mar-2000     62      31      46.5
    03-Mar-2000     68      32        50
    04-Mar-2000     70      36        53
    05-Mar-2000     56      36        46
    06-Mar-2000     50      31      40.5
    07-Mar-2000     56      24        40
    08-Mar-2000     50      40        45
    09-Mar-2000     57      32      44.5
    10-Mar-2000     64      28        46
    11-Mar-2000     70      30        50
    12-Mar-2000     73      32      52.5
    13-Mar-2000     76      32        54
    14-Mar-2000     79      34      56.5
    15-Mar-2000     77      36      56.5
    16-Mar-2000     72      40        56
    17-Mar-2000     73      44      58.5
    18-Mar-2000     73      39        56
    19-Mar-2000     76      32        54
    20-Mar-2000     55      39        47
    21-Mar-2000     67      40      53.5
    22-Mar-2000     74      32        53
    23-Mar-2000     71      29        50
    24-Mar-2000     74      32        53
    25-Mar-2000     78      41      59.5
    26-Mar-2000     81      33        57
    27-Mar-2000     68      40        54
    28-Mar-2000     74      33      53.5
    29-Mar-2000     75      33        54
    30-Mar-2000     67      40      53.5
    31-Mar-2000     70      35      52.5
%
% readMonth function now removes Feb 29th data if the year is not a leap year
%
function Tbl = readMonth(filename)
opts = detectImportOptions(filename);
opts.ConsecutiveDelimitersRule = 'join';
opts.MissingRule = 'omitvar';
opts = setvartype(opts,'double');
opts.VariableNames = ["Day","MaxT","MinT","AvgT"];
Tbl = readtable(filename,opts);
Tbl = standardizeMissing(Tbl,{999,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'});
Tbl = standardizeMissing(Tbl,{-99,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'});
[~,basename] = fileparts(filename);
% use the base file name, not the full file name:
d = str2double(extract(basename,digitsPattern));
if ~leapyear(d(3)) && d(2) == 2 % February of a non-leap-year
    Tbl(Tbl.Day == 29,:) = [];  % remove the 29th day data, if any
end
Tbl.Day = datetime(d(3),d(2),Tbl.Day);
end

Jonathon Klepatzki on 8 Feb 2024

Edited: Walter Roberson on 8 Feb 2024

Open in MATLAB Online

Capture_1.PNG

I tried running your code and I received an error (see attached)...code is below

close all;
clear all;
clc;
Datafiles = fileDatastore("temp_summary*.txt","ReadFcn",@readMonth,"UniformRead",true);
dataAll = readall(Datafiles)
dataAll.Year = year(dataAll.Day);
dataAll.Month = month(dataAll.Day);
dataAll.DD = day(dataAll.Day)
% Unstack variables
minT_tbl = unstack(dataAll,"MinT","Year","GroupingVariables", ["Month","DD"],"VariableNamingRule","preserve")
maxT_tbl = unstack(dataAll,"MaxT","Year","GroupingVariables", ["Month","DD"],"VariableNamingRule","preserve")
yrs =str2double(minT_tbl.Properties.VariableNames(3:end))';
% find min
[Tmin,idxMn] = min(minT_tbl{:,3:end},[],2,'omitnan');
Tmin_yr = yrs(idxMn);
% find max
[Tmax,idxMx] = max(maxT_tbl{:,3:end},[],2,'omitnan');
Tmax_yr = yrs(idxMx);
% find low high
[lowTMax,idxMx] = min(maxT_tbl{:,3:end},[],2,'omitnan');
LowTMax_yr = yrs(idxMx);
% find high low
[highlowTMn,idxMn] = max(minT_tbl{:,3:end},[],2,'omitnan');
HighLowT_yr = yrs(idxMn);
% find avg high
AvgTMx = round(mean(table2array(maxT_tbl(:,3:end)),2,'omitnan'));
% find avg low
AvgTMn = round(mean(table2array(minT_tbl(:,3:end)),2,'omitnan'));
% Results
tempTbl = [maxT_tbl(:,["Month","DD"]), table(Tmax,Tmax_yr,AvgTMx,lowTMax,LowTMax_yr,Tmin,Tmin_yr,AvgTMn,highlowTMn,HighLowT_yr)]
tempTbl2 = splitvars(tempTbl)
FID = fopen('Meda 05 Temperature Climatology.txt','w');
report_date = datetime('now','format','yyyy-MM-dd HH:MM');
fprintf(FID,'Meda 05 Temperature Climatology at %s \n', report_date);
fprintf(FID,"Month   DD   Temp Max (°F)  Tmax_yr   AvgTMax (°F)  lowTMax (°F)  LowTMax_yr   TempMin (°F)  TMin_yr   AvgTMin (°F)  HighlowTMin (°F)  HighlowT_yr \n");
fprintf(FID,'%3d %6d %7d %14d %11d %11d %15d %11d %13d %10d %13d %17d \n', tempTbl2{:,1:end}');
fclose(FID);
winopen('Meda 05 Temperature Climatology.txt')
function Tbl = readMonth(filename)
opts = detectImportOptions(filename);
opts.ConsecutiveDelimitersRule = 'join';
opts.MissingRule = 'omitvar';
opts = setvartype(opts,'double');
opts.VariableNames = ["Day","MaxT","MinT","AvgT"];
Tbl = readtable(filename,opts);
Tbl = standardizeMissing(Tbl,{999,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'});
Tbl = standardizeMissing(Tbl,{-99,'N/A'},"DataVariables",{'MaxT','MinT','AvgT'});
[~,basename] = fileparts(filename);
% use the base file name, not the full file name:
d = str2double(extract(basename,digitsPattern));
if ~leapyear(d(3)) && d(2) == 2 % February of a non-leap-year
    Tbl(Tbl.Day == 29,:) = [];  % remove the 29th day data, if any
end
Tbl.Day = datetime(d(3),d(2),Tbl.Day);
end

Star Strider on 13 Feb 2024

I am lost. My code seems to work correctly when I run it, without any other modifications to it or to the tables or files it creates.

Mentioning me using ‘@’ flags me and I look to see what I need to attend to, if anything, since sometimes it’s just a reference.

Cris LaPierre on 13 Feb 2024

@Jonathon Klepatzki, you can specify the NumHeaderLines, VariableNamesLine, VariableUnitsLine, VariableDescriptionsLine, and the DataLines import arguments to correctly import a file that has non-data lines between the variable names and data.

However, where you are using a datastore to import your files, the same import options are used to read in all files. Therefore, all files must be formattted the same or you will get errors like the one you saw.

Sign in to comment.

Answer 2

Sulaymon Eshkabilov on 6 Feb 2024

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/2078906-how-to-ignore-special-characters-and-retrieve-the-data-prior-to-the-character#answer_1403971

Open in MATLAB Online

temp_summary.05.03_1998.txt

Here is one possible solution, to get the data correctly from the data file:

% Open the data file for reading
FID = fopen('temp_summary.05.03_1998.txt', 'r');
% Initialize a cell array to store the cleaned data
C_Lines = {};
% Read the file line by line
N_line = fgetl(FID);
while ischar(N_line)
    % Remove '*' and '#' characters from the line
    C_Line = strrep(N_line, '*', '');
    C_Line = strrep(C_Line, '#', '');
    
    % Store the cleaned line if it is not empty
    if ~isempty(C_Line)
        C_Lines{end+1} = C_Line;
    end
    % Read the next line
    N_line = fgetl(FID);
end
% Close the file:
fclose(FID);
% Convert the cell array of cleaned lines to a character array:
C_Data = char(C_Lines)
C_Data = 32×49 char array
    '  Day   Maximum Temp  Minimum Temp   Average Temp'
    '   01        66            28           47.0     '
    '   02        65            29           47.0     '
    '   03        62            36           49.0     '
    '   04        63            31           47.0     '
    '   05        52            36           44.0     '
    '   06        53            28           40.5     '
    '   07        62            26          44.0      '
    '   08        65            27           46.0     '
    '   09        69            27           48.0     '
    '   10        76            28           52.0     '
    '   11        74            29           51.5     '
    '   12        62            44           53.0     '
    '   13        65            43           54.0     '
    '   14        75            32           53.5     '
    '   15        73            35           54.0     '
    '   16        73            34           53.5     '
    '   17        64            37           50.5     '
    '   18        69            27           48.0     '
    '   19        74            34           54.0     '
    '   20        77            31           54.0     '
    '   21        76            36           56.0     '
    '   22        83           37           60.0      '
    '   23        82            50           66.0     '
    '   24        64            49           56.5     '
    '   25        60            43           51.5     '
    '   26        54            47           50.5     '
    '   27        52            34           43.0     '
    '   28        51            34           42.5     '
    '   29        60            29           44.5     '
    '   30        57            31           44.0     '
    '   31        50            32           41.0     '

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 3

Cris LaPierre on 7 Feb 2024

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/2078906-how-to-ignore-special-characters-and-retrieve-the-data-prior-to-the-character#answer_1405271

Edited: Cris LaPierre on 8 Feb 2024

Open in MATLAB Online

I think another rather straightforward approach is to treat * and # as delmiters.

I've simplified the read function for readability

Datafiles = fileDatastore("temp_summary*.txt","ReadFcn",@readMonth,"UniformRead",true);
dataAll = readall(Datafiles)
dataAll = 93×4 table
    Day    MaxT    MinT    AvgT
    ___    ____    ____    ____

     1      66      28       47
     2      65      29       47
     3      62      36       49
     4      63      31       47
     5      52      36       44
     6      53      28     40.5
     7      62      26       44
     8      65      27       46
     9      69      27       48
    10      76      28       52
    11      74      29     51.5
    12      62      44       53
    13      65      43       54
    14      75      32     53.5
    15      73      35       54
    16      73      34     53.5
function Tbl = readMonth(filename)
Tbl = readtable(filename,"ConsecutiveDelimitersRule","join","ReadVariableNames",false,...
    "Delimiter",{' ','\t','*','#'},"LeadingDelimitersRule",'ignore',...
    'EmptyLineRule','skip');
Tbl.Properties.VariableNames = {'Day' 'MaxT' 'MinT' 'AvgT'};
end

5 Comments
Show 3 older commentsHide 3 older comments

Cris LaPierre on 7 Feb 2024

Moved: Cris LaPierre on 8 Feb 2024

Open in MATLAB Online

Hmm. Works here. Have you shared the full error message (all the red text)?

Datafiles = fileDatastore("temp_summary*.txt","ReadFcn",@readMonth,"UniformRead",true);
dataAll = readall(Datafiles)
dataAll = 124×4 table
    Day    MaxT    MinT    AvgT
    ___    ____    ____    ____

     1      65      12     38.5
     2      68      28       48
     3      65      17       41
     4      57      22     39.5
     5      46      24       35
     6      61      18     39.5
     7      62      25     43.5
     8      58      12       35
     9      64      11     37.5
    10      65      14     39.5
    11      54      22       38
    12      58      40       49
    13      64      27     45.5
    14      65      19       42
    15      59      19       39
    16      62      23     42.5
function Tbl = readMonth(filename)
Tbl = readtable(filename,"ConsecutiveDelimitersRule","join","ReadVariableNames",false,...
    "Delimiter",{' ','\t','*','#'},"LeadingDelimitersRule",'ignore',...
    'EmptyLineRule','skip');
Tbl.Properties.VariableNames = {'Day' 'MaxT' 'MinT' 'AvgT'};
end

Jonathon Klepatzki on 7 Feb 2024

Moved: Cris LaPierre on 8 Feb 2024

Everything that was said were provided.

Sign in to comment.

Answer 4

Walter Roberson on 8 Feb 2024

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/2078906-how-to-ignore-special-characters-and-retrieve-the-data-prior-to-the-character#answer_1405256

To answer the original question:

An alternative way to read the files is to use FixedWidthImportOptions together with readtable() https://www.mathworks.com/help/matlab/ref/matlab.io.text.fixedwidthimportoptions.html

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

How to ignore special characters and retrieve the data prior to the character

6 Comments
Show 4 older commentsHide 4 older comments

Accepted Answer

20 Comments
Show 18 older commentsHide 18 older comments

More Answers (3)

0 Comments
Show -2 older commentsHide -2 older comments

5 Comments
Show 3 older commentsHide 3 older comments

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

How to ignore special characters and retrieve the data prior to the character

6 Comments Show 4 older commentsHide 4 older comments

Accepted Answer

20 Comments Show 18 older commentsHide 18 older comments

More Answers (3)

0 Comments Show -2 older commentsHide -2 older comments

5 Comments Show 3 older commentsHide 3 older comments

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

6 Comments
Show 4 older commentsHide 4 older comments

20 Comments
Show 18 older commentsHide 18 older comments

0 Comments
Show -2 older commentsHide -2 older comments

5 Comments
Show 3 older commentsHide 3 older comments

0 Comments
Show -2 older commentsHide -2 older comments