Operations on variables with specific naming patterns

8 views (last 30 days)
I am reading in a large number of text files. The files contain numerical values that I want, and text that I do not. Using the matlab.lang.makeValidName command, I am able to save the numbers into arrays with names like:
  • A123
  • B123
  • C123
  • A234
  • B234
  • C234
  • A345
  • B345
  • C345
  • ...
(It's a lot more complex than this in reality. The variable names are a combination of the filename from which the data was read and the name of the values from the file...but let's try to keep it simple in the example! :)
What I am trying to do now is to run calculations on each of the variables with "A" in the title. Using who('-regexp','A') I get a cell that contains the names of all of the variables in my workspace with "A" in the title, but I can't quite figure out what to do next with that data.
If I wanted to add all of the variables with A in the title, what would the proper command be? Likewise, if I wanted to create a much larger matrix of [A123 ; A234 ; A345 ...] what would that command be? The sizes of all of the "A" variables are the same, so there's nothing to worry about there.
Thanks for the help!
  7 Comments
James
James on 21 Jul 2017
Stephen,
Thanks for your input. My "LOAD ALL THE FILES" comment was the easy way of saying "load all of the data in the files, put them into one really big cell, and manipulate them from there" which matches up with your "you should put them into one numeric array if possible" comment.
The files do contain the same variables, but they may be in a different order. Some files may have them stored in ABC, while others generated by a different person may have them BAC or CBA. This is why I can't just say "Take every third row in each of the files, and make that Array1."
The dynamically naming structure fields looks helpful. https://blogs.mathworks.com/loren/2005/12/13/use-dynamic-field-references/ in particular looks like something that I could use, and still (presumably) practice good coding and avoid the evil eval command.
Guillaume
Guillaume on 21 Jul 2017
In my opinion, dynamically named fields are just as bad as eval. You're still encoding metadata in variable (field) names, and this it not the way you should solve your problem.
There are two orthogonal issues at hand:
  • Parsing of the files, so that whatever order the variable come in, you know what they are
  • Storing of these variables and storing of the metadata
The first one can be solved any number of ways, with more or less complexity depending on how robust you want your parser to be
For storage, dynamically named anything is not a good idea. If speed is the focus, then as Stephen said the simplest storage is the best, matrices or cell arrays. Otherwise, you could go more fancy with maps (unfortunately, not very well implemented in matlab) or other containers.
Certainly when I see that you want to have variables A123, B123, A234, etc, what I read is that you need
container(A, 1, 2, 3) %where A could be categorical
container(B, 1, 2, 3)
container(A, 2, 3, 4)
It is then trivial to get all 'A' variables
container(A, :, :, :)

Sign in to comment.

Accepted Answer

per isakson
per isakson on 16 Jul 2017
Edited: per isakson on 29 Jul 2017
It's easier to say that dynamically named variables and eval is not a good idea than making recommendation regarding a design. Nevertheless, I'll make some comments in the order they come to mind.
  • The requirements on your program will depends on whether you yourself will use the program a number of times during the next few weeks or the program will be used by more users during a longer period of time.
  • You describe "groups of 60 files", file size "30-40 MB" and 1D arrays with length of 5,7, and 12 elements. That makes a few zillions of 1D arrays to keep track of. Would mistakes be costly?
  • "The files contain numerical values that I want, and text that I do not." The text contains the metadata.
  • Do you foresee other analyses on these data.
  • Each value of a cell or structure array has an overhead of a little over 100 bytes. That could add up.
num = magic( 7 );
cac = num2cell( num, 2 );
whos num cac
Name Size Bytes Class Attributes
cac 7x1 1176 cell
num 7x7 392 double
cac{3,1}
ans =
46 6 8 17 26 35 37
  • I often chose structure arrays over cell arrays because they help me make the code more readable. Field names are more meaningful than numbers.
master_list = {'A123','B123','C123','A234','B234','C234','A345','B345','C345'};
isA = not( cellfun( @isempty, regexp(master_list,'^A\d{3}$','match'), 'uni',true ) );
A_list = master_list( isA );
for item = A_List
num = S.(item{:});
end
isA = strncmp( master_list, 'A', 1 );

More Answers (0)

Categories

Find more on Cell Arrays in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!