Replace String in Table if Substring is Found

22 views (last 30 days)
Mark Panaro
Mark Panaro on 22 Oct 2022
Commented: dpb on 22 Oct 2022
Hi all,
I am attempting to convert a table column filled with text into categorical data. As an example, for I have a category statuette and several different cells read: "Broken statuette", "Terracotta statuette" or "Statuette". I want all of these to read Statuette. How might I approach this problem?
Thanks

Answers (2)

dpb
dpb on 22 Oct 2022
id=["Broken statuette", "Terracotta statuette","Statuette"]; % the variable
valset=[id,"Other Stuff"]; % the values allowed in the category
catnames=[repmat("statuette",1,sum(contains(id,'statuette','IgnoreCase',1))) "stuff"]; % what to display for each value 1:1
id=categorical(id,valset,catnames) % apply to the input array and...
id = 1×3 categorical array
statuette statuette statuette
This is barest/crudest way to illustrate -- the input array is what you have; the valueset is all possible values to allow in the categorical variable (can be more or less than unique(id); missing are <undefined> if in the input array). The catnames are the names to be displayed for each element of valuset on a 1:1 match.
categories(id)
ans = 2×1 cell array
{'statuette'} {'stuff' }
shows the categories for the variable consist of the two values in valueset even though the one wasn't in the input array to convert.

Image Analyst
Image Analyst on 22 Oct 2022
You forgot to attach your table. Here is one way though (untested). Assume you have a table with a column called "Statuette"
for row = 1 : height(t) % t is your table variable
if contains(t.Statuette{row}, 'Statuette', 'IgnoreCase', true)
t.Statuette{row} = 'Statuette';
end
end
If you have any more questions, then attach your data and code to read it in with the paperclip icon after you read this:
  2 Comments
Image Analyst
Image Analyst on 22 Oct 2022
I don't think you can "convert" an existing column from one data class to a different kind. You'd have to add a new column with addvars and then fill it up with the categorical values. More complicated so I just replaced the strings, which was simpler.
dpb
dpb on 22 Oct 2022
The above in an existing table could look like
isS=contains(t.Statuette,'Statuette', 'IgnoreCase', true);
t.Statuette(is)='Statuette';
t.Statuette=categorical(t.Statuette);
Doing the above loses the additional data associated with the condition of the items; that may or may not be something to preserve.
"I don't think you can "convert" an existing column from one data class to a different kind."
Au contraire, good buddy! :)
t=convertvars(t,'Statuette','categorical');
will do the deed, too. convertvars doesn't have the ability to pass the additional parameters that are available with categorical illustrated above, however; it would result in the same applied after the string conversion as above, but if used before, then unique(original variable values) would be the categories.
A very neat thing w/ addvars (and tables in general) is the vartype class that would let you do all variables of class('string') without needing to know the column names/locations.

Sign in to comment.

Categories

Find more on Characters and Strings in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!