How do I compare a string with a #single word?

7 views (last 30 days)
Hello I am trying to compare a string with 'word'. for example if the word ‘retro’ is in the text file, and ‘#retro’ appear in the str,
str = It was #crazy #retro.
word = 'retro'
How do I compare the str with word including the hashtag. I tried using
strfind(lower(str), '#line2')
but it gave me an empty vector.
Thank you.

Accepted Answer

Guillaume
Guillaume on 27 Feb 2015
Edited: Guillaume on 28 Feb 2015
All of the solution proposed so far have the problem that they'll find the hashtag #retro in the hashtag #retrorocket, which I don't think is wanted.
At the end of the day, a very good parser for strings has been invented long ago, it's called a regular expression. Here is a way to get your matches without the need of a loop:
hiplist = {'denim'; 'vinyl'; 'retro'};
teststr = 'the denim #vinyl was #crazy #retro but the #retrorocket went backward';
%build regular expression from hitlist:
regpattern = sprintf('\\<(?<=#)(%s)\\>', strjoin(hiplist, '|'));
matches = regexp(teststr, regpattern, 'match')
The pattern I've built only match hashtags surrounded by whitespaces (regular spaces, newlines, tabs, etc.) or at the beginning and end of the string. A hashtag followed by a punctuation mark will not be detected, but it's a fairly small change to the regex if wanted.
  8 Comments
Kratos
Kratos on 28 Feb 2015
okay so here is the whole question for my code
The number of phony hipsters is on the rise, and you don’t want that to encroach on your genuine creativity and frugal means of clothing yourself. So, like any true hipster would, you write a MATLAB function to determine how much of a fake hipster someone is. The function will input the name of a “.txt” file containing the text of a conversation they had and a second “.txt” filename containing words to check for in the conversation. Your function should award one faux-hipster point for each occurrence of the “hip” words in the text conversation.
You should also award one faux-hipster point for every hashtag used in the conversation. A hashtag is defined as the pound (#) followed immediately by any non-space character.
Finally, you are able to have overlapping points. For example, if the word ‘denim’ were in the hip words file, and ‘#denim’ appeared in the conversation file, you would need to award 2 points because it is a hipster word and a hashtag. Finding word matches should NOT be case sensitive (‘scarf’ is the same as ‘SCARF’).
This is what I am supposed to do.
Guillaume
Guillaume on 28 Feb 2015
I've given you 99% of the solution in my last answer. You only need to modify slightly one of the regular expression (and it's just removing part of it) and make it case insensitive (it's explained how in the doc), add one line to calculate the score and you're done.
As it is an assignment, I'm not going to help you any further.

Sign in to comment.

More Answers (2)

Image Analyst
Image Analyst on 27 Feb 2015
Not sure what you're exactly looking to do so I just offer some possibilities:
str = 'It was #crazy #retro.'
word = 'retro'
hashLocations = str == '#' % Logical vector
hashIndexes = find(hashLocations) % Actual index numbers.
location = strfind(lower(str), '#retro') + 1 % Skips past #
location = strfind(lower(str), word)
In command window:
hashLocations =
0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0
hashIndexes =
8 15
location =
16
location =
16
  1 Comment
Kratos
Kratos on 27 Feb 2015
Okay so here is the question: if the word ‘denim’ were in the hip words file, and ‘#denim’ appeared in the conversation file, you would need to award 2 points because it is a hipster word and a hashtag.
I did it like this
if strfind(lower(line1), line2)
a = length(strfind(lower(line1), line2));;
for ind = 1:length(a)
A = [A a(ind)];
end
elseif strfind(lower(line1), '#')
a = length(strfind(lower(line1), '#'));
for ind = 1:length(a)
A = [A a(ind)];
end
elseif strfind(lower(line1), )
a = length(strfind(lower(line1), ));
for ind = 1:length(a)
A = [A a(ind)];
end
else
I am missing the second elseif. I don't know how to compare the hashtag and the word.

Sign in to comment.


Joseph Cheng
Joseph Cheng on 27 Feb 2015
Edited: Joseph Cheng on 27 Feb 2015
you can try something like this where hiplist is your hipster word dictionary. Then in my loop there you test for hits against the dictionary in the test string and then look for n-1 index for whether it was a pound sign and award points for each one.
hiplist = [{'denim'};{'vinyl'};{'retro'}];
teststr = 'the denim #vinyl was #crazy #retro.';
pointsawarded=0;
for ind = 1:length(hiplist)
det = strfind(teststr,hiplist{ind});
if ~isempty(det)
if det>1 & teststr(det-1)=='#'
pointsawarded = pointsawarded+2;
end
end
end
disp(teststr)
disp(['got ' num2str(pointsawarded) ' points'])
oh and use lower such that the detection isn't case sensitive.
  5 Comments
Joseph Cheng
Joseph Cheng on 27 Feb 2015
Edited: Joseph Cheng on 27 Feb 2015
good call, i don't deal with cells often when hard coded in.
Kratos
Kratos on 27 Feb 2015
Thank you for the help but the problem is I have 14 different words and I have to compare it to the same line. Since I am going through the same line 14 times, it counts hostage every single time. How do I make sure that I count it only one time?

Sign in to comment.

Categories

Find more on Characters and Strings in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!