how to search for multiple words anywhere in the sentence ?

I want to search for three words "Battery , power , failure" the three must exist in the sentence in any order to copy the cell .
I try :
j=1;
k=1;
D=alldata(:,126:130);
idx = cellfun('isclass',D,'char');
idx(idx)=~cellfun('isempty',regexpi(D(idx),'battery|power|failure')) ;
data = alldata(any(idx,2),:);
Notdata = alldata(~any(idx,2),:); %save rows which didn't contain
but it search for any cell contains for one of the three.
how i can search for the cells contains the three words in any order?

2 Comments

where is per isakson comment ...!!
Is there any function do it instead of (reqexpi) ?

Sign in to comment.

Answers (3)

The most straightforward way, it seems to me, is to do the regexp search three times, once for each word, and then copy the cells where all three match. I am not sure there is a way to do an "and" match in the same way one can do an "or" match like you have done.

2 Comments

thanks for your idea , but that's waste more time
thanks to you all...
I take your advice "to do the regexp search three times, once for each word"
and try this:
D=alldata(:,126:130);
idx = cellfun('isclass',D,'char');
idx(idx)=~cellfun('isempty',regexpi(D(idx),'battery')) ;
data = alldata(any(idx,2),:);
Notdata = alldata(~any(idx,2),:);
%2nd word
D2=data(:,126:130);
idx2 = cellfun('isclass',D2,'char');
idx2(idx2)=~cellfun('isempty',regexpi(D2(idx2),'power')) ;
data2 = data(any(idx2,2),:);
Notdata2 = data(~any(idx2,2),:);
%3rd word
D3=data2(:,126:130);
idx3 = cellfun('isclass',D3,'char');
idx3(idx3)=~cellfun('isempty',regexpi(D3(idx3),'failure')) ;
data3 = data2(any(idx3,2),:);
Notdata3 = data2(~any(idx3,2),:);
NotdataALL=[Notdata;Notdata2;Notdata3];
but I am still thinking, may be the three words not exist in the same cell.
I mean 126= battery 127: power 128= failure
but overall the code now sounds good :)

Sign in to comment.

Try this
sentence_1 = 'abc battery def power ghi failure';
typo_str_1 = 'abc battery def power ghi faiXure';
sentence_2 = 'Battery def power ghi failure.';
typo_str_2 = 'abc Xbattery def power ghi failure';
words = {'battery','power','failure'};
is1 = cellfun( @(str) not(isempty(regexpi( sentence_1, ['\<',str,'\>'] ))), words );
is2 = cellfun( @(str) not(isempty(regexpi( typo_str_1, ['\<',str,'\>'] ))), words );
is3 = cellfun( @(str) not(isempty(regexpi( sentence_2, ['\<',str,'\>'] ))), words );
is4 = cellfun( @(str) not(isempty(regexpi( typo_str_2, ['\<',str,'\>'] ))), words );
&nbsp
A different approach
>> cssm(1)
Elapsed time is 0.001078 seconds.
ans =
1 0 0 1 0 0
>> cssm(1e3);
Elapsed time is 0.791887 seconds.
where
function has_all_three = cssm( N )
sentence_1 = 'Abc battery def power ghi failure.';
typo_str_1 = 'Abc battery def power ghi faiXure.';
multistr_1 = 'Abc battery def power ghi battery.';
sentence_2 = 'Battery def failure ghi power jkl.';
typo_str_2 = 'Abc Xbattery def power ghi failure';
multistr_2 = 'Abc power def power ghi power jkl.';
%
test_sentences = {sentence_1,typo_str_1,multistr_1,sentence_2,typo_str_2,multistr_2};
%
text_corp = repmat( test_sentences, [N,1] );
tic
cac = regexpi( text_corp, ['\<(battery)|(power)|(failure)\>'], 'match' );
has_all_three = cellfun( @(c) length(unique(lower(c)))==3, cac );
toc
end

12 Comments

thanks ... but thats not what i want .
I have about (57000*6 cell)
"... but thats not what i want"
Then you need to better explain what you want. And also explain why my hint isn't useful to you.
I only need to modify this line:
idx(idx)=~cellfun('isempty',regexpi(D(idx),'battery|power|failure')) ;
to allow me seaching for the cells contains the three words in any arrangement .
and in the same time to save the sequence of the hole code
Because he wants a magic solution.
No, my friend. I didn't want a magic solution.
I only want to solve this problem
I try :
D=alldata(:,126:130);
idx = cellfun('isclass',D,'char');
exp = {'battery';...
'failure';...
'power'};
idx(idx)=~cellfun('isempty',regexpi(D(idx),exp,'match')) ;
data = alldata(any(idx,2),:);
Notdata = alldata(~any(idx,2),:);
but it didn't work, with an error:
??? Error using ==> regexpi
Multiple strings and patterns given to regexpi must have
the same quantity.
The task is: &nbsp "search for three words "Battery, power, failure" the three must exist in the sentence in any order". &nbsp Is that correct?
"I have about (57000*6 cell)" &nbsp How are that cell array related to alldata(:,126:130)? Thus, with one sentence per cell, you have 0.342 million sentences(?). What is an acceptable execution time?
"I only need to modify this line:" &nbsp You need at least to explain what you expect the line to do! Why should I guess?
"I only want to solve this problem" &nbsp What problem? Why only? What make you think that it is even possible to accomplish the task with a code along the lines, which you propose? I don't think it is possible!
btw: "Xbattery" should that match "battery"?
thank you per isakson for your contribution
\
I already use this code -searching for one word - in the whole file (57000*6 cells ) and it works.
#
I am now want to search for the three words (as I explain above )
#
and I mean by only modifing this line (as I mentioned above):
idx(idx)=~cellfun('isempty',regexpi(D(idx),'battery|power|failure')) ;
#
I am now asking is it possible to modify the code or not? or in another words, is there any function i can use to search for multiple strings instead of (regexpi) or not ?
#
thanks in advance
"I am now asking is it possible to modify the code or not? " &nbsp I repeat: I don't think it is possible!
Three words in any order is a tough job for regexp. &nbsp "to do the regexp search three times, once for each word" &nbsp is a sound approach and I cannot understand why you dismissed it.
I agree with Per, and I am adding that it is often more efficient to make multiple calls of REGEXP(I) that involve simple patterns, than to make a single call that involves a rather complex pattern.
I added a new code to my answer.

Sign in to comment.

that's work:
D=alldata(:,126:130);
idx = cellfun('isclass',D,'char');
idx(idx)=~cellfun('isempty',regexpi(D(idx),'battery')) ;
data = alldata(any(idx,2),:);
Notdata = alldata(~any(idx,2),:);
%2nd word
D2=data(:,126:130);
idx2 = cellfun('isclass',D2,'char');
idx2(idx2)=~cellfun('isempty',regexpi(D2(idx2),'power')) ;
data2 = data(any(idx2,2),:);
Notdata2 = data(~any(idx2,2),:);
%3rd word
D3=data2(:,126:130);
idx3 = cellfun('isclass',D3,'char');
idx3(idx3)=~cellfun('isempty',regexpi(D3(idx3),'failure')) ;
data3 = data2(any(idx3,2),:);
Notdata3 = data2(~any(idx3,2),:);
NotdataALL=[Notdata;Notdata2;Notdata3];

1 Comment

This can be simplified as developed in my answer. I move it below as a comment:
Here is an alternate solution:
keywords = {'battery', 'power', 'failure'} ;
allCells = {'V_batterypowerfailure', 'I_batterypwerfailure'; ...
'V_batterypowerfailure', 'I_atterypowerfailure'; ...
'I_batterypowerfailre', 'V_batterypowerfailure'} ;
ids = 1 : numel( allCells ) ;
for k = 1 : numel( keywords )
isFound = ~cellfun( 'isempty', strfind( allCells(ids), keywords{k} )) ;
ids = ids(isFound) ;
end
validCells = allCells(ids) ;
You'll notice that it works on a pool of cells which reduces with the keyword index (as when a keyword is not found, there is no point in testing the others). I started valid entries of the dummy data set with V_ and invalid entries with I_ to simplify the final check.
If you need a case-insensitive solution, replace
strfind( allCells(ids), keywords{k} )
with
regexpi( allCells(ids), keywords{k}, 'once' )

Sign in to comment.

Categories

Asked:

on 19 Sep 2015

Commented:

on 22 Sep 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!