# the number of occurences of each character of one string,in another

8 views (last 30 days)
hiva on 28 Dec 2014
Edited: Luuk van Oosten on 24 Jan 2015
i have a string of more than 100 characters (fasta format of a protein sequence. like
which is being shortened here for simplicity) and i want to find out whether or not it is hydrophobic. so i have to check the number of occurrences of each of the characters in the set 'A C F I L M P V W Y'(hydrophob amino acids) in my fasta string. considering the very long length of fasta strings, is there any easy way to do that by matlab string functions?

Azzi Abdelmalek on 28 Dec 2014
Edited: Azzi Abdelmalek on 28 Dec 2014
p={'A' 'C' 'F' 'I' 'L' 'M' 'P' 'V' 'W' 'Y'}'
out=[p cellfun(@(x) nnz(ismember(str,x)),p,'un',0)]
##### 2 CommentsShowHide 1 older comment
Stephen23 on 30 Dec 2014
Edited: Stephen23 on 30 Dec 2014
This could be simplified and speeded-up by using arrayfun instead of cellfun, and removing the ismember:
>> t = 'ACFILMPVWY';
>> arrayfun(@(x)sum(str==x), t)
ans =
6 2 4 6 13 2 7 7 1 7

Peter Perkins on 29 Dec 2014
Another possibility:
>> t = 'ACFILMPVWY';
>> n = hist(double(s),1:90);
>> n(t)
ans =
6 2 4 6 13 2 7 7 1 7
Jan on 30 Dec 2014
This is a histogram problem, so histc is an efficient and direct solution.

Luuk van Oosten on 24 Jan 2015
Edited: Luuk van Oosten on 24 Jan 2015
I reckon you are using the BioInformatics Toolbox. In that case you can probably use:
aacount('SEQ')
and using
nr_A = All.A
nr_C = All.C
nr_F = All.F
etc. (you get the idea)
you get the numbers of your hydrophobic residues. Sum these and you have your hydrophobic score. You might want to 'normalize' this number by dividing this number by the total amount of amino acids in the sequence.
Of course you can write a loop for this and calculate the hydrophobic score for all your sequences in your FASTA file.

Shoaibur Rahman on 28 Dec 2014
numA = sum(s=='A')
numC = sum(s=='C')
numF = sum(s=='F')
numI = sum(s=='I')
numL = sum(s=='L')
numM = sum(s=='M')
numP = sum(s=='P')
numV = sum(s=='V')
numW = sum(s=='W')
numY = sum(s=='Y')
hiva on 29 Dec 2014
very simple and delicate. really thanks

Stephen23 on 30 Dec 2014
Edited: Stephen23 on 30 Dec 2014
A neat solution using bsxfun :
>> t = 'ACFILMPVWY';
>> sum(bsxfun(@eq,s.',t))
ans =
6 2 4 6 13 2 7 7 1 7
hiva on 30 Dec 2014
Edited: hiva on 30 Dec 2014
wow!!! just wonderful. it works pretty well.thanks a lot.