Random amino acid sequence generation with a given amino acid count of a specified sequence
1 view (last 30 days)
Show older comments
I have a sequence
>sp|Q7RTX1|TS1R1_HUMAN Taste receptor type 1 member 1 OS=Homo sapiens OX=9606 GN=TAS1R1 PE=2 SV=1
MLLCTARLVGLQLLISCCWAFACHSTESSPDFTLPGDYLLAGLFPLHSGCLQVRHRPEVT
LCDRSCSFNEHGYHLFQAMRLGVEEINNSTALLPNITLGYQLYDVCSDSANVYATLRVLS
LPGQHHIELQGDLLHYSPTVLAVIGPDSTNRAATTAALLSPFLVPMISYAASSETLSVKR
QYPSFLRTIPNDKYQVETMVLLLQKFGWTWISLVGSSDDYGQLGVQALENQATGQGICIA
FKDIMPFSAQVGDERMQCLMRHLAQAGATVVVVFSSRQLARVFFESVVLTNLTGKVWVAS
EAWALSRHITGVPGIQRIGMVLGVAIQKRAVPGLKAFEEAYARADKKAPRPCHKGSWCSS
NQLCRECQAFMAHTMPKLKAFSMSSAYNAYRAVYAVAHGLHQLLGCASGACSRGRVYPWQ
LLEQIHKVHFLLHKDTVAFNDNRDPLSSYNIIAWDWNGPKWTFTVLGSSTWSPVQLNINE
TKIQWHGKDNQVPKSVCSSDCLEGHQRVVTGFHHCCFECVPCGAGTFLNKSDLYRCQPCG
KEEWAPEGSQTCFPRTVVFLALREHTSWVLLAANTLLLLLLLGTAGLFAWHLDTPVVRSA
GGRLCFLMLGSLAAGSGSLYGFFGEPTRPACLLRQALFALGFTIFLSCLTVRSFQLIIIF
KFSTKVPTFYHAWVQNHGAGLFVMISSAAQLLICLTWLVVWTPLPAREYQRFPHLVMLEC
TETNSLGFILAFLYNGLLSISAFACSYLGKDLPENYNEAKCVTFSLLFNFVSWIAFFTTA
SVYDGKYLPAANMMAGLSSLSSGFGGYFLPKCYVILCRPDLNSTEHFQASIQDYTRRCGS
T
I wish to get a set of 10000 sequences (in fasta format) having identical amino acid counts as in the above sequence.
I could not use properly the randseq function in getting what I need.
Any help would be highly appreciated.
0 Comments
Accepted Answer
Tim DeFreitas
on 8 Jun 2022
Edited: Tim DeFreitas
on 22 Jun 2022
If you want exactly the same amino acid counts, then you want to randomly shuffle the input sequence, which can be done with randperm:
[~, sequences] = fastaread('pf00002.fa');
targetSeq = sequences{1}; % Select specific sequence from FA file.
randomSeqs = cell(1,10000);
for i=1:numel(randomSeqs)
randomSeqs{i} = targetSeq(:, randperm(numel(targetSeq)));
end
0 Comments
More Answers (1)
Sam Chak
on 8 Jun 2022
Hi @Sk. Hassan
I'm no expert in this, but it is possible to generate a sequence of numbers that associate with the Roman alphabet.
Here is a simple script that you can modify to generate a long sequence of characters like the PASTA spaghetti form.
function amino = generateAmino(n)
ASCII_L = 65;
ASCII_U = 90;
C = round((ASCII_U - ASCII_L).*rand(n, 1) + ASCII_L);
amino = char(C');
end
On the Command Window:
S1 = generateAmino(60)
S1 =
'TGNRWYODEGVGUGXJFGPMJVPOXHTTKOCBNTXDOMAIEUINEPHQRTLCGXEVNZCL'
You can then use a for loop to generate the number of sequences that you want.
numOfSeq = 10000;
for i = 1:numOfSeq
S(i,:) = generateAmino(60);
end
S
That's the basic idea. You may need to modify the script to select certain Roman alphabets in the ASCII chart.
0 Comments
See Also
Categories
Find more on Sequence Alignment in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!