How can I find the symbol given the gene id?
1 view (last 30 days)
Show older comments
ID and name conversion is one of the common tasks in Bioinformatics. In this problem, you will write a function symbol=geneidtosymbol(id,filename) that will return the symbol of a gene, given its GeneID. The GeneID to symbol conversion should be looked up from a file named "gene_info.txt". Each line in this file contains tab-delimited information for a gene. The first line of the file specifies what type of information is available in each column. Download and use the file available from http://sacan.biomed.drexel.edu/ftp/bmes201/final.20123/gene_info.txt (which contains the first 100 lines of the file available from: <ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz)>.
If filename is not given, use gene_info.txt. If it is given (may be different than gene_info.txt), use the filename provided as input.
Here is what I have so far:
function out = geneidtosymbol(x)
fid=fopen('gene_info.txt','r'); %open file
if fid<0
fprintf('I am not able to open the pdb file');
out=[];
return;
end
symbol=[];
if ~feof(fid)
line=fgetl(fid);
str2num(line(3:10)) = x;
line=strsplit(line);
symbol=line{3};
end
out = symbol;
2 Comments
Geoff Hayes
on 27 Nov 2014
S - why does your code just read the first line from the file? Don't you get an error with the line str2num(line(3:18)) = x? Please describe what you are attempting with these lines of code.
Accepted Answer
Geoff Hayes
on 29 Nov 2014
S - lookfor is used to search for a keyword in all help entries, not to search for a substring within another string. Your line of code
lookfor(x,line)= id;
is probably generating the error Undefined function or variable 'id'. because you are trying to use the variable id before it has been defined. And even if it were, it is unclear why you are attempting an assignment. What is the intent of this line?
while ~feof(fid)
% get the next line of the file
line = fgetl(fid);
% does this line contain the gene id?
if strfind(line,x)>0
% split on the empty spaces
line=strsplit(line);
% third element is symbol
symbol=line{3};
% since symbol found, exit
break;
end
end
% close the file
fclose(fid);
Note that once we have found the symbol, since we assume only one per gene id, then we break out of the while loop and close the file.
Make sure you adjust your code to handle an input for a different data file as per the instruction If filename is not given, use gene_info.txt. If it is given (may be different than gene_info.txt), use the filename provided as input. So you will need to add the input parameter filename.
0 Comments
More Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!