Reading in specific column and plotting bar chart

7 views (last 30 days)
I have a text file as:
Heading A
------------------------
Heading B
GA008246-0_B_F_1852967891 X 7117
GA011810-0_B_F_1852968731 14 7380
GA017861-0_B_F_1852970072 22 7749
GA017864-0_T_R_1853027526 22 7751
GA017866-0_T_R_1853027527 22 7753
GA017875-0_B_R_1852970076 22 7755
I want to be able to plot a histogram of the 2nd column under the title Heading B. sometimes there are additonal lines under heading A.
This is what I have so far.
%Read in data file
fid = fopen('c:\myfile.txt','rt');
C = textscan (fid, '%s %s s', 'delimiter', '\t','headerlines', 1)
while (strcmp(C{1}{1}, 'Heading B') == 0)
C = textscan (fid, '%s %s %s', 'delimiter', '\t')
end
fclose(fid);
C{:,2}
But Im picking out one too early item i.e.
ans =
''
'X'
'14'
'22'
'22'
'22'
'22'
once the additional ' ' item is removed, how can I plot a bar chart showing the number of occurances of each of these int he list. i.e. in this example
X = 1 repetition 14 = 1 repetition 22 = 4 repetitions
Tanaks for any help. Jsaon

Accepted Answer

Guillaume
Guillaume on 14 Apr 2015
Edited: Guillaume on 14 Apr 2015
I would use fgetl instead of textscan to find the start of the heading B section, then use textscan to read it.
fid = fopen('c:\myfile.txt','rt');
tline = fgetl(fid);
while ~isnumeric(tline) && ~strcmp(tline, 'Heading B')
tline = fgetl(fid);
end
if isnumeric(tline) %eol reach before Heading B
error('End of file reached prematurely');
end
C = textscan (fid, '%s %s %s', 'delimiter', '\t');
To find the number of repetitions in a column of C, use the third return value of unique together with histc:
[names, ~, position] = unique(C{2})
repetitions = histc(position, 1:numel(names))
%useful for seeing the result:
table(names, repetitions)
  5 Comments
Guillaume
Guillaume on 14 Apr 2015
Oh, sorry I misunderstood. You also need to change the position and numbers of ticks (XTick property)
set(gca, 'XTickLabel', names, 'XTick', 1:numel(names))
should work.

Sign in to comment.

More Answers (1)

Star Strider
Star Strider on 14 Apr 2015
I don’t have your file, but I would change the textscan call to:
C = textscan (fid, '%s %f %f', 'delimiter', '\t','headerlines', 3)
The initial ‘X’ in column #2 will then show up as either '' or NaN, so you can eliminate it by using isempty or isnan, as appropriate.
  2 Comments
Jason
Jason on 14 Apr 2015
Edited: Jason on 14 Apr 2015
The problem is that there are sometimes lines under "Heading A", so the number of lines until I find "Heading B" is variable.
I actually want the X as well as the numbers (its to do with Chromosomes). Its actually this mixture of text and numbers in the cell array that I am finding it hard to plot a bar chart showing the frequency of each string.
I've included the txt file. Thanks
Star Strider
Star Strider on 14 Apr 2015
Edited: Star Strider on 14 Apr 2015
This works for the current file:
fidi = fopen('test1.txt');
C = textscan (fidi, '%s %s %s', 'delimiter', '\t','headerlines', 2);
C2 = C{2};
Ix = cellfun(@isempty,C2);
[C2u,ia,ic] = unique(C2(~Ix));
cnts = hist(ic,length(C2u));
figure(1)
bar(cnts)
xt = get(gca, 'XTick');
set(gca, 'XTick', xt, 'XTickLabel',C2u)
EDIT —
Added plot ...

Sign in to comment.

Categories

Find more on Labels and Annotations in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!