tf weighting in docs

4 views (last 30 days)

John on 2 Dec 2017

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/370566-tf-weighting-in-docs

How do I evaluate term frequency (how many times each term occurs in a document) from a notepad having multiple documents, started by a document ID <P ID=xxx> and separated by delimiters </P>. I need to distinguish the statistics for each document.
I have been able to load the text, but my regular approach of identifying document ID won't work because the IDs are not contiguous, and as such, 'n' cannot be used to increment doc ID.
% The notepad file has been loaded into variable C
C = C{1}; 
fclose(fid);
idx = strfind(C,'</P>');
n = nnz(cellfun(@(x) ~isempty(x), idx));
fileName = ('DTags.txt');
fid = fopen(fileName,'w+');
for kk = 1:n
  str = ['<p id=',num2str(kk),'>'];
      fileName = ('DTags.txt');
      fid = fopen(fileName,'a+');
      fprintf(fid,'%s\r\n',str);
      fclose(fid);
end

0 Comments
Show -2 older commentsHide -2 older comments

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

tf weighting in docs

0 Comments
Show -2 older commentsHide -2 older comments

Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

tf weighting in docs

0 Comments Show -2 older commentsHide -2 older comments

Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments