How to structure input data as what I need?

I have a text file with content like this:
35.921 M, 220,000, 48.509 M, 6.349 M, 54.052 M, 806,362, 41.018 M, 4.006 M, 41.564 M, 3.46 M, 40.02 M, 100,000, 39.22 M, 900,000, 36.974 M, 30,000, 36.549 M, 455,500, 35.739 M, 230,000, 33.104 M, 2.866 M, 19.627 M, 0, 18.877 M, 750,000, 19.603 M, 4.379 M, 23.982 M, 0, 10.186 M, 250,000, 10.436 M, 0, 15.414 M, 250,500, 15.465 M, 200,000, 21.571 M, 665,000, 22.236 M, 0, 15.537 M, 250,000, 15.787 M, 0, 21.422 M, 221,004, 21.243 M, 400,000, 30.662 M, 2.375 M, 33.036 M, 0, 39.287 M, 455,000, 39.742 M, 0, 53.141 M, 6.11 M, 59.131 M, 120,000, 23.587 M, 255,000, 23.842 M, 0, 17.043 M, 255,000, 17.298 M, 0, 33.51 M, 1.25 M, 34.75 M, 10,000, 36.408 M, 15,000, 28.248 M, 8.175 M, 32.367 M, 480,000, 31.535 M, 1.312 M, 54.773 M, 2.68 M, 43.936 M, 13.517 M, 58.955 M, 2.54 M, 41.234 M, 20.262 M, 45.222 M, 15,000
That I like to fill them into a N*4 dimentional matrix like this:
35.921 M, 220,000, 48.509 M, 6.349 M
54.052 M, 806,362, 41.018 M, 4.006 M
41.564 M, 3.46 M, 40.02 M, 100,000
How can I do it on Matlab?
I have attached my file to my question.

2 Comments

@reza: please upload the text file by clicking the paperclip button.
I have attached my file to my question.

Sign in to comment.

 Accepted Answer

Stephen23
Stephen23 on 20 Jan 2019
Edited: Stephen23 on 21 Jan 2019
str = fileread('store.txt');
str = strtrim(str);
C = regexp(str,'\s*,\s*','split');
[fid,msg] = fopen('store2.txt','wt');
assert(fid>=3,msg)
fprintf(fid,'%s, %s, %s, %s, %s\n', C{:});
fclose(fid);
EDIT: see the comments for further developments of this code.

8 Comments

reza
reza on 21 Jan 2019
Edited: reza on 21 Jan 2019
Hello Stephen and thank you for help. I ran your code as a script, but it seems I have 1*902 dimention matrix C, instead of a 226*4 matrix that I needed to have!
Stephen23
Stephen23 on 21 Jan 2019
Edited: Stephen23 on 21 Jan 2019
@reza: ah, I misread your question and thought that you wanted to get a file as the output. If you just want to get an array of those strings/char vectors then you can just reshape C. You will also need to pad/shorten C to ensure that its number of elements is divisble by four.
Thanks again dear Stephen! But I have 2 questions.
  1. In the C array there is a problem. For example column 2 and 3 are 248 and 000 but in fact they must be only one column like 248000. How can I fix that?
  2. In some columns there are M letters that mean million, but I like to change them to thousands. For example the column 1 of C should be converted from 47.059 to the 47059000. How can I do that?
Stephen23
Stephen23 on 21 Jan 2019
Edited: Stephen23 on 21 Jan 2019
"In the C array there is a problem"
Actually the problem is the use of commas in the file. It appears that sometimes you want to treat the comma as the field delimiter and sometimes as a thousands separator. This is, in simple terms, a badly designed file format.
"How can I fix that?"
The best solution is to not use the same character for both the field delimiter and also the thousands separator.
But we might be able to work with this badly designed file...
"In some columns there are M letters that mean million, but I like to change them to thousands. For example the column 1 of C should be converted from 47.059 to the 47059000"
Your example "from 47.059 to the 47059000" shows M == one million, so it is not clear what "thousands" have to do with anything.
In any case, we can import the file data, converting the 'M' millions symbols to standard E-notation using regxprep, and handling the thousands separators using str2double:
str = fileread('store.txt');
C = regexp(strtrim(str),',\s+','split');
C = regexprep(C,'\s*M','e6');
M = str2double(C);
M = reshape(M,4,[]).';
Giving:
>> M
M =
47059000 248000 40475000 6832000
15404000 478714 15277000 605000
9253000 9599000 10751000 8102000
8518000 1115000 9417000 215937
15647000 5987000 18284000 3351000
21848000 1501000 21648000 1701000
30845000 1300000 30663000 1482000
9914000 5474000 9938000 5450000
10775000 250000 10995000 30000
21328000 2027000 22315000 1040000
19588000 1540000 21048000 80000
12554000 418000 11504000 1468000
14980000 1299000 16135000 144008
10878000 502040 11378000 2040
10012000 275000 10287000 0
11992000 500000 11707000 785244
16492000 820000 17056000 256241
19639000 378384 20017000 0
13781000 639609 14161000 260000
31797000 300507 26089000 6009000
18159000 30391 15914000 2275000
21271000 1010000 21501000 780000
... lots of rows here
3601000 385001 3986000 0
3070000 1970000 3090000 1950000
646168 61378 707546 0
1812000 264692 2077000 0
469352 0 469352 0
1312000 0 1312000 0
643448 0 643448 0
967725 40000 1008000 0
551976 0 551976 0
My God!!
What a great programmer you are :D
You did it with only few lines of the code! Also Matlab is great! I must learn more about Matlab!
If I could I will give +1M votes to you.
And about bad structured data file you are right! The problem is that I got the data with pupeteer that I don't know alot about it! I asked somemene about getting data from a specific site and he helped me with this code: https://gist.github.com/saurabhnarhe/f808b4ad7576e037b741a431ad2cd0ec
And the store.txt is the output file of that code! And I don't know much about javascript so I decided to do the correction job with Matlab(that I know a little more) then created this topic!
Also at the moment a new question came to me: Can I do web scrapping with Matlab? Can I do what I did with that javascript code with Matlab? If so, is it a good idea? Is it easier with Matlab?
I asked because I am a machine learning student and I should learn and use Matlab everytime. But I don't need to learn javascript. So if I could repleace the above code with Matlab(web scrapping with Matlab), it would be a great job and I could be spend more time in Matlab with no need to learn javascript too!
Yes I know, but I liked to hear advices from an expert great programmer :)

Sign in to comment.

More Answers (0)

Categories

Asked:

on 20 Jan 2019

Edited:

on 21 Jan 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!