.mat vs .txt or .csv

13 views (last 30 days)
Javi
Javi on 7 Sep 2017
Commented: Guillaume on 20 Sep 2017
Hi everyone. It's my first question here, there I go. I've done a lot of image processing of very large set of images and save some numerical results (particles, centroids, axis,...whatever), and I wonder which of the storing options would be the best (or just the fastest, simplest, less memory consumption... not really sure what optimize) for future post-pocressing: 1) Store the data in .txt/.csv in a single (appending columns, maybe of different lengths), or several files; 2) use cell arrays and store in .mat file.
In terms of portability I guess it's best the former one, but ¿is it the fastest and less memory consumption, too?
  6 Comments
Walter Roberson
Walter Roberson on 8 Sep 2017
You can use imagestore() to handle some of the overhead of handling large numbers of images.
For each image you do some processing and save the results. You then do more phases of work, which presumably do not need the original images. For example if you were doing neural network work, then your first phase might have been feature extraction, and your second phase could then involve investigation of feature reduction and various parameters of neural network training.
Now, for the sake of reproduction of your work, you would either be making the original images available to researchers or you would be publishing where to obtain them.
But if the images themselves are not required for the phases after that, then you might consider publishing the results of the first stage processing, so that others could start from there and investigate. As long as the images themselves are available to researchers, this would not strictly be necessary as long as you published exactly how the first stage was done, but one could imagine that it could be convenient.
Now, if that kind of publishing of the intermediate data is of significant interest to you, then you need to consider portability, arguing towards csv or xlsx or netcdf or hdf5. But if you are not planning to do that, then efficiency becomes more important, arguing towards binary files or .mat files or databases.
per isakson
per isakson on 13 Sep 2017
Edited: per isakson on 13 Sep 2017
The first time I read your comment I understood that "I am beginning to standardize all the work done so far" included not only your own work. However, after a second reading I believe it's your own work.
You'll find a lot if you google for "management scientific data". E.g this four minute video on sharing data.
IMO: You should make a little study on "management of scientific data" focused on your scenario. Write user stories, make small experiments, what is the practice in your field, find good examples of your colleagues (include other univ.), compile a report.
There are many contributors here who work with "scientific data". Maybe a question with the title "management of scientific data" would catch their eye. But the question itself must be more specific.
writing, reading, sharing, archiving

Sign in to comment.

Answers (1)

Javi
Javi on 20 Sep 2017
Thank you all. I think I will use csv files, it seems to be the fittest (easy, portable, low memory,... and universal). Thanks for imagestore() function, it seems really helpful. Bests
  1 Comment
Guillaume
Guillaume on 20 Sep 2017
As far as memory is concerned a text format is the worst. Converting text to/from numbers is not particularly fast either.
While the csv format is portable and easily read by humans, one of its major shortcoming is that it is only suitable for storing one matrix (Otherwise, you end up having to define a syntax format for the file and it's no longer portable).
Whenever I process scientific data, I not only store the result of that processing but also all the inputs and configuration switches that were used to obtain that result, the software version of any code that was used, etc, so that when a few months/years later questions are asked about the results I can always go back and reproduce them exactly as they were generated. All of that I store in a mat file.

Sign in to comment.

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!