use xlread to read csv file with mixed type data is very slow

2 views (last 30 days)
Hi
I have a csv file with mixed data type like below, first column is string, 2nd column is numerical, 3rd column tothe end columns is date string. Total row number is 25000.
I used xlread() shown below to read the two csv files. it tooks me 10 minutes to read two csv file. The first csv file has only 10 columns while 2nd file has 400 columns. I am wondering if it is possible to speed it up using different approaches? Thanks
[num_steamdate,txt_steamdate,raw_steamdate] =xlsread(filename_WellDate);
[num_fracdate,txt_fracdate,raw_fracdate] =xlsread(filename_WellFracDate);

Answers (1)

Cris LaPierre
Cris LaPierre on 27 Aug 2020
I would suggest using readtable instead. It creates a table, which supports mutiple data types. If you need help, consider sharing your data files. I'm not feeling motivated enough to transcribe your screenshot for testing.
  16 Comments
Cris LaPierre
Cris LaPierre on 29 Aug 2020
Ok, it appears you cannot use implicit expansion on datetime arrays. You must make sure both arrays have the same number of rows AND columns. You need to repeat the column of wellfracdate_array to match the width of wellsteamdate_array (391 columns). You can do this with repmat.
filter_bfFrac_c1=and(wellsteamdate_array >= repmat(wellfracdate_array(:,1),[1,size(wellsteamdate_array,2)]),...
wellsteamdate_array < repmat(wellfracdate_array(:,2),[1,size(wellsteamdate_array,2)]));
filter_bfFrac_c2=and(wellsteamdate_array >= repmat(wellfracdate_array(:,3),[1,size(wellsteamdate_array,2)]),...
wellsteamdate_array < repmat(wellfracdate_array(:,4),[1,size(wellsteamdate_array,2)]));
filter_bfFrac_c3=and(wellsteamdate_array >= repmat(wellfracdate_array(:,5),[1,size(wellsteamdate_array,2)]),...
wellsteamdate_array < repmat(wellfracdate_array(:,6),[1,size(wellsteamdate_array,2)]));
filter_bfFrac_c4=and(wellsteamdate_array >= repmat(wellfracdate_array(:,7),[1,size(wellsteamdate_array,2)]),...
wellsteamdate_array < repmat(wellfracdate_array(:,8),[1,size(wellsteamdate_array,2)]));
filter_bfFrac_c5=and(wellsteamdate_array >= repmat(wellfracdate_array(:,9),[1,size(wellsteamdate_array,2)]),...
wellsteamdate_array < repmat(wellfracdate_array(:,10),[1,size(wellsteamdate_array,2)]));
filter_onFrac_c1=(wellsteamdate_array == repmat(wellfracdate_array(:,2),[1,size(wellsteamdate_array,2)]));
filter_onFrac_c2=(wellsteamdate_array == repmat(wellfracdate_array(:,4),[1,size(wellsteamdate_array,2)]));
filter_onFrac_c3=(wellsteamdate_array == repmat(wellfracdate_array(:,6),[1,size(wellsteamdate_array,2)]));
filter_onFrac_c4=(wellsteamdate_array == repmat(wellfracdate_array(:,8),[1,size(wellsteamdate_array,2)]));
filter_onFrac_c5=(wellsteamdate_array == repmat(wellfracdate_array(:,10),[1,size(wellsteamdate_array,2)]));
roudan
roudan on 29 Aug 2020
Edited: roudan on 29 Aug 2020
Wow, Cris, Yes it works finally. Thank you so much, I really appreciate it!! i l got very goof learning on readtable(). thank you!
But going back to use xlsread(), the speed in my computer is the same, but using xlsread() actually leads to cleaner code. i have the other file with same number of row and column but with numberic data, reading it using readcsv is much faster.
Here is my code, you can give it a try for timing check.
[num_steamdate,txt_steamdate,raw_steamdate] =xlsread(filename_WellDate);
[num_fracdate,txt_fracdate,raw_fracdate] =xlsread(filename_WellFracDate);
wellsteamdate_str=raw_steamdate(:,4:end);
wellsteamdate=datenum(wellsteamdate_str(:),'mm/dd/yyyy'); % convert it to 1D column since datenum is expecting column or row not array
wellsteamdate_array=reshape(wellsteamdate,size(wellsteamdate_str)); % convert 1D back to original shape
wellfracdate_str=raw_fracdate(:,4:end);
wellfracdate=datenum(wellfracdate_str(:),'mm/dd/yyyy'); % convert it to 1D column since datenum is expecting column or row not array
wellfracdate_array=reshape(wellfracdate,size(wellfracdate_str));
filter_bfFrac_c1=and(wellsteamdate_array>=wellfracdate_array(:,1), wellsteamdate_array<wellfracdate_array(:,2));
filter_bfFrac_c2=and(wellsteamdate_array>=wellfracdate_array(:,3), wellsteamdate_array<wellfracdate_array(:,4));
filter_bfFrac_c3=and(wellsteamdate_array>=wellfracdate_array(:,5), wellsteamdate_array<wellfracdate_array(:,6));
filter_bfFrac_c4=and(wellsteamdate_array>=wellfracdate_array(:,7), wellsteamdate_array<wellfracdate_array(:,8));
filter_bfFrac_c5=and(wellsteamdate_array>=wellfracdate_array(:,9), wellsteamdate_array<wellfracdate_array(:,10));
filter_onFrac_c1=(wellsteamdate_array==wellfracdate_array(:,2));
filter_onFrac_c2=(wellsteamdate_array==wellfracdate_array(:,4));
filter_onFrac_c3=(wellsteamdate_array==wellfracdate_array(:,6));
filter_onFrac_c4=(wellsteamdate_array==wellfracdate_array(:,8));
filter_onFrac_c5=(wellsteamdate_array==wellfracdate_array(:,10));

Sign in to comment.

Products


Release

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!