To add more detail, what i have tried in excel is to take the differnece in time points (since each point is identical expect for the loop) to try and show where i need to split the column. This is a looooong process though using excel and then i am trying to maually copy and paste each segment into a new colunm repsectivly.
You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
Parse 1 large column of uneven data into an array of columns by nth rows.
2 views (last 30 days)
Show older comments
Everyone, this may be a very easy question, and as i mentioned in previous work i am a biochemist trying to apply some basic (I think basic) metlab to help expidite data managemnet. I have a text file that is 4 colunms by 136000 rows (ish). (see attached text). I am running a loop where voltage is constnat and current is measured. The time between each measuremnt is identical OTHER then when the loop occurs creating some larger time gap. I want to extract each loop in reshape (neat metlab word i recently leanr haha) the text so that each colunm is one loop for however many loops there are.
So.. the data file is to large and it wont let me attach. I can explain the data in more detail if needed.
7 Comments
Jeffrey Clark
on 10 Nov 2022
@David Probst, please make a small file to attach or at least tell us what each of the four columns contains, you only mention time, voltage and current.
Jeffrey Clark
on 11 Nov 2022
@David Probst, thanks but the file you attached was created as text with only 3 significant digits, this quickly becomes a problem for the time column after line 200 or so (201 x 0.05 needs 4 significant digits and it just keeps getting worse). Please repost with sufficient significant digits for the entire "short" file so we can see what the gaps are supposed to be like.
David Probst
on 14 Nov 2022
The "normal" time step for each point should be ~ 0.01-0.07 seconds, and the loop occurs when the delta in time is about 340 seconds (plus and minus some decimal). so in excel I take the differnee in each time point in a seperate colunm, then do conditional formating to highlight points higher then 1 (which only is the loop wait times). After I manually copy and paste each colunm bteween the highlighted region. Obviously this is a VERY long and VERY VERY ineffiecnt. I apoliges if there is missing inofmration you need, I am a biochemcist trying to learn this basic (I think basic) matlab to help out our research.
Answers (1)
Bjorn Gustavsson
on 11 Nov 2022
If your file contains an equal number of samples for every "loop" and the files contains data from full "loops". Then something like this should work:
n_per_loop = 37; % adjust
idx_time_var = 1; % index to the column with the time-stamps
sz_data = size(datasmaple);
n_loops = floor(sz_data(1)/n_per_loop); % this should get us the number of full loops
% This should extract the time-stamps and put them in an
% [ n_per_loop x n_loops ] array
t_all = reshape(datasmaple(1:(n_per_loop*n_loops),idx_time_var),n_per_loop,n_loops);
for i2 = setdiff(1:sz_data(2),idx_time_var) % Here we loop over the other columns
% Extract those columns reshape them and put them into a cell-array
data_cell_format{i2} = eshape(datasmaple(1:(n_per_loop*n_loops),i2),n_per_loop,n_loops);
end
HTH
13 Comments
David Probst
on 11 Nov 2022
Bjorn,
Thank you for the code, the loops sizes are not the exact same (they are all about 1-3 differente in size) although i think if I run them as the same it should still work, since the change in reuslts would be negibible.
that bing said I am working through your code and get this error:
>> ParseTest
Error using tabular/reshape (line 194)
Undefined function 'reshape' for input arguments of type 'table'.
Error in ParseTest (line 7)
t_all = reshape(datasmaple(1:(n_per_loop*n_loops),idx_time_var),n_per_loop,n_loops);
Im playing with a few things such as ensure the n_per_loop is a number that can be equally divided into the total data set (right now there is some points a the the end that are "leftover" since the loop size is not perfectly divisible in the total number). IDK if thats the issue. Again i apologize if this is not the most helpful. I appreciate the support.
Bjorn Gustavsson
on 11 Nov 2022
Then I would convert the table to an array (but that's much because I've never had any reason to start working with tables). Step 2 would be to create an array with indices of the end of each "loop" (did those correspond to end-of-day?). Perhaps something like this as a first step:
datasample = table2array(datasmaple);
sz_data = size(datasmaple);
t_all = datasample(:,idx_time_var); % lets say the time is in seconds
doy = round(t_all/(24*3600))+1;
udays = sort(unique(doy),'descend');
% Extract those columns reshape them and put them into a cell-array
% now it seems we have to this day-by-day
for iDay = udays
iCurr = find(doy==iDay);
for i2 = 1:sz_data(2) % Here we loop over all columns
data_cell_format{i2}(1:numel(iCurr),iDay) = datasample(iCurr,i2);
end
end
It is most likely not the fastest, but should get the job done...
HTH
David Probst
on 14 Nov 2022
I will try this today and let you know! thank you for the support, I may post some questions (so i can learn some of the detaisl here) if that cool!
David Probst
on 14 Nov 2022
Also, the time goas are not in the way of "end of each day" its more of about every 6000 points +/- 5-10 points due to sampling error of the tool. So one coulnm made need to be about 6000 rows while the other is 6005, and the time jump is 0.5 second for every point, execpt for when the loop occurs which has a longer (~340 second) wiat time.
Bjorn Gustavsson
on 14 Nov 2022
Well the "only" thing to change then is to build the index to the current loop taking that into account. Something like this:
dt_jump = 100; % or whatever jump-condition you can use between loops
idxEnd = [find( diff(t_all) > dt_jump ), numel(t_all)];
idxStart = [1,idxEnd(1:end-1)+1];
Then change the loop to something like this:
for iLoop = 1:numel(idxStart)
iCurr = idxStart(iLoop):idxEnd(iLoop)
for i2 = 1:sz_data(2) % Here we loop over all columns
data_cell_format{i2}(1:numel(iCurr),iLoop) = datasample(iCurr,i2);
end
end
David Probst
on 14 Nov 2022
So to make sure i understand this, i am usnig the following code:
datasample1 = table2array(datasmaple);
sz_data = size(datasmaple);
idx_time_var = 1;
t_all = datasample1(:,idx_time_var); % lets say the time is in seconds
dt_jump = 100; % or whatever jump-condition you can use between loops
idxEnd = [find( diff(t_all) > dt_jump ), numel(t_all)];
idxStart = [1,idxEnd(1:end-1)+1];
for iLoop = 1:numel(idxStart)
iCurr = idxStart(iLoop):idxEnd(iLoop)
for i2 = 1:sz_data(2) % Here we loop over all columns
data_cell_format{i2}(1:numel(iCurr),iLoop) = datasample(iCurr,i2);
end
end
There is an issue at line "idxEnd = [find( diff(t_all) > dt_jump ), numel(t_all)];". I lloked up each definiton, and from what I understnad this is trying to build some index array where the frist colunm is where we find the change in time to be anything larger then 100, and the seocnd function numel(t_all) is to ensure we do this over the entire dataset?
Bjorn Gustavsson
on 14 Nov 2022
Yes, as far as I understand your comment at your question, you have sample-intervalls between 0.01 and 0.07 seconds during the loop and then more than 100 s dwell-time after the end of one loop and the start of the next. If that is a time-step separation that is reliable you should be able to use the time-differences that are larger than 1 s (10? 50?) to identify the end of one and start of the next loop.
David Probst
on 21 Nov 2022
Hey so i am having challenges with implementing the following code:
datasample1 = table2array(datasmaple);
sz_data = size(datasmaple);
idx_time_var = 1;
t_all = datasample1(:,idx_time_var); % lets say the time is in seconds
dt_jump = 5; % or whatever jump-condition you can use between loops
idxEnd = [find(diff(t_all)>dt_jump,numel(t_all))];
idxStart = [1,idxEnd(1:end-1)+1];
for iLoop = 1:numel(idxStart)
iCurr = idxStart(iLoop):idxEnd(iLoop)
for i2 = 1:sz_data(2) % Here we loop over all columns
data_cell_format{i2}(1:numel(iCurr),iLoop) = datasample(iCurr,i2);
end
end
It seems to break down where i bolded (idxStart).
Any advice for suggestions for me to test? I appreciate the support!
David Probst
on 21 Nov 2022
SO i changed the code to the following:
datasample1 = table2array(datasmaple);
sz_data = size(datasmaple);
idx_time_var = 1;
t_all = datasample1(:,idx_time_var); % lets say the time is in seconds
dt_jump = 5; % or whatever jump-condition you can use between loops
idxEnd = [find(diff(t_all)>dt_jump,numel(t_all))];
idxStart = [idxEnd(1:end-1)];
for iLoop = 1:numel(idxStart)
iCurr = idxStart(iLoop):idxEnd(iLoop)
for i2 = 1:sz_data(2) % Here we loop over all columns
data_cell_format{i2}(1:numel(iCurr),iLoop) = datasample(iCurr,i2);
end
end
It wokrs but fills the final arrays with only the first row and colunm with vlaues all other values are 0.
Bjorn Gustavsson
on 22 Nov 2022
It might be easiest if you mock up a very small test data set, with say some 5 data-points between jumps and a coupld of loops, 6-7 perhaps. Then you can step through the algorithm with the debugger line by line and check what the different variables and indices should be and how to correct them. if you for example define a time-variable like this:
t_all = [1:5,11:15,21:25,31:34,41:45]';
You should rather easily be able to make some data, and figure out the indices to the starts and stops of each loop, then check that you get from the code-snippet and how to modify it.
David Probst
on 22 Nov 2022
That is a good idea! thank you! I appreicate the support, i am very very novice to coding.
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)