You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
Data processing for different size matrixes
1 view (last 30 days)
Show older comments
Hello everyone, I was wondering if someone could guide me for “from where to start?” I have almost 20 data files each one is of different size (e.g., 13000*6 2000*6 5000*6 etc). Now I want to produce a 3D plot from this data. The first value in each data set is a time which is different in a sense that each data file has different time interval for readings e.g., at 0.1s, some at 0.5s interval some at 1s some at 2 etc. I tried rounding of time and to pick one data set at round figures (e.g., 1s interval) but it is affecting results because readings or at very small time intervals? I would appreciate it very much if someone could help me pass this step with possible accuracy.
2 Comments
Star Strider
on 12 Apr 2016
‘The first value in each data set is a time ...’
Do you mean the first row, the first column or something else?
You imply that you want to combine them. You might be able to interpolate them to the same times for all of them, depending on the data.
How would you want to plot them?
amberly hadden
on 12 Apr 2016
Edited: amberly hadden
on 13 Apr 2016
Time is the first column and next columns are parameters i.e., results of experiments for different locations . I'm interested in producing a 3D model out of this data as I have time (or depth), lat, lon, parameter. So I need one size for all matrixes to mesh and then produce a model. Thank you star once again for being so helpful.
Accepted Answer
Star Strider
on 12 Apr 2016
‘So I need one size for all matrixes to mesh and then produce a model.’
I’m still not quite certain where you’re going with your data, but if you want them all to have the same time base (and all time bases begin with zero or some specific number), I would choose the smallest absolute time interval, defined by ‘t(end)-t(1)’ for each data set, define the sampling interval you want (the time difference between any two consecutive samples), and then use the interp1 function to limit them to a common time base without extrapolating.
Read the documentation for the interp1 function, experiment with it on created (not actual) data to understand how it works, then use it on your data.
If all goes well, this will work!
22 Comments
amberly hadden
on 13 Apr 2016
Perfect thank you I'm going to try this..
Star Strider
on 13 Apr 2016
My pleasure!
Let me know if you have any problems with it. I don’t have your data, so all I can do is describe what I would do.
amberly hadden
on 14 Apr 2016
Dear Star... I promise this is my last question for this month I tried using interp1 but could't understand it completely. Then I decided to go with following scheme 1) round depths (column a)for each data set and use nearest neabour to pick up data set but unfortunately my program below is just working for two data sets I cannot add as many as data files I want. Plus I'm not sure how good I would be able to get some results out of it.
Mds = load('data1.txt'); Mas = load('data2.txt');
x = round(Mas(:,1:2).*1e3) / 1e3; y = round(Mds(:,1:2).*1e3) / 1e3;
M1 = [x(:,1),Mas(:,2),Mas(:,3), Mas(:,4)];
M2 = [y(:,1),Mds(:,2),Mds(:,3), Mds(:,4)];
M1LL = M1(:,1:2); M2LL = M2(:,1:2);
[M1inM2, M2idx] = ismember(M1LL, M2LL, 'rows'); joinedM1M2 = [M1(M1inM2,:), M2(M2idx(M1inM2), 3)];
W11 = [joinedM1M2(:,1) joinedM1M2(:,2) joinedM1M2(:,3)]; W22 = [joinedM1M2(:,1) joinedM1M2(:,2) joinedM1M2(:,4)];
Data file is attached Thank you in advance
Star Strider
on 14 Apr 2016
I apologise for the delay. Life interferes.
I interpolated the data by depth, but the depth vector is rather narrow to avoid extrapolating the shorter depth records, and non-overlapping depth records. It is easier to limit the longer depth records to the shortest depth interval, in order to avoid extrapolating and inserting a series of NaN values at the ends. Extrapolating is bad regardless.
There is now one common depth vector, ‘dep_intrp’, with the rest of your ‘xyz’ values interpolated to match those depths in the ‘xyz’ matrix:
[D,S,R] = xlsread('amberly hadden inpolygon.xlsx');
dxyzn = D(~isnan(D));
dxyz = reshape(dxyzn, 14, 4, []); % Remove ‘NaN’ Values
dep_limm = squeeze([min(dxyz(:,1,:)) max(dxyz(:,1,:))]);
dep_intrp = linspace(max(dep_limm(1,:)), min(dep_limm(2,:)), 20); % Depth Interpolation Vector
for k1 = 1:3
xyz(:,:,k1) = interp1(squeeze(dxyz(:,1,k1)), dxyz(:,2:end,k1), dep_intrp, 'linear');
end
figure(1)
for k1 = 1:3
subplot(1,3,k1)
plot3(xyz(:,1,k1), xyz(:,2,k1),xyz(:,3,k1), '.-b')
grid on
end
set(gcf, 'Position', [300 400 946 288])
figure(2)
for k1 = 1:3
subplot(3,1,k1)
plot(dep_intrp, xyz(:,1,k1), '.-', dep_intrp, xyz(:,2,k1), '.-', dep_intrp, xyz(:,3,k1), '.-')
grid
if k1 == 1
legend('x', 'y', 'z', 'Location','northoutside', 'Orientation','horizontal')
end
end
xlabel('Depth')
The ‘figure(1)’ plot simply plots ‘x’, ‘y’, and ‘z’. I have no idea what you want to plot, or how you want to plot it. I just did these plots to be sure the interpolation yielded what appear to be decent data.
amberly hadden
on 15 Apr 2016
Thank you so much star star you are live saver.... I went thorugh code it works really good I'm just getting one error when I'm using real data set which is 26834*58 with total number of element dxyzn = 397256*1, D = 26834*49, R = 26835*49 and S = 1*49. I changed dxyz = reshape(dxyzn, 49, 4, []); but it is giving following error Product of known dimensions, 107340, not divisible into total number of elements, 397256.
Star Strider
on 15 Apr 2016
My pleasure!
The data in your file were 14 rows by 12 columns (after I eliminated the NaN values) that, owing to their format, I reshaped into a (14x4x3) matrix.
I’m not certain what your actual data set is, or how it is organised. I’m having problems analysing the numerical data to attempt to deduce the structure of your file. Your original matrix in ‘D’ has 1314866 elements. The difference between that and ‘dxyzn’ are the number of NaN values. Since ‘dxyzn’ has 397256 elements, divided by 4 (the number of columns in each ‘depth|x|y|z’ matrix segment), you should be able to reshape it to a (99314x4) matrix. I would do this reshape call:
dxyz = reshape(dxyzn, 99314, 4);
The problem then is to split them up into the appropriate number of depth segments. I have to leave that to you, because I have no idea how your data are organised. If you have the same number of rows in each depth segment (as in the file you posted), then you can change the reshape call to do that easily, just as I did with your posted file. Otherwise, you need to create a series of cell arrays. That isn’t difficult from a programming perspective, but does complicate your analysis.
If you can provide a bit more detail as to how your file is organised, I might be able to provide more exact coding for how to use the reshape function with it.
amberly hadden
on 15 Apr 2016
Thanks star, I attached here the real data file now I understand it is not an easy step to proceed for. Data files not only have different intervals but also different size as well unfortunately.
Star Strider
on 15 Apr 2016
There are so many NaN values in your data that I can’t reshape the matrix. Some rows are obviously incomplete, but I can’t determine which. You may have to end up editing it manually to create a file that can be processed. I can’t do that.
The best I can do with your present data is:
[D,S,R] = xlsread('amberly hadden test_1.xlsx');
figure(9)
plot(D(:,1), D(:,2:4))
grid
That does produce an interesting plot, but that’s about all.
If you have different numbers of depth data (rows) in your data, you will probably have to segment them manually as well. If you had complete data with no NaN values, this would be straightforward.
Sorry, but that’s the best I can do.
amberly hadden
on 15 Apr 2016
This is so nice of you star and I appreciate it very much.I will try to solve the problem and would update you if successful :) Cheers
Star Strider
on 15 Apr 2016
As always, my pleasure.
When you get your matrix edited to your satisfaction (only you know how best to do that because you know what it is and what’s important), I’ll be glad to help you interpolate and analyse your data.
Consider using cell arrays, and possibly the mat2cell function if it’s appropriate.
amberly hadden
on 15 Apr 2016
Sounds good star.. I woul try this over the weekend and would update you. Have a great weekend :)
Star Strider
on 15 Apr 2016
You, too!
amberly hadden
on 19 Apr 2016
Edited: amberly hadden
on 19 Apr 2016
hello star - I spent my whole weekend on this but unfortunately couldn't get what I wanted to. Briefly, I have a dataset which I want to invert in a mesh file to create a 3D model. Data is x y z and parameter to model, x and y remains constant for individual data set but depth varies. x and y varies for different data files. Lets call x and y location coorditaes and z as height. Now I put togather all data files on top of each other and write the following program to give parameter as out put for each x,y and z value defined in start of program. But my out put file is always giving me zero data. Could you please help me find error? Thank you
amberly hadden
on 19 Apr 2016
attaching code in .mfile
Star Strider
on 19 Apr 2016
It would help if you formatted your code. Select (highlight) your code, then use the [{}Code] button.
A few observations:
This is likely not the correct way to use the all function:
indices = find(all(:,1) >= emin & all(:,1) <= emax &...
since it returns a logical array here, returning true if the 4th dimension of a 2D array is non-zero. (Note that you are not testing the 4th row or column, but the entire array.) These (0,1) logical values become (0,1) numeric values when you use them in a calculation. (Check the documentation for it.) Also, check to see what your ‘indices’ variable contains.
It is very difficult to follow your code, so I will leave you to troubleshoot it beginning with that find call. You likely need to get that sorted before you go any further.
amberly hadden
on 19 Apr 2016
sounds good I would spend time in it. Thank you for your help
Star Strider
on 19 Apr 2016
As always, my pleasure.
amberly hadden
on 20 Apr 2016
Star hello - here is the more refine idea (Figure in attachment). I have x-y and z values and parameter to model. I want to break x and y in 0.005 intervals and z in 0.1m interval. This will generate very small tiny cells for x-y and z. I'm then trying to use my data file and trying to get values on each 0.005 and 0.1 intervals. So let say I have an area and depth first I want to divide area in very small celss and then depth intervals. Any data which falls in or on boundery of cell will be part of that specific cell. That is why I start breaking x and y in 0.005 intervals and z in 0.1 intervals. But I did my best to get out put which is again nothing else but zero unfortunately. [
a = xlsread('test.xlsx'); all = a(:,1:4); %% % space = 0.005; x = (min(a(:,1)):0.005:max(a(:,1))); y = (min(a(:,2)):0.005:max(a(:,2))); zcells =(min(a(:,3)):0.1:max(a(:,3))); %% dimx = length(x) - 1; dimy = length(y) - 1; dimz = length(zcells) - 1; dimmodel = dimx*dimy*dimz; [X,Y]=meshgrid(x,y);
clear count clear i clear space %% Sort the Data into Cells and out put the mesh file for parameter percentile = 1.0; model = zeros(dimmodel,1); % clc timet = 0; if percentile > 0.9 && percentile <= 1.0 C = 1; elseif percentile > 1.0 percentile <= 0.0 errordlg('The percentile must be between 0 and 1.0!'); return else C = 0; end for i = 1:dimy tic; disp(['y-index ',num2str(i),' of ',num2str(dimy)]); for j = 1:dimx for k = 1:dimz cellno = dimx*dimz*(i-1)+dimz*(j-1)+k; emin = x(j); emax = x(j+1); nmin = y(i); nmax = y(i+1); zmin = zcells(k); zmax = zcells(k+1); % m = 1; % while m <= length(all) indices = find(all(:,1) >= emin & all(:,1) <= emax &... all(:,2) >= nmin & all(:,2) <= nmax &... all(:,3) >= zmin & all(:,3) <= zmax); % m = m + 1; % end if ~isempty(indices) addition(1,:) = all(indices,4); addition = sort(addition); start = length(addition)-ceil(length(addition)*percentile)+C; if start < 1 start = 1; end useadd = addition(start:length(addition)); count = length(useadd); model(cellno,1) = sum(useadd)/count; else model(cellno,1) = 0; end clear addition clear useadd clear count end end time = toc; timet = timet + time; clc disp(['Time for run: ',num2str(floor(time/60)),' minutes and ',num2str(((time/60)-floor(time/60))*60),' seconds']); disp(['Estimated time remaining: ',num2str(floor(time*(dimy-i)/3600)),' hours and ',... num2str(((time*(dimy-i)/3600)-floor(time*(dimy-i)/3600))*60),' minutes']); end
dlmwrite([num2str(dimx),'x',num2str(dimy),'x',num2str(dimz),'_test.txt'],... model,'precision',7,'newline','pc'); ]
Star Strider
on 20 Apr 2016
I still cannot follow your code. (Also, please format it. Use the {}Code button at the top of the text entry window.)
1. Please do not do this:
all = a(:,1:4);
The all function is a logical function that is important in doing vector comparisons. You are ‘overshadowing’ it, so any call to it will reference the variable and not the function. The solution is simply to rename it to something that is meaningful in the context of your code. (Calling it ‘Data’ is usually my choice.) The MATLAB Editor makes a global-search-and-replace relatively easy.
2. Again, please check this line to be certain it does what you want:
indices = find(all(:,1) >= emin & all(:,1) <= emax &... all(:,2) >= nmin & all(:,2) <= nmax &... all(:,3) >= zmin & all(:,3) <= zmax);
Even though you ‘overshadow’ the all function earlier, you might actually need it (or the related any function) here. Experiment by choosing one comparison, looking at the contents of the column, the value you are comparing it to, and the result of that comparison.
For instance:
Comparison_1 = all(:,1) >= emin
Comparison_2 = all(:,1) <= emax
Cmp_1_and_Cmp_2 = Comparison_1 & Comparison_2
Do the same for the others, and examine the results of these operations. Go through all the logic in your find call systematically to see what the results are, and then make the appropriate changes if the results are not what you want.
3. After you get your logic sorted, read through the documentation on the various interpolation functions, since it seems to me that you want to do some sort of interpolation. See particularly the interpolation functions: interp2, interp3, scatteredInterpolant, and choose whatever interpolation function is best for your data.
Regardless, do not extrapolate! Extrapolating will destroy the credibility of your data.
amberly hadden
on 20 Apr 2016
Sounds like a lot to work on... Thanks though. I would be pausing it here to get ready for final exams and would come back again to finish up with this step when I would be completely free of all worries :) means final exams starting from 24 until may 11.
Star Strider
on 20 Apr 2016
My pleasure.
It’s really not that much, but it does require that you code those comparisons to do what you want them to do. I have no idea what you want to do, so I can’t offer any specific suggestions.
The first thing I would do would be to rename your ‘all’ matrix to something that does not overshadow the all function, since you may need it. You can do that in the next five minutes.
You really do need to see what your comparisons are doing, and if they are doing what you want them to. This is especially relevant to the find call. You may want to break up the find call into several different find calls, then use the appropriate set functions, such as intersect, to select the indices you want.
Best of luck on your finals! I’m certain you will do well!
amberly hadden
on 20 Apr 2016
Thank you star I needed this to produce results for a presentation and if I spend time on it I would not be able to prepare for exams. The surprising thing is the program I sent you with the exception of 'all' is working for some other data set which means I'm having some error in defining intervals of may be missing something perhaps mesh(x,y,z)? or something else may be. I would work on it after exam. Best regards, Amb
More Answers (0)
See Also
Categories
Find more on Data Preprocessing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)