Scatter plot with two data sets of uneven values

Hi All,
Is it possible to create a scatter plot using two datasets of uneven values. For example, D1 and D1 both have x values that span 0 120 and y values of different parameters (D1 = oxygen, D2 = chlorine). However, D1 consists of 80 data points, and D2 consists of ~20. Moreover, the x values for D1 and D2 do not overlap.
If not, is there a recommended solution to make this easier? The only thing I can think of is to resample the data to a common axis, but that introduces data that are not real.
Thanks!

9 Comments

"Is it possible to create a scatter plot using two datasets of uneven values"
Sure
scatter(1:5,1:5)
hold on
scatter(10:12, 10:12)
"Moreover, the x values for D1 and D2 do not overlap"
Something tells me we're missing a piece of the picture needed to understand the problem.
Hopefully this is better clarified. X values (and their corresponding Y values) for D1 and D2 both span 0 120. For example D1 x = [0 20 40 60 80 100 120] and D2 x = [1 25 42 75 88 90 118]. So both sets of data have X values that fall in the range of 0 to 120, but it is my understanding that scatter plot data need to be the same vector length. So can I plot the Y values from D1 and D2 against eachother despite their X values not overlapping?
Are D1 and D2 vectors or matricies? I don't understand whether D1 and D2 represent x values (vectors) or [x,y] values (matrices).
In the example you gave, D1 and D2 are vectors of the same length (7 elements). Where do the y values come in?
Maybe you could show us an actual representation of the variables you're working with.
Apologies, I meant matricies. Here is a screenshot of the data:
As you can see, the x values in this example range 0 110 for both data sets. However, D1 and D2 are two different lengths, and the x values do not overlap in any way.
Screen Shot 2020-01-10 at 12.29.38 PM.png
That's much clearer. So you've got two sets of data, D1 and D2. Each set consists if an nx2 matrix of n [x,y] data points.
The problem still isn't clear, though. Why doesn't one of these solutions solve it?
scatter(D1(:,1),D1(:,2))
hold on
scatter(D2(:,1),D2(:,2))
or
D = [D1; D2];
scatter(D(:,1),D(:,2))
I am trying to plot D1(:,2) v D2(:,2)
ahhhhh...... got it now.
Ok, how to you expect that these values are paired? Here are some ideas.
  1. y values from dataset2 are paired with the first n values of dataset1 (this sounds arbitrary to me; I doubt this is what you want).
  2. y values from dataset2 are paired with the y values from dataset1 whose x values are closest between the two datasets. Note that this may result in more than 1 coordinate in dataset2 being paired with the same coordinate in dataset1, which is fine.
  3. Some other rule you have in mind.
I expect a fairly linear relationship between the variables. Option 2 sounds reasonable, but it would be best to mitigate any spruious data.
"Option 2 sounds reasonable"
It sounds like this decision hasn't been though out. The results will not be meaningful unless the pairing is meaningful. There are lots of ways to pair the two datasets and each of them will produce a very different result with a different interpretation.

Sign in to comment.

 Accepted Answer

Here's a demo you can follow.
It produces 2 datasets per your description; then it pairs the y values from dataset1 to dataset 2 according to proximity of the x values.
Then in plots the results. The data are random so don't expect linearity.
% Produce 2 datasets, one longer than the other; x values range from 0:110
dataset1 = [rand(100,1)*110, rand(100,1)];
dataset2 = [rand(50,1)*110, rand(50,1)*10];
% Find the rows of dataset1 that is closest to the
% x values in dataset 2
D = pdist2(dataset1,dataset2); % distance between each (x,y)
% D = pdist2(dataset1(:,1),dataset2(:,1)); % distance between each (x)
[~, minRow] = min(D);
% Plot results
plot(dataset1(minRow,2), dataset2(:,2),'o')

4 Comments

This worked great, thank you! Given the resolution of Dataset 1, the nearest neighbor method here is more than adequate.
Vince I'd like to add two things.
1) I added a line to my answer (it's commented-out)
% D = pdist2(dataset1(:,1),dataset2(:,1)); % distance between each (x)
I realized that the original line (the one above it) pairs the coordinates which might be exactly what you want. The new commented-out line does the pairing based only on the x values, in case that's what you wanted.
2) In order to see the distance between the paired values, you can add color that represents distance. This may be helpful to confirm that your pairing is reasonable.
D = pdist2(dataset1,dataset2); % distance between each (x,y)
[minDist, minRow] = min(D);
% Plot results
scatter(dataset1(minRow,2), dataset2(:,2),25,minDist,'filled')
cb = colorbar();
ylabel(cb,'nearest neighbor distance')
Great modification that strengthens the method. Thank you.
Glad it worked out!

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!