Create a Boxplot with two variables; one to separate into bins based on the other

10 views (last 30 days)
I want to create a boxplot with two variables. One of the variable will be binned based on the other. To clarify say one of the variable is time and the other is distance travelled. I want to bin the time into several bins based on which the distance will be binned and the boxplot will be generated. How to achieve this in Matlab ? Also I want the whiskers to be 5th and 95th percentile.

Answers (1)

Walter Roberson
Walter Roberson on 27 Jun 2015
numbins = 5;
binedges = linspace(min(FirstVariable), max(FirstVariable), numbins+1);
binedges(end) = inf;
[~, binnumbers] = histc(FirstVariable, binedges);
boxplot(SecondVariable, binnumbers, 'positions', binedges, 'whisker', 0.7193313666);
The 0.719etc value is based upon solving
erf((s + 2*s*w)/sqrt(2)) == 9/10
where s = solve(erf(s/sqrt(2)) == 1/2) = 0.6744897500 which is the z score for 50% coverage (in agreement with the first entry in the table at Wikipedia) . The 9/10 reflects that you want 5% left before and 5% left after the whiskers, leaving 90% within the whiskers.
The s + 2*s*w is based upon the whisker formula q3 + w*(q3-q1) where q1 and q3 are the first and third quartiles; in a normally distributed distribution the z*sigma that gives 50% coverage for the +/- z*sigma standard deviations is the 0.674etc noted above.
Calculating the right whisker length was the hardest part of this, which is the reason I show the work here; using this you can calculate what whisker length to use if you decide to change your 95% criteria. Others might find it useful as well. And some day I will probably look back at this post to work it out for another question.
  4 Comments
Sayantan Sahu
Sayantan Sahu on 28 Jun 2015
They should be all of uniform width. What if I want the bins to be from 40 to 2700 for every 250 ?
Walter Roberson
Walter Roberson on 28 Jun 2015
binedges = 40:250:2700;
and carry on with the rest, such as
binedges(end) = inf;
[~, binnumbers] = histc(FirstVariable, binedges);
boxplot(SecondVariable, binnumbers, 'positions', binedges, 'whisker', 0.7193313666);
Replacing the last edge with inf has to do with the fact that for histc(), the final bin counts values which are exactly the value of the last edge. If you had bins at 1 5 9 then that would be 3 bins, second of which would count values from 5 to less than 9, and the 3rd would count the exactly 9. Replacing the final bin with inf to make 1 5 inf causes the second bin to be from 5 and upward (but not infinity), and so you would include that final value 9 in the second bin.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!