> How can I automatically adjust the y-axis of those plots so that only the boxplots and the whiskery but not the outliers are considered in determining the y-axis?
In boxchart, outliers are defined as values greater or less than 1.5*IQR from the box edges where IQR is the innerquartile range. The box edges are the 25th and 75th quartile of the data. So, the outlier bounds are the 25th quartile minus 1.5*IQR and 75th quartile plus 1.5*IQR. These are the bounds that will be used to define your y axis limit. For each box in the boxchart, these limits are computed as
iqrng = iqr(ydata);
lower = quantile(ydata, 0.25)-1.5*iqrng;
upper = quantile(ydata, 0.75)+1.5*iqrng;
The y limit will be the minimum lower value between all boxes and the maximum upper value between all boxes. This can be a bit tricky to compute when you're working with grouped boxes.
Here's a demo that creates a boxchart, computes the min and max outlier bound, and sets the y axis limit to the bounds. Don't miss the last section below on "A note on data visualization".
Create boxchart
All you need in your data is the "h" variable which his the handle to your boxchart object.
tbl = readtable('TemperatureData.csv');
monthOrder = {'January','February','March','April','May','June','July', ...
'August','September','October','November','December'};
tbl.Month = categorical(tbl.Month,monthOrder);
r = unique(randi(565,1,20));
tbl.TemperatureF(r) = 2*tbl.TemperatureF(r);
w = unique(randi(565,1,20));
tbl.TemperatureF(w) = -1*tbl.TemperatureF(w);
h = boxchart(tbl.Month,tbl.TemperatureF,'GroupByColor',tbl.Year);
ylabel('Temperature (F)')
Compute limits based on outlier bounds
Replace h with your boxchart object handle.
groups = findgroups(h(i).XData);
qtile.lower = splitapply(@(x)quantile(x,0.25),h(i).YData,groups);
qtile.upper = splitapply(@(x)quantile(x,0.75),h(i).YData,groups);
iqr = qtile.upper - qtile.lower;
upperbound = [upperbound; qtile.upper + 1.5*iqr];
lowerbound = [lowerbound; qtile.lower - 1.5*iqr];
ybound = [min(lowerbound), max(upperbound)];
A note on data visualization
The chart above is misleading because it hides many outliers that appear to not exist. There are two ways to imrpove this so that your data visualization more accuratly depicts your data.
- Turn off outliers using set(h, 'MarkerStyle','none'). Note, this is not the same as detecting and removing outliers from your data before plotting. Also note that you'll still need to implement my solution to update the axis limits.
- Clearly indicate that some outliers are outside of the chart within your text.