ANOVA for linear mixed-effects models produces misleading main effects

12 views (last 30 days)
Hello,
I'm using fitlme for modeling my data (followed by anova). I always assumed that the stats for the main effect for a specific fixed effect reflect the significance of the difference between means, but now I'm not so sure.
The script belwo generates simulated data for model with two fixed effects - one continous and one categorical, with two levels. The data was generated to produce a significant interaction, but no main effects (i.e., the means are identical for both groups). However, the model yields a significant main effect of group, simply because the intersects are different (although the means are the same). Shifting the data on the x-axis produced markedly different results.
I'm hoping to still find a way to use these functions to measure both main and interaction effects, but I'm not sure what I'm doing wrong and how to fix it.
Any guidance will be much appreciated.
x=0:100;
y_group1=50+randn(1,101)*10;
y_group2=y_group1+x-50;
figure;plot(x,y_group1,x,y_group2);
mean(y_group1)==mean(y_group1)
% ans =
%
% logical
%
% 1
tab=table([x';x'],[y_group1';y_group2'],[ones(101,1);2*ones(101,1)],'variablenames',{'x','y','group'});
tab.group=categorical(tab.group);
model=fitlme(tab,'y~x*group');
anova(model)
% ANOVA marginal tests: DFMethod = 'Residual'
%
% Term FStat DF1 DF2 pValue
% {'(Intercept)'} 755.21 1 198 1.7114e-69
% {'x' } 0.16396 1 198 0.68597
% {'group' } 388.26 1 198 1.4799e-48
% {'x:group' } 520.27 1 198 2.6278e-57
% Strong effect of group, even though means are equal
tab.x=tab.x-50;
figure;plot(x-50,y_group1,x-50,y_group2);
model=fitlme(tab,'y~x*group');
anova(model)
% ANOVA marginal tests: DFMethod = 'Residual'
%
% Term FStat DF1 DF2 pValue
% {'(Intercept)'} 3052.7 1 198 2.8264e-122
% {'x' } 0.16396 1 198 0.68597
% {'group' } 1.3039e-28 1 198 1
% {'x:group' } 520.27 1 198 2.6278e-57
% Essentially the same data, but the intercept is eliminated. The group effect is gone

Answers (3)

the cyclist
the cyclist on 13 Feb 2024
Edited: the cyclist on 13 Feb 2024
I think it is easier to interpret the estimated model coefficients if you write out all terms for both groups, in the following format:
x = 0:100;
noise = 10*randn(1,101); % using same "noise" for both groups
y_group1 = 50 + 0*x + noise;
y_group2 = 0 + 1*x + noise;
What I see in your generative model is two effects:
  • intercept
  • group * slope
If you consider group 1 to be the reference group, then the effects are measured with respect to it. Therefore,
  • The intercept of the reference group is 50
  • The slope (w.r.t x) of the reference group is 0
  • The intercept effect (of group 2 relative to group 1) is -50
  • The group*slope effect (of group 2 relative to group 1) is 1
I think you kinda fooled yourself by plotting these data over a range where they have the same mean. But that's not actually salient for the model. You could have plotted these data over the range 1000:2000, where they would have radically different means. But the model would be the same.
The effects you specified are well estimated by fitlme:
tab=table([x';x'],[y_group1';y_group2'],[ones(101,1);2*ones(101,1)],'variablenames',{'x','y','group'});
tab.group=categorical(tab.group);
model_lme = fitlme(tab,'y~x*group')
model_lme =
Linear mixed-effects model fit by ML Model information: Number of observations 202 Fixed effects coefficients 4 Random effects coefficients 0 Covariance parameters 1 Formula: y ~ 1 + x*group Model fit statistics: AIC BIC LogLikelihood Deviance 1523.6 1540.2 -756.82 1513.6 Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper {'(Intercept)'} 52.779 2.0257 26.055 198 6.71e-66 48.784 56.774 {'x' } -0.025032 0.034998 -0.71523 198 0.47531 -0.094048 0.043985 {'group_2' } -50 2.8647 -17.454 198 6.4006e-42 -55.649 -44.351 {'x:group_2' } 1 0.049495 20.204 198 5.3453e-50 0.9024 1.0976 Random effects covariance parameters (95% CIs): Group: Error Name Estimate Lower Upper {'Res Std'} 10.254 9.3017 11.305
I hope that helps.

Jeff Miller
Jeff Miller on 13 Feb 2024
In addition to what @the cyclist said, here is another way to look at it:
"I always assumed that the stats for the main effect for a specific fixed effect reflect the significance of the difference between means..."
Intuitive as it seems, that's actually not true when models include continuous as well as categorical effects. For such models, the specific fixed effect reflects the significance of the (predicted) group difference when the continuous variable is zero. And, as you can see from your first graph, the groups clearly are different when x is 0, even though they aren't different on average across all of the x values.
  2 Comments
Eitan Schechtman
Eitan Schechtman on 14 Feb 2024
I guess my thinking on this is based on classic ANOVAs - those don't care about intercepts, but they do convey whether there's a group difference. Let me ask this then - if I want to analyze data that has both a categorical and continous variables (and some random effect too) and I want to estimate whether there is a group effect for the categorical variables, how would I go about that?
Say, for example, that the y data from before was scores in test A, the X is scores in test B, and the groups are different interventions, how should I analyze the data in a way that would not only show the interaction but also answer the question: is intervention 1 overall better than intervention 2 (real answer - it's not, it just makes everyone get the same score, around 50).
How can I estimate the significance of this effect (and not the intercept, which I don't care too much about)?
Thanks again for your guidance!
the cyclist
the cyclist on 14 Feb 2024
Edited: the cyclist on 15 Feb 2024
The model has to fit the hypothesis. Leaving out the intercept term:
x = 0:100;
noise = 10*randn(1,101);
y_group1 = 50 + 0*x + noise;
y_group2 = 0 + 1*x + noise;
tab=table([x';x'],[y_group1';y_group2'],[ones(101,1);2*ones(101,1)],'variablenames',{'x','y','group'});
tab.group=categorical(tab.group);
% No-intercept model, estimating coefficients for group's impact on (1)
% overall level and (2) multiplier on x
model = fitlme(tab,'y ~ -1 + group + x:group')
model =
Linear mixed-effects model fit by ML Model information: Number of observations 202 Fixed effects coefficients 2 Random effects coefficients 0 Covariance parameters 1 Formula: y ~ group + x:group Model fit statistics: AIC BIC LogLikelihood Deviance 2025.4 2035.3 -1009.7 2019.4 Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue Lower Upper {'group_2' } -3.782 7.0836 -0.53391 200 0.594 -17.75 10.186 {'x:group_2'} 1.0496 0.12239 8.5763 200 2.6875e-15 0.80829 1.291 Random effects covariance parameters (95% CIs): Group: Error Name Estimate Lower Upper {'Res Std'} 35.86 32.528 39.532
anova(model)
ans =
ANOVA MARGINAL TESTS: DFMETHOD = 'RESIDUAL' Term FStat DF1 DF2 pValue {'group' } 0.28506 1 200 0.594 {'x:group'} 73.553 1 200 2.6875e-15
The result is that the group has no (significant) effect on level, but has one on the interaction term of group:level.

Sign in to comment.


Jeff Miller
Jeff Miller on 14 Feb 2024
One common approach in this situation is to rescore x to shift the intercept to a value that you do care about--usually the mean of x. For example,
x2 = x - 50;
model2=fitlme(tab,'y~x2*group');
In model2, the categorical group effect tests whether the groups differ at x2=0 <=> at x=50. Given that the lines are not parallel in your original plot, it is kind of meaningless to ask whether the groups differ overall ignoring x, because the group difference in y clearly depends on which x you are asking about.
The situation is analogous to avoiding the interpretation of a main effect when there is a cross-over interaction in a purely categorical anova.
Didn't quite follow your example with tests A and B and the intervention, sorry.

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!