Discrepancy between ANOVA and fitlme

11 views (last 30 days)
snj on 7 Oct 2015
Edited: snj on 12 Oct 2015
I thought ANOVA and linear models with categorical variables were equivalent when testing for differences across groups. In my case, I have N measurements, say Yi, of some quantity in M<N animals. There are more than one measurement/sample from each animal, to increase the reliability of these measurements. Those measurements can be regarded as independent samples of the same variable from the same animal. Additionally, the animals can be divided into two groups, a control and an affected group, and I wish to test for differences between these two groups. However, each animal can have its own level, which I am not interested in. Thus, I have tried ANOVA with random effects (over animals).
anovan(Yi,{group,animal'},'random',[2],'varnames',{'group','animal'},'nested',[0,0;1 0]);
In my case, I get a P value of about .17. Then I. A linear model with random effects using fitlme like so:
tbl = table(group,animal,Yi,'VariableNames',{'group,'animal',Yi});
mdl = fitlme(tbl,Yi~group+(1|animal)');
This time I get a P value for differences across groups of about .28. One difference is in the reported difference in the denominator degrees of freedom, which was 19.6 in the ANOVA case, and 2502 in the fitlme case (which sounds suspiciously large). In my case, N=2505 and M=21. What is the source of this difference?

Answers (1)

Gautam Pendse
Gautam Pendse on 9 Oct 2015
Hi snj,
I noticed the following three differences in the call to anovan and fitlme:
1. The call to fitlme should use restricted maximum likelihood to be compared to anovan. You can do that by saying 'FitMethod','reml' in the call to fitlme.
2. The random effect specification in fitlme should be (1|animal:group) since animal is nested in group.
3. The call to anova(mdl) should be replaced by anova(mdl,'dfmethod','satterthwaite') since that is the method used by anovan.
For this model, after you make these replacements and if the data is balanced, you should get the same results. I am pasting some code below to illustrate this:
%%Dummy data
% Animals in group 1 = N1 and group 2 = N2.
N1 = 1200;
N2 = 1200;
% Number of animals in each group.
M = 20;
% Random effects for animals in group 1 and group 2.
u1 = randn(M,1);
u2 = randn(M,1);
% Mean response.
mu = 1;
% Effect of group.
beta = 1;
% Data from group 1.
animal1 = repmat((1:20)',60,1);
group1 = ones(N1,1);
Y1 = mu + u1(animal1) + 0.5*randn(N1,1);
% Data from group 2.
animal2 = repmat((1:20)',60,1);
group2 = 2*ones(N2,1);
Y2 = mu + beta + u2(animal2) + 0.5*randn(N2,1);
% All data.
Y = [Y1;Y2];
animal = [animal1;animal2];
group = [group1;group2];
% Put variables in a table.
tbl = table();
tbl.Y = Y;
tbl.animal = categorical(animal);
tbl.group = categorical(group);
% Every combination of group and animal has the same number of
% observations - balanced data.
%%Fit with anovan
[P,T,STATS,TERMS] = anovan(tbl.Y,{tbl.group,tbl.animal},'random',2,'varnames',{'group','animal'},'nested',[0,0;1 0]);
%%Fit with fitlme using REML and (1|animal:group)
% Animal ID 1 represents a different animal in group 1 and 2. So use
% (1|animal:group) instead of (1|animal). Also use restricted maximum
% likelihood by specifying 'FitMethod' equal to 'reml'.
mdl = fitlme(tbl,'Y~group+(1|animal:group)','FitMethod','reml');
% Ask for 'Satterthwaite' degrees of freedom method in the call to anova.
  1 Comment
snj on 10 Oct 2015
Edited: snj on 12 Oct 2015
Hi Gautam.
Thank you very much for your very helpful response, which included a lot of information I wish was in the Matlab documentation! Unfortunately, your suggestions didn't resolve the difference, but perhaps this is because my data are not balanced. But why would that lead to a difference in the two approaches?
Your answer prompted a few other questions: Another thing which I find odd is the need to specify explicitly the nesting in the two calls, because my animal ID numbers are in fact unique, such that no numbers are repeated across groups. I would thus think that the nesting is implicit, but specifying it explicitly does change the results, at least in anovan. Why? Considering the number of degrees of freedom, is there any one method that is the more accepted for a problem like mine ?
Thanks again, SNJ

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!