fitlme different to lmer in R
    3 views (last 30 days)
  
       Show older comments
    
I am trying to illustrate Simpsons Paradox using fitlme:
dat = [
    8.7050   21.0000    1.0000
    7.3362   22.0000    1.0000
    6.2369   23.0000    1.0000
    4.8375   24.0000    1.0000
    4.9990   25.0000    1.0000
    13.5644   31.0000    2.0000
    12.5219   32.0000    2.0000
    12.3329   33.0000    2.0000
    11.4778   34.0000    2.0000
    10.0626   35.0000    2.0000
    18.9573   41.0000    3.0000
    17.9516   42.0000    3.0000
    15.1706   43.0000    3.0000
    16.1398   44.0000    3.0000
    15.1729   45.0000    3.0000
    23.7340   51.0000    4.0000
    23.6630   52.0000    4.0000
    22.4108   53.0000    4.0000
    21.0697   54.0000    4.0000
    20.3200   55.0000    4.0000
    28.0275   61.0000    5.0000
    28.0699   62.0000    5.0000
    27.3365   63.0000    5.0000
    25.8270   64.0000    5.0000
    24.5141   65.0000    5.0000];
tab = array2table(dat,'VariableNames',{'y','Age','Participant'})
figure,plot(tab.Age, tab.y,'o')
%tab.Participant = nominal(tab.Participant) % doesn't have any effect
m = fitlme(tab,'y ~ Age + (1|Participant)','FitMethod','REML')
but I get a positive coefficient for the fixed effect of Age, when I was expecting a negative one:
Linear mixed-effects model fit by REML
Model information:
    Number of observations              25
    Fixed effects coefficients           2
    Random effects coefficients          5
    Covariance parameters                2
Formula:
    y ~ 1 + Age + (1 | Participant)
Model fit statistics:
    AIC       BIC       LogLikelihood    Deviance
    120.67    125.21    -56.334          112.67  
Fixed effects coefficients (95% CIs):
    Name                 Estimate    SE          tStat      DF    pValue        Lower      Upper  
    '(Intercept)'        -4.4658       1.3833    -3.2283    23     0.0037183    -7.3274    -1.6042
    'Age'                0.49496     0.030545     16.204    23    4.4887e-14    0.43177    0.55815
Random effects covariance parameters (95% CIs):
Group: Participant (5 Levels)
    Name1                Name2                Type         Estimate      Lower    Upper
    '(Intercept)'        '(Intercept)'        'std'        2.4099e-16    NaN      NaN  
Group: Error
    Name             Estimate    Lower     Upper
    'Res Std'        2.1706      1.6259    2.898
If I run what I think is equivalent model using "lmer" in R on exactly the same data, I get a negative coefficient for Age, as expected:
summary(lmer('y ~ Age + (1|Participant)', data = tab)) # "tab" is dataframe version of tab above
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ Age + (1 | Participant)
   Data: tab
REML criterion at convergence: 88.2
Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.8806 -0.5016  0.1545  0.7367  1.3270 
Random effects:
 Groups      Name        Variance Std.Dev.
 Participant (Intercept) 673.2727 25.9475 
 Residual                  0.4153  0.6445 
Number of obs: 25, groups:  Participant, 5
Fixed effects:
            Estimate Std. Error t value
(Intercept) 65.94094   12.24102   5.387
Age         -1.13831    0.09058 -12.567
Correlation of Fixed Effects:
    (Intr)
Age -0.318
What am I doing wrong with fitlme?
2 Comments
  the cyclist
      
      
 on 25 Jun 2024
				I don't have an answer for you; instead, I'm going to make it even more puzzling.
If I include only the first three participants, I get the sign you expect.
dat = [
    8.7050   21.0000    1.0000
    7.3362   22.0000    1.0000
    6.2369   23.0000    1.0000
    4.8375   24.0000    1.0000
    4.9990   25.0000    1.0000
    13.5644   31.0000    2.0000
    12.5219   32.0000    2.0000
    12.3329   33.0000    2.0000
    11.4778   34.0000    2.0000
    10.0626   35.0000    2.0000
    18.9573   41.0000    3.0000
    17.9516   42.0000    3.0000
    15.1706   43.0000    3.0000
    16.1398   44.0000    3.0000
    15.1729   45.0000    3.0000
    % 23.7340   51.0000    4.0000
    % 23.6630   52.0000    4.0000
    % 22.4108   53.0000    4.0000
    % 21.0697   54.0000    4.0000
    % 20.3200   55.0000    4.0000
    % 28.0275   61.0000    5.0000
    % 28.0699   62.0000    5.0000
    % 27.3365   63.0000    5.0000
    % 25.8270   64.0000    5.0000
    % 24.5141   65.0000    5.0000
    ];
tab = array2table(dat,'VariableNames',{'y','Age','Participant'});
figure
plot(tab.Age, tab.y,'o')
% tab.Participant = nominal(tab.Participant) % doesn't have any effect
m = fitlme(tab,'y ~ Age + (1|Participant)','FitMethod','REML')
  the cyclist
      
      
 on 25 Jun 2024
				Also, if I increase the size of your dataset (still including participants 4 & 5):
dat = repmat(dat,2,1);
and run the model, I also get the sign you expect.
I wonder if MATLAB is somehow getting stuck in a local minimum solution, on the original dataset.
Answers (1)
  the cyclist
      
      
 on 25 Jun 2024
        
      Edited: the cyclist
      
      
 on 25 Jun 2024
  
      Disclaimer: I don't fully understand the specifics on why you are seeing what you are.
Thinking about my comment about being stuck in a local minimum, I remembered that one can randomize the start. Here, I do that, run your code 500 times, and see what age coefficient I get. Clearly, things are a bit dicy.
The model with the negative age coefficient, is definitely better, with a much lower AIC.
rng default
dat = [
    8.7050   21.0000    1.0000
    7.3362   22.0000    1.0000
    6.2369   23.0000    1.0000
    4.8375   24.0000    1.0000
    4.9990   25.0000    1.0000
    13.5644   31.0000    2.0000
    12.5219   32.0000    2.0000
    12.3329   33.0000    2.0000
    11.4778   34.0000    2.0000
    10.0626   35.0000    2.0000
    18.9573   41.0000    3.0000
    17.9516   42.0000    3.0000
    15.1706   43.0000    3.0000
    16.1398   44.0000    3.0000
    15.1729   45.0000    3.0000
    23.7340   51.0000    4.0000
    23.6630   52.0000    4.0000
    22.4108   53.0000    4.0000
    21.0697   54.0000    4.0000
    20.3200   55.0000    4.0000
    28.0275   61.0000    5.0000
    28.0699   62.0000    5.0000
    27.3365   63.0000    5.0000
    25.8270   64.0000    5.0000
    24.5141   65.0000    5.0000
    ];
tab = array2table(dat,'VariableNames',{'y','Age','Participant'});
figure
plot(tab.Age, tab.y,'o')
NS = 200;
[aic,ageCoefficient] = deal(zeros(NS,1));
for ns = 1:NS
% tab.Participant = nominal(tab.Participant) % doesn't have any effect
m = fitlme(tab,'y ~ Age + (1|Participant)','FitMethod','REML','StartMethod','random');
ageCoefficient(ns) = m.Coefficients.Estimate(2);
aic(ns) = m.ModelCriterion.AIC;
end
figure
histogram(ageCoefficient)
xlabel('Age coefficient')
ylabel('Frequency')
figure
histogram(aic)
xlabel('AIC')
ylabel('Frequency')
3 Comments
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


