Error in generalized linear mixed-effects model with poisson distribution.

17 views (last 30 days)
Dear All
I have daily measurements for four days (day 0 to 3) from patients. I would like to fit a glme with a poisson distribution with the formula "Measurement ~ Day*Group + (1|Patient)".
The code I use is:
mdl = fitglme(tbl,"Measurement ~ Day*Group + (1|Patient)",'Distribution','Poisson')
I receive the following error message:
Error using classreg.regr.lmeutils.StandardGeneralizedLinearMixedModel/validateyRange
For Poisson distribution, the responses must be non-negative, 'Weights' must be positive integers and the product of responses and 'Weights' must be integers.
Error in classreg.regr.lmeutils.StandardGeneralizedLinearMixedModel (line 4193)
validateyRange(sglme,sglme.y,binomialsize,weights,distribution);
Error in GeneralizedLinearMixedModel/fitStandardLMEModel (line 1319)
slme = classreg.regr.lmeutils.StandardGeneralizedLinearMixedModel(X,model.y,Zs,Psi,model.FitMethod,dofit,dostats,args{:});
Error in GeneralizedLinearMixedModel/fitter (line 893)
model.slme = fitStandardLMEModel(model);
Error in classreg.regr.FitObject/doFit (line 94)
model = fitter(model);
Error in GeneralizedLinearMixedModel.fit (line 2419)
model = doFit(model);
Error in fitglme (line 398)
glme = GeneralizedLinearMixedModel.fit(T,formula,varargin{:});
If I fit the same data without the mixed-effects (fitglm) using the formula "Measurement ~ Day*Group", I do not encounter any problems:
mdl = fitglm(tbl,"Measurement ~ Day*Group",'Distribution','Poisson')
I don't understand the reason for this error message and how to fix, so any help would be highly appreciated.
Thank you very much
Kind regards
Kaspar
  4 Comments
the cyclist
the cyclist on 28 Oct 2023
Can you upload the data? You can use the paper clip icon in the INSERT section of the toolbar.
Kaspar Bachmann
Kaspar Bachmann on 28 Oct 2023
Hi, thanks for the reply. I have created some code that simulates data exactly to my own. This makes it easier to manipulate the single components of the data structure. I was able to reproduce the error with the sample code and the models run on the code. I chose rng('default') in order to ensure reproducibility. Thanks for any help / suggestions on why this error appears and where I went wrong.
Best wishes
K
% Simulation of data with poisson distribution
% Simulation is for 100 patients in two groups A and B
% Each patient has four measurements (1 per day)
% There is daily trend added to the data
% The data is sampled from a random poisson distribution
clear
close all
% Default rng for reproducibility
rng('Default')
% Create distribution
pd = makedist('Poisson','lambda',4);
% Create patient index
Patients = categorical(1:100).';
% Create patient groups
Groups = categorical([zeros(50,1);ones(50,1)],[0:1],["A","B"]);
% Create Days 0 - 3
Days = [zeros(100,1);ones(100,1);2.*ones(100,1);3.*ones(100,1)];
% Create daily trend for different groups
DailyReductionA = [1,0.54,0.44,0.39];
DailyReductionB = [1, 1.1, 0.89, 0.85];
%Create empty data vector
Data = [];
% Create for loop to create data for groups A and B
for i =1:4
% Create positive Data from random distribution with two
% different factors and interecept per group
A = 0.238.*random(pd,50,1)+0.3;
B = 0.543.*random(pd,50,1)+0.6;
% Add daily trend
DataA = DailyReductionA(i).*A;
DataB = DailyReductionB(i).*B;
% Store data
Data = [Data;DataA;DataB];
end
% Create table
tbl = table(repmat(Patients,4,1),repmat(Groups,4,1),Data,Days,'VariableNames',["Patients","Group","Data","Days"]);
head(tbl)
Patients Group Data Days ________ _____ _____ ____ 1 A 1.014 0 2 A 1.49 0 3 A 1.014 0 4 A 1.014 0 5 A 0.776 0 6 A 1.49 0 7 A 1.252 0 8 A 1.728 0
% Visualize data
figure
boxchart(tbl.Days,tbl.Data,'GroupByColor',tbl.Group)
xlabel("Days")
ylabel("Data")
xticks(0:3)
xticklabels(["Day 0","Day 1","Day 2","Day 3"])
% Create glm with poisson distribution
mdl1 = fitglm(tbl,"Data ~ Days*Group",'Distribution','Poisson')
mdl1 =
Generalized linear regression model: log(Data) ~ 1 + Group*Days Distribution = Poisson Estimated Coefficients: Estimate SE tStat pValue ________ ________ _______ __________ (Intercept) 0.13701 0.11743 1.1668 0.24331 Group_B 0.90112 0.13724 6.5659 5.1727e-11 Days -0.33424 0.076801 -4.3521 1.3483e-05 Group_B:Days 0.29246 0.086064 3.3982 0.00067839 400 observations, 396 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 249, p-value = 1.1e-53
% Create glme with poisson distribution and random effects for patient to
% account for repeated measurement
mdl2 = fitglme(tbl, "Data ~ Days*Group + (1|Patients)",'Distribution','Poisson')
Error using classreg.regr.lmeutils.StandardGeneralizedLinearMixedModel/validateyRange
For Poisson distribution, the responses must be non-negative, 'Weights' must be positive integers and the product of responses and 'Weights' must be integers.

Error in classreg.regr.lmeutils.StandardGeneralizedLinearMixedModel (line 4193)
validateyRange(sglme,sglme.y,binomialsize,weights,distribution);

Error in GeneralizedLinearMixedModel/fitStandardLMEModel (line 1319)
slme = classreg.regr.lmeutils.StandardGeneralizedLinearMixedModel(X,model.y,Zs,Psi,model.FitMethod,dofit,dostats,args{:});

Error in GeneralizedLinearMixedModel/fitter (line 893)
model.slme = fitStandardLMEModel(model);

Error in classreg.regr.FitObject/doFit (line 94)
model = fitter(model);

Error in GeneralizedLinearMixedModel.fit (line 2419)
model = doFit(model);

Error in fitglme (line 398)
glme = GeneralizedLinearMixedModel.fit(T,formula,varargin{:});

Sign in to comment.

Answers (1)

Jeff Miller
Jeff Miller on 29 Oct 2023
I haven't used Poisson models myself, but I thought they required the values of the dependent variable to be non-negative integers. Your data values are reals. Note that your script runs find if you add (for example) the two extra lines below to make the data values integers:
% Add daily trend
DataA = DailyReductionA(i).*A;
DataB = DailyReductionB(i).*B;
% Added these two lines to make sure the data are integers:
DataA = round(10*DataA);
DataB = round(10*DataB);
% Store data
Data = [Data;DataA;DataB];
According to the data model underlying this type of analysis, I think that the factors, intercepts, and trends, etc influence the lamba parameter values of the different conditions--not the data values as you have them being created in your script. Thus, the final random values generated in the various conditions (with their different lambdas) would all be integers.
This doesn't explain why fitglm works even with non-integer data values; maybe it just doesn't check as thoroughly as fitglme. And fitglme's error message could be improved by adding "integers" after "must be non-negative".
Maybe you shouldn't be using a Poisson-based model if your measured data values are not integers?
  2 Comments
Kaspar Bachmann
Kaspar Bachmann on 29 Oct 2023
Hi
Thanks for the reply, from what I understand the data has to be non-negative, but not integers. If I add weights, then the weights and weights*responses have to be integers. I wonder whether fitglme adds a row of ones with the height of tbl as weights and thus produces this error while fitglm runs without weights? This would explain the error message. Otherwise I see no reason why fitglm should run on the dataset amd fitglme does not; and from what my understanding of poisson regression tells me, a regression with non-integer data should be possible in theory. However, if you had a similar (non-normal) measurement distribution, what approach would you take other than poisson? Is linear regression robust enough? Best wishes and thanks again
Jeff Miller
Jeff Miller on 30 Oct 2023
Sorry, if the integer issue isn't the problem, then I don't have any other idea. (I still think it is pretty diagnostic that fitglme runs find if the data values are converted integers, though. It looks to me like fitglm supports some more generalized theory that allows non-integer values of the "Poisson" DV, but fitglme doesn't.)
I don't know of any general approach that works for non-normal distributions. It depends on what the DV is and why/how it is non-normal. For example, various different transformations (log, sqrt, inverse, etc) can be used to make DVs more normal. Typically the people working with a given DV have adopted some specific approach that seems to work well for that DV.
Sorry I couldn't be more help.

Sign in to comment.

Products


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!