Regression tree and prediction equation
Show older comments
Suppose i have 3 independent variables A,B and C and dependent variable T. The variable A is discrete and B,C are continuous. The output variable T is also continuous. In such situation we need to create Regression tree. How can we generate prediction equation for such regression tree in MATLAB?
E.g.
A=[ 50 75 100 125 150 175 ];
B=[ 0.45 0.55 0.75 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1];
C=[3 4 5 6 7 8 9 10 11 12 13 14 15 16 ];
T= [ 1.2 1.8 2.1 2.3 2.5 2.7 2.8 3.1 3.2 3.3];
5 Comments
dpb
on 3 Nov 2022
You can set a categorical variable in fitlim in the Statistics TB, See fitlm for details on model specification. But you don't have sufficient data to fit as given -- you've got
A=categorical([ 50 75 100 125 150 175 ]);
B=[ 0.45 0.55 0.75 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1];
C=[3 4 5 6 7 8 9 10 11 12 13 14 15 16 ];
T= [ 1.2 1.8 2.1 2.3 2.5 2.7 2.8 3.1 3.2 3.3];
>> whos A B C T
Name Size Bytes Class Attributes
A 1x6 664 categorical
B 1x14 112 double
C 1x14 112 double
T 1x10 80 double
>>
You're missing definitions of what the indepedent variable values are for the specific 10 responses -- it doesn't matter what other levels there may be if you don't have any data to be able to include them in the model.
A=[ 50 75 100 125 150 175 ];
B=[ 0.45 0.55 0.75 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1];
C=[3 4 5 6 7 8 9 10 11 12 13 14 15 16 ];
T= [ 1.2 1.8 2.1 2.3 2.5 2.7 2.8 3.1 3.2 3.3];
tPredictors = [A(:), B(:), C(:)]
trueResponse = T(:)
You have only 6 A values but 14 for the others.
If you do supply all the A values (one A for each T), then you can use the Regression Learner app on the Apps tab of the Tool Ribbon.
the cyclist
on 4 Nov 2022
As mentioned in previous comments, you need to resolve the fact that you have different length vectors for your variables. But, after you resolve that, it is easy to build a regression tree model in MATLAB, using the fitrtree function.
It's not clear (to me) what you mean by "discrete" for A. Do you prefer to treat that variable as categorical, ordinal, or continuous? (Ordinal is tricky, but the other two are easy to handle.)
Danish Nasir
on 4 Nov 2022
A=[ 50 75 100 125 150 175 ];
B=[ 0.45 0.55 0.75 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1];
C=[3 4 5 6 7 8 9 10 11 12 13 14 15 16 ];
T= [ 1.2 1.8 2.1 2.3 2.5 2.7 2.8 3.1 3.2 3.3];
tABC=array2table([A;B(1:numel(A));C(1:numel(A));T(1:numel(A))].','VariableNames',{'A','B','C','T'})
mdl=fitlm(tABC,'categorical',{'A'})
While it runs, the toy dataset is deficient in that the three independent variables are all almost exact linear combinations of the first so there's only one of the three that is estimable...observe
corrcoef(tABC{:,:})
Accepted Answer
More Answers (0)
Categories
Find more on Linear Regression in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!