Regression tree and prediction equation

Suppose i have 3 independent variables A,B and C and dependent variable T. The variable A is discrete and B,C are continuous. The output variable T is also continuous. In such situation we need to create Regression tree. How can we generate prediction equation for such regression tree in MATLAB?
E.g.
A=[ 50 75 100 125 150 175 ];
B=[ 0.45 0.55 0.75 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1];
C=[3 4 5 6 7 8 9 10 11 12 13 14 15 16 ];
T= [ 1.2 1.8 2.1 2.3 2.5 2.7 2.8 3.1 3.2 3.3];

5 Comments

You can set a categorical variable in fitlim in the Statistics TB, See fitlm for details on model specification. But you don't have sufficient data to fit as given -- you've got
A=categorical([ 50 75 100 125 150 175 ]);
B=[ 0.45 0.55 0.75 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1];
C=[3 4 5 6 7 8 9 10 11 12 13 14 15 16 ];
T= [ 1.2 1.8 2.1 2.3 2.5 2.7 2.8 3.1 3.2 3.3];
>> whos A B C T
Name Size Bytes Class Attributes
A 1x6 664 categorical
B 1x14 112 double
C 1x14 112 double
T 1x10 80 double
>>
You're missing definitions of what the indepedent variable values are for the specific 10 responses -- it doesn't matter what other levels there may be if you don't have any data to be able to include them in the model.
Like @dpb said, you need to have a value of A for everyone in T if you're going to make a model.
A=[ 50 75 100 125 150 175 ];
B=[ 0.45 0.55 0.75 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1];
C=[3 4 5 6 7 8 9 10 11 12 13 14 15 16 ];
T= [ 1.2 1.8 2.1 2.3 2.5 2.7 2.8 3.1 3.2 3.3];
tPredictors = [A(:), B(:), C(:)]
Error using horzcat
Dimensions of arrays being concatenated are not consistent.
trueResponse = T(:)
You have only 6 A values but 14 for the others.
If you do supply all the A values (one A for each T), then you can use the Regression Learner app on the Apps tab of the Tool Ribbon.
As mentioned in previous comments, you need to resolve the fact that you have different length vectors for your variables. But, after you resolve that, it is easy to build a regression tree model in MATLAB, using the fitrtree function.
It's not clear (to me) what you mean by "discrete" for A. Do you prefer to treat that variable as categorical, ordinal, or continuous? (Ordinal is tricky, but the other two are easy to handle.)
Yes the length of each variable should be same. Say it is 6 for each variable ( take first 6 values ). I want to consider A as categorical variable. When one of the input is categorical, we can't use Multiple regression but instead use Regression tree. How can i predict T using MATLAB?
As noted above, the MATLAB fitlm know how to handle the categorical variables automagically.
A=[ 50 75 100 125 150 175 ];
B=[ 0.45 0.55 0.75 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1];
C=[3 4 5 6 7 8 9 10 11 12 13 14 15 16 ];
T= [ 1.2 1.8 2.1 2.3 2.5 2.7 2.8 3.1 3.2 3.3];
tABC=array2table([A;B(1:numel(A));C(1:numel(A));T(1:numel(A))].','VariableNames',{'A','B','C','T'})
tABC = 6×4 table
A B C T ___ ____ _ ___ 50 0.45 3 1.2 75 0.55 4 1.8 100 0.75 5 2.1 125 0.8 6 2.3 150 0.9 7 2.5 175 1 8 2.7
mdl=fitlm(tABC,'categorical',{'A'})
Warning: Regression design matrix is rank deficient to within machine precision.
mdl =
Linear regression model: T ~ 1 + A + B + C Estimated Coefficients: Estimate SE tStat pValue ________ __ _____ ______ (Intercept) 0.3 0 Inf NaN A_75 0.3 0 Inf NaN A_100 0.3 0 Inf NaN A_125 0.2 0 Inf NaN A_150 0.1 0 Inf NaN A_175 0 0 NaN NaN B 0 0 NaN NaN C 0.3 0 Inf NaN Number of observations: 6, Error degrees of freedom: 0 R-squared: 1, Adjusted R-Squared: NaN F-statistic vs. constant model: NaN, p-value = NaN
While it runs, the toy dataset is deficient in that the three independent variables are all almost exact linear combinations of the first so there's only one of the three that is estimable...observe
corrcoef(tABC{:,:})
ans = 4×4
1.0000 0.9876 1.0000 0.9694 0.9876 1.0000 0.9876 0.9770 1.0000 0.9876 1.0000 0.9694 0.9694 0.9770 0.9694 1.0000

Sign in to comment.

 Accepted Answer

This model is probably nonsense, because of the linear dependencies that @dpb points out. But perhaps your real data will yield a useful model. (Note that I transposed all your variables before putting them in a table.)
A = [ 50 75 100 125 150 175 ]';
Acat = categorical(A);
B = [ 0.45 0.55 0.75 0.8 0.9 1]';
C = [3 4 5 6 7 8]';
T= [ 1.2 1.8 2.1 2.3 2.5 2.7]';
tbl = table(Acat,B,C,T);
mdl=fitrtree(tbl,"T ~ Acat + B + C")
mdl =
RegressionTree PredictorNames: {'Acat' 'B' 'C'} ResponseName: 'T' CategoricalPredictors: 1 ResponseTransform: 'none' NumObservations: 6 Properties, Methods

2 Comments

Yes the data set for each variable has 400 elements. A (categorical variable) has the mentioned 6 values kept repeating. The range of B is 0.5 to 2 while the range of C is 3 to 23. The range of T is 2 to 7.
A=400x1,B=400x1,C=400x1,T=400x1
Now i need a prediction model which can predict T using Regression Tree in Matlab.
The model the way I specified it should do what you want. You can then use that model's predict method to predict T for new values.

Sign in to comment.

More Answers (0)

Products

Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!