# Train Classification Ensemble in Parallel

This example shows how to train a classification ensemble in parallel. The model has ten red and ten green base locations, and red and green populations that are normally distributed and centered at the base locations. The objective is to classify points based on their locations. These classifications are ambiguous because some base locations are near the locations of the other color.

Create and plot ten base locations of each color.

```rng default % For reproducibility grnpop = mvnrnd([1,0],eye(2),10); redpop = mvnrnd([0,1],eye(2),10); plot(grnpop(:,1),grnpop(:,2),'go') hold on plot(redpop(:,1),redpop(:,2),'ro') hold off``` Create 40,000 points of each color centered on random base points.

```N = 40000; redpts = zeros(N,2);grnpts = redpts; for i = 1:N grnpts(i,:) = mvnrnd(grnpop(randi(10),:),eye(2)*0.02); redpts(i,:) = mvnrnd(redpop(randi(10),:),eye(2)*0.02); end figure plot(grnpts(:,1),grnpts(:,2),'go') hold on plot(redpts(:,1),redpts(:,2),'ro') hold off``` ```cdata = [grnpts;redpts]; grp = ones(2*N,1); % Green label 1, red label -1 grp(N+1:2*N) = -1;```

Fit a bagged classification ensemble to the data. For comparison with parallel training, fit the ensemble in serial and return the training time.

```tic mdl = fitcensemble(cdata,grp,'Method','Bag'); stime = toc```
```stime = 12.4671 ```

Evaluate the out-of-bag loss for the fitted model.

`myerr = oobLoss(mdl)`
```myerr = 0.0572 ```

Create a bagged classification model in parallel, using a reproducible tree template and parallel substreams. You can create a parallel pool on a cluster or a parallel pool of thread workers on your local machine. To choose the appropriate parallel environment, see Choose Between Thread-Based and Process-Based Environments (Parallel Computing Toolbox).

`parpool`
```Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 8). ans = ProcessPool with properties: Connected: true NumWorkers: 8 Busy: false Cluster: local AttachedFiles: {} AutoAddClientPath: true FileStore: [1x1 parallel.FileStore] ValueStore: [1x1 parallel.ValueStore] IdleTimeout: 30 minutes (30 minutes remaining) SpmdEnabled: true ```
```s = RandStream('mrg32k3a'); options = statset("UseParallel",true,"UseSubstreams",true,"Streams",s); t = templateTree("Reproducible",true); tic mdl2 = fitcensemble(cdata,grp,'Method','Bag','Learners',t,'Options',options); ptime = toc```
```ptime = 5.9234 ```

On this six-core system, the training process in parallel is faster.

`speedup = stime/ptime`
```speedup = 2.1047 ```

Evaluate the out-of-bag loss for this model.

`myerr2 = oobLoss(mdl2)`
```myerr2 = 0.0577 ```

The error rate is similar to the rate of the first model.

To demonstrate the reproducibility of the model, reset the random number stream and fit the model again.

```reset(s); tic mdl2 = fitcensemble(cdata,grp,'Method','Bag','Learners',t,'Options',options); toc```
```Elapsed time is 3.446164 seconds. ```

Check that the loss is the same as the previous loss.

`myerr2 = oobLoss(mdl2)`
```myerr2 = 0.0577 ```