# Create Categorical Arrays

This example shows how to create categorical arrays from various types of input data and modify their elements. The `categorical`

data type stores values from a finite set of discrete categories. You can create a categorical array from a numeric array, logical array, string array, or cell array of character vectors. The unique values from the input array become the categories of the categorical array. A categorical array provides efficient storage and convenient manipulation of data while also maintaining meaningful names for the values.

By default, the categories of a categorical array do not have a mathematical ordering. For example, the discrete set of pet categories `["dog" "cat" "bird"]`

has no meaningful mathematical ordering, so MATLAB® uses the alphabetical ordering `["bird" "cat" "dog"]`

. But you can also create ordinal categorical arrays, in which the categories do have meaningful mathematical orderings. For example, the discrete set of size categories `["small" "medium" "large"]`

can have the mathematical ordering of `small < medium < large`

. Ordinal categorical arrays enable you to make comparisons between their elements.

### Create Categorical Array from Input Array

To create a categorical array from an input array, use the `categorical`

function.

For example, create a string array whose elements are all states from New England. Notice that some of the strings have leading and trailing spaces.

statesNE = ["MA" "ME" " CT" "VT" " ME " "NH" "VT" "MA" "NH" "CT" "RI "]

`statesNE = `*1x11 string*
"MA" "ME" " CT" "VT" " ME " "NH" "VT" "MA" "NH" "CT" "RI "

Convert the string array to a categorical array. When you create categorical arrays from string arrays (or cell arrays of character vectors), leading and trailing spaces are removed.

statesNE = categorical(statesNE)

`statesNE = `*1x11 categorical*
MA ME CT VT ME NH VT MA NH CT RI

List the categories of `statesNE`

by using the `categories`

function. Every element of `statesNE`

belongs to one of these categories. Because `statesNE`

has six unique states, there are six categories. The categories are listed in alphabetical order because the state abbreviations have no mathematical ordering.

categories(statesNE)

`ans = `*6x1 cell*
{'CT'}
{'MA'}
{'ME'}
{'NH'}
{'RI'}
{'VT'}

### Add and Modify Elements

To add one element to a categorical array, you can assign text that represents a category name. For example, add a state to `statesNE`

.

`statesNE(12) = "ME"`

`statesNE = `*1x12 categorical*
MA ME CT VT ME NH VT MA NH CT RI ME

To add or modify multiple elements, you must assign a categorical array.

statesNE(1:3) = categorical(["RI" "VT" "MA"])

`statesNE = `*1x12 categorical*
RI VT MA VT ME NH VT MA NH CT RI ME

### Add Missing Values as Undefined Elements

You can assign missing values as undefined elements of a categorical array. An undefined categorical value does not belong to any category, similar to `NaN`

(Not-a-Number) in numeric arrays.

To assign missing values, use the `missing`

function. For example, modify the first element of the categorical array to be a missing value.

statesNE(1) = missing

`statesNE = `*1x12 categorical*
<undefined> VT MA VT ME NH VT MA NH CT RI ME

Assign two missing values at the end of the categorical array.

statesNE(12:13) = [missing missing]

`statesNE = `*1x13 categorical*
<undefined> VT MA VT ME NH VT MA NH CT RI <undefined> <undefined>

If you convert a string array to a categorical array, then missing strings and empty strings become undefined elements in the categorical array. If you convert a numeric array, then `NaN`

s become undefined elements. Therefore, assigning missing strings, `""`

, `''`

, or `NaN`

s to elements of a categorical array converts them to undefined categorical values.

`statesNE(2) = ""`

`statesNE = `*1x13 categorical*
<undefined> <undefined> MA VT ME NH VT MA NH CT RI <undefined> <undefined>

### Create Ordinal Categorical Array from String Array

In an ordinal categorical array, the order of the categories defines a mathematical order that enables comparisons. Because of this mathematical order, you can compare elements of an ordinal categorical array using relational operators. You cannot compare elements of categorical arrays that are not ordinal.

For example, create a string array that contains the sizes of eight objects.

AllSizes = ["medium" "large" "small" "small" "medium" ... "large" "medium" "small"];

The string array has three unique values: `"large"`

, `"medium"`

, and `"small"`

. A string array has no convenient way to indicate that `small < medium < large`

.

Convert the string array to an ordinal categorical array. Define the categories as `small`

, `medium`

, and `large`

, in that order. For an ordinal categorical array, the first category specified is the smallest and the last category is the largest.

valueset = ["small" "medium" "large"]; sizeOrd = categorical(AllSizes,valueset,"Ordinal",true)

`sizeOrd = `*1x8 categorical*
medium large small small medium large medium small

The order of the values in the categorical array, `sizeOrd`

, remains unchanged.

List the discrete categories in `sizeOrd`

. The order of the categories matches their mathematical ordering `small < medium < large`

.

categories(sizeOrd)

`ans = `*3x1 cell*
{'small' }
{'medium'}
{'large' }

### Create Ordinal Categorical Array by Binning Numeric Data

If you have an array with continuous numeric data, specifying numeric ranges as categories can be useful. In such cases, bin the data using the `discretize`

function. Assign category names to the bins.

For example, create a vector of 100 random numbers between 0 and 50.

x = rand(100,1)*50

`x = `*100×1*
40.7362
45.2896
6.3493
45.6688
31.6180
4.8770
13.9249
27.3441
47.8753
48.2444
⋮

Use `discretize`

to create a categorical array by binning the values of `x`

. Put all the values between 0 and 15 in the first bin, all the values between 15 and 35 in the second bin, and all the values between 35 and 50 in the third bin. Each bin includes the left endpoint but does not include the right endpoint, except the last bin.

catnames = ["small" "medium" "large"]; binnedData = discretize(x,[0 15 35 50],"categorical",catnames)

`binnedData = `*100x1 categorical*
large
large
small
large
medium
small
small
medium
large
large
small
large
large
medium
large
small
medium
large
large
large
medium
small
large
large
medium
large
large
medium
medium
small
⋮

`binnedData`

is an ordinal categorical array with three categories, such that `small < medium < large`

.

To display the number of elements in each category, use the `summary`

function.

summary(binnedData)

binnedData: 100x1 ordinal categorical small 30 medium 35 large 35 <undefined> 0 Additional statistics: Min small Median medium Max large

You can make various kinds of charts of the binned data. For example, make a pie chart of `binnedData`

.

pie(binnedData)

### Preallocate Categorical Array

You can preallocate a categorical array of any size by creating an array of `NaN`

s and converting it to a categorical array. After you preallocate the array, you can initialize its categories by adding the category names to the array.

For example, create a 2-by-4 array of `NaN`

s.

A = NaN(2,4)

`A = `*2×4*
NaN NaN NaN NaN
NaN NaN NaN NaN

Then convert the array of `NaN`

s to a categorical array of undefined categorical values.

A = categorical(A)

`A = `*2x4 categorical*
<undefined> <undefined> <undefined> <undefined>
<undefined> <undefined> <undefined> <undefined>

At this point, `A`

has no categories.

categories(A)

ans = 0x0 empty cell array

Add `small`

, `medium`

, and `large`

categories to `A`

is by using the `addcats`

function.

A = addcats(A,["small" "medium" "large"])

`A = `*2x4 categorical*
<undefined> <undefined> <undefined> <undefined>
<undefined> <undefined> <undefined> <undefined>

While the elements of `A`

are still undefined values, the categories of `A`

are defined.

categories(A)

`ans = `*3x1 cell*
{'small' }
{'medium'}
{'large' }

Now that `A`

has categories, you can assign defined categorical values as elements of `A`

.

A(1) = "medium"; A(8) = "small"; A(3:5) = "large"

`A = `*2x4 categorical*
medium large large <undefined>
<undefined> large <undefined> small

## See Also

`categorical`

| `categories`

| `discretize`

| `summary`

| `addcats`

| `missing`