Main Content

## Performing Fixed-Point Arithmetic

### Fixed-Point Arithmetic

#### Addition and subtraction

Whenever you add two fixed-point numbers, you may need a carry bit to correctly represent the result. For this reason, when adding two B-bit numbers (with the same scaling), the resulting value has an extra bit compared to the two operands used.

```a = fi(0.234375,0,4,6); c = a+a```
```c = 0.4688 DataTypeMode: Fixed-point: binary point scaling Signedness: Unsigned WordLength: 5 FractionLength: 6```
`a.bin`
```ans = 1111```
`c.bin`
```ans = 11110```

If you add or subtract two numbers with different precision, the radix point first needs to be aligned to perform the operation. The result is that there is a difference of more than one bit between the result of the operation and the operands.

```a = fi(pi,1,16,13); b = fi(0.1,1,12,14); c = a + b```
```c = 3.2416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 18 FractionLength: 14```

#### Multiplication

In general, a full precision product requires a word length equal to the sum of the word length of the operands. In the following example, note that the word length of the product `c` is equal to the word length of `a` plus the word length of `b`. The fraction length of `c` is also equal to the fraction length of `a` plus the fraction length of `b`.

`a = fi(pi,1,20), b = fi(exp(1),1,16)`
```a = 3.1416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 20 FractionLength: 17 b = 2.7183 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 16 FractionLength: 13```
`c = a*b`
```c = 8.5397 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 36 FractionLength: 30```

#### Math with other built in data types

Note that in C, the result of an operation between an integer data type and a double data type promotes to a double. However, in MATLAB®, the result of an operation between a built-in integer data type and a double data type is an integer. In this respect, the `fi` object behaves like the built-in integer data types in MATLAB.

When doing addition between `fi` and `double`, the double is cast to a `fi` with the same numerictype as the `fi` input. The result of the operation is a `fi`. When doing multiplication between `fi` and `double`, the double is cast to a `fi` with the same word length and signedness of the `fi`, and best precision fraction length. The result of the operation is a `fi`.

```a = fi(pi); ```
```a = 3.1416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 16 FractionLength: 13```
`b = 0.5 * a`
```b = 1.5708 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 32 FractionLength: 28```

When doing arithmetic between a `fi` and one of the built-in integer data types, `[u]int[8, 16, 32]`, the word length and signedness of the integer are preserved. The result of the operation is a `fi`.

```a = fi(pi); b = int8(2) * a```
```b = 6.2832 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 24 FractionLength: 13```

When doing arithmetic between a `fi` and a logical data type, the logical is treated as an unsigned `fi` object with a value of 0 or 1, and word length 1. The result of the operation is a `fi` object.

```a = fi(pi); b = logical(1); c = a*b```
```c = 3.1416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 17 FractionLength: 13```

### The fimath Object

`fimath` properties define the rules for performing arithmetic operations on `fi` objects, including math, rounding, and overflow properties. A `fi` object can have a local `fimath` object, or it can use the default `fimath` properties. You can attach a `fimath` object to a `fi` object by using `setfimath`. Alternatively, you can specify `fimath` properties in the `fi` constructor at creation. When a `fi` object has a local `fimath` , rather than using the default properties, the display of the `fi` object shows the `fimath` properties. In this example, `a` has the `ProductMode` property specified in the constructor.

` a = fi(5,1,16,4,'ProductMode','KeepMSB')`
```a = 5 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 16 FractionLength: 4 RoundingMethod: Nearest OverflowAction: Saturate ProductMode: KeepMSB ProductWordLength: 32 SumMode: FullPrecision```
The `ProductMode` property of `a` is set to `KeepMSB` while the remaining `fimath` properties use the default values.

Note

For more information on the `fimath` object, its properties, and their default values, see fimath Object Properties.

### Bit Growth

The following table shows the bit growth of `fi` objects, `A` and `B`, when their `SumMode` and `ProductMode` properties use the default `fimath` value, `FullPrecision`.

ABSum = A+BProd = A*B
Format`fi(vA,s1,w1,f1)``fi(vB,s2,w2,f2)`
Sign`s1``s2``Ssum` = (`s1`||`s2`)`Sproduct` = (`s1`||`s2`)
Integer bits```I1 = w1-f1-s1``````I2= w2-f2-s2``````Isum = max(w1-f1, w2-f2) + 1 - Ssum``````Iproduct = (w1 + w2) - (f1 + f2)```
Fraction bits`f1``f2````Fsum = max(f1, f2) ``````Fproduct = f1 + f2```
Total bits`w1``w2````Ssum + Isum + Fsum``````w1 + w2```

This example shows how bit growth can occur in a `for`-loop.

```T.acc = fi([],1,32,0); T.x = fi([],1,16,0); x = cast(1:3,'like',T.x); acc = zeros(1,1,'like',T.acc); for n = 1:length(x) acc = acc + x(n) end```
```acc = 1 s33,0 acc = 3 s34,0 acc = 6 s35,0```
The word length of `acc` increases with each iteration of the loop. This increase causes two problems: One is that code generation does not allow changing data types in a loop. The other is that, if the loop is long enough, you run out of memory in MATLAB. See Controlling Bit Growth for some strategies to avoid this problem.

### Controlling Bit Growth

#### Using fimath

By specifying the `fimath` properties of a `fi` object, you can control the bit growth as operations are performed on the object.

```F = fimath('SumMode', 'SpecifyPrecision', 'SumWordLength', 8,... 'SumFractionLength', 0); a = fi(8,1,8,0, F); b = fi(3, 1, 8, 0); c = a+b```
```c = 11 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 8 FractionLength: 0 RoundingMethod: Nearest OverflowAction: Saturate ProductMode: FullPrecision SumMode: SpecifyPrecision SumWordLength: 8 SumFractionLength: 0 CastBeforeSum: true```

The `fi` object `a` has a local `fimath` object `F`. `F` specifies the word length and fraction length of the sum. Under the default `fimath` settings, the output, `c`, normally has word length 9, and fraction length 0. However because `a` had a local `fimath` object, the resulting `fi` object has word length 8 and fraction length 0.

You can also use `fimath` properties to control bit growth in a `for`-loop.

```F = fimath('SumMode', 'SpecifyPrecision','SumWordLength',32,... 'SumFractionLength',0); T.acc = fi([],1,32,0,F); T.x = fi([],1,16,0); x = cast(1:3,'like',T.x); acc = zeros(1,1,'like',T.acc); for n = 1:length(x) acc = acc + x(n) end```
```acc = 1 s32,0 acc = 3 s32,0 acc = 6 s32,0```

Unlike when `T.acc` was using the default `fimath` properties, the bit growth of `acc` is now restricted. Thus, the word length of `acc` stays at 32.

#### Subscripted Assignment

Another way to control bit growth is by using subscripted assignment. `a(I) = b` assigns the values of `b` into the elements of `a` specified by the subscript vector, `I`, while retaining the `numerictype` of `a`.

```T.acc = fi([],1,32,0); T.x = fi([],1,16,0); x = cast(1:3,'like',T.x); acc = zeros(1,1,'like',T.acc); % Assign in to acc without changing its type for n = 1:length(x) acc(:) = acc + x(n) end```

acc (:) = acc + x(n) dictates that the values at subscript vector, `(:)`, change. However, the `numerictype` of output `acc` remains the same. Because `acc` is a scalar, you also receive the same output if you use `(1)` as the subscript vector.

``` for n = 1:numel(x) acc(1) = acc + x(n); end ```
```acc = 1 s32,0 acc = 3 s32,0 acc = 6 s32,0```

The `numerictype` of `acc` remains the same at each iteration of the `for`-loop.

Subscripted assignment can also help you control bit growth in a function. In the function, `cumulative_sum`, the `numerictype` of `y` does not change, but the values in the elements specified by n do.

```function y = cumulative_sum(x) % CUMULATIVE_SUM Cumulative sum of elements of a vector. % % For vectors, Y = cumulative_sum(X) is a vector containing the % cumulative sum of the elements of X. The type of Y is the type of X. y = zeros(size(x),'like',x); y(1) = x(1); for n = 2:length(x) y(n) = y(n-1) + x(n); end end ```
`y = cumulative_sum(fi([1:10],1,8,0))`
```y = 1 3 6 10 15 21 28 36 45 55 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 8 FractionLength: 0```

Note

For more information on subscripted assignment, see the `subsasgn` function.

#### accumpos and accumneg

Another way you can control bit growth is by using the `accumpos` and `accumneg` functions to perform addition and subtraction operations. Similar to using subscripted assignment, `accumpos` and `accumneg` preserve the data type of one of its input `fi` objects while allowing you to specify a rounding method, and overflow action in the input values.

For more information on how to implement `accumpos` and `accumneg`, see Avoid Multiword Operations in Generated Code

### Overflows and Rounding

When performing fixed-point arithmetic, consider the possibility and consequences of overflow. The `fimath` object specifies the overflow and rounding modes used when performing arithmetic operations.

#### Overflows

Overflows can occur when the result of an operation exceeds the maximum or minimum representable value. The `fimath` object has an `OverflowAction` property which offers two ways of dealing with overflows: saturation and wrap. If you set `OverflowAction` to `saturate`, overflows are saturated to the maximum or minimum value in the range. If you set `OverflowAction` to `wrap`, any overflows wrap using modulo arithmetic, if unsigned, or two’s complement wrap, if signed.

For more information on how to detect overflow see Underflow and Overflow Logging Using fipref.

#### Rounding

There are several factors to consider when choosing a rounding method, including cost, bias, and whether or not there is a possibility of overflow. Fixed-Point Designer™ software offers several different rounding functions to meet the requirements of your design.

Rounding Method DescriptionCostBiasPossibility of Overflow
`ceil` Rounds to the closest representable number in the direction of positive infinity.LowLarge positiveYes
`convergent`Rounds to the closest representable number. In the case of a tie, `convergent` rounds to the nearest even number. This approach is the least-biased rounding method provided by the toolbox.HighUnbiasedYes
`floor`Rounds to the closest representable number in the direction of negative infinity, equivalent to two’s complement truncation.LowLarge negativeNo
`nearest`Rounds to the closest representable number. In the case of a tie, `nearest` rounds to the closest representable number in the direction of positive infinity. This rounding method is the default for `fi` object creation and `fi` arithmetic.ModerateSmall positiveYes
`round`Rounds to the closest representable number. In the case of a tie, the `round` method rounds:
• Positive numbers to the closest representable number in the direction of positive infinity.

• Negative numbers to the closest representable number in the direction of negative infinity.

High
• Small negative for negative samples

• Unbiased for samples with evenly distributed positive and negative values

• Small positive for positive samples

Yes
`fix`Rounds to the closest representable number in the direction of zero.Low
• Large positive for negative samples

• Unbiased for samples with evenly distributed positive and negative values

• Large negative for positive samples

No

## Support Get trial now