Accelerating the pace of engineering and science

# Documentation

## fimath ProductMode and SumMode

### Example Setup

The examples in the sections of this topic show the differences among the four settings of the ProductMode and SumMode properties:

• FullPrecision

• KeepLSB

• KeepMSB

• SpecifyPrecision

To follow along, first set the following preferences:

```p = fipref;
p.NumericTypeDisplay = 'short';
p.FimathDisplay = 'none';
p.LoggingMode = 'on';
F = fimath('OverflowAction','Wrap',...
'RoundingMethod','Floor',...
'CastBeforeSum',false);
warning off
format compact```

Next, define fi objects a and b. Both have signed 8-bit data types. The fraction length gets chosen automatically for each fi object to yield the best possible precision:

`a = fi(pi, true, 8)`
```a =
3.1563
s8,5
```
`b = fi(exp(1), true, 8)`
```b =
2.7188
s8,5```

### FullPrecision

Now, set ProductMode and SumMode for a and b to FullPrecision and look at some results:

```F.ProductMode = 'FullPrecision';
F.SumMode = 'FullPrecision';
a.fimath = F;
b.fimath = F;
a```
```a =
3.1563			%011.00101
s8,5
```
`b`
```b =
2.7188			%010.10111
s8,5
```
`a*b`
```ans =
8.5811			%001000.1001010011
s16,10
```
`a+b`
```ans =
5.8750			%0101.11100
s9,5```

In FullPrecision mode, the product word length grows to the sum of the word lengths of the operands. In this case, each operand has 8 bits, so the product word length is 16 bits. The product fraction length is the sum of the fraction lengths of the operands, in this case 5 + 5 = 10 bits.

The sum word length grows by one most significant bit to accommodate the possibility of a carry bit. The sum fraction length aligns with the fraction lengths of the operands, and all fractional bits are kept for full precision. In this case, both operands have 5 fractional bits, so the sum has 5 fractional bits.

### KeepLSB

Now, set ProductMode and SumMode for a and b to KeepLSB and look at some results:

```F.ProductMode = 'KeepLSB';
F.ProductWordLength = 12;
F.SumMode = 'KeepLSB';
F.SumWordLength = 12;
a.fimath = F;
b.fimath = F;
a```
```a =
3.1563			%011.00101
s8,5
```
`b`
```b =
2.7188			%010.10111
s8,5
```
`a*b`
```ans =
0.5811			%00.1001010011
s12,10
```
`a+b`
```ans =
5.8750			%0000101.11100
s12,5```

In KeepLSB mode, you specify the word lengths and the least significant bits of results are automatically kept. This mode models the behavior of integer operations in the C language.

The product fraction length is the sum of the fraction lengths of the operands. In this case, each operand has 5 fractional bits, so the product fraction length is 10 bits. In this mode, all 10 fractional bits are kept. Overflow occurs because the full-precision result requires 6 integer bits, and only 2 integer bits remain in the product.

The sum fraction length aligns with the fraction lengths of the operands, and in this model all least significant bits are kept. In this case, both operands had 5 fractional bits, so the sum has 5 fractional bits. The full-precision result requires 4 integer bits, and 7 integer bits remain in the sum, so no overflow occurs in the sum.

### KeepMSB

Now, set ProductMode and SumMode for a and b to KeepMSB and look at some results:

```F.ProductMode = 'KeepMSB';
F.ProductWordLength = 12;
F.SumMode = 'KeepMSB';
F.SumWordLength = 12;
a.fimath = F;
b.fimath = F;
a```
```a =
3.1563			%011.00101
s8,5
```
`b`
```b =
2.7188			%010.10111
s8,5
```
`a*b`
```ans =
8.5781			%001000.100101
s12,6
```
`a+b`
```ans =
5.8750			%0101.11100000
s12,8```

In KeepMSB mode, you specify the word lengths and the most significant bits of sum and product results are automatically kept. This mode models the behavior of many DSP devices where the product and sum are kept in double-wide registers, and the programmer chooses to transfer the most significant bits from the registers to memory after each operation.

The full-precision product requires 6 integer bits, and the fraction length of the product is adjusted to accommodate all 6 integer bits in this mode. No overflow occurs. However, the full-precision product requires 10 fractional bits, and only 6 are available. Therefore, precision is lost.

The full-precision sum requires 4 integer bits, and the fraction length of the sum is adjusted to accommodate all 4 integer bits in this mode. The full-precision sum requires only 5 fractional bits; in this case there are 8, so there is no loss of precision.

This example shows that, in KeepMSB mode the fraction length changes regardless of whether an overflow occurs. The fraction length is set to the amount needed to represent the product in case both terms use the maximum possible value (18+18-16=20 in this example).

```F = fimath('SumMode','KeepMSB','ProductMode','KeepMSB',...
'ProductWordLength',16,'SumWordLength',16);
a = fi(100,1,16,-2,'fimath',F);
a*a```
```ans =

0

DataTypeMode: Fixed-point: binary point scaling
Signedness: Signed
WordLength: 16
FractionLength: -20

RoundingMethod: Nearest
OverflowAction: Saturate
ProductMode: KeepMSB
ProductWordLength: 16
SumMode: KeepMSB
SumWordLength: 16
CastBeforeSum: true
```

### SpecifyPrecision

Now set ProductMode and SumMode for a and b to SpecifyPrecision and look at some results:

```F.ProductMode = 'SpecifyPrecision';
F.ProductWordLength = 8;
F.ProductFractionLength = 7;
F.SumMode = 'SpecifyPrecision';
F.SumWordLength = 8;
F.SumFractionLength = 7;
a.fimath = F;
b.fimath = F;
a```
```a =
3.1563			%011.00101
s8,5
```
`b`
```b =
2.7188			%010.10111
s8,5
```
`a*b`
```ans =
0.5781			%0.1001010
s8,7
```
`a+b`
```ans =
-0.1250			%1.1110000
s8,7```

In SpecifyPrecision mode, you must specify both word length and fraction length for sums and products. This example unwisely uses fractional formats for the products and sums, with 8-bit word lengths and 7-bit fraction lengths.

The full-precision product requires 6 integer bits, and the example specifies only 1, so the product overflows. The full-precision product requires 10 fractional bits, and the example only specifies 7, so there is precision loss in the product.

The full-precision sum requires 4 integer bits, and the example specifies only 1, so the sum overflows. The full-precision sum requires 5 fractional bits, and the example specifies 7, so there is no loss of precision in the sum.