FAQ: What's the difference between the two Simulink Precision Loss Diagnostics for Parameters and Fixed-Point Constants?

7 views (last 30 days)
What is the difference between the Detect precision loss that applies to fixed-point constants and the Detect precision loss for Parameters from the Diagnostics -> Data validity?

Accepted Answer

Andy Bartlett
Andy Bartlett on 16 Nov 2023
Edited: Andy Bartlett on 20 Nov 2023
Simulink's detect parameter precision loss diagnostic applys to run-time parameter values. For example, a Gain blocks gain parameter value or a Saturation blocks upper and lower limit parameter values.
The diagnostic is about precision loss when converting from the data type used when entering the parameter value on the dialog and value obtained when quantizing the dialog value to the data type used by the run-time parameter in simulation, code generation, etc.
For example, suppose the parameter entered on the dialog was the double precision floating-point approximation of pi
format long
paramDialogValue = pi
paramDialogValue =
3.141592653589793
Now suppose the run-time data type for the parameter is single
paramRunTimeValueSingle = single(paramDialogValue)
paramRunTimeValueSingle = single
3.1415927
quantErrSingle = double(paramRunTimeValueSingle) - double(paramDialogValue)
quantErrSingle =
8.742278012618954e-08
As second example, suppose the run-time data type is int8.
paramRunTimeValueInt8 = int8(paramDialogValue)
paramRunTimeValueInt8 = int8
3
quantErrInt8 = double(paramRunTimeValueInt8) - double(paramDialogValue)
quantErrInt8 =
-0.141592653589793
As third example, suppose the run-time data type is a 32 bit fixed-point type
paramRunTimeValueUfix32En30 = fi(paramDialogValue,0,32,30)
paramRunTimeValueUfix32En30 =
3.141592653468251 DataTypeMode: Fixed-point: binary point scaling Signedness: Unsigned WordLength: 32 FractionLength: 30
quantErrUfix32En30 = double(paramRunTimeValueUfix32En30) - double(paramDialogValue)
quantErrUfix32En30 =
-1.215418876654439e-10
All three examples of quantizing the run-time parameter pi involve precision loss. 32-bit fixed-point has the least precision loss. Single's precision loss is in the middle. Int8 gives the most precision loss. Simulink's diagnostic for run-time parameter precision loss would apply to all three of these cases.
Simulink's diagnostic for detecting precision loss in fixed-point Net-Scaling constants only applies to fixed-point math involving Slope-Bias scaling.
For example, suppost the ideal net-slope for a fixed-point cast operation was
netSlope = 3/550
netSlope =
0.005454545454545
Now let's assume the fixed-point cast was handled in generated C code as follows
y = (int8_T) ( ((int32_T) uStoredInteger * 11439) >> 21 );
The net-slope is represented in the C code by
multiplication by 11439
arithmetic shift right 21 bits
Arithmetic shifts right are mathematically equivalent to division by 2^nbits.
So effectively the quantized net-slope is
quantizedNetSlope = 11439 / 2^21
quantizedNetSlope =
0.005454540252686
quantErrNetSlope = double(quantizedNetSlope) - double(netSlope)
quantErrNetSlope =
-5.201859908099404e-09
quantRelErrNetSlope = abs(quantErrNetSlope)/double(netSlope)
quantRelErrNetSlope =
9.536743164848907e-07
There is a small difference between the ideal net-slope and the quantized net slope. Small net-scaling differences like this are very common when slope-bias scaling is used. These small errors do not occur if all the scaling is restricted to binary point scaling which occurs when Slopes are exact powers of two and Biases are all zero.
Simulink's diagnostic for detecting precision loss in fixed-point Net-Scaling constants is about these small errors.
Specific example
A specific example where net-scaling example occurs is a Cast from
numerictype(1,16,0.0003,1.65)
to
numerictype(1,8,0.08,1.65)
Deriving stored integer operations for cast using fixed-point scaling equations
Even though the scaling bias for the two signals were not zero, the ideal net-bias for this cast does become zero. Zero can be perfectly quantized to zero, so there is no precision loss. In generated code, the zero bias would be completely optimized away, i.e. no code needed.
The ideal net-slope is the rational 3/800. To avoid the cost of division, this is approximated with a integer multiplication followed by a shift right. This approximation effectively gives a net-scaling constant precision loss that the diagnostic would apply to.
In this example, the precision loss due to going from an input scaling of 0.0003 to an output scaling of 0.08 is orders of magnitude bigger than the small precision loss of quantizing 3/800 to 31457 / 2^23.
The ideal net slope 3/800 = 0.0037499999999999999 = 0.95999999999999996 * 2^-8.
Quantization of this net slope produces a diagnostic that only mentions the mantissa 0.95999999999999996.
Net scaling quantization caused precision loss. The original value for NetSlope was 0.95999999999999985. The quantized value was 0.959991455078125 with an error of 8.54492187485345E-6.
The full quantization error needs to account for the exponent too. 8.54492187485345E-6 * 2^-8
quantErrorNetSlope = 8.54492187485345E-6 * 2^-8
quantErrorNetSlope =
3.337860107364629e-08
When multiplied by the biggest input stored integers, the amplified error is
worstCaseError = quantErrorNetSlope * [-2^15, 2^15-1]
worstCaseError = 1×2
-0.001093749999981 0.001093716621380
In terms of the output scaling of 0.08, the worst case error do to Net Slope quantization is quite small.
errorInOutputBits = worstCaseError / 0.08
errorInOutputBits = 1×2
-0.013671874999766 0.013671457767252
Not even 1.4% of a bit value.
If we round toward Floor, then ideally we expect an average rounding error 0.04 (half a bit) and a worst case error of 0.08 (1 bit).
If we round toward Nearest, then ideally we expect an average rounding error 0.02 (quarter of a bit) and a worst case error of 0.04 (half a bit).
Figure 1: Box and wisker plots showing quantization error statistics for casts. Top wisker is maximum error. Bottom wisker is smallest error. Red line is median. Top and bottom of blue boxes are 75th and 25th percentiles, respectively. Round to Nearest gives half the error of Round to Floor.
Figure 1 compares the error statistics of the fixed-point casts Simulink would provide including quantization of net scaling vs a more costly "luxury" cast that used very precise representations of the net scaling. For Floor rounding, the fixed-point cast and the cast using high precision net slope the box plots are indistiguishable. Likewise, for Nearest rounding, the fixed-point cast and the cast using high precision net slope the box plots are indistiguishable. What dominates the casts are the Slope of the output data type 0.08, and the type of rounding selected Floor vs Nearest.
One of the reasons the impact of Net Slope quantization is small is because "Best Precision Scaling" is applied when quantizing the scalar value. This will minimize the quantization error for the number of bits used to represent the net slope. In contrast, depending on how run-time parameters have been configured they might be scalars, vectors, ... n-d arrays and they might get scaling that is a great fit or a poor fit for the individual values within the run-parameter. Best precision scaling is one of the reasons quantization of a net slope constants is unlikely to have significant impact on a design.
Let's look at an example showing the huge contrast between best precisions scaling and bad scaling.
idealNetScaling = 3/800
idealNetScaling =
0.003750000000000
quantNetSlopeBestPrecisionScaling = fi( idealNetScaling, 1, 16 ) % No scaling means use best
quantNetSlopeBestPrecisionScaling =
0.003749966621399 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 16 FractionLength: 23
relativeErrorNetSlope = abs( (double(quantNetSlopeBestPrecisionScaling) - idealNetScaling))/idealNetScaling
relativeErrorNetSlope =
8.900960286421327e-06
Notice that with best precision scaling, quantization of the Net Slope gives a relative error of a meer 0.00089 percent.
idealRunTimeParam = 3/800
idealRunTimeParam =
0.003750000000000
quantRunTimeParamPoorScaling = fi( idealRunTimeParam, 1, 16, 5)
quantRunTimeParamPoorScaling =
0 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 16 FractionLength: 5
relativeErrorRunTimeParam = abs( (double(quantRunTimeParamPoorScaling) - idealRunTimeParam) )/idealRunTimeParam
relativeErrorRunTimeParam =
1
In contrast, the same value as a run-time parameter could get very bad scaling assigned. In this case, the quantization error of the run-time parameter lead to a 100% relative error. In other words, in relative terms, all value information was completely lost.
Given the expectation of precision losses in fixed-point designs, Simulink's diagnostic for detecting precision loss in fixed-point Net-Scaling constants is usually not important to investigate directly. It is better to invest in testing the fixed-point design at the system level to make sure system level behavior is within system design tolerances. In the example given in this section, we saw that the slope of the output data type and the rounding mode selected had over 30 times larger impact on accuracy than Net Slope quantization did. Testing to make sure an output slope of 0.08 met system requirements is much more important than directly investigating a net slope quantization error with a worst case impact less than 0.0011 which is 70 times smaller than the output slope.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!