Method of Real number multiplication operation

Question

MIN HOON on 6 Mar 2023

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/1923665-method-of-real-number-multiplication-operation

Edited: Andy Bartlett on 7 Mar 2023

Accepted Answer: Andy Bartlett

Environment

In simulink,

┌ ─ ─ ─ ─ ─ ─ ─ ─ ┐

│ Value (uint16) │ㅡㅡㅡㅡ [ Gain : 9.8 (fixdt(1,16,0.1,0) ] ㅡㅡㅡㅡㅡ [out]

└ ─ ─ ─ ─ ─ ─ ─ ─ ┘

When I try to generate code in above model, code is generated like below briefly.

(uint16_T) ((((uint32_T)((uint16_T)( (((uint32_T49U)*((uint32_T)Value)) >> 3 )) * ((uint32_T)52429U))) >> 15)

In above equation, I think that we can represent like below.

Value * (49>>3) * (52429>>15)

Additionally, I recognized that the sequantial patterns of binary about 49 and 52429 are respectively same as 98 and 0.1.

So, I think that 49>>3 part represents 98, and 52429>>15 part represents 0.1.

From what i know, even though we represent real number in binary, computer only can understand it like inteager, which is why above equation appear.

However, I couldn't find out any meaning of bitwise operation (>>3 , >>15)

I mean... Why is " >> 3 " generated with 49 instead of 52429?

Please let me know more than what I know...

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Andy Bartlett on 6 Mar 2023

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/1923665-method-of-real-number-multiplication-operation#answer_1186625

Open in MATLAB Online

Quick Answer:

Shifts are Multiply or Divide by a Power of Two

Shift left of M bits is equivalent to multiplication by 2^M when no overflow occurs.

Arithmetic shift right of N bit is equivalent to dividing by 2^N followed by rounding toward floor.

Integer shift left and shift right are fast machine instructions that take just one clock cycle on most modern microcontrollers.

Shifts are also much more efficient than division.

When you see shifts in generated C code for fixed-point math, the shifts usually mean multiply by 2^M or divide by 2^N.

So

( ( ( Value * 49 ) >>3 ) * 52429 ) >>15

is mathematically equivalent to

floor( ( floor( ( Value * 49 ) * 2^-3 ) * 52429 ) * 2^-15 )

which if we ignore the rounding is

Value * ( 49 * 2^-3 * 52429 * 2^-15)

or combining constants show the expected math (with input and output both uint16).

Value * 9.8

Long Answer:

Assumptions

Based on the generated code, I assume

you've configured the gain's output data type to be uint16.
all the fixed-point scaling Biases are zero

Ideal math

The ideal math for a gain in real-world values is

VY = VU * VG

where VG is the gain parameter value.

Replacing each of the variables with the zero-bias fixed-point scaling equation

V = Slope * Q

where Q is the stored integer value, gives

SlopeY * QY = SlopeU * QU * SlopeG * QG

solving for QY gives

QY = QU * QG * (SlopeU * SlopeG / SlopeY)

Using the known values

Based on the given information, we have

SlopeU = 1.0

SlopeY = 1.0

SlopeG = 0.1

QG = 98

Substituting gives

QY = QU * 98 * (1 * 0.1 / 1)

QY = QU * 98 * 0.1

or

QY = QU * 9.8

This is the "ideal math" expressed in terms of the stored integer values.

We expect the generated code to be equivalent to this except for rounding or overflow issues.

Why the math is done in two parts

The math for implementing the gain is effectively done in two parts

QTemp = ( Value * 49 ) >> 3

QY = ( QTemp * 52429 ) >> 15

One reason the math is broken into two parts is because the Gain block is designed to treat the gain parameter as if it could be tuned at some later point to a different stored integer value. So it starts with stored integer value 98, but it might be later tuned to another unsigned 16 bit stored integer values, such as 65001.

So the gain does NOT attempt to optimize the ideal math for

QY = QU * 98 * 0.1

Instead it tries to optimize the generated code for the ideal math

QY = QU * QG * 0.1

This could be implemented as

QY = QU * QG * 52429 * 2^-19

where 52429 * 2^-19 approximates 0.1.

But the pair of integer multiplications Q1 * QG * 52429 requires 48 bits to avoid all posible overflows over the range of tunable values for QG. For the Simulink model's specified production hardware target, the Gain block is trying to avoid going above 32 bit integers.

The operation is split into two parts with each part not exceeding 32 bits needed.

QY = ( QU * QG * 2^-4 ) * 52429 * 2^-15

or conceptually

QTemp = QU * QG * 2^-4

QY = QTemp * 52429 * 2^-15

Code Generation Optimization

Later when code is finally generated, the gain parameter is known to be constant.

Coders can exploit this information provided it doesn't change the defined math behaviors.

Coder see this math to be implemented

QTemp = QU * 98 * 2^-4

QY = QTemp * 52429 * 2^-15

Since 98 in binary has a trailing zero, this

QTemp = QU * 98 * 2^-4

can be replaced with the slightly simpler

QTemp = QU * 49 * 2^-3

Now replacing the multiplications and divisions by powers of two with shifts gives

QTemp = (QU * 49) >> 3

QY = ( QTemp * 52429 ) >> 15

This finishes the explaination of where the pieces of the generated code come from.

Recommendation

Using Slope and Bias scaling can improve accuracy. But to achieve the full accuracy and also get maximally efficient generated code, you want the output type to have scaling that matches the operation. So for gain block multiplication, you want

SlopeY = SlopeU * SlopeG

In your example, you've configured the scaling to not match

1 ~= 1 * 0.1

This causes less efficient code to be generated and removes the accuracy gains in representing 9.8 with Slope 0.1.

Either change the output of the gain to have SlopeY = 0.1.

Or use binary point scaling for the gain parameter. For example, if you use unsigned 16 bit best precision scaling

format long
g = fi( 9.8, 0, 16 )
g = 
   9.800048828125000

          DataTypeMode: Fixed-point: binary point scaling
            Signedness: Unsigned
            WordLength: 16
        FractionLength: 12
QG = g.storedInteger
QG = uint16
   40141

Then for cases that don't overflow the output type (uint16),

the worst case rounding error is just 0.8.

And the generated code is just

QY = ( QU * 40141U ) >> 12;

2 Comments
Show NoneHide None

MIN HOON on 7 Mar 2023

@Andy Bartlett

Thank you so much for you kind and detailed answer.

However, I couldn't understand why seperating 2^-19 into 2^-4 and 2^15 is used as method for avoiding exceeding 32bits.

Can you explain it more specific?

Andy Bartlett on 7 Mar 2023

Edited: Andy Bartlett on 7 Mar 2023

N-bit by M-bit Multiplies in C

When multiplying an N bit integer by an M bit integer, the full precision product will need N+M bits.

But N bit by N bit multiplications provided by the C language do NOT give the full precision 2N bit product. They only give the least significant N bits of the product. So the trick in C to get the full precision product is to upcast that the inputs to a data type that has M+N bits, then do multiplication in that bigger type.

C code normally provides two sizes of integer multiplies int * int and and long long * long long. (Note this assumes long is the same size as int or long long.) Usually, the int multiply has a 2X or more speed and code compactness advantage over long long multiply. Multiplications bigger than long long are not native and require much bigger and slower code to implement.

Your example is 16 bit by 16 bit multiplies

In the example you provided, the multiplications are a 16 bit input times a 16 bit input initially producing a 32 bit product. These can both be produced by a lean and fast int32 (or uint32) multiplication.

The first multiplication is easily observed to be 16 bit by 16 bit based on the input and gain parameter types.

The second multiplication is also 16 bit by 16 bit because an EXPLICIT DOWNCAST to uint16 was inserted after the first multiply and the shift right of 3 bits. Hence the second multiply can also be done with a lean and fast int32 (or uint32) multiplication.

Overflow Management is Part of the Strategy

The math that you've configured uint16 * 9.8 being assigned back to uint16 will overflow for most of the possible inputs.

For example, for 6688 <= u <= 65535, the product u * 9.8 is greater than intmax('uint16') == 65535 so overflow is expected.

The implementation, by design, has structured the first stage to overflow a little less than the overall operation. Stage 1 will overflow 10700 <= u. The expected overflows for 6688 <= u < 10700 will happen in the second stage. More importantly, for the "good cases" 0 <= u <= 6687, neither stage will overflow.

Keeping the multiplications lean and keeping the good cases free of overflows is why the 2^-19 was split up as 2^-4 for stage one and 2^-15 for stage 2.

Sign in to comment.

Method of Real number multiplication operation

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Method of Real number multiplication operation

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None