What is the efficient code?
2 views (last 30 days)
I got a blow warning message in model advisor.
- The following blocks will invoke net slope computation for multiplication.
- The net slope computation can be implemented by: multiplication-and-shift or integer multiplication and/or division.
- Changing the Configuration Parameters > Optimization > Use division for fixed-point net slope computation setting to On might generate more efficient code.
- For example, net slope computation from fixdt(1, 16, 7/10, 0) to fixdt(1, 16, 1, 0) can be achieved by Qy = (Qu*7)/10 instead of Qu*11469 >> 14.
- To use this option, change the Integer rounding mode parameter of the following blocks to Simplest or to the Configuration Parameters > Hardware Implementation > Production Hardware > Signed integer division rounds to setting.
In this warning message, what is the efficient code meaning?
Readability? or Processing speed?
Walter Roberson on 12 Mar 2023
If you are using fixdt() then you are likely targetting hardware.
If that hardware does not have a division operator, or the division is slow, or the division does not support the data types you need, then use multiplication-and-shift.
If the hardware has a division operator that supports the data types you need, then probably it would be faster to use that hardware.
Note that if you are targetting FPGA then leaving out all division can save a notable amount of gates. Leaving out all floating point operations of any kind can save a lot of gates for FPGA . But if you need to compute with a range of values such that fixed-point becomes awkward, then it might be worth linking in a floating-point core.
Andy Bartlett on 14 Mar 2023
Edited: Andy Bartlett on 14 Mar 2023
Simple Ansewer: Use shift approach unless multiplicative constant is really big
In my experience, the multiply by constant followed by a shift will be more efficient than division in most cases.
The exceptions occur when the multiplicative constant is quite big and forces a more combersome multiplication to be performed, especially a multiword multiply.
Improvements to the handling of slope-bias casts in recent releases will make the occurance of big multiplicative constants less likely. Also, slope-bias scaling is most frequently used with types with 16 or fewer bits. Small wordlengths make it more likely that the multiplicative constant can be implemented quite efficiently.
As Walter noted, if you have a target that does not have an integer division instruction, such as an ARM Cortex-M0, then you want to avoid division. With no hardware instruction, division will need to be implemented by sequencing many other operations.
But division by a constant value is a special case. Even when a integer division hardware operation is available, it often takes many more clock cycles than other operations. Many compilers have lots of tricks to avoid using a division operation when the denominator is a constant.
Example with Godbolt Compiler Explorer
For gcc targeting Cortex-M4 (which does have an integer division operation) with these compiler options
-O2 -march=armv7e-m -mtune=cortex-m4
the division approach requires more code, but it doesn't actually use a division. The compiler exploited that the denominator was a constant and optimized it away. Instead of division, it used multiply and shifts. Sound familiar?
With gcc targeting Cortex-M0 with these options
the compiler has "optimized away" division by constant and multiplication by constant. Multiplication is still a little smaller. I did not count the clock cycles, but I'm guessing the approach that originated as a multiply-then-shift is faster than then the approach that originated as division-then-multiply.