Optimize integer multiplication with +/-1


Optimize integer multiplication with +/-1



Is it possible to optimize multiplication of an integer with -1/1 without using any multiplication and conditionals/branches?



Can it be done only with bitwise operations and integer addition?



Edit: The final goal is to optimize a scalar product of two integer vectors, where one of the vectors has only -1/1 values.





Like this: i = ~i + 1;?
– alk
Jul 1 at 9:32


i = ~i + 1;





your suggestion negates the number with 2's completement but where is the condition (1, -1) ?
– Jean-François Fabre
Jul 1 at 9:35






Reading yours comments, I feel I somehow did not got the question.
– alk
Jul 1 at 9:41





give -O3 to compiler, if it doesn't generate the best code then file a bug
– M.M
Jul 1 at 9:57


-O3





Bench your code before trying to optimize. Compilers are probably much smarter than us when it comes to optimizing, and processors have come a long way since times where you needed such micro-optimization. Just check ideone.com/jTsYvL for instance.
– Holt
Jul 1 at 10:25




5 Answers
5



Most modern processors have an ALU with a fast multiplier (meaning it takes about the same time to add two numbers as to multiply them, give or take one CPU clock), so doing anything but for (i=0;i<VectorLength;++i) { p += (x[i] * y[i]) ; } isn't likely to help. However, try a simple if and see if that gives any benefits gained from the CPU's branch prediction:


for (i=0;i<VectorLength;++i) { p += (x[i] * y[i]) ; }


if


for (i=0;i<VectorLength;++i) { p += (y[i]<0) ? -x[i] : x[i] ; }



In any case, if the CPU has fast multiply, doing any trick that involves more than one ALU operation (e.g., negation followed by addition, as in some of the examples given here) will more likely cause loss of performance compared to just one multiplication.





I have checked all methods mentioned here (including branches) and none of them beat straight forward multiplication on my CPU.
– Doe Jensen
Jul 1 at 10:39






This is in complete agreement with what I expected. What CPU are you running on? Anything other than very old Intel CPUs, a low-power micro controller or a low-end RISC CPU will have a fast multiplier, so it is best to stick with multiplication.
– Leo K
Jul 1 at 10:44





I'm running on Intel i5. Although I will need this specific optimization for a low-power GPU, so I will check in the next few days on that.
– Doe Jensen
Jul 1 at 10:53






Then, as you have likely guessed already, you can only get a definitive answer once you test on the target processor.
– Leo K
Jul 1 at 10:58



Yes, for instance, the following function returns a*b, where b is either +1 or -1:


a*b


b


+1


-1


int mul(int a, int b)
{
int c[3] = { -a, 0, +a };

return c[1+b];
}



or if both a and b are restricted to +-1:


a


b


+-1


int mul(int a, int b)
{
int c[5] = { +1, 0, -1, 0, +1 };

return c[a+b+2];
}



Yet another variant without memory access (faster than the ones above):


int mul(int a, int b)
{
return 1 - (signed)( (unsigned)(a+1) ^ (unsigned)(b+1) );
}



This answer works with any signed integer representation (sign-magnitude, ones' complement, two's complement) and does not cause any undefined behaviour.



However, I cannot guarantee that this will be faster than normal multiplication.





Clean and portable! But strictly speaking, mul(INT_MIN, 1) does have undefined behavior on a 2's complement architecture even though the result is perfectly defined.
– chqrlie
Jul 1 at 10:38



mul(INT_MIN, 1)





@chqrlie: Yes, you are right. However, the second variant does not suffer from this.
– DaBler
Jul 1 at 10:40





Yes indeed, for obvious reasons :)
– chqrlie
Jul 1 at 10:41





Cool, but it's much slower than multiplication :)
– Doe Jensen
Jul 1 at 10:42





@DoeJensen: benchmarking these hacks is rather tricky: does the function get inlined in all versions you compare? What compiler and flags do you use?
– chqrlie
Jul 1 at 10:45


int Multiplication(int x, int PlusOrMinusOne)
{
PlusOrMinusOne >>= 1; //becomes 0 or -1
//optionally build 2's complement (invert all bits plus 1)
return (x ^ PlusOrMinusOne) + (PlusOrMinusOne & 1);
}



Here a nice resource for such Bit Twiddling Hacks.





works, but I'm concerned because bit shifting on signed integers is implementation defined or undefined behaviour.
– Jean-François Fabre
Jul 1 at 9:44





Nice. I've benchmarked this and unfortunately it's still not faster than straight forward multiplication :(
– Doe Jensen
Jul 1 at 10:26





You could subtract PlusOrMinusOne, subtracting -1 is of course the same as adding 1
– harold
Jul 1 at 16:28


PlusOrMinusOne



Strictly speaking, no, because C allows for three different integer representations:



There is a proposal to strictly make it two's complement but until that makes it into the standard, you don't know what your negative numbers look like, so you can't really do too many bit hacks with them.



Less strictly speaking, most implementations use two's complement, so it is reasonably safe to use the hacks shown in other answers.





Let's hope this proposal makes it to the C Standard too. Too bad it is submitted for C++ and not C, where it would make sense the same way, except maybe for allowing padding bits on crippled micro-controllers. The Knuth's reference is a gem: 1's complement shall be written ones' complement. The motto Reducing the nerd-snipe potential inherent in C++ is a Good Thing remains wishful thinking I'm afraid.
– chqrlie
Jul 1 at 10:21






I'm afraid DaBler proved you wrong with a portable solution :)
– chqrlie
Jul 1 at 10:40



Assuming 2's compliment for integers:


int i1 = ...; // +1 or -1
int i2 = ...; // +1 or -1
unsigned u1 = i1 + 1; // 0 or 2
unsigned u2 = i2 + 1; // 0 or 2
unsigned u = u1 + u2; // 0 or 2 or 4: 0 and 4 need to become 1, 2 needs to become -1
u = (u & ~4); // 0 is 1 and 2 is -1
int i = u - 1; // -1 is 1 and 1 is -1
i = ~i + 1; // -1 is -1 and 1 is 1 :-)



The same as one-liner:


int i = ~(((i1 + i2 + 2) & ~4) - 1) + 1;



The following is true for i1 and i2 being either +1 or -1:


i1


i2


+1


-1


i == i1 * i2






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

List of Kim Possible characters

Audio Livestreaming with Python & Flask

NSwag: Generate C# Client from multiple Versions of an API