c++ - Tricks compiler uses to compile basic arithmetic operations of 128-bit integer -


i played on godbolt see x86-64 gcc(6.3) compiles following codes:

typedef __int128_t int128_t; typedef __uint128_t uint128_t;  uint128_t mul_to_128(uint64_t x, uint64_t y) {   return uint128_t(x)*uint128_t(y); } uint128_t mul(uint128_t x, uint128_t y) {   return x*y; } uint128_t div(uint128_t x, uint128_t y) {   return x/y; } 

and got:

mul_to_128(unsigned long, unsigned long):         mov     rax, rdi         mul     rsi         ret mul(unsigned __int128, unsigned __int128):         imul    rsi, rdx         mov     rax, rdi         imul    rcx, rdi         mul     rdx         add     rcx, rsi         add     rdx, rcx         ret div(unsigned __int128, unsigned __int128):         sub     rsp, 8         call    __udivti3 //what this???         add     rsp, 8         ret 

3 questions:

  1. the 1st function(cast 64-bit uint 128-bit multiply them) simpler multiplication of 2 128-bit uints(2nd function). basically, 1 multiplication. if multiply 2 maximums of 64-bit uint, overflows out of 64-bit register...how produce 128-bit result 1 64-bit-64-bit multiplication???
  2. i cannot read second result well...my guess break 64-bit number 2 32-bit numbers(says, hi higher 4 bytes , lo lower 4 bytes), , assemble result (hi1*hi2)<<64 + (hi1*lo2)<<32 + (hi2*lo1)<<32+(lo1*lo2). apparently wrong...because uses 3 multiplications (2 of them imul...signed multiplication???why???). can tell me gcc thinking? , optimal?
  3. cannot understand assembly of division...push stack -> call called __udivti3 pop stack...is __udivti3 big?(like table look-up?) , stuff gcc try push before call?

the godbolt link: https://godbolt.org/g/siiam3

you're right multiplying 2 unsigned 64-bit values can produce 128-bit result. funny thing, hardware designers know that, too. <g> multiplying 2 64-bit values produces 128-bit result storing lower half of result in 1 64-bit register , upper half of result in 64-bit register. compiler-writer knows registers used, , when call mul_to_128 results in appropriate registers.

in second example, think of values a1*2^64 + a0 , b1*2^64 + b0 (that is, split each 128-bit value 2 parts, upper 64 bits , lower 64 bits). when multiply a1*b1*2^64*2^64 + a1*b0*2^64 + a0*b1*2^64 + a0*b0. that's assembly code doing. parts of result overflow 128 bits ignored.

in third example,__udivti3 function division. it's not simple, doesn't expanded inline.


Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -