c++ - Tricks compiler uses to compile basic arithmetic operations of 128-bit integer -
i played on godbolt see x86-64 gcc(6.3) compiles following codes:
typedef __int128_t int128_t; typedef __uint128_t uint128_t; uint128_t mul_to_128(uint64_t x, uint64_t y) { return uint128_t(x)*uint128_t(y); } uint128_t mul(uint128_t x, uint128_t y) { return x*y; } uint128_t div(uint128_t x, uint128_t y) { return x/y; }
and got:
mul_to_128(unsigned long, unsigned long): mov rax, rdi mul rsi ret mul(unsigned __int128, unsigned __int128): imul rsi, rdx mov rax, rdi imul rcx, rdi mul rdx add rcx, rsi add rdx, rcx ret div(unsigned __int128, unsigned __int128): sub rsp, 8 call __udivti3 //what this??? add rsp, 8 ret
3 questions:
- the 1st function(cast
64-bit
uint128-bit
multiply them) simpler multiplication of 2 128-bit uints(2nd function). basically, 1 multiplication. if multiply 2 maximums of 64-bit uint, overflows out of 64-bit register...how produce 128-bit result 1 64-bit-64-bit multiplication??? - i cannot read second result well...my guess break 64-bit number 2 32-bit numbers(says,
hi
higher 4 bytes ,lo
lower 4 bytes), , assemble result(hi1*hi2)<<64 + (hi1*lo2)<<32 + (hi2*lo1)<<32+(lo1*lo2)
. apparently wrong...because uses 3 multiplications (2 of themimul
...signed multiplication???why???). can tell me gcc thinking? , optimal? - cannot understand assembly of division...push stack -> call called
__udivti3
pop stack...is__udivti3
big?(like table look-up?) , stuff gcc try push before call?
the godbolt link: https://godbolt.org/g/siiam3
you're right multiplying 2 unsigned 64-bit values can produce 128-bit result. funny thing, hardware designers know that, too. <g> multiplying 2 64-bit values produces 128-bit result storing lower half of result in 1 64-bit register , upper half of result in 64-bit register. compiler-writer knows registers used, , when call mul_to_128
results in appropriate registers.
in second example, think of values a1*2^64 + a0
, b1*2^64 + b0
(that is, split each 128-bit value 2 parts, upper 64 bits , lower 64 bits). when multiply a1*b1*2^64*2^64 + a1*b0*2^64 + a0*b1*2^64 + a0*b0
. that's assembly code doing. parts of result overflow 128 bits ignored.
in third example,__udivti3
function division. it's not simple, doesn't expanded inline.
Comments
Post a Comment