c - Strict-Aliasing warnings and tcpdump example code -
simplifying concept, strict-aliasing rule states object should accessed pointer of compatible type or pointer char
. way, compiler can make assumptions code , make optimizations.
even though interpretation can bring doubts , discussion, rule isn't state secret. questions are:
why respected organizations, maintained experienced programmers, submit codes don't respect strict aliasing rule? example can give tcpdump
: on website's tutorial on libpcap
there's example code apparently breaks strict aliasing rules several times. i've seem many other codes too, specially when handling network packets.. did got lucky compiler didn't broke code got unnoticed? rely on user compiling -fno-strict-aliasing
flag? possibility considering respectful programmers - think linus torvalds himself example, i've seem on mailing list linux snippet break strict aliasing enabled - don't think optimization gained strict aliasing compensates kind of bad assumptions compiler make. or bad code , bad practice unfortunately intrinsicate in programming community?
the other question sniffex.c
code tcpdump: why when compiled gcc -o5 -wall -wextra -wstrict-aliasing=1 sniffex.c -lpcap
gcc 5.4.0
don't issue warnings on strict aliasing rules being broken? because doesn't detect type-punnings when don't have address operator &
?
i feel bad bringing topic once again (since there many other questions it) though understand rule can't seem understand why ignored in lots of places..
edit:
the snippets of tcpdump
example code apparently breaks strict aliasing rule are:
void got_packet(u_char *args, const struct pcap_pkthdr *header, const u_char *packet) { ... /* declare pointers packet headers */ const struct sniff_ethernet *ethernet; /* ethernet header [1] */ const struct sniff_ip *ip; /* ip header */ const struct sniff_tcp *tcp; /* tcp header */ const char *payload; /* packet payload */ ... /* define ethernet header */ ethernet = (struct sniff_ethernet*)(packet); /* define/compute ip header offset */ ip = (struct sniff_ip*)(packet + size_ethernet); ... /* define/compute tcp header offset */ tcp = (struct sniff_tcp*)(packet + size_ethernet + size_ip); ... /* define/compute tcp payload (segment) offset */ payload = (u_char *)(packet + size_ethernet + size_ip + size_tcp); ...
there, sort of overlaying structures represent different parts of network packets have easier way access each of fields. in bottom line, uses several pointers don't have effective type of u_char
(the original packet
type) access it, thus, believe, violating strict aliasing rule.
the strict aliasing rule controversial.
bit of background:
note "the strict aliasing rule" not formal term, refers the paragraph 6.5/6 regarding effective type , 6.5./7 regarding accessing data through pointer. latter paragraph actual strict aliasing rule , has been part of c long language has been standardized, existence should not come shock anyone. text in 6.5./7 identical way ansi-c drafts c11.
however, section unclear in c90, because focused on type of pointer used "lvalue access", rather type of data stored there. made situations cast void pointers unclear, such when using memcpy
, or when doing various forms of type punning.
in c99 there attempt clarify introducing effective type. didn't change wording of strict aliasing rule much, made interpretation clearer. (it still remains 1 of hardest parts in standard understand.)
the original intent rule allow compilers avoid weird worst-case assumptions, such example c99 rationale:
int a; void f( double * b ) { = 1; *b = 2.0; g(a); }
if compiler can assume b
not pointing @ a
, should sensible assumption make given wildly different types, can optimize function
a = 1; *b = 2.0; g(1); // micro-optimization, doesn't have load `a` memory
so though rule has been there time, wasn't problem before somewhere along c99, when gcc compiler in particular decided go haywire , abuse cases different effective types used. example code makes perfect sense, yet violates strict aliasing:
uint32_t u32=0; uint16_t* p16 = (uint16_t*)&u32; // grab ms/ls word (endian-dependent) *p16 = something; if(u32) do_stuff();
the above useful code in manner of bit-twiddling , hardware-related programming. compilers generate programmer expects, namely code changes ms/ls word of 32 bit value check if function should called.
however, since above code formally undefined behavior because of strict aliasing violation, compilers gcc might decide abuse , generate code removes call do_stuff()
machine code, since may assume nothing in code changes u32
having value 0.
to dodge unwanted compiler behavior, programmer has go out of way. either make u32
volatile compiler forced read - blocks all optimizations on variable , not undesired one. or alternatively come home-brewed union type containing 1 uint32_t
, 2 uint16_t
. or possibly access u32 byte per byte. inconvenient.
therefore programmers tend rebel against strict aliasing rule , write code relies on compiler not making incredibly weird optimizations based on strict aliasing. there exists many valid cases when want break chunk of data in different parts, such when de-serializing block of raw data bytes.
for example if receive serial data byte-by-byte , store in array of uint8_t
i, programmer, know contains uint16_t
, should able write code (uint16_t*)array
without compiler making assumptions such "oh look, array never used, lets optimize away" or other nonsense.
most compilers not go crazy generate expected code. allowed go crazy standard. , growing popularity of gcc in hardware-related programming, becoming serious problem embedded industry, hardware-related programming everyday task, rather exotic special case.
overall, standard committee has repeatedly failed see problem.
and of course, lot of programmers don't know strict aliasing rule in first place, explanation of why write code violating it.
Comments
Post a Comment