commit 3381492d26c2c3677b5f6065e1022c9cfe225c79 Author: Stephan Wiesand Date: Thu Jun 12 10:30:48 2014 +0200 Make OpenAFS 1.6.9 Update version strings and NEWS for 1.6.9 Change-Id: I80fe292dd091a26cbec5d5b4a2fab51e4cf4dee6 commit bc8f62fcdfa479023d15125404d1b13b6dfd6dc3 Author: Jeffrey Altman Date: Wed Jun 11 19:03:45 2014 -0400 Revert "viced: Avoid issuing redundant TMAY requests" This reverts commit 03a9b481c7f27c462c9d65a756d172e79758b86d. Andrew Deason wrote, "Briefly, 'host' structures are allocated without clearing all of the contents to '0'. Only part of the structure is cleared, according to the HOST_TO_ZERO macro. Unfortunately I put the new tmay_ fields right below the 'index' field for some reason, so this means they aren't zeroed and can contain garbage. This means we can easily segfault in the fileserver when we try to access the pointers in there. "We access uninitialized memory for every 'host' that is allocated. So the chance of us corrupting memory is the chance that a particular pointer-sized area of memory from 'malloc' is not already NULL. "That seems pretty likely, but it's not so frequent as to have the fileserver effectively "constantly" crashing at the site that noticed. So it has not been a fire drill, but it has been noticeable (we heard about it I think yesterday, and got details today when it happened again). The noticing incident was a segfault, but an abort or sigbus are probably also likely. "Of course, the chances of noticing go way up with more clients. I expect the chances dramatically increase if you have more than 512 client hosts hit the box, since the first block of 512 are allocated before we really do anything. For the next 512, it seems much more likely that 'malloc' will give us back non-zeroed data. But this is just theory. "With the incident I know about, the crash happened semi-quickly after the server started (a few minutes). But it seems likely to occur after the server has been up for a long time, if/when you cross the next line of 512 hosts. "I am also concerned that this can easily be corrupting memory without being noticed via a crash (or it takes a while to crash), since we are potentially free'ing invalid pointers, or stomping over someone else's memory, etc etc." Change-Id: I20bd40fc9df69247884099a0623e6db40908b3e8