7a8c251901 made this logic hard to follow. After that change, messages would
not be sent to a peer via SendMessages() before the handshake was complete, but
messages could still be sent as a response to an incoming message.
For example, if a peer had not yet sent a verack, we wouldn't notify it about
new blocks, but we would respond to a PING with a PONG.
This change makes the behavior straightforward: until we've received a verack,
never send any message other than version/verack/reject.
The behavior until a VERACK is received has always been undefined, this change
just tightens our policy.
This also makes testing much easier, because we can now connect but not send
version/verack, and anything sent to us is an error.
Prior to this change, all messages were ignored until a VERSION message was
received, as well as possibly incurring a ban score.
Since REJECT messages can be sent at any time (including as a response to a bad
VERSION message), make sure to always parse them.
Moving this parsing up keeps it from being caught in the
if (pfrom->nVersion == 0) check below.
7a8c251901 made a change to avoid getting into SendMessages() until the
version handshake (VERSION + VERACK) is complete. That was done to avoid
leaking out messages to nodes who could connect, but never bothered sending
us their version/verack.
Unfortunately, the ban tally and possible disconnect are done as part of
SendMessages(). So after 7a8c251901, if a peer managed to do something
bannable before completing the handshake (say send 100 non-version messages
before their version), they wouldn't actually end up getting
disconnected/banned. That's fixed here by checking the banscore as part of
ProcessMessages() in addition to SendMessages().
These are (afaik) all long-standing races or concurrent accesses. Going
forward, we can clean these up so that they're not all individual atomic
accesses.
- Reintroduce cs_vRecv to guard receive-specific vars
- Lock vRecv/vSend for CNodeStats
- Make some vars atomic.
- Only set the connection time in CNode's constructor so that it doesn't change
This avoids having some vars set if the version negotiation fails.
Also copy it all into CNode at the same site. nVersion and
fSuccessfullyConnected are set last, as they are the gates for the other vars.
Make them atomic for that reason.
This adds a comment to the new logic for setting HB peers based
on block validation (and aligns the code below to reflect the comment).
It's not obvious why we're checking mapBlocksInFlight. Add a comment to
explain.
The old Bitcoin alert system has long since been retired.
( See also: https://bitcoin.org/en/alert/2016-11-01-alert-retirement )
This change causes each node to send any old peers that
it connects with a copy of the final alert.
The alert it hardcode cancels all other alerts including
other final alerts.
This forces the message handling thread to make another full
iteration of SendMessages prior to going back to sleep, ensuring
we announce the new block to all peers before sleeping.
vRecvMsg is now only touched by the socket handler thread.
The accounting vars (nRecvBytes/nLastRecv/mapRecvBytesPerMsgCmd) are also
only used by the socket handler thread, with the exception of queries from
rpc/gui. These accesses are not threadsafe, but they never were. This needs to
be addressed separately.
Also, update comment describing data flow
Similar to the recv flag, but this one indicates whether or not the net's send
buffer is full.
The socket handler checks the send queue when a new message is added and pauses
if necessary, and possibly unpauses after each message is drained from its buffer.
Messages are dumped very quickly from the socket handler to the processor, so
it's the depth of the processing queue that's interesting.
The socket handler checks the process queue's size during the brief message
hand-off and pauses if necessary, and the processor possibly unpauses each time
a message is popped off of its queue.
In order to sleep accurately, the message handler needs to know if _any_ node
has more processing that it should do before the entire thread sleeps.
Rather than returning a value that represents whether ProcessMessages
encountered a message that should trigger a disconnnect, interpret the return
value as whether or not that node has more work to do.
Also, use a global fProcessWake value that can be set by other threads,
which takes precedence (for one cycle) over the messagehandler's decision.
Note that the previous behavior was to only process one message per loop
(except in the case of a bad checksum or invalid header). That was changed in
PR #3180.
The only change here in that regard is that the current node now falls to the
back of the processing queue for the bad checksum/invalid header cases.
There is still a call to ActivateBestChain with cs_main if a peer
requests the block prior to it being validated, but this one is
more specifically-gated, so should be less of an issue.
Previously addnodes were in competition with outbound connections
for access to the eight outbound slots.
One result of this is that frequently a node with several addnode
configured peers would end up connected to none of them, because
while the addnode loop was in its two minute sleep the automatic
connection logic would fill any free slots with random peers.
This is particularly unwelcome to users trying to maintain links
to specific nodes for fast block relay or purposes.
Another result is that a group of nine or more nodes which are
have addnode configured towards each other can become partitioned
from the public network.
This commit introduces a new limit of eight connections just for
addnode peers which is not subject to any of the other connection
limitations (including maxconnections).
The choice of eight is sufficient so that under no condition would
a user find themselves connected to fewer addnoded peers than
previously. It is also low enough that users who are confused
about the significance of more connections and have gotten too
copy-and-paste happy will not consume more than twice the slot
usage of a typical user.
Any additional load on the network resulting from this will likely
be offset by a reduction in users applying even more wasteful
workaround for the prior behavior.
The retry delays are reduced to avoid nodes sitting around without
their added peers up, but are still sufficient to prevent overly
aggressive repeated connections. The reduced delays also make
the system much more responsive to the addnode RPC.
Ban-disconnects are also exempted for peers added via addnode since
the outbound addnode logic ignores bans. Previously it would ban
an addnode then immediately reconnect to it.
A minor change was also made to CSemaphoreGrant so that it is
possible to re-acquire via an object whos grant was moved.
In order to do this, we must call ActivateBestChain prior to
responding getdata requests for blocks which we announced using
compact blocks.
For getheaders responses we dont need code changes, but do note
that we must reset the bestHeaderSent so that the SendMessages call
re-announces the header in question.
While we could do something smarter for getblocks, calling
ActivateBestChain is simple and more obviously correct, instead of
doing something more similar to getheaders.
See-also the BIP clarifications at
https://github.com/bitcoin/bips/pull/486