Approval Checking Improvements Omnibus (#2480)

* add tracing to approval voting

* notify if session info is not working

* add dispute period to chain specs

* propagate genesis session to parachains runtime

* use `on_genesis_session`

* protect against zero cores in computation

* tweak voting rule to be based off of best and add logs

* genesis configuration should use VRF slots only

* swallow more keystore errors

* add some docs

* make validation-worker args non-optional and update clap

* better tracing for bitfield signing and provisioner

* pass amount of bits in bitfields to inclusion instead of recomputing

* debug -> warn for some logs

* better tracing for availability recovery

* a little av-store tracing

* bridge: forward availability recovery messages

* add missing try_from impl

* some more tracing

* improve approval distribution tracing

* guide: hold onto pending approval messages until NewBlocks

* Hold onto pending approval messages until NewBlocks

* guide: adjust comment

* process all actions for one wakeup at a time

* vec

* fix network bridge test

* replace randomness-collective-flip with Babe

* remove PairNotFound
This commit is contained in:
Robert Habermeier
2021-02-23 14:12:28 -06:00
committed by GitHub
parent 3c4ed7b234
commit 3300b53306
27 changed files with 647 additions and 132 deletions
@@ -42,6 +42,11 @@ Output:
```rust
type BlockScopedCandidate = (Hash, CandidateHash);
enum PendingMessage {
Assignment(IndirectAssignmentCert, CoreIndex),
Approval(IndirectSignedApprovalVote),
}
/// The `State` struct is responsible for tracking the overall state of the subsystem.
///
/// It tracks metadata about our view of the unfinalized chain, which assignments and approvals we have seen, and our peers' views.
@@ -50,6 +55,14 @@ struct State {
blocks_by_number: BTreeMap<BlockNumber, Vec<Hash>>,
blocks: HashMap<Hash, BlockEntry>,
/// Our view updates to our peers can race with `NewBlocks` updates. We store messages received
/// against the directly mentioned blocks in our view in this map until `NewBlocks` is received.
///
/// As long as the parent is already in the `blocks` map and `NewBlocks` messages aren't delayed
/// by more than a block length, this strategy will work well for mitigating the race. This is
/// also a race that occurs typically on local networks.
pending_known: HashMap<Hash, Vec<(PeerId, PendingMessage>)>>,
// Peer view data is partially stored here, and partially inline within the `BlockEntry`s
peer_views: HashMap<PeerId, View>,
}
@@ -102,6 +115,11 @@ Remove the view under the associated `PeerId` from `State::peer_views`.
Iterate over every `BlockEntry` and remove `PeerId` from it.
#### `NetworkBridgeEvent::OurViewChange`
Remove entries in `pending_known` for all hashes not present in the view.
Ensure a vector is present in `pending_known` for each hash in the view that does not have an entry in `blocks`.
#### `NetworkBridgeEvent::PeerViewChange`
Invoke `unify_with_peer(peer, view)` to catch them up to messages we have.
@@ -116,6 +134,8 @@ From there, we can loop backwards from `constrain(view.finalized_number)` until
#### `NetworkBridgeEvent::PeerMessage`
If the block hash referenced by the message exists in `pending_known`, add it to the vector of pending messages and return.
If the message is of type `ApprovalDistributionV1Message::Assignment(assignment_cert, claimed_index)`, then call `import_and_circulate_assignment(MessageSource::Peer(sender), assignment_cert, claimed_index)`
If the message is of type `ApprovalDistributionV1Message::Approval(approval_vote)`, then call `import_and_circulate_approval(MessageSource::Peer(sender), approval_vote)`
@@ -126,6 +146,9 @@ If the message is of type `ApprovalDistributionV1Message::Approval(approval_vote
Create `BlockEntry` and `CandidateEntries` for all blocks.
For all entries in `pending_known`:
* If there is now an entry under `blocks` for the block hash, drain all messages and import with `import_and_circulate_assignment` and `import_and_circulate_approval`.
For all peers:
* Compute `view_intersection` as the intersection of the peer's view blocks with the hashes of the new blocks.
* Invoke `unify_with_peer(peer, view_intersection)`.
@@ -157,8 +180,8 @@ enum MessageSource {
Imports an assignment cert referenced by block hash and candidate index. As a postcondition, if the cert is valid, it will have distributed the cert to all peers who have the block in their view, with the exclusion of the peer referenced by the `MessageSource`.
We maintain a few invariants:
* we only send an assignment to a peer after we add its fingerpring to our knownledge
* we add a fingerprint of an assignment to our knownledge only if it's valid and hasn't been added before
* we only send an assignment to a peer after we add its fingerprint to our knowledge
* we add a fingerprint of an assignment to our knowledge only if it's valid and hasn't been added before
The algorithm is the following:
@@ -167,7 +190,7 @@ The algorithm is the following:
* If the source is `MessageSource::Peer(sender)`:
* check if `peer` appears under `known_by` and whether the fingerprint is in the `known_messages` of the peer. If the peer does not know the block, report for providing data out-of-view and proceed. If the peer does know the block and the knowledge contains the fingerprint, report for providing replicate data and return.
* If the message fingerprint appears under the `BlockEntry`'s `Knowledge`, give the peer a small positive reputation boost,
add the fingerpring to the peer's knownledge only if it knows about the block and return.
add the fingerprint to the peer's knowledge only if it knows about the block and return.
Note that we must do this after checking for out-of-view and if the peers knows about the block to avoid being spammed.
If we did this check earlier, a peer could provide data out-of-view repeatedly and be rewarded for it.
* Dispatch `ApprovalVotingMessage::CheckAndImportAssignment(assignment)` and wait for the response.
@@ -194,7 +217,7 @@ Imports an approval signature referenced by block hash and candidate index:
* If the source is `MessageSource::Peer(sender)`:
* check if `peer` appears under `known_by` and whether the fingerprint is in the `known_messages` of the peer. If the peer does not know the block, report for providing data out-of-view and proceed. If the peer does know the block and the knowledge contains the fingerprint, report for providing replicate data and return.
* If the message fingerprint appears under the `BlockEntry`'s `Knowledge`, give the peer a small positive reputation boost,
add the fingerpring to the peer's knownledge only if it knows about the block and return.
add the fingerprint to the peer's knowledge only if it knows about the block and return.
Note that we must do this after checking for out-of-view to avoid being spammed. If we did this check earlier, a peer could provide data out-of-view repeatedly and be rewarded for it.
* Dispatch `ApprovalVotingMessage::CheckAndImportApproval(approval)` and wait for the response.
* If the result is `VoteCheckResult::Accepted(())`:
@@ -135,7 +135,10 @@ struct State {
session_info: Vec<SessionInfo>,
babe_epoch: Option<BabeEpoch>, // information about a cached BABE epoch.
keystore: KeyStorePtr,
wakeups: BTreeMap<Tick, Vec<(Hash, Hash)>>, // Tick -> [(Relay Block, Candidate Hash)]
// A scheduler which keeps at most one wakeup per hash, candidate hash pair and
// maps such pairs to `Tick`s.
wakeups: Wakeups,
// These are connected to each other.
background_tx: mpsc::Sender<BackgroundRequest>,
@@ -48,8 +48,8 @@ Validators: Vec<ValidatorId>;
All failed checks should lead to an unrecoverable error making the block invalid.
* `process_bitfields(Bitfields, core_lookup: Fn(CoreIndex) -> Option<ParaId>)`:
1. check that the number of bitfields and bits in each bitfield is correct.
* `process_bitfields(expected_bits, Bitfields, core_lookup: Fn(CoreIndex) -> Option<ParaId>)`:
1. check that there is at most 1 bitfield per validator and that the number of bits in each bitfield is equal to expected_bits.
1. check that there are no duplicates
1. check all validator signatures.
1. apply each bit of bitfield to the corresponding pending candidate. looking up parathread cores using the `core_lookup`. Disregard bitfields that have a `1` bit for any free cores.
@@ -27,7 +27,7 @@ Included: Option<()>,
1. Invoke `Disputes::provide_multi_dispute_data`.
1. If `Disputes::is_frozen`, return and set `Included` to `Some(())`.
1. If there are any created disputes from the current session, invoke `Inclusion::collect_disputed` with the disputed candidates. Annotate each returned core with `FreedReason::Concluded`.
1. The `Bitfields` are first forwarded to the `Inclusion::process_bitfields` routine, returning a set of freed cores. Provide a `Scheduler::core_para` as a core-lookup to the `process_bitfields` routine. Annotate each of these freed cores with `FreedReason::Concluded`.
1. The `Bitfields` are first forwarded to the `Inclusion::process_bitfields` routine, returning a set of freed cores. Provide the number of availability cores (`Scheduler::availability_cores().len()`) as the expected number of bits and a `Scheduler::core_para` as a core-lookup to the `process_bitfields` routine. Annotate each of these freed cores with `FreedReason::Concluded`.
1. For each freed candidate from the `Inclusion::process_bitfields` call, invoke `Disputes::note_included(current_session, candidate)`.
1. If `Scheduler::availability_timeout_predicate` is `Some`, invoke `Inclusion::collect_pending` using it and annotate each of those freed cores with `FreedReason::TimedOut`.
1. Combine and sort the dispute-freed cores, the bitfield-freed cores, and the timed-out cores.