I am dumb and can't spell (#1366)

* rename implementor's guide to implementer's guide

* fix typos in more places
This commit is contained in:
Robert Habermeier
2020-07-07 10:10:36 -04:00
committed by GitHub
parent 37da08a764
commit 42bd096413
54 changed files with 5 additions and 5 deletions
@@ -0,0 +1,10 @@
# Backing Subsystems
The backing subsystems, when conceived as a black box, receive an arbitrary quantity of parablock candidates and associated proofs of validity from arbitrary untrusted collators. From these, they produce a bounded quantity of backable candidates which relay chain block authors may choose to include in a subsequent block.
In broad strokes, the flow operates like this:
- **Candidate Selection** winnows the field of parablock candidates, selecting up to one of them to second.
- **Candidate Backing** ensures that a seconding candidate is valid, then generates the appropriate `Statement`. It also keeps track of which candidates have received the backing of a quorum of other validators.
- **Statement Distribution** is the networking component which ensures that all validators receive each others' statements.
- **PoV Distribution** is the networking component which ensures that validators considering a candidate can get the appropriate PoV.
@@ -0,0 +1,92 @@
# Candidate Backing
The Candidate Backing subsystem ensures every parablock considered for relay block inclusion has been seconded by at least one validator, and approved by a quorum. Parablocks for which no validator will assert correctness are discarded. If the block later proves invalid, the initial backers are slashable; this gives polkadot a rational threat model during subsequent stages.
Its role is to produce backable candidates for inclusion in new relay-chain blocks. It does so by issuing signed [`Statement`s](../../types/backing.md#statement-type) and tracking received statements signed by other validators. Once enough statements are received, they can be combined into backing for specific candidates.
Note that though the candidate backing subsystem attempts to produce as many backable candidates as possible, it does _not_ attempt to choose a single authoritative one. The choice of which actually gets included is ultimately up to the block author, by whatever metrics it may use; those are opaque to this subsystem.
Once a sufficient quorum has agreed that a candidate is valid, this subsystem notifies the [Provisioner](../utility/provisioner.md), which in turn engages block production mechanisms to include the parablock.
## Protocol
The [Candidate Selection subsystem](candidate-selection.md) is the primary source of non-overseer messages into this subsystem. That subsystem generates appropriate [`CandidateBackingMessage`s](../../types/overseer-protocol.md#candidate-backing-message), and passes them to this subsystem.
This subsystem validates the candidates and generates an appropriate [`SignedStatement`](../../types/backing.md#signed-statement-type). All `SignedStatement`s are then passed on to the [Statement Distribution subsystem](statement-distribution.md) to be gossiped to peers. All [Proofs of Validity](../../types/availability.md#proof-of-validity) should be distributed via the [PoV Distribution](pov-distribution.md) subsystem. When this subsystem decides that a candidate is invalid, and it was recommended to us to second by our own Candidate Selection subsystem, a message is sent to the Candidate Selection subsystem with the candidate's hash so that the collator which recommended it can be penalized.
## Functionality
The subsystem should maintain a set of handles to Candidate Backing Jobs that are currently live, as well as the relay-parent to which they correspond.
### On Overseer Signal
* If the signal is an [`OverseerSignal`](../../types/overseer-protocol.md#overseer-signal)`::StartWork(relay_parent)`, spawn a Candidate Backing Job with the given relay parent, storing a bidirectional channel with the Candidate Backing Job in the set of handles.
* If the signal is an [`OverseerSignal`](../../types/overseer-protocol.md#overseer-signal)`::StopWork(relay_parent)`, cease the Candidate Backing Job under that relay parent, if any.
### On `CandidateBackingMessage`
* If the message corresponds to a particular relay-parent, forward the message to the Candidate Backing Job for that relay-parent, if any is live.
> big TODO: "contextual execution"
>
> * At the moment we only allow inclusion of _new_ parachain candidates validated by _current_ validators.
> * Allow inclusion of _old_ parachain candidates validated by _current_ validators.
> * Allow inclusion of _old_ parachain candidates validated by _old_ validators.
>
> This will probably blur the lines between jobs, will probably require inter-job communication and a short-term memory of recently backable, but not backed candidates.
## Candidate Backing Job
The Candidate Backing Job represents the work a node does for backing candidates with respect to a particular relay-parent.
The goal of a Candidate Backing Job is to produce as many backable candidates as possible. This is done via signed [`Statement`s](../../types/backing.md#statement-type) by validators. If a candidate receives a majority of supporting Statements from the Parachain Validators currently assigned, then that candidate is considered backable.
### On Startup
* Fetch current validator set, validator -> parachain assignments from runtime API.
* Determine if the node controls a key in the current validator set. Call this the local key if so.
* If the local key exists, extract the parachain head and validation function for the parachain the local key is assigned to.
### On Receiving New Signed Statement
```rust
if let Statement::Seconded(candidate) = signed.statement {
if candidate is unknown and in local assignment {
spawn_validation_work(candidate, parachain head, validation function)
}
}
// add `Seconded` statements and `Valid` statements to a quorum. If quorum reaches validator-group
// majority, send a `BlockAuthorshipProvisioning::BackableCandidate(relay_parent, Candidate, Backing)` message.
```
### Spawning Validation Work
```rust
fn spawn_validation_work(candidate, parachain head, validation function) {
asynchronously {
let pov = (fetch pov block).await
// dispatched to sub-process (OS process) pool.
let valid = validate_candidate(candidate, validation function, parachain head, pov).await;
if valid {
// make PoV available for later distribution. Send data to the availability store to keep.
// sign and dispatch `valid` statement to network if we have not seconded the given candidate.
} else {
// sign and dispatch `invalid` statement to network.
}
}
}
```
### Fetch Pov Block
Create a `(sender, receiver)` pair.
Dispatch a `PovFetchSubsystemMessage(relay_parent, candidate_hash, sender)` and listen on the receiver for a response.
### On Receiving `CandidateBackingMessage`
* If the message is a `CandidateBackingMessage::RegisterBackingWatcher`, register the watcher and trigger it each time a new candidate is backable. Also trigger it once initially if there are any backable candidates at the time of receipt.
* If the message is a `CandidateBackingMessage::Second`, sign and dispatch a `Seconded` statement only if we have not seconded any other candidate and have not signed a `Valid` statement for the requested candidate. Signing both a `Seconded` and `Valid` message is a double-voting misbehavior with a heavy penalty, and this could occur if another validator has seconded the same candidate and we've received their message before the internal seconding request.
> TODO: send statements to Statement Distribution subsystem, handle shutdown signal from candidate backing subsystem
@@ -0,0 +1,39 @@
# Candidate Selection
The Candidate Selection Subsystem is run by validators, and is responsible for interfacing with Collators to select a candidate, along with its PoV, to second during the backing process relative to a specific relay parent.
This subsystem includes networking code for communicating with collators, and tracks which collations specific collators have submitted. This subsystem is responsible for disconnecting and blacklisting collators who are found to have submitted invalid collations. Typically an invalid collation will be discovered by a different subsystem.
This subsystem is only ever interested in parablocks assigned to the particular parachain which this validator is currently handling.
New parablock candidates may arrive from a potentially unbounded set of collators. This subsystem chooses either 0 or 1 of them per relay parent to second. If it chooses to second a candidate, it sends an appropriate message to the [Candidate Backing subsystem](candidate-backing.md) to generate an appropriate [`Statement`](../../types/backing.md#statement-type).
In the event that a parablock candidate proves invalid, this subsystem will receive a message back from the Candidate Backing subsystem indicating so. If that parablock candidate originated from a collator, this subsystem will blacklist that collator. If that parablock candidate originated from a peer, this subsystem generates a report for the [Misbehavior Arbitration subsystem](../utility/misbehavior-arbitration.md).
## Protocol
Input: [`CandidateSelectionMessage`](../../types/overseer-protocol.md#candidate-selection-message)
Output:
- Validation requests to Validation subsystem
- [`CandidateBackingMessage`](../../types/overseer-protocol.md#candidate-backing-message)`::Second`
- Peer set manager: report peers (collators who have misbehaved)
## Functionality
Overarching network protocol + job for every relay-parent
> TODO The Candidate Selection network protocol is currently intentionally unspecified pending further discussion.
Several approaches have been selected, but all have some issues:
- The most straightforward approach is for this subsystem to simply second the first valid parablock candidate which it sees per relay head. However, that protocol is vulnerable to a single collator which, as an attack or simply through chance, gets its block candidate to the node more often than its fair share of the time.
- It may be possible to do some BABE-like selection algorithm to choose an "Official" collator for the round, but that is tricky because the collator which produces the PoV does not necessarily actually produce the block.
- We could use relay-chain BABE randomness to generate some delay `D` on the order of 1 second, +- 1 second. The collator would then second the first valid parablock which arrives after `D`, or in case none has arrived by `2*D`, the last valid parablock which has arrived. This makes it very hard for a collator to game the system to always get its block nominated, but it reduces the maximum throughput of the system by introducing delay into an already tight schedule.
- A variation of that scheme would be to randomly choose a number `I`, and have a fixed acceptance window `D` for parablock candidates. At the end of the period `D`, count `C`: the number of parablock candidates received. Second the one with index `I % C`. Its drawback is the same: it must wait the full `D` period before seconding any of its received candidates, reducing throughput.
## Candidate Selection Job
- Aware of validator key and assignment
- One job for each relay-parent, which selects up to one collation for the Candidate Backing Subsystem
@@ -0,0 +1,121 @@
# PoV Distribution
This subsystem is responsible for distributing PoV blocks. For now, unified with [Statement Distribution subsystem](statement-distribution.md).
## Protocol
`ProtocolId`: `b"povd"`
Input: [`PoVDistributionMessage`](../../types/overseer-protocol.md#pov-distribution-message)
Output:
- NetworkBridge::RegisterEventProducer(`ProtocolId`)
- NetworkBridge::SendMessage(`[PeerId]`, `ProtocolId`, `Bytes`)
- NetworkBridge::ReportPeer(PeerId, cost_or_benefit)
## Functionality
This network protocol is responsible for distributing [`PoV`s](../../types/availability.md#proof-of-validity) by gossip. Since PoVs are heavy in practice, gossip is far from the most efficient way to distribute them. In the future, this should be replaced by a better network protocol that finds validators who have validated the block and connects to them directly. This protocol is descrbied
This protocol is described in terms of "us" and our peers, with the understanding that this is the procedure that any honest node will run. It has the following goals:
- We never have to buffer an unbounded amount of data
- PoVs will flow transitively across a network of honest nodes, stemming from the validators that originally seconded candidates requiring those PoVs.
As we are gossiping, we need to track which PoVs our peers are waiting for to avoid sending them data that they are not expecting. It is not reasonable to expect our peers to buffer unexpected PoVs, just as we will not buffer unexpected PoVs. So notifying our peers about what is being awaited is key. However it is important that the notifications system is also bounded.
For this, in order to avoid reaching into the internals of the [Statement Distribution](statement-distribution.md) Subsystem, we can rely on an expected propery of candidate backing: that each validator can only second one candidate at each chain head. So we can set a cap on the number of PoVs each peer is allowed to notify us that they are waiting for at a given relay-parent. This cap will be the number of validators at that relay-parent. And the view update mechanism of the [Network Bridge](../utility/network-bridge.md) ensures that peers are only allowed to consider a certain set of relay-parents as live. So this bounding mechanism caps the amount of data we need to store per peer at any time at `sum({ n_validators_at_head(head) | head in view_heads })`. Additionally, peers should only be allowed to notify us of PoV hashes they are waiting for in the context of relay-parents in our own local view, which means that `n_validators_at_head` is implied to be `0` for relay-parents not in our own local view.
View updates from peers and our own view updates are received from the network bridge. These will lag somewhat behind the `StartWork` and `StopWork` messages received from the overseer, which will influence the actual data we store. The `OurViewUpdate`s from the [`NetworkBridgeEvent`](../../types/overseer-protocol.md#network-bridge-update) must be considered canonical in terms of our peers' perception of us.
Lastly, the system needs to be bootstrapped with our own perception of which PoVs we are cognizant of but awaiting data for. This is done by receipt of the [`PoVDistributionMessage`](../../types/overseer-protocol.md#pov-distribution-message)::ValidatorStatement variant. We can ignore anything except for `Seconded` statements.
## Formal Description
This protocol can be implemented as a state machine with the following state:
```rust
struct State {
relay_parent_state: Map<Hash, BlockBasedState>,
peer_state: Map<PeerId, PeerState>,
our_view: View,
}
struct BlockBasedState {
known: Map<Hash, PoV>, // should be a shared PoV in practice. these things are heavy.
awaited: Set<Hash>, // awaited PoVs by blake2-256 hash.
fetching: Map<Hash, [ResponseChannel<PoV>]>,
n_validators: usize,
}
struct PeerState {
awaited: Map<Hash, Set<Hash>>,
}
```
We also assume the following network messages, which are sent and received by the [Network Bridge](../utility/network-bridge.md)
```rust
enum NetworkMessage {
/// Notification that we are awaiting the given PoVs (by hash) against a
/// specific relay-parent hash.
Awaiting(Hash, Vec<Hash>),
/// Notification of an awaited PoV, in a given relay-parent context.
/// (relay_parent, pov_hash, pov)
SendPoV(Hash, Hash, PoV),
}
```
Here is the logic of the state machine:
*Overseer Signals*
- On `StartWork(relay_parent)`:
- Get the number of validators at that relay parent by querying the [Runtime API](../utility/runtime-api.md) for the validators and then counting them.
- Create a blank entry in `relay_parent_state` under `relay_parent` with correct `n_validators` set.
- On `StopWork(relay_parent)`:
- Remove the entry for `relay_parent` from `relay_parent_state`.
- On `Concluded`: conclude.
*PoV Distribution Messages*
- On `ValidatorStatement(relay_parent, statement)`
- If this is not `Statement::Seconded`, ignore.
- If there is an entry under `relay_parent` in `relay_parent_state`, add the `pov_hash` of the seconded Candidate's [`CandidateDescriptor`](../../types/candidate.md#candidate-descriptor) to the `awaited` set of the entry.
- If the `pov_hash` was not previously awaited and there are `n_validators` or fewer entries in the `awaited` set, send `NetworkMessage::Awaiting(relay_parent, vec![pov_hash])` to all peers.
- On `FetchPoV(relay_parent, descriptor, response_channel)`
- If there is no entry in `relay_parent_state` under `relay_parent`, ignore.
- If there is a PoV under `descriptor.pov_hash` in the `known` map, send that PoV on the channel and return.
- Otherwise, place the `response_channel` in the `fetching` map under `descriptor.pov_hash`.
- On `DistributePoV(relay_parent, descriptor, PoV)`
- If there is no entry in `relay_parent_state` under `relay_parent`, ignore.
- Complete and remove any channels under `descriptor.pov_hash` in the `fetching` map.
- Send `NetworkMessage::SendPoV(relay_parent, descriptor.pov_hash, PoV)` to all peers who have the `descriptor.pov_hash` in the set under `relay_parent` in the `peer.awaited` map and remove the entry from `peer.awaited`.
- Note the PoV under `descriptor.pov_hash` in `known`.
*Network Bridge Updates*
- On `PeerConnected(peer_id, observed_role)`
- Make a fresh entry in the `peer_state` map for the `peer_id`.
- On `PeerDisconnected(peer_id)
- Remove the entry for `peer_id` from the `peer_state` map.
- On `PeerMessage(peer_id, bytes)`
- If the bytes do not decode to a `NetworkMessage` or the `peer_id` has no entry in the `peer_state` map, report and ignore.
- If this is `NetworkMessage::Awaiting(relay_parent, pov_hashes)`:
- If there is no entry under `peer_state.awaited` for the `relay_parent`, report and ignore.
- If `relay_parent` is not contained within `our_view`, report and ignore.
- Otherwise, if the `awaited` map combined with the `pov_hashes` would have more than `relay_parent_state[relay_parent].n_validators` entries, report and ignore. Note that we are leaning on the property of the network bridge that it sets our view based on `StartWork` messages.
- For each new `pov_hash` in `pov_hashes`, if there is a `pov` under `pov_hash` in the `known` map, send the peer a `NetworkMessage::SendPoV(relay_parent, pov_hash, pov)`.
- Otherwise, add the `pov_hash` to the `awaited` map
- If this is `NetworkMessage::SendPoV(relay_parent, pov_hash, pov)`:
- If there is no entry under `relay_parent` in `relay_parent_state` or no entry under `pov_hash` in our `awaited` map for that `relay_parent`, report and ignore.
- If the blake2-256 hash of the pov doesn't equal `pov_hash`, report and ignore.
- Complete and remove any listeners in the `fetching` map under `pov_hash`.
- Add to `known` map.
- Send `NetworkMessage::SendPoV(relay_parent, descriptor.pov_hash, PoV)` to all peers who have the `descriptor.pov_hash` in the set under `relay_parent` in the `peer.awaited` map and remove the entry from `peer.awaited`.
- On `PeerViewChange(peer_id, view)`
- If Peer is unknown, ignore.
- Ensure there is an entry under `relay_parent` for each `relay_parent` in `view` within the `peer.awaited` map, creating blank `awaited` lists as necessary.
- Remove all entries under `peer.awaited` that are not within `view`.
- On `OurViewChange(view)`
- Update `our_view` to `view`
@@ -0,0 +1,73 @@
# Statement Distribution
The Statement Distribution Subsystem is responsible for distributing statements about seconded candidates between validators.
## Protocol
`ProtocolId`: `b"stmd"`
Input:
- NetworkBridgeUpdate(update)
Output:
- NetworkBridge::RegisterEventProducer(`ProtocolId`)
- NetworkBridge::SendMessage(`[PeerId]`, `ProtocolId`, `Bytes`)
- NetworkBridge::ReportPeer(PeerId, cost_or_benefit)
## Functionality
Implemented as a gossip protocol. Register a network event producer on startup. Handle updates to our view and peers' views. Neighbor packets are used to inform peers which chain heads we are interested in data for.
Statement Distribution is the only backing subsystem which has any notion of peer nodes, who are any full nodes on the network. Validators will also act as peer nodes.
It is responsible for distributing signed statements that we have generated and forwarding them, and for detecting a variety of Validator misbehaviors for reporting to [Misbehavior Arbitration](../utility/misbehavior-arbitration.md). During the Backing stage of the inclusion pipeline, it's the main point of contact with peer nodes. On receiving a signed statement from a peer, assuming the peer receipt state machine is in an appropriate state, it sends the Candidate Receipt to the [Candidate Backing subsystem](candidate-backing.md) to handle the validator's statement.
Track equivocating validators and stop accepting information from them. Establish a data-dependency order:
- In order to receive a `Seconded` message we have the on corresponding chain head in our view
- In order to receive an `Invalid` or `Valid` message we must have received the corresponding `Seconded` message.
And respect this data-dependency order from our peers by respecting their views. This subsystem is responsible for checking message signatures.
The Statement Distribution subsystem sends statements to peer nodes.
## Peer Receipt State Machine
There is a very simple state machine which governs which messages we are willing to receive from peers. Not depicted in the state machine: on initial receipt of any [`SignedFullStatement`](../../types/backing.md#signed-statement-type), validate that the provided signature does in fact sign the included data. Note that each individual parablock candidate gets its own instance of this state machine; it is perfectly legal to receive a `Valid(X)` before a `Seconded(Y)`, as long as a `Seconded(X)` has been received.
A: Initial State. Receive `SignedFullStatement(Statement::Second)`: extract `Statement`, forward to Candidate Backing and PoV Distribution, proceed to B. Receive any other `SignedFullStatement` variant: drop it.
B: Receive any `SignedFullStatement`: check signature, forward to Candidate Backing. Receive `OverseerMessage::StopWork`: proceed to C.
C: Receive any message for this block: drop it.
## Peer Knowledge Tracking
The peer receipt state machine implies that for parsimony of network resources, we should model the knowledge of our peers, and help them out. For example, let's consider a case with peers A, B, and C, validators X and Y, and candidate M. A sends us a `Statement::Second(M)` signed by X. We've double-checked it, and it's valid. While we're checking it, we receive a copy of X's `Statement::Second(M)` from `B`, along with a `Statement::Valid(M)` signed by Y.
Our response to A is just the `Statement::Valid(M)` signed by Y. However, we haven't heard anything about this from C. Therefore, we send it everything we have: first a copy of X's `Statement::Second`, then Y's `Statement::Valid`.
This system implies a certain level of duplication of messages--we received X's `Statement::Second` from both our peers, and C may experience the same--but it minimizes the degree to which messages are simply dropped.
And respect this data-dependency order from our peers. This subsystem is responsible for checking message signatures.
No jobs. We follow view changes from the [`NetworkBridge`](../utility/network-bridge.md), which in turn is updated by the overseer.
## Equivocations and Flood Protection
An equivocation is a double-vote by a validator. The [Candidate Backing](candidate-backing.md) Subsystem is better-suited than this one to detect equivocations as it adds votes to quorum trackers.
At this level, we are primarily concerned about flood-protection, and to some extent, detecting equivocations is a part of that. In particular, we are interested in detecting equivocations of `Seconded` statements. Since every other statement is dependent on `Seconded` statements, ensuring that we only ever hold a bounded number of `Seconded` statements is sufficient for flood-protection.
The simple approach is to say that we only receive up to two `Seconded` statements per validator per chain head. However, the marginal cost of equivocation, conditional on having already equivocated, is close to 0, since a single double-vote offence is counted as all double-vote offences for a particular chain-head. Even if it were not, there is some amount of equivocations that can be done such that the marginal cost of issuing further equivocations is close to 0, as there would be an amount of equivocations necessary to be completely and totally obliterated by the slashing algorithm. We fear the validator with nothing left to lose.
With that in mind, this simple approach has a caveat worth digging deeper into.
First: We may be aware of two equivocated `Seconded` statements issued by a validator. A totally honest peer of ours can also be aware of one or two different `Seconded` statements issued by the same validator. And yet another peer may be aware of one or two _more_ `Seconded` statements. And so on. This interacts badly with pre-emptive sending logic. Upon sending a `Seconded` statement to a peer, we will want to pre-emptively follow up with all statements relative to that candidate. Waiting for acknowledgement introduces latency at every hop, so that is best avoided. What can happen is that upon receipt of the `Seconded` statement, the peer will discard it as it falls beyond the bound of 2 that it is allowed to store. It cannot store anything in memory about discarded candidates as that would introduce a DoS vector. Then, the peer would receive from us all of the statements pertaining to that candidate, which, from its perspective, would be undesired - they are data-dependent on the `Seconded` statement we sent them, but they have erased all record of that from their memory. Upon receiving a potential flood of undesired statements, this 100% honest peer may choose to disconnect from us. In this way, an adversary may be able to partition the network with careful distribution of equivocated `Seconded` statements.
The fix is to track, per-peer, the hashes of up to 4 candidates per validator (per relay-parent) that the peer is aware of. It is 4 because we may send them 2 and they may send us 2 different ones. We track the data that they are aware of as the union of things we have sent them and things they have sent us. If we receive a 1st or 2nd `Seconded` statement from a peer, we note it in the peer's known candidates even if we do disregard the data locally. And then, upon receipt of any data dependent on that statement, we do not reduce that peer's standing in our eyes, as the data was not undesired.
There is another caveat to the fix: we don't want to allow the peer to flood us because it has set things up in a way that it knows we will drop all of its traffic.
We also track how many statements we have received per peer, per candidate, and per chain-head. This is any statement concerning a particular candidate: `Seconded`, `Valid`, or `Invalid`. If we ever receive a statement from a peer which would push any of these counters beyond twice the amount of validators at the chain-head, we begin to lower the peer's standing and eventually disconnect. This bound is a massive overestimate and could be reduced to twice the number of validators in the corresponding validator group. It is worth noting that the goal at the time of writing is to ensure any finite bound on the amount of stored data, as any equivocation results in a large slash.