mirror of
https://github.com/pezkuwichain/pezkuwi-subxt.git
synced 2026-05-31 07:31:02 +00:00
implement candidate selection subsystem (#1645)
* choose the straightforward candidate selection algorithm for now * add draft implementation of candidate selection * fix typo in summary * more properly report misbehaving collators * describe how CandidateSelection subsystem becomes aware of candidates * revise candidate selection / collator protocol interaction pattern * implement rest of candidate selection per the guide * review: resolve nits * start writing test suite, harness * implement first test * add second test * implement third test Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
This commit is contained in:
committed by
GitHub
parent
2f800d1489
commit
d1b1c17285
@@ -24,16 +24,18 @@ Output:
|
||||
|
||||
Overarching network protocol + job for every relay-parent
|
||||
|
||||
> TODO The Candidate Selection network protocol is currently intentionally unspecified pending further discussion.
|
||||
|
||||
Several approaches have been selected, but all have some issues:
|
||||
|
||||
- The most straightforward approach is for this subsystem to simply second the first valid parablock candidate which it sees per relay head. However, that protocol is vulnerable to a single collator which, as an attack or simply through chance, gets its block candidate to the node more often than its fair share of the time.
|
||||
- It may be possible to do some BABE-like selection algorithm to choose an "Official" collator for the round, but that is tricky because the collator which produces the PoV does not necessarily actually produce the block.
|
||||
- We could use relay-chain BABE randomness to generate some delay `D` on the order of 1 second, +- 1 second. The collator would then second the first valid parablock which arrives after `D`, or in case none has arrived by `2*D`, the last valid parablock which has arrived. This makes it very hard for a collator to game the system to always get its block nominated, but it reduces the maximum throughput of the system by introducing delay into an already tight schedule.
|
||||
- A variation of that scheme would be to randomly choose a number `I`, and have a fixed acceptance window `D` for parablock candidates. At the end of the period `D`, count `C`: the number of parablock candidates received. Second the one with index `I % C`. Its drawback is the same: it must wait the full `D` period before seconding any of its received candidates, reducing throughput.
|
||||
For the moment, the candidate selection algorithm is simply to second the first valid parablock candidate per relay head. See [Future Work](#future-work).
|
||||
|
||||
## Candidate Selection Job
|
||||
|
||||
- Aware of validator key and assignment
|
||||
- One job for each relay-parent, which selects up to one collation for the Candidate Backing Subsystem
|
||||
|
||||
## Future Work
|
||||
|
||||
Several approaches have been discussed, but all have some issues:
|
||||
|
||||
- The current approach is very straightforward. However, that protocol is vulnerable to a single collator which, as an attack or simply through chance, gets its block candidate to the node more often than its fair share of the time.
|
||||
- It may be possible to do some BABE-like selection algorithm to choose an "Official" collator for the round, but that is tricky because the collator which produces the PoV does not necessarily actually produce the block.
|
||||
- We could use relay-chain BABE randomness to generate some delay `D` on the order of 1 second, +- 1 second. The collator would then second the first valid parablock which arrives after `D`, or in case none has arrived by `2*D`, the last valid parablock which has arrived. This makes it very hard for a collator to game the system to always get its block nominated, but it reduces the maximum throughput of the system by introducing delay into an already tight schedule.
|
||||
- A variation of that scheme would be to randomly choose a number `I`, and have a fixed acceptance window `D` for parablock candidates. At the end of the period `D`, count `C`: the number of parablock candidates received. Second the one with index `I % C`. Its drawback is the same: it must wait the full `D` period before seconding any of its received candidates, reducing throughput.
|
||||
|
||||
@@ -10,15 +10,17 @@ Validation of candidates is a heavy task, and furthermore, the [`PoV`][PoV] itse
|
||||
|
||||
> TODO: note the incremental validation function Ximin proposes at https://github.com/paritytech/polkadot/issues/1348
|
||||
|
||||
As this network protocol serves as a bridge between collators and validators, it communicates primarily with one subsystem on behalf of each. As a collator, this will receive messages from the [`CollationGeneration`][CG] subsystem. As a validator, this will communicate with the [`CandidateBacking`][CB] subsystem.
|
||||
As this network protocol serves as a bridge between collators and validators, it communicates primarily with one subsystem on behalf of each. As a collator, this will receive messages from the [`CollationGeneration`][CG] subsystem. As a validator, this will communicate with the [`CandidateBacking`][CB] and [`CandidateSelection`][CS] subsystems.
|
||||
|
||||
## Protocol
|
||||
|
||||
Input: [`CollatorProtocolMessage`][CPM]
|
||||
|
||||
Output:
|
||||
- [`RuntimeApiMessage`][RAM]
|
||||
- [`NetworkBridgeMessage`][NBM]
|
||||
|
||||
- [`RuntimeApiMessage`][RAM]
|
||||
- [`NetworkBridgeMessage`][NBM]
|
||||
- [`CandidateSelectionMessage`][CSM]
|
||||
|
||||
## Functionality
|
||||
|
||||
@@ -102,18 +104,30 @@ When peers connect to us, they can `Declare` that they represent a collator with
|
||||
|
||||
The protocol tracks advertisements received and the source of the advertisement. The advertisement source is the `PeerId` of the peer who sent the message. We accept one advertisement per collator per source per relay-parent.
|
||||
|
||||
|
||||
As a validator, we will handle requests from other subsystems to fetch a collation on a specific `ParaId` and relay-parent. These requests are made with the [`CollatorProtocolMessage`][CPM]`::FetchCollation`. To do so, we need to first check if we have already gathered a collation on that `ParaId` and relay-parent. If not, we need to select one of the advertisements and issue a request for it. If we've already issued a request, we shouldn't issue another one until the first has returned.
|
||||
|
||||
When acting on an advertisement, we issue a `WireMessage::RequestCollation`. If the request times out, we need to note the collator as being unreliable and reduce its priority relative to other collators. And then make another request - repeat until we get a response or the chain has moved on.
|
||||
|
||||
As a validator, once the collation has been fetched some other subsystem will inspect and do deeper validation of the collation. The subsystem will report to this subsystem with a [`CollatorProtocolMessage`][CPM]`::ReportCollator` or `NoteGoodCollation` message. In that case, if we are connected directly to the collator, we apply a cost to the `PeerId` associated with the collator and potentially disconnect or blacklist it.
|
||||
|
||||
[PoV]: ../../types/availability.md#proofofvalidity
|
||||
[CPM]: ../../types/overseer-protocol.md#collatorprotocolmessage
|
||||
[CG]: collation-generation.md
|
||||
### Interaction with [Candidate Selection][CS]
|
||||
|
||||
As collators advertise the availability, we notify the Candidate Selection subsystem with a [`CandidateSelection`][CSM]`::Collation` message. Note that this message is lightweight: it only contains the relay parent, para id, and collator id.
|
||||
|
||||
At that point, the Candidate Selection algorithm is free to use an arbitrary algorithm to determine which if any of these messages to follow up on. It is expected to use the [`CollatorProtocolMessage`][CPM]`::FetchCollation` message to follow up.
|
||||
|
||||
The intent behind this design is to minimize the total number of (large) collations which must be transmitted.
|
||||
|
||||
|
||||
[CB]: ../backing/candidate-backing.md
|
||||
[CBM]: ../../types/overseer-protocol.md#candidate-backing-mesage
|
||||
[CG]: collation-generation.md
|
||||
[CPM]: ../../types/overseer-protocol.md#collator-protocol-message
|
||||
[CS]: ../backing/candidate-selection.md
|
||||
[CSM]: ../../types/overseer-protocol.md#candidate-selection-message
|
||||
[NB]: ../utility/network-bridge.md
|
||||
[CBM]: ../../types/overseer-protocol.md#candidatebackingmesage
|
||||
[RAM]: ../../types/overseer-protocol.md#runtimeapimessage
|
||||
[NBM]: ../../types/overseer-protocol.md#networkbridgemessage
|
||||
[NBM]: ../../types/overseer-protocol.md#network-bridge-message
|
||||
[PoV]: ../../types/availability.md#proofofvalidity
|
||||
[RAM]: ../../types/overseer-protocol.md#runtime-api-message
|
||||
[SCH]: ../../runtime/scheduler.md
|
||||
|
||||
@@ -124,6 +124,8 @@ These messages are sent to the [Candidate Selection subsystem](../node/backing/c
|
||||
|
||||
```rust
|
||||
enum CandidateSelectionMessage {
|
||||
/// A candidate collation can be fetched from a collator and should be considered for seconding.
|
||||
Collation(RelayParent, ParaId, CollatorId),
|
||||
/// We recommended a particular candidate to be seconded, but it was invalid; penalize the collator.
|
||||
Invalid(CandidateReceipt),
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user