feat: initialize Kurdistan SDK - independent fork of Polkadot SDK
This commit is contained in:
@@ -0,0 +1,196 @@
|
||||
# Collator Protocol
|
||||
|
||||
> NOTE: This module has suffered changes for the elastic scaling implementation. As a result, parts of this document may
|
||||
be out of date and will be updated at a later time. Issue tracking the update:
|
||||
https://github.com/pezkuwichain/pezkuwi-sdk/issues/132
|
||||
|
||||
The Collator Protocol implements the network protocol by which collators and validators communicate. It is used by
|
||||
collators to distribute collations to validators and used by validators to accept collations by collators.
|
||||
|
||||
Collator-to-Validator networking is more difficult than Validator-to-Validator networking because the set of possible
|
||||
collators for any given para is unbounded, unlike the validator set. Validator-to-Validator networking protocols can
|
||||
easily be implemented as gossip because the data can be bounded, and validators can authenticate each other by their
|
||||
`PeerId`s for the purposes of instantiating and accepting connections.
|
||||
|
||||
Since, at least at the level of the para abstraction, the collator-set for any given para is unbounded, validators need
|
||||
to make sure that they are receiving connections from capable and honest collators and that their bandwidth and time are
|
||||
not being wasted by attackers. Communicating across this trust-boundary is the most difficult part of this subsystem.
|
||||
|
||||
Validation of candidates is a heavy task, and furthermore, the [`PoV`][PoV] itself is a large piece of data.
|
||||
Empirically, `PoV`s are on the order of 10MB.
|
||||
|
||||
> TODO: note the incremental validation function Ximin proposes at https://github.com/paritytech/polkadot/issues/1348
|
||||
|
||||
As this network protocol serves as a bridge between collators and validators, it communicates primarily with one
|
||||
subsystem on behalf of each. As a collator, this will receive messages from the [`CollationGeneration`][CG] subsystem.
|
||||
As a validator, this will communicate only with the [`CandidateBacking`][CB].
|
||||
|
||||
## Protocol
|
||||
|
||||
Input: [`CollatorProtocolMessage`][CPM]
|
||||
|
||||
Output:
|
||||
|
||||
* [`RuntimeApiMessage`][RAM]
|
||||
* [`NetworkBridgeMessage`][NBM]
|
||||
* [`CandidateBackingMessage`][CBM]
|
||||
|
||||
## Functionality
|
||||
|
||||
This network protocol uses the `Collation` peer-set of the [`NetworkBridge`][NB].
|
||||
|
||||
It uses the [`CollatorProtocolV1Message`](../../types/network.md#collator-protocol) as its `WireMessage`
|
||||
|
||||
Since this protocol functions both for validators and collators, it is easiest to go through the protocol actions for
|
||||
each of them separately.
|
||||
|
||||
Validators and collators.
|
||||
```dot process
|
||||
digraph {
|
||||
c1 [shape=MSquare, label="Collator 1"];
|
||||
c2 [shape=MSquare, label="Collator 2"];
|
||||
|
||||
v1 [shape=MSquare, label="Validator 1"];
|
||||
v2 [shape=MSquare, label="Validator 2"];
|
||||
|
||||
c1 -> v1;
|
||||
c1 -> v2;
|
||||
c2 -> v2;
|
||||
}
|
||||
```
|
||||
|
||||
### Collators
|
||||
|
||||
It is assumed that collators are only collating on a single teyrchain. Collations are generated by the [Collation
|
||||
Generation][CG] subsystem. We will keep up to one local collation per relay-parent, based on `DistributeCollation`
|
||||
messages. If the para is not scheduled on any core, at the relay-parent, or the relay-parent isn't in the active-leaves
|
||||
set, we ignore the message as it must be invalid in that case - although this indicates a logic error elsewhere in the
|
||||
node.
|
||||
|
||||
We keep track of the Para ID we are collating on as a collator. This starts as `None`, and is updated with each
|
||||
`CollateOn` message received. If the `ParaId` of a collation requested to be distributed does not match the one we
|
||||
expect, we ignore the message.
|
||||
|
||||
As with most other subsystems, we track the active leaves set by following `ActiveLeavesUpdate` signals.
|
||||
|
||||
For the purposes of actually distributing a collation, we need to be connected to the validators who are interested in
|
||||
collations on that `ParaId` at this point in time. We assume that there is a discovery API for connecting to a set of
|
||||
validators.
|
||||
|
||||
As seen in the [Scheduler Module][SCH] of the runtime, validator groups are fixed for an entire session and their
|
||||
rotations across cores are predictable. Collators will want to do these things when attempting to distribute collations
|
||||
at a given relay-parent:
|
||||
* Determine which core the para collated-on is assigned to.
|
||||
* Determine the group on that core.
|
||||
* Issue a discovery request for the validators of the current group
|
||||
with[`NetworkBridgeMessage`][NBM]`::ConnectToValidators`.
|
||||
|
||||
Once connected to the relevant peers for the current group assigned to the core (transitively, the para), advertise the
|
||||
collation to any of them which advertise the relay-parent in their view (as provided by the [Network Bridge][NB]). If
|
||||
any respond with a request for the full collation, provide it. However, we only send one collation at a time per relay
|
||||
parent, other requests need to wait. This is done to reduce the bandwidth requirements of a collator and also increases
|
||||
the chance to fully send the collation to at least one validator. From the point where one validator has received the
|
||||
collation and seconded it, it will also start to share this collation with other validators in its backing group. Upon
|
||||
receiving a view update from any of these peers which includes a relay-parent for which we have a collation that they
|
||||
will find relevant, advertise the collation to them if we haven't already.
|
||||
|
||||
### Validators
|
||||
|
||||
On the validator side of the protocol, validators need to accept incoming connections from collators. They should keep
|
||||
some peer slots open for accepting new speculative connections from collators and should disconnect from collators who
|
||||
are not relevant.
|
||||
|
||||
```dot process
|
||||
digraph G {
|
||||
label = "Declaring, advertising, and providing collations";
|
||||
labelloc = "t";
|
||||
rankdir = LR;
|
||||
|
||||
subgraph cluster_collator {
|
||||
rank = min;
|
||||
label = "Collator";
|
||||
graph[style = border, rank = min];
|
||||
|
||||
c1, c2 [label = ""];
|
||||
}
|
||||
|
||||
subgraph cluster_validator {
|
||||
rank = same;
|
||||
label = "Validator";
|
||||
graph[style = border];
|
||||
|
||||
v1, v2 [label = ""];
|
||||
}
|
||||
|
||||
c1 -> v1 [label = "Declare and advertise"];
|
||||
|
||||
v1 -> c2 [label = "Request"];
|
||||
|
||||
c2 -> v2 [label = "Provide"];
|
||||
|
||||
v2 -> v2 [label = "Note Good/Bad"];
|
||||
}
|
||||
```
|
||||
|
||||
When peers connect to us, they can `Declare` that they represent a collator with given public key and intend to collate
|
||||
on a specific para ID. Once they've declared that, and we checked their signature, they can begin to send advertisements
|
||||
of collations. The peers should not send us any advertisements for collations that are on a relay-parent outside of our
|
||||
view or for a para outside of the one they've declared.
|
||||
|
||||
The protocol tracks advertisements received and the source of the advertisement. The advertisement source is the
|
||||
`PeerId` of the peer who sent the message. We accept one advertisement per collator per source per relay-parent.
|
||||
|
||||
As a validator, we will handle requests from other subsystems to fetch a collation on a specific `ParaId` and
|
||||
relay-parent. These requests are made with the request response protocol `CollationFetchingRequest` request. To do so,
|
||||
we need to first check if we have already gathered a collation on that `ParaId` and relay-parent. If not, we need to
|
||||
select one of the advertisements and issue a request for it. If we've already issued a request, we shouldn't issue
|
||||
another one until the first has returned.
|
||||
|
||||
When acting on an advertisement, we issue a `Requests::CollationFetchingV1`. However, we only request one collation at a
|
||||
time per relay parent. This reduces the bandwidth requirements and as we can second only one candidate per relay parent,
|
||||
the others are probably not required anyway. If the request times out, we need to note the collator as being unreliable
|
||||
and reduce its priority relative to other collators.
|
||||
|
||||
### Interaction with [Candidate Backing][CB]
|
||||
|
||||
As collators advertise the availability, a validator will simply second the first valid parablock candidate per relay
|
||||
head by sending a [`CandidateBackingMessage`][CBM]`::Second`. Note that this message contains the relay parent of the
|
||||
advertised collation, the candidate receipt and the [PoV][PoV].
|
||||
|
||||
Subsequently, once a valid parablock candidate has been seconded, the [`CandidateBacking`][CB] subsystem will send a
|
||||
[`CollatorProtocolMessage`][CPM]`::Seconded`, which will trigger this subsystem to notify the collator at the `PeerId`
|
||||
that first advertised the parablock on the seconded relay head of their successful seconding.
|
||||
|
||||
|
||||
## Future Work
|
||||
|
||||
Several approaches have been discussed, but all have some issues:
|
||||
|
||||
* The current approach is very straightforward. However, that protocol is vulnerable to a single collator which, as an
|
||||
attack or simply through chance, gets its block candidate to the node more often than its fair share of the time.
|
||||
* If collators produce blocks via Aura, BABE or in future Sassafras, it may be possible to choose an "Official" collator
|
||||
for the round, but it may be tricky to ensure that the PVF logic is enforced at collator leader election.
|
||||
* We could use relay-chain BABE randomness to generate some delay `D` on the order of 1 second, +* 1 second. The
|
||||
collator would then second the first valid parablock which arrives after `D`, or in case none has arrived by `2*D`,
|
||||
the last valid parablock which has arrived. This makes it very hard for a collator to game the system to always get
|
||||
its block nominated, but it reduces the maximum throughput of the system by introducing delay into an already tight
|
||||
schedule.
|
||||
* A variation of that scheme would be to have a fixed acceptance window `D` for parablock candidates and keep track of
|
||||
count `C`: the number of parablock candidates received. At the end of the period `D`, we choose a random number I in
|
||||
the range `[0, C)` and second the block at Index I. Its drawback is the same: it must wait the full `D` period before
|
||||
seconding any of its received candidates, reducing throughput.
|
||||
* In order to protect against DoS attacks, it may be prudent to run throw out collations from collators that have
|
||||
behaved poorly (whether recently or historically) and subsequently only verify the PoV for the most suitable of
|
||||
collations.
|
||||
|
||||
[CB]: ../backing/candidate-backing.md
|
||||
[CBM]: ../../types/overseer-protocol.md#candidate-backing-mesage
|
||||
[CG]: collation-generation.md
|
||||
[CPM]: ../../types/overseer-protocol.md#collator-protocol-message
|
||||
[CS]: ../backing/candidate-selection.md
|
||||
[CSM]: ../../types/overseer-protocol.md#candidate-selection-message
|
||||
[NB]: ../utility/network-bridge.md
|
||||
[NBM]: ../../types/overseer-protocol.md#network-bridge-message
|
||||
[PoV]: ../../types/availability.md#proofofvalidity
|
||||
[RAM]: ../../types/overseer-protocol.md#runtime-api-message
|
||||
[SCH]: ../../runtime/scheduler.md
|
||||
Reference in New Issue
Block a user