implement provisioner (#1473)

* sketch out provisioner basics

* handle provisionable data

* stub out select_inherent_data

* split runtime APIs into sub-chapters to improve linkability

* explain SignedAvailabilityBitfield semantics

* add internal link to further documentation

* some more work figuring out how the provisioner can do its thing

* fix broken link

* don't import enum variants where it's one layer deep

* make request_availability_cores a free fn in util

* document more precisely what should happen on block production

* finish first-draft implementation of provisioner

* start working on the full and proper backed candidate selection rule

* Pass number of block under construction via RequestInherentData

* Revert "Pass number of block under construction via RequestInherentData"

This reverts commit 850fe62cc0dfb04252580c21a985962000e693c8.

That initially looked like the better approach--it spent the time
budget for fetching the block number in the proposer, instead of
the provisioner, and that felt more appropriate--but it turns out
not to be obvious how to get the block number of the block under
construction from within the proposer. The Chain API may be less
ideal, but it should be easier to implement.

* wip: get the block under production from the Chain API

* add ChainApiMessage to AllMessages

* don't break the run loop if a provisionable data channel closes

* clone only those backed candidates which are coherent

* propagate chain_api subsystem through various locations

* add delegated_subsystem! macro to ease delegating subsystems

Unfortunately, it doesn't work right:

```
error[E0446]: private type `CandidateBackingJob` in public interface
   --> node/core/backing/src/lib.rs:775:1
    |
86  | struct CandidateBackingJob {
    | - `CandidateBackingJob` declared as private
...
775 | delegated_subsystem!(CandidateBackingJob as CandidateBackingSubsystem);
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't leak private type
```

I'm not sure precisely what's going wrong, here; I suspect the problem is
the use of `$job as JobTrait>::RunArgs` and `::ToJob`; the failure would be
that it's not reifying the types to verify that the actual types are public,
but instead referring to them via `CandidateBackingJob`, which is in fact private;
that privacy is the point.

Going to see if I can generic my way out of this, but we may be headed for a
quick revert here.

* fix delegated_subsystem

The invocation is a bit more verbose than I'd prefer, but it's also
more explicit about what types need to be public. I'll take it as a win.

* add provisioning subsystem; reduce public interface of provisioner

* deny missing docs in provisioner

* refactor core selection per code review suggestion

This is twice as much code when measured by line, but IMO it is
in fact somewhat clearer to read, so overall a win.

Also adds an improved rule for selecting availability bitfields,
which (unlike the previous implementation) guarantees that the
appropriate postconditions hold there.

* fix bad merge double-declaration

* update guide with (hopefully) complete provisioner candidate selection procedure

* clarify candidate selection algorithm

* Revert "clarify candidate selection algorithm"

This reverts commit c68a02ac9cf42b3a4a28eb197d38633a40d0e3e6.

* clarify candidate selection algorithm

* update provisioner to implement candidate selection per the guide

* add test that no more than one bitfield is selected per validator

* add test that each selected bitfield corresponds to an occupied core

* add test that more set bits win conflicts

* add macro for specializing runtime requests; specailize all runtime requests

* add tests harness for select_candidates tests

* add first real select_candidates test, fix test_harness

* add mock overseer and test that success is possible

* add test that the candidate selection algorithm picks the right ones

* make candidate selection test somewhat more stringent
This commit is contained in:
Peter Goodspeed-Niklaus
2020-08-06 18:12:15 +02:00
committed by GitHub
parent 877a5059aa
commit 21cec309a4
23 changed files with 1434 additions and 380 deletions
@@ -40,7 +40,54 @@ Note that block authors must re-send a `ProvisionerMessage::RequestBlockAuthorsh
## Block Production
When a validator is selected by BABE to author a block, it becomes a block producer. The provisioner is the subsystem best suited to choosing which specific backed candidates and availability bitfields should be assembled into the block. To engage this functionality, a `ProvisionerMessage::RequestInherentData` is sent; the response is a set of non-conflicting candidates and the appropriate bitfields. Non-conflicting means that there are never two distinct parachain candidates included for the same parachain and that new parachain candidates cannot be included until the previous one either gets declared available or expired.
When a validator is selected by BABE to author a block, it becomes a block producer. The provisioner is the subsystem best suited to choosing which specific backed candidates and availability bitfields should be assembled into the block. To engage this functionality, a `ProvisionerMessage::RequestInherentData` is sent; the response is a set of non-conflicting candidates and the appropriate bitfields. Non-conflicting means that there are never two distinct parachain candidates included for the same parachain and that new parachain candidates cannot be backed until the previous one either gets declared available or expired.
### Bitfield Selection
Our goal with respect to bitfields is simple: maximize availability. However, it's not quite as simple as always including all bitfields; there are constraints which still need to be met:
- We cannot choose more than one bitfield per validator.
- Each bitfield must correspond to an occupied core.
Beyond that, a semi-arbitrary selection policy is fine. In order to meet the goal of maximizing availability, a heuristic of picking the bitfield with the greatest number of 1 bits set in the event of conflict is useful.
### Candidate Selection
The goal of candidate selection is to determine which cores are free, and then to the degree possible, pick a candidate appropriate to each free core.
To determine availability:
- Get the list of core states from the runtime API
- For each core state:
- On `CoreState::Scheduled`, then we can make an `OccupiedCoreAssumption::Free`.
- On `CoreState::Occupied`, then we may be able to make an assumption:
- If the bitfields indicate availability and there is a scheduled `next_up_on_available`, then we can make an `OccupiedCoreAssumption::Included`.
- If the bitfields do not indicate availability, and there is a scheduled `next_up_on_time_out`, and `occupied_core.time_out_at == block_number_under_production`, then we can make an `OccupiedCoreAssumption::TimedOut`.
- If we did not make an `OccupiedCoreAssumption`, then continue on to the next core.
- Now compute the core's `validation_data_hash`: get the `LocalValidationData` from the runtime, given the known `ParaId` and `OccupiedCoreAssumption`; this can be combined with a cached `GlobalValidationData` to compute the hash.
- Find an appropriate candidate for the core.
- There are two constraints: `backed_candidate.candidate.descriptor.para_id == scheduled_core.para_id && candidate.candidate.descriptor.validation_data_hash == computed_validation_data_hash`.
- In the event that more than one candidate meets the constraints, selection between the candidates is arbitrary. However, not more than one candidate can be selected per core.
The end result of this process is a vector of `BackedCandidate`s, sorted in order of their core index.
### Determining Bitfield Availability
An occupied core has a `CoreAvailability` bitfield. We also have a list of `SignedAvailabilityBitfield`s. We need to determine from these whether or not a core at a particular index has become available.
The key insight required is that `CoreAvailability` is transverse to the `SignedAvailabilityBitfield`s: if we conceptualize the list of bitfields as many rows, each bit of which is its own column, then `CoreAvailability` for a given core index is the vertical slice of bits in the set at that index.
To compute bitfield availability, then:
- Start with a copy of `OccupiedCore.availability`
- For each bitfield in the list of `SignedAvailabilityBitfield`s:
- Get the bitfield's `validator_index`
- Update the availability. Conceptually, assuming bit vectors: `availability[validator_index] |= bitfield[core_idx]`
- Availability has a 2/3 threshold. Therefore: `3 * availability.count_ones() >= 2 * availability.len()`
### Notes
See also: [Scheduler Module: Availability Cores](../../runtime/scheduler.md#availability-cores).
One might ask: given `ProvisionerMessage::RequestInherentData`, what's the point of `ProvisionerMessage::RequestBlockAuthorshipData`? The answer is that the block authorship data includes more information than is present in the inherent data; disputes, for example.
@@ -51,10 +98,10 @@ The subsystem should maintain a set of handles to Block Authorship Provisioning
### On Overseer Signal
- `ActiveLeavesUpdate`:
- For each `activated` head:
- spawn a Block Authorship Provisioning Job with the given relay parent, storing a bidirectional channel with that job.
- For each `deactivated` head:
- terminate the Block Authorship Provisioning Job for the given relay parent, if any.
- For each `activated` head:
- spawn a Block Authorship Provisioning Job with the given relay parent, storing a bidirectional channel with that job.
- For each `deactivated` head:
- terminate the Block Authorship Provisioning Job for the given relay parent, if any.
- `Conclude`: Forward `Conclude` to all jobs, waiting a small amount of time for them to join, and then hard-exiting.
### On `ProvisionerMessage`