Availability store subsystem guide (#1424)

* Improve AVStore and Scheduler docs

* Update roadmap/implementers-guide/src/node/utility/availability-store.md

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* Bug in linking to README.md

* Update against new runtime apis

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
This commit is contained in:
Fedor Sakharov
2020-07-27 18:12:31 +03:00
committed by GitHub
parent 32a20a178c
commit 824c6245e2
3 changed files with 205 additions and 41 deletions
@@ -23,9 +23,54 @@ There is also the case where a validator commits to make a PoV available, but th
There may be multiple competing blocks all ending the availability phase for a particular candidate. Until (and slightly beyond) finality, it will be unclear which of those is actually the canonical chain, so the pruning records for PoVs and Availability chunks should keep track of all such blocks.
## Lifetime of the PoV in the storage
```dot process
digraph {
label = "Block life FSM\n\n\n";
labelloc = "t";
rankdir="LR";
st [label = "Stored"; shape = circle]
inc [label = "Included"; shape = circle]
fin [label = "Finalized"; shape = circle]
prn [label = "Pruned"; shape = circle]
st -> inc [label = "Block\nincluded"]
st -> prn [label = "Stored block\ntimed out"]
inc -> fin [label = "Block\nfinalized"]
fin -> prn [label = "Block keep time\n(1 day) elapsed"]
}
```
## Lifetime of the chunk in the storage
```dot process
digraph {
label = "Chunk life FSM\n\n\n";
labelloc = "t";
rankdir="LR";
chst [label = "Chunk\nStored"; shape = circle]
st [label = "Block\nStored"; shape = circle]
inc [label = "Included"; shape = circle]
fin [label = "Finalized"; shape = circle]
prn [label = "Pruned"; shape = circle]
chst -> inc [label = "Block\nincluded"]
st -> inc [label = "Block\nincluded"]
st -> prn [label = "Stored block\ntimed out"]
inc -> fin [label = "Block\nfinalized"]
fin -> prn [label = "Block keep time\n(1 day + 1 hour) elapsed"]
}
```
## Protocol
Input: [`AvailabilityStoreMessage`](../../types/overseer-protocol.md#availability-store-message)
Input: [`AvailabilityStoreMessage`][ASM]
Output:
- [`RuntimeApiMessage`][RAM]
## Functionality
@@ -35,9 +80,8 @@ For each head in the `activated` list:
- Note any new candidates backed in the block. Update pruning records for any stored `PoVBlock`s.
- Note any newly-included candidates backed in the block. Update pruning records for any stored availability chunks.
On block finality events:
On `OverseerSignal::BlockFinalized(_)` events:
- > TODO: figure out how we get block finality events from overseer
- Handle all pruning based on the newly-finalized block.
On `QueryPoV` message:
@@ -51,3 +95,107 @@ On `QueryChunk` message:
On `StoreChunk` message:
- Store the chunk along with its inclusion proof under the candidate hash and validator index.
On `StorePoV` message:
- Store the block, if the validator index is provided, store the respective chunk as well.
On finality event:
- For the finalized block and any earlier block (if any) update pruning records of `PoV`s and chunks to keep them for respective periods after finality.
### Note any backed, included and timedout candidates in the block by `hash`.
- Create a `(sender, receiver)` pair.
- Dispatch a [`RuntimeApiMessage`][RAM]`::Request(hash, RuntimeApiRequest::CandidateEvents(sender)` and listen on the receiver for a response.
- For every event in the response:`CandidateEvent::CandidateIncluded`.
* For every `CandidateEvent::CandidateBacked` do nothing
* For every `CandidateEvent::CandidateIncluded` update pruning records of any blocks that the node stored previously.
* For every `CandidateEvent::CandidateTimedOut` use pruning records to prune the data; delete the info from records.
## Schema
### PoV pruning
We keep a record about every PoV we keep, tracking its state and the time after which this PoV should be pruned.
As the state of the `Candidate` changes, so does the `Prune At` time according to the rules defined earlier.
| Record 1 | .. | Record N |
|----------------|----|----------------|
| CandidateHash1 | .. | CandidateHashN |
| Prune At | .. | Prune At |
| CandidateState | .. | CandidateState |
### Chunk pruning
Chunk pruning is organized in a similar schema as PoV pruning.
| Record 1 | .. | Record N |
|----------------|----|----------------|
| CandidateHash1 | .. | CandidateHashN |
| Prune At | .. | Prune At |
| CandidateState | .. | CandidateState |
### Included blocks caching
In order to process finality events correctly we need to cache the set of parablocks included into each relay block beginning with the last finalized block and up to the most recent heads. We have to cache this data since we are only able to query this info from the state for the `k` last blocks where `k` is a relatively small number (for more info see `Assumptions`)
These are used to update Chunk pruning and PoV pruning records upon finality:
When another block finality notification is received:
- For any record older than this block:
- Update pruning
- Remove the record
| Relay Block N | .. | Chain Head 1 | Chain Head 2 |
|---------------|----|--------------|--------------|
| CandidateN_1 Included | .. | Candidate1_1 Included | Candidate2_1 Included |
| CandidateN_2 Included | .. | Candidate1_2 Included | Candidete2_2 Included |
| .. | .. | .. | .. |
| CandidateN_M Included | .. | Candidate1_K Included | Candidate2_L Included |
> TODO: It's likely we will have to have a way to go from block hash to `BlockNumber` to make this work.
### Blocks
Blocks are simply stored as `(Hash, AvailableData)` key-value pairs.
### Chunks
Chunks are stored as `(Hash, Vec<ErasureChunk>)` key-value pairs.
## Basic scenarios to test
Basically we need to test the correctness of data flow through state FSMs described earlier. These tests obviously assume that some mocking of time is happening.
- Stored data that is never included pruned in necessary timeout
- A block (and/or a chunk) is added to the store.
- We never note that the respective candidate is included.
- Until a defined timeout the data in question is available.
- After this timeout the data is no longer available.
- Stored data is kept until we are certain it is finalized.
- A block (and/or a chunk) is added to the store.
- It is available.
- Before the inclusion timeout expires notify storage that the candidate was included.
- The data is still available.
- Wait for an absurd amount of time (longer than 1 day).
- Check that the data is still available.
- Send finality notification about the block in question.
- Wait for some time below finalized data timeout.
- The data is still available.
- Wait until the data should have been pruned.
- The data is no longer available.
- Forkfulness of the relay chain is taken into account
- Block `B1` is added to the store.
- Block `B2` is added to the store.
- Notify the subsystem that both `B1` and `B2` were included in different leafs of relay chain.
- Notify the subsystem that the leaf with `B1` was finalized.
- Leaf with `B2` is never finalized.
- Leaf with `B2` is pruned and its data is no longer available.
- Wait until the finalized data of `B1` should have been pruned.
- `B1` is no longer available.
[RAM]: ../../types/overseer-protocol.md#runtime-api-message
[ASM]: ../../types/overseer-protocol.md#availability-store-message
@@ -16,48 +16,64 @@ It aims to achieve these tasks with these goals in mind:
The Scheduler manages resource allocation using the concept of "Availability Cores". There will be one availability core for each parachain, and a fixed number of cores used for multiplexing parathreads. Validators will be partitioned into groups, with the same number of groups as availability cores. Validator groups will be assigned to different availability cores over time.
An availability core can exist in either one of two states at the beginning or end of a block: free or occupied. A free availability core can have a parachain or parathread assigned to it for the potential to have a backed candidate included. After inclusion, the core enters the occupied state as the backed candidate is pending availability. There is an important distinction: a core is not considered occupied until it is in charge of a block pending availability, although the implementation may treat scheduled cores the same as occupied ones for brevity. A core exits the occupied state when the candidate is no longer pending availability - either on timeout or on availability. A core starting in the occupied state can move to the free state and back to occupied all within a single block, as availability bitfields are processed before backed candidates. At the end of the block, there is a possible timeout on availability which can move the core back to the free state if occupied.
An availability core can exist in either one of two states at the beginning or end of a block: free or occupied. A free availability core can have a parachain or parathread assigned to it for the potential to have a backed candidate included. After backing, the core enters the occupied state as the backed candidate is pending availability. There is an important distinction: a core is not considered occupied until it is in charge of a block pending availability, although the implementation may treat scheduled cores the same as occupied ones for brevity. A core exits the occupied state when the candidate is no longer pending availability - either on timeout or on availability. A core starting in the occupied state can move to the free state and back to occupied all within a single block, as availability bitfields are processed before backed candidates. At the end of the block, there is a possible timeout on availability which can move the core back to the free state if occupied.
```text
Availability Core State Machine
```dot process
digraph {
label = "Availability Core State Machine\n\n\n";
labelloc = "t";
Assignment &
Backing
+-----------+ +-----------+
| +--------------> |
| Free | | Occupied |
| <--------------+ |
+-----------+ Availability +-----------+
or Timeout
{ rank=same vg1 vg2 }
vg1 [label = "Free" shape=rectangle]
vg2 [label = "Occupied" shape=rectangle]
vg1 -> vg2 [label = "Assignment & Backing" ]
vg2 -> vg1 [label = "Availability or Timeout" ]
}
```
```text
Availability Core Transitions within Block
```dot process
digraph {
label = "Availability Core Transitions within Block\n\n\n";
labelloc = "t";
splines="line";
+-----------+ | +-----------+
| | | | |
| Free | | | Occupied |
| | | | |
+--/-----\--+ | +--/-----\--+
/- -\ | /- -\
No Backing /- \ Backing | Availability /- \ No availability
/- \ | / \
/- -\ | /- -\
+-----v-----+ +----v------+ | +-----v-----+ +-----v-----+
| | | | | | | | |
| Free | | Occupied | | | Free | | Occupied |
| | | | | | | | |
+-----------+ +-----------+ | +-----|---\-+ +-----|-----+
| | \ |
| No backing | \ Backing | (no change)
| | -\ |
| +-----v-----+ \ +-----v-----+
| | | \ | |
| | Free -----+---> Occupied |
| | | | |
| +-----------+ +-----------+
| Availability Timeout
subgraph cluster_left {
label = "";
labelloc = "t";
fr1 [label = "Free" shape=rectangle]
fr2 [label = "Free" shape=rectangle]
occ [label = "Occupied" shape=rectangle]
fr1 -> fr2 [label = "No Backing"]
fr1 -> occ [label = "Backing"]
{ rank=same fr2 occ }
}
subgraph cluster_right {
label = "";
labelloc = "t";
occ2 [label = "Occupied" shape=rectangle]
fr3 [label = "Free" shape=rectangle]
fr4 [label = "Free" shape=rectangle]
occ3 [label = "Occupied" shape=rectangle]
occ4 [label = "Occupied" shape=rectangle]
occ2 -> fr3 [label = "Availability"]
occ2 -> occ3 [label = "No availability"]
fr3 -> fr4 [label = "No backing"]
fr3 -> occ4 [label = "Backing"]
occ3 -> occ4 [label = "(no change)"]
occ3 -> fr3 [label = "Availability Timeout"]
{ rank=same; fr3[group=g1]; occ3[group=g2] }
{ rank=same; fr4[group=g1]; occ4[group=g2] }
}
}
```
Validator group assignments do not need to change very quickly. The security benefits of fast rotation is redundant with the challenge mechanism in the [Validity module](validity.md). Because of this, we only divide validators into groups at the beginning of the session and do not shuffle membership during the session. However, we do take steps to ensure that no particular validator group has dominance over a single parachain or parathread-multiplexer for an entire session to provide better guarantees of liveness.
@@ -60,12 +60,12 @@ Messages to and from the availability store.
```rust
enum AvailabilityStoreMessage {
/// Query the `AvailableData` of a candidate by hash.
QueryAvailableData(Hash, ResponseChannel<AvailableData>),
QueryAvailableData(Hash, ResponseChannel<Option<AvailableData>>),
/// Query whether an `AvailableData` exists within the AV Store.
QueryDataAvailability(Hash, ResponseChannel<bool>),
/// Query a specific availability chunk of the candidate's erasure-coding by validator index.
/// Returns the chunk and its inclusion proof against the candidate's erasure-root.
QueryChunk(Hash, ValidatorIndex, ResponseChannel<AvailabilityChunkAndProof>),
QueryChunk(Hash, ValidatorIndex, ResponseChannel<Option<AvailabilityChunkAndProof>>),
/// Store a specific chunk of the candidate's erasure-coding by validator index, with an
/// accompanying proof.
StoreChunk(Hash, ValidatorIndex, AvailabilityChunkAndProof, ResponseChannel<Result<()>>),