feat: initialize Kurdistan SDK - independent fork of Polkadot SDK

This commit is contained in:
2025-12-13 15:44:15 +03:00
commit 286de54384
6841 changed files with 1848356 additions and 0 deletions
@@ -0,0 +1,11 @@
# Preamble
This document aims to describe the purpose, functionality, and implementation of the host for PezkuwiChain's _teyrchains_
functionality - that is, the software which provides security and advancement for constituent teyrchains. It is not for
the implementer of a specific teyrchain but rather for the implementer of the Teyrchain Host. In practice, this is for
the implementers of PezkuwiChain in general.
There are a number of other documents describing the research in more detail. All referenced documents will be linked
here and should be read alongside this document for the best understanding of the full picture. However, this is the
only document which aims to describe key aspects of PezkuwiChain's particular instantiation of much of that research down to
low-level technical details and software architecture.
@@ -0,0 +1,87 @@
# Summary
[Preamble](README.md)
- [Whence Teyrchains](whence-teyrchains.md)
- [Protocol Overview](protocol-overview.md)
- [Approval Process](protocol-approval.md)
- [Disputes Process](protocol-disputes.md)
- [Dispute Flow](disputes-flow.md)
- [Chain Selection and Finalization](protocol-chain-selection.md)
- [Validator Disabling](protocol-validator-disabling.md)
- [Architecture Overview](architecture.md)
- [Messaging Overview](messaging.md)
- [PVF Pre-checking](pvf-prechecking.md)
- [Runtime Architecture](runtime/README.md)
- [`Initializer` Pallet](runtime/initializer.md)
- [`Configuration` Pallet](runtime/configuration.md)
- [`Shared` Pallet](runtime/shared.md)
- [`Disputes` Pallet](runtime/disputes.md)
- [`Paras` Pallet](runtime/paras.md)
- [`Scheduler` Pallet](runtime/scheduler.md)
- [`Inclusion` Pallet](runtime/inclusion.md)
- [`ParaInherent` Pallet](runtime/parainherent.md)
- [`DMP` Pallet](runtime/dmp.md)
- [`HRMP` Pallet](runtime/hrmp.md)
- [`Session Info` Pallet](runtime/session_info.md)
- [Runtime APIs](runtime-api/README.md)
- [Validators](runtime-api/validators.md)
- [Validator Groups](runtime-api/validator-groups.md)
- [Availability Cores](runtime-api/availability-cores.md)
- [Persisted Validation Data](runtime-api/persisted-validation-data.md)
- [Session Index](runtime-api/session-index.md)
- [Validation Code](runtime-api/validation-code.md)
- [Candidate Pending Availability](runtime-api/candidate-pending-availability.md)
- [Candidate Events](runtime-api/candidate-events.md)
- [Disputes Info](runtime-api/disputes-info.md)
- [Candidates Included](runtime-api/candidates-included.md)
- [PVF Pre-checking](runtime-api/pvf-prechecking.md)
- [Node Architecture](node/README.md)
- [Subsystems and Jobs](node/subsystems-and-jobs.md)
- [Overseer](node/overseer.md)
- [GRANDPA Voting Rule](node/grandpa-voting-rule.md)
- [Collator Subsystems](node/collators/README.md)
- [Collation Generation](node/collators/collation-generation.md)
- [Collator Protocol](node/collators/collator-protocol.md)
- [Backing Subsystems](node/backing/README.md)
- [Candidate Backing](node/backing/candidate-backing.md)
- [Prospective Teyrchains](node/backing/prospective-teyrchains.md)
- [Statement Distribution](node/backing/statement-distribution.md)
- [Statement Distribution (Legacy)](node/backing/statement-distribution-legacy.md)
- [Availability Subsystems](node/availability/README.md)
- [Availability Distribution](node/availability/availability-distribution.md)
- [Availability Recovery](node/availability/availability-recovery.md)
- [Bitfield Distribution](node/availability/bitfield-distribution.md)
- [Bitfield Signing](node/availability/bitfield-signing.md)
- [Approval Subsystems](node/approval/README.md)
- [Approval Voting](node/approval/approval-voting.md)
- [Approval Distribution](node/approval/approval-distribution.md)
- [Disputes Subsystems](node/disputes/README.md)
- [Dispute Coordinator](node/disputes/dispute-coordinator.md)
- [Dispute Distribution](node/disputes/dispute-distribution.md)
- [Utility Subsystems](node/utility/README.md)
- [Availability Store](node/utility/availability-store.md)
- [Candidate Validation](node/utility/candidate-validation.md)
- [PVF Host and Workers](node/utility/pvf-host-and-workers.md)
- [Provisioner](node/utility/provisioner.md)
- [Network Bridge](node/utility/network-bridge.md)
- [Gossip Support](node/utility/gossip-support.md)
- [Peer Set Manager](node/utility/peer-set-manager.md)
- [Runtime API Requests](node/utility/runtime-api.md)
- [Chain API Requests](node/utility/chain-api.md)
- [Chain Selection Request](node/utility/chain-selection.md)
- [PVF Pre-Checking](node/utility/pvf-prechecker.md)
- [Data Structures and Types](types/README.md)
- [Candidate](types/candidate.md)
- [Backing](types/backing.md)
- [Availability](types/availability.md)
- [Overseer and Subsystem Protocol](types/overseer-protocol.md)
- [Runtime](types/runtime.md)
- [Messages](types/messages.md)
- [Network](types/network.md)
- [Approvals](types/approval.md)
- [Disputes](types/disputes.md)
- [PVF Pre-checking](types/pvf-prechecking.md)
[Glossary](glossary.md)
[Further Reading](further-reading.md)
@@ -0,0 +1,102 @@
# Architecture Overview
This section aims to describe, at a high level, the code architecture and subsystems involved in the implementation of
an individual Teyrchain Host. It also illuminates certain subtleties and challenges faced in the design and
implementation of those subsystems.
To recap, Pezkuwi includes a blockchain known as the relay-chain. A blockchain is a Directed Acyclic Graph (DAG) of
state transitions, where every block can be considered to be the head of a linked-list (known as a "chain" or "fork")
with a cumulative state which is determined by applying the state transition of each block in turn. All paths through
the DAG terminate at the Genesis Block. In fact, the blockchain is a tree, since each block can have only one parent.
```dot process
digraph {
node [shape=box];
genesis [label = Genesis]
b1 [label = "Block 1"]
b2 [label = "Block 2"]
b3 [label = "Block 3"]
b4 [label = "Block 4"]
b5 [label = "Block 5"]
b5 -> b3
b4 -> b3
b3 -> b1
b2 -> genesis
b1 -> genesis
}
```
A blockchain network is comprised of nodes. These nodes each have a view of many different forks of a blockchain and
must decide which forks to follow and what actions to take based on the forks of the chain that they are aware of.
So in specifying an architecture to carry out the functionality of a Teyrchain Host, we have to answer two categories of
questions:
1. What is the state-transition function of the blockchain? What is necessary for a transition to be considered valid,
and what information is carried within the implicit state of a block?
1. Being aware of various forks of the blockchain as well as global private state such as a view of the current time,
what behaviors should a node undertake? What information should a node extract from the state of which forks, and how
should that information be used?
The first category of questions will be addressed by the Runtime, which defines the state-transition logic of the chain.
Runtime logic only has to focus on the perspective of one chain, as each state has only a single parent state.
The second category of questions addressed by Node-side behavior. Node-side behavior defines all activities that a node
undertakes, given its view of the blockchain/block-DAG. Node-side behavior can take into account all or many of the
forks of the blockchain, and only conditionally undertake certain activities based on which forks it is aware of, as
well as the state of the head of those forks.
```dot process
digraph G {
Runtime [shape=box]
"Node" [shape=box margin=0.5]
Transport [shape=rectangle width=5]
Runtime -> "Node" [dir=both label="Runtime API"]
"Node" -> Transport [penwidth=1]
}
```
It is also helpful to divide Node-side behavior into two further categories: Networking and Core. Networking behaviors
relate to how information is distributed between nodes. Core behaviors relate to internal work that a specific node
does. These two categories of behavior often interact, but can be heavily abstracted from each other. Core behaviors
care that information is distributed and received, but not the internal details of how distribution and receipt
function. Networking behaviors act on requests for distribution or fetching of information, but are not concerned with
how the information is used afterwards. This allows us to create clean boundaries between Core and Networking
activities, improving the modularity of the code.
```text
___________________ ____________________
/ Core \ / Networking \
| | Send "Hello" | |
| |- to "foo" --->| |
| | | |
| | | |
| | | |
| | Got "World" | |
| |<-- from "bar" --| |
| | | |
\___________________/ \____________________/
______| |______
___Transport___
```
Node-side behavior is split up into various subsystems. Subsystems are long-lived workers that perform a particular
category of work. Subsystems can communicate with each other, and do so via an [Overseer](node/overseer.md) that
prevents race conditions.
Runtime logic is divided up into Modules and APIs. Modules encapsulate particular behavior of the system. Modules
consist of storage, routines, and entry-points. Routines are invoked by entry points, by other modules, upon block
initialization or closing. Routines can read and alter the storage of the module. Entry-points are the means by which
new information is introduced to a module and can limit the origins (user, root, teyrchain) that they accept being
called by. Each block in the blockchain contains a set of Extrinsics. Each extrinsic targets a specific entry point to
trigger and which data should be passed to it. Runtime APIs provide a means for Node-side behavior to extract meaningful
information from the state of a single fork.
These two aspects of the implementation are heavily dependent on each other. The Runtime depends on Node-side behavior
to author blocks, and to include Extrinsics which trigger the correct entry points. The Node-side behavior relies on
Runtime APIs to extract information necessary to determine which actions to take.
@@ -0,0 +1,138 @@
# Disputes Flows
A component-free description in what-if form with addition state graphs of the dispute.
```mermaid
stateDiagram-v2
[*] --> WaitForBackingVote: negative Vote received
[*] --> WaitForDisputeVote: backing Vote received
WaitForBackingVote --> Open: negative Vote received
WaitForDisputeVote --> Open: backing Vote received
Open --> Concluded: Incoming Vote via Gossip
Open --> Open: No ⅔ supermajority
Open --> [*]
Concluded --> [*]
```
---
```mermaid
stateDiagram-v2
[*] --> Open: First Vote(s) received
Open --> HasPoV : Fetch Availability Store for PoV
HasPoV --> HasCode : Fetch historical Code
HasCode --> VerifyWithRuntime: All Data locally avail
Open --> DisputeAvailabilityDataReceived
DisputeAvailabilityDataReceived --> VerifyWithRuntime: Received Gossip
HasPoV --> RequestDisputeAvailabilityData: nope
HasCode --> RequestDisputeAvailabilityData: nope
RequestDisputeAvailabilityData --> VerifyWithRuntime: Received
RequestDisputeAvailabilityData --> RequestDisputeAvailabilityData: Timed out - pick another peer
VerifyWithRuntime --> CastVoteValid: Block Valid
VerifyWithRuntime --> CastVoteInvalid: Block Invalid
CastVoteInvalid --> GossipVote
CastVoteValid --> GossipVote
GossipVote --> [*]
```
---
Dispute Availability Data
```mermaid
stateDiagram-v2
[*] --> Open: First Vote(s) received
Open --> DisputeDataAvail: somehow the data became available
Open --> RespondUnavailable: Data not available
IncomingRequestDisputeAvailabilityData --> RespondUnavailable
IncomingRequestDisputeAvailabilityData --> DisputeDataAvail
DisputeDataAvail --> RespondWithDisputeAvailabilityData: Send
VoteGossipReceived --> Track: implies source peer has<br />dispute availability data
```
---
Peer handling
```mermaid
stateDiagram-v2
[*] --> Open: First Vote(s) received
Open --> GossipVotes: for all current peers
Open --> PeerConnected: another
PeerConnected --> GossipVotes: Peer connects
GossipVotes --> [*]
```
## Conditional formulation
The set of validators eligible to vote consists of the validators that had duty at the time of backing, plus backing
votes by the backing validators.
If a validator receives an initial dispute message (a set of votes where there are at least two opposing votes
contained), and the PoV or Code are hence not reconstructible from local storage, that validator must request the
required data from its peers.
The dispute availability message must contain code, persisted validation data, and the proof of validity.
Only peers that already voted shall be queried for the dispute availability data.
The peer to be queried for disputes data, must be picked at random.
A validator must retain code, persisted validation data and PoV until a block, that contains the dispute resolution, is
finalized - plus an additional 24 hours.
Dispute availability gossip must continue beyond the dispute resolution, until the post resolution timeout expired
(equiv to the timeout until which additional late votes are accepted).
Remote disputes are disputes that are in relation to a chain that is not part of the local validators active heads.
All incoming votes must be persisted.
Persisted votes stay persisted for `N` sessions, and are cleaned up on a per session basis.
Votes must be queryable by a particular validator, identified by its signing key.
Votes must be queryable by a particular validator, identified by a session index and the validator index valid in that
session.
If there exists a negative and a positive vote for a particular block, a dispute is detected.
If a dispute is detected, all currently available votes for that block must be gossiped.
If an incoming dispute vote is detected, a validator must cast their own vote. The vote is determined by validating the
PoV with the Code at the time of backing the block in question.
If the validator was also a backer of the block, validation and casting an additional vote should be skipped.
If the count of votes pro or cons regarding the disputed block, reaches the required ⅔ supermajority (including the
backing votes), the conclusion must be recorded on chain and the voters on the loosing and no-shows being slashed
appropriately.
If a block is found invalid by a dispute resolution, it must be blacklisted to avoid resync or further build on that
chain if other chains are available (to be detailed in the grandpa fork choice rule).
A dispute accepts Votes after the dispute is resolved, for 1 day.
If a vote is received, after the dispute is resolved, the vote shall still be recorded in the state root, albeit
yielding less reward.
Recording in the state root might happen batched, at timeout expiry.
If a new active head/chain appears, and the dispute resolution was not recorded on that chain yet, the dispute
resolution or open dispute must be recorded / transplanted to that chain as well, since the disputes must be present on
all chains to make sure the offender is punished.
If a validator votes in two opposing ways, this composes of a double vote like in other cases (backing, approval
voting).
If a dispute is not resolved within due time, all validators are to be slashed for a small amount.
If a dispute is not resolved within due time, governance mode shall be entered for manual resolution.
If a validator unexpectedly restarts, the dispute shall be continued with the state based on votes being cast and being
present in persistent storage.
@@ -0,0 +1,4 @@
# Further Reading
- Pezkuwi Wiki on Consensus: <https://wiki.network.pezkuwichain.io/docs/learn-consensus>
- Pezkuwi Spec: <https://github.com/w3f/polkadot-spec>
@@ -0,0 +1,79 @@
# Glossary
Here you can find definitions of a bunch of jargon, usually specific to the Pezkuwi project.
- **Approval Checker:** A validator who randomly self-selects so to perform validity checks on a parablock which is
pending approval.
- **BABE:** (Blind Assignment for Blockchain Extension). The algorithm validators use to safely extend the Relay Chain.
See [the Pezkuwi wiki][0] for more information.
- **Backable Candidate:** A Teyrchain Candidate which is backed by a majority of validators assigned to a given
teyrchain.
- **Backed Candidate:** A Backable Candidate noted in a relay-chain block
- **Backing:** A set of statements proving that a Teyrchain Candidate is backable.
- **Collator:** A node who generates Proofs-of-Validity (PoV) for blocks of a specific teyrchain.
- **DMP:** (Downward Message Passing). Message passing from the relay-chain to a teyrchain. Also there is a runtime
teyrchains module with the same name.
- **DMQ:** (Downward Message Queue). A message queue for messages from the relay-chain down to a teyrchain. A teyrchain
has exactly one downward message queue.
- **Extrinsic:** An element of a relay-chain block which triggers a specific entry-point of a runtime module with given
arguments.
- **GRANDPA:** (Ghost-based Recursive ANcestor Deriving Prefix Agreement). The algorithm validators use to guarantee
finality of the Relay Chain.
- **HRMP:** (Horizontally Relay-routed Message Passing). A mechanism for message passing between teyrchains (hence
horizontal) that leverages the relay-chain storage. Predates XCMP. Also there is a runtime teyrchains module with the
same name.
- **Inclusion Pipeline:** The set of steps taken to carry a Teyrchain Candidate from authoring, to backing, to
availability and full inclusion in an active fork of its teyrchain.
- **Module:** A component of the Runtime logic, encapsulating storage, routines, and entry-points.
- **Module Entry Point:** A recipient of new information presented to the Runtime. This may trigger routines.
- **Module Routine:** A piece of code executed within a module by block initialization, closing, or upon an entry point
being triggered. This may execute computation, and read or write storage.
- **MQC:** (Message Queue Chain). A cryptographic data structure that resembles an append-only linked list which doesn't
store original values but only their hashes. The whole structure is described by a single hash, referred as a "head".
When a value is appended, it's contents hashed with the previous head creating a hash that becomes a new head.
- **Node:** A participant in the Pezkuwi network, who follows the protocols of communication and connection to other
nodes. Nodes form a peer-to-peer network topology without a central authority.
- **Teyrchain Candidate, or Candidate:** A proposed block for inclusion into a teyrchain.
- **Parablock:** A block in a teyrchain.
- **Teyrchain:** A constituent chain secured by the Relay Chain's validators.
- **Teyrchain Validators:** A subset of validators assigned during a period of time to back candidates for a specific
teyrchain
- **On-demand teyrchain:** A teyrchain which is scheduled on a pay-as-you-go basis.
- **Lease holding teyrchain:** A teyrchain possessing an active slot lease. The lease holder is assigned a single
availability core for the duration of the lease, granting consistent blockspace scheduling at the rate 1 parablock per
relay block.
- **PDK (Teyrchain Development Kit):** A toolset that allows one to develop a teyrchain. Cumulus is a PDK.
- **Preimage:** In our context, if `H(X) = Y` where `H` is a hash function and `Y` is the hash, then `X` is the hash
preimage.
- **Proof-of-Validity (PoV):** A stateless-client proof that a teyrchain candidate is valid, with respect to some
validation function.
- **PVF:** Teyrchain Validation Function. The validation code that is run by validators on teyrchains.
- **PVF Prechecking:** This is the process of checking a PVF when it appears
on-chain, either when the teyrchain is onboarded or when it signalled an
upgrade of its validation code. We attempt preparation of the PVF and make
sure it that succeeds within a given timeout, plus some additional checks.
- **PVF Preparation:** This is the process of preparing the WASM blob and includes both prevalidation and compilation.
- **PVF Prevalidation:** Some basic checks for correctness of the PVF blob. The
first step of PVF preparation, before compilation.
- **Relay Parent:** A block in the relay chain, referred to in a context where work is being done in the context of the
state at this block.
- **Runtime:** The relay-chain state machine.
- **Runtime Module:** See Module.
- **Runtime API:** A means for the node-side behavior to access structured information based on the state of a fork of
the blockchain.
- **Subsystem:** A long-running task which is responsible for carrying out a particular category of work.
- **UMP:** (Upward Message Passing) A vertical message passing mechanism from a teyrchain to the relay chain.
- **Validator:** Specially-selected node in the network who is responsible for validating teyrchain blocks and issuing
attestations about their validity.
- **Validation Function:** A piece of Wasm code that describes the state-transition function of a teyrchain.
- **VMP:** (Vertical Message Passing) A family of mechanisms that are responsible for message exchange between the relay
chain and teyrchains.
- **XCMP:** (Cross-Chain Message Passing) A type of horizontal message passing (i.e. between teyrchains) that allows
secure message passing directly between teyrchains and has minimal resource requirements from the relay chain, thus
highly scalable.
## See Also
Also of use is the [Substrate Glossary](https://substrate.dev/docs/en/knowledgebase/getting-started/glossary).
[0]: https://wiki.network.pezkuwichain.io/docs/learn-consensus
@@ -0,0 +1,103 @@
# Messaging Overview
The Pezkuwi Host has a few mechanisms that are responsible for message passing. They can be generally divided on two
categories: Horizontal and Vertical. Horizontal Message Passing (HMP) refers to mechanisms that are responsible for
exchanging messages between teyrchains. Vertical Message Passing (VMP) is used for communication between the relay chain
and teyrchains.
## Vertical Message Passing
```dot process
digraph {
rc [shape=Mdiamond label="Relay Chain"];
p1 [shape=box label = "Teyrchain"];
rc -> p1 [label="DMP"];
p1 -> rc [label="UMP"];
}
```
Downward Message Passing (DMP) is a mechanism for delivering messages to teyrchains from the relay chain.
Each teyrchain has its own queue that stores all pending inbound downward messages. A teyrchain doesn't have to process
all messages at once, however, there are rules as to how the downward message queue should be processed. Currently, at
least one message must be consumed per candidate if the queue is not empty. The downward message queue doesn't have a
cap on its size and it is up to the relay-chain to put mechanisms that prevent spamming in place.
Upward Message Passing (UMP) is a mechanism responsible for delivering messages in the opposite direction: from a
teyrchain up to the relay chain. Upward messages are essentially byte blobs. However, they are interpreted by the
relay-chain according to the XCM standard.
The XCM standard is a common vocabulary of messages. The XCM standard doesn't require a particular interpretation of a
message. However, the teyrchains host (e.g. Pezkuwi) guarantees certain semantics for those.
Moreover, while most XCM messages are handled by the on-chain XCM interpreter, some of the messages are special cased.
Specifically, those messages can be checked during the acceptance criteria and thus invalid messages would lead to
rejecting the candidate itself.
One kind of such a message is `Xcm::Transact`. This upward message can be seen as a way for a teyrchain to execute
arbitrary entrypoints on the relay-chain. `Xcm::Transact` messages resemble regular extrinsics with the exception that
they originate from a teyrchain.
The payload of `Xcm::Transact` messages is referred as to `Dispatchable`. When a candidate with such a message is
enacted the dispatchables are put into a queue corresponding to the teyrchain. There can be only so many dispatchables
in that queue at once. The weight that processing of the dispatchables can consume is limited by a preconfigured value.
Therefore, it is possible that some dispatchables will be left for later blocks. To make the dispatching more fair, the
queues are processed turn-by-turn in a round robin fashion.
The second category of special cased XCM messages are for horizontal messaging channel management, namely messages meant
to request opening and closing HRMP channels (HRMP will be described below).
## Horizontal Message Passing
```dot process
digraph {
rc [shape=Mdiamond color="gray" fontcolor="gray" label="Relay Chain"];
subgraph {
rank = "same"
p1 [shape=box label = "Teyrchain 1"];
p2 [shape=box label = "Teyrchain 2"];
}
rc -> p1 [label="DMP" color="gray" fontcolor="gray"];
p1 -> rc [label="UMP" color="gray" fontcolor="gray"];
rc -> p2 [label="DMP" color="gray" fontcolor="gray"];
p2 -> rc [label="UMP" color="gray" fontcolor="gray"];
p2 -> p1 [dir=both label="XCMP"];
}
```
### Cross-Chain Message Passing
The most important member of this family is XCMP.
> ️ XCMP is currently under construction and details are subject for change.
XCMP is a message passing mechanism between teyrchains that require minimal involvement of the relay chain. The relay
chain provides means for sending teyrchains to authenticate messages sent to recipient teyrchains.
Semantically communication occurs through so called channels. A channel is unidirectional and it has two endpoints, for
sender and for recipient. A channel can be opened only if the both parties agree and closed unilaterally.
Only the channel metadata is stored on the relay-chain in a very compact form: all messages and their contents sent by
the sender teyrchain are encoded using only one root hash. This root is referred as MQC head.
The authenticity of the messages must be proven using that root hash to the receiving party at the candidate authoring
time. The proof stems from the relay parent storage that contains the root hash of the channel. Since not all messages
are required to be processed by the receiver's candidate, only the processed messages are supplied (i.e. preimages),
rest are provided as hashes.
Further details can be found at the official repository for the [Cross-Consensus Message Format
(XCM)](https://github.com/paritytech/xcm-format/blob/master/README.md), as well as at the [W3F research
website](https://research.web3.foundation/en/latest/polkadot/XCMP.html) and [this
blogpost](https://medium.com/web3foundation/polkadots-messaging-scheme-b1ec560908b7).
HRMP (Horizontally Relay-routed Message Passing) is a stop gap that predates XCMP. Semantically, it mimics XCMP's
interface. The crucial difference from XCMP though is that all the messages are stored in the relay-chain storage. That
makes things simple but at the same time that makes HRMP more demanding in terms of resources thus making it more
expensive.
Once XCMP is available we expect to retire HRMP.
@@ -0,0 +1,50 @@
# Node Architecture
## Design Goals
* Modularity: Components of the system should be as self-contained as possible. Communication boundaries between
components should be well-defined and mockable. This is key to creating testable, easily reviewable code.
* Minimizing side effects: Components of the system should aim to minimize side effects and to communicate with other
components via message-passing.
* Operational Safety: The software will be managing signing keys where conflicting messages can lead to large amounts of
value to be slashed. Care should be taken to ensure that no messages are signed incorrectly or in conflict with each
other.
The architecture of the node-side behavior aims to embody the Rust principles of ownership and message-passing to create
clean, isolatable code. Each resource should have a single owner, with minimal sharing where unavoidable.
Many operations that need to be carried out involve the network, which is asynchronous. This asynchrony affects all core
subsystems that rely on the network as well. The approach of hierarchical state machines is well-suited to this kind of
environment.
We introduce
## Components
The node architecture consists of the following components:
* The Overseer (and subsystems): A hierarchy of state machines where an overseer supervises subsystems. Subsystems can
contain their own internal hierarchy of jobs. This is elaborated on in the next section on Subsystems.
* A block proposer: Logic triggered by the consensus algorithm of the chain when the node should author a block.
* A GRANDPA voting rule: A strategy for selecting chains to vote on in the GRANDPA algorithm to ensure that only valid
teyrchain candidates appear in finalized relay-chain blocks.
## Assumptions
The Node-side code comes with a set of assumptions that we build upon. These assumptions encompass most of the
fundamental blockchain functionality.
We assume the following constraints regarding provided basic functionality:
* The underlying **consensus** algorithm, whether it is BABE or SASSAFRAS is implemented.
* There is a **chain synchronization** protocol which will search for and download the longest available chains at all
times.
* The **state** of all blocks at the head of the chain is available. There may be **state pruning** such that state of
the last `k` blocks behind the last finalized block are available, as well as the state of all their descendants.
This assumption implies that the state of all active leaves and their last `k` ancestors are all available. The
underlying implementation is expected to support `k` of a few hundred blocks, but we reduce this to a very
conservative `k=5` for our purposes.
* There is an underlying **networking** framework which provides **peer discovery** services which will provide us
with peers and will not create "loopback" connections to our own node. The number of peers we will have is assumed
to be bounded at 1000.
* There is a **transaction pool** and a **transaction propagation** mechanism which maintains a set of current
transactions and distributes to connected peers. Current transactions are those which are not outdated relative to
some "best" fork of the chain, which is part of the active heads, and have not been included in the best fork.
@@ -0,0 +1,10 @@
# Approval Subsystems
The approval subsystems implement the node-side of the [Approval Protocol](../../protocol-approval.md).
We make a divide between the [assignment/voting logic](approval-voting.md) and the [distribution
logic](approval-distribution.md) that distributes assignment certifications and approval votes. The logic in the
assignment and voting also informs the GRANDPA voting rule on how to vote.
These subsystems are intended to flag issues and begin participating in live disputes. Dispute subsystems also track all
observed votes (backing, approval, and dispute-specific) by all validators on all candidates.
@@ -0,0 +1,348 @@
# Approval Distribution
A subsystem for the distribution of assignments and approvals for approval checks on candidates over the network.
The [Approval Voting](approval-voting.md) subsystem is responsible for active participation in a protocol designed to
select a sufficient number of validators to check each and every candidate which appears in the relay chain. Statements
of participation in this checking process are divided into two kinds:
* **Assignments** indicate that validators have been selected to do checking
* **Approvals** indicate that validators have checked and found the candidate satisfactory.
The [Approval Voting](approval-voting.md) subsystem handles all the issuing and tallying of this protocol, but this
subsystem is responsible for the disbursal of statements among the validator-set.
The inclusion pipeline of candidates concludes after availability, and only after inclusion do candidates actually get
pushed into the approval checking pipeline. As such, this protocol deals with the candidates _made available by_
particular blocks, as opposed to the candidates which actually appear within those blocks, which are the candidates
_backed by_ those blocks. Unless stated otherwise, whenever we reference a candidate partially by block hash, we are
referring to the set of candidates _made available by_ those blocks.
We implement this protocol as a gossip protocol, and like other teyrchain-related gossip protocols our primary concerns
are about ensuring fast message propagation while maintaining an upper bound on the number of messages any given node
must store at any time.
Approval messages should always follow assignments, so we need to be able to discern two pieces of information based on
our [View](../../types/network.md#universal-types):
1. Is a particular assignment relevant under a given `View`?
2. Is a particular approval relevant to any assignment in a set?
For our own local view, these two queries must not yield false negatives. When applied to our peers' views, it is
acceptable for them to yield false negatives. The reason for that is that our peers' views may be beyond ours, and we
are not capable of fully evaluating them. Once we have caught up, we can check again for false negatives to continue
distributing.
For assignments, what we need to be checking is whether we are aware of the (block, candidate) pair that the assignment
references. For approvals, we need to be aware of an assignment by the same validator which references the candidate
being approved.
However, awareness on its own of a (block, candidate) pair would imply that even ancient candidates all the way back to
the genesis are relevant. We are actually not interested in anything before finality.
We gossip assignments along a grid topology produced by the [Gossip Support Subsystem](../utility/gossip-support.md) and
also to a few random peers. The first time we accept an assignment or approval, regardless of the source, which
originates from a validator peer in a shared dimension of the grid, we propagate the message to validator peers in the
unshared dimension as well as a few random peers.
But, in case these mechanisms don't work on their own, we need to trade bandwidth for protocol liveness by introducing
aggression.
Aggression has 3 levels:
* Aggression Level 0: The basic behaviors described above.
* Aggression Level 1: The originator of a message sends to all peers. Other peers follow the rules above.
* Aggression Level 2: All peers send all messages to all their row and column neighbors. This means that each validator
will, on average, receive each message approximately 2*sqrt(n) times.
These aggression levels are chosen based on how long a block has taken to finalize: assignments and approvals related to
the unfinalized block will be propagated with more aggression. In particular, it's only the earliest unfinalized blocks
that aggression should be applied to, because descendants may be unfinalized only by virtue of being descendants.
## Protocol
Input:
* `ApprovalDistributionMessage::NewBlocks`
* `ApprovalDistributionMessage::DistributeAssignment`
* `ApprovalDistributionMessage::DistributeApproval`
* `ApprovalDistributionMessage::NetworkBridgeUpdate`
* `OverseerSignal::BlockFinalized`
Output:
* `ApprovalVotingMessage::ImportAssignment`
* `ApprovalVotingMessage::ImportApproval`
* `NetworkBridgeMessage::SendValidationMessage::ApprovalDistribution`
## Functionality
```rust
type BlockScopedCandidate = (Hash, CandidateHash);
enum PendingMessage {
Assignment(IndirectAssignmentCert, CoreIndex),
Approval(IndirectSignedApprovalVote),
}
/// The `State` struct is responsible for tracking the overall state of the subsystem.
///
/// It tracks metadata about our view of the unfinalized chain, which assignments and approvals we have seen, and our peers' views.
struct State {
// These two fields are used in conjunction to construct a view over the unfinalized chain.
blocks_by_number: BTreeMap<BlockNumber, Vec<Hash>>,
blocks: HashMap<Hash, BlockEntry>,
/// Our view updates to our peers can race with `NewBlocks` updates. We store messages received
/// against the directly mentioned blocks in our view in this map until `NewBlocks` is received.
///
/// As long as the parent is already in the `blocks` map and `NewBlocks` messages aren't delayed
/// by more than a block length, this strategy will work well for mitigating the race. This is
/// also a race that occurs typically on local networks.
pending_known: HashMap<Hash, Vec<(PeerId, PendingMessage>)>>,
// Peer view data is partially stored here, and partially inline within the `BlockEntry`s
peer_views: HashMap<PeerId, View>,
}
enum MessageFingerprint {
Assignment(Hash, u32, ValidatorIndex),
Approval(Hash, u32, ValidatorIndex),
}
struct Knowledge {
known_messages: HashSet<MessageFingerprint>,
}
struct PeerKnowledge {
/// The knowledge we've sent to the peer.
sent: Knowledge,
/// The knowledge we've received from the peer.
received: Knowledge,
}
/// Information about blocks in our current view as well as whether peers know of them.
struct BlockEntry {
// Peers who we know are aware of this block and thus, the candidates within it. This maps to their knowledge of messages.
known_by: HashMap<PeerId, PeerKnowledge>,
// The number of the block.
number: BlockNumber,
// The parent hash of the block.
parent_hash: Hash,
// Our knowledge of messages.
knowledge: Knowledge,
// A votes entry for each candidate.
candidates: IndexMap<CandidateHash, CandidateEntry>,
}
enum ApprovalState {
Assigned(AssignmentCert),
Approved(AssignmentCert, ApprovalSignature),
}
/// Information about candidates in the context of a particular block they are included in. In other words,
/// multiple `CandidateEntry`s may exist for the same candidate, if it is included by multiple blocks - this is likely the case
/// when there are forks.
struct CandidateEntry {
approvals: HashMap<ValidatorIndex, ApprovalState>,
}
```
### Network updates
#### `NetworkBridgeEvent::PeerConnected`
Add a blank view to the `peer_views` state.
#### `NetworkBridgeEvent::PeerDisconnected`
Remove the view under the associated `PeerId` from `State::peer_views`.
Iterate over every `BlockEntry` and remove `PeerId` from it.
#### `NetworkBridgeEvent::OurViewChange`
Remove entries in `pending_known` for all hashes not present in the view. Ensure a vector is present in `pending_known`
for each hash in the view that does not have an entry in `blocks`.
#### `NetworkBridgeEvent::PeerViewChange`
Invoke `unify_with_peer(peer, view)` to catch them up to messages we have.
We also need to use the `view.finalized_number` to remove the `PeerId` from any blocks that it won't be wanting
information about anymore. Note that we have to be on guard for peers doing crazy stuff like jumping their
`finalized_number` forward 10 trillion blocks to try and get us stuck in a loop for ages.
One of the safeguards we can implement is to reject view updates from peers where the new `finalized_number` is less
than the previous.
We augment that by defining `constrain(x)` to output the x bounded by the first and last numbers in
`state.blocks_by_number`.
From there, we can loop backwards from `constrain(view.finalized_number)` until `constrain(last_view.finalized_number)`
is reached, removing the `PeerId` from all `BlockEntry`s referenced at that height. We can break the loop early if we
ever exit the bound supplied by the first block in `state.blocks_by_number`.
#### `NetworkBridgeEvent::PeerMessage`
If the block hash referenced by the message exists in `pending_known`, add it to the vector of pending messages and
return.
If the message is of type `ApprovalDistributionV1Message::Assignment(assignment_cert, claimed_index)`, then call
`import_and_circulate_assignment(MessageSource::Peer(sender), assignment_cert, claimed_index)`
If the message is of type `ApprovalDistributionV1Message::Approval(approval_vote)`, then call
`import_and_circulate_approval(MessageSource::Peer(sender), approval_vote)`
### Subsystem Updates
#### `ApprovalDistributionMessage::NewBlocks`
Create `BlockEntry` and `CandidateEntries` for all blocks.
For all entries in `pending_known`:
* If there is now an entry under `blocks` for the block hash, drain all messages and import with
`import_and_circulate_assignment` and `import_and_circulate_approval`.
For all peers:
* Compute `view_intersection` as the intersection of the peer's view blocks with the hashes of the new blocks.
* Invoke `unify_with_peer(peer, view_intersection)`.
#### `ApprovalDistributionMessage::DistributeAssignment`
Call `import_and_circulate_assignment` with `MessageSource::Local`.
#### `ApprovalDistributionMessage::DistributeApproval`
Call `import_and_circulate_approval` with `MessageSource::Local`.
#### `OverseerSignal::BlockFinalized`
Prune all lists from `blocks_by_number` with number less than or equal to `finalized_number`. Prune all the
`BlockEntry`s referenced by those lists.
### Utility
```rust
enum MessageSource {
Peer(PeerId),
Local,
}
```
#### `import_and_circulate_assignment(...)`
`import_and_circulate_assignment(source: MessageSource, assignment: IndirectAssignmentCert, claimed_candidate_index:
CandidateIndex)`
Imports an assignment cert referenced by block hash and candidate index. As a postcondition, if the cert is valid, it
will have distributed the cert to all peers who have the block in their view, with the exclusion of the peer referenced
by the `MessageSource`.
We maintain a few invariants:
* we only send an assignment to a peer after we add its fingerprint to our knowledge
* we add a fingerprint of an assignment to our knowledge only if it's valid and hasn't been added before
The algorithm is the following:
* Load the `BlockEntry` using `assignment.block_hash`. If it does not exist, report the source if it is
`MessageSource::Peer` and return.
* Compute a fingerprint for the `assignment` using `claimed_candidate_index`.
* If the source is `MessageSource::Peer(sender)`:
* check if `peer` appears under `known_by` and whether the fingerprint is in the knowledge of the peer. If the peer
does not know the block, report for providing data out-of-view and proceed. If the peer does know the block and
the `sent` knowledge contains the fingerprint, report for providing replicate data and return, otherwise, insert
into the `received` knowledge and return.
* If the message fingerprint appears under the `BlockEntry`'s `Knowledge`, give the peer a small positive reputation
boost, add the fingerprint to the peer's knowledge only if it knows about the block and return. Note that we must do
this after checking for out-of-view and if the peers knows about the block to avoid being spammed. If we did this
check earlier, a peer could provide data out-of-view repeatedly and be rewarded for it.
* Check the assignment certificate is valid.
* If the cert kind is `RelayVRFModulo`, then the certificate is valid as long as `sample <
session_info.relay_vrf_samples` and the VRF is valid for the validator's key with the input
`block_entry.relay_vrf_story ++ sample.encode()` as described with
[the approvals protocol section](../../protocol-approval.md#assignment-criteria). We set
`core_index = vrf.make_bytes().to_u32() % session_info.n_cores`. If the `BlockEntry` causes
inclusion of a candidate at `core_index`, then this is a valid assignment for the candidate
at `core_index` and has delay tranche 0. Otherwise, it can be ignored.
* If the cert kind is `RelayVRFModuloCompact`, then the certificate is valid as long as the VRF
is valid for the validator's key with the input `block_entry.relay_vrf_story ++ relay_vrf_samples.encode()`
as described with [the approvals protocol section](../../protocol-approval.md#assignment-criteria).
We enforce that all `core_bitfield` indices are included in the set of the core indices sampled from the
VRF Output. The assignment is considered a valid tranche0 assignment for all claimed candidates if all
`core_bitfield` indices match the core indices where the claimed candidates were included at.
* If the cert kind is `RelayVRFDelay`, then we check if the VRF is valid for the validator's key with the
input `block_entry.relay_vrf_story ++ cert.core_index.encode()` as described in [the approvals protocol
section](../../protocol-approval.md#assignment-criteria). The cert can be ignored if the block did not
cause inclusion of a candidate on that core index. Otherwise, this is a valid assignment for the included
candidate. The delay tranche for the assignment is determined by reducing
`(vrf.make_bytes().to_u64() % (session_info.n_delay_tranches + session_info.zeroth_delay_tranche_width)).saturating_sub(session_info.zeroth_delay_tranche_width)`.
* We also check that the core index derived by the output is covered by the `VRFProof` by means of an auxiliary signature.
* If the delay tranche is too far in the future, return `AssignmentCheckResult::TooFarInFuture`.
* If the result is `AssignmentCheckResult::Accepted`
* Dispatch `ApprovalVotingMessage::ImportAssignment(assignment)` to approval-voting to import the assignment.
* If the vote was accepted but not duplicate, give the peer a positive reputation boost
* add the fingerprint to both our and the peer's knowledge in the `BlockEntry`. Note that we only doing this after
making sure we have the right fingerprint.
* If the result is `AssignmentCheckResult::AcceptedDuplicate`, add the fingerprint to the peer's knowledge if it
knows about the block and return.
* If the result is `AssignmentCheckResult::TooFarInFuture`, mildly punish the peer and return.
* If the result is `AssignmentCheckResult::Bad`, punish the peer and return.
* If the source is `MessageSource::Local(CandidateIndex)`
* check if the fingerprint appears under the `BlockEntry's` knowledge. If not, add it.
* Load the candidate entry for the given candidate index. It should exist unless there is a logic error in the
approval voting subsystem.
* Set the approval state for the validator index to `ApprovalState::Assigned` unless the approval state is set
already. This should not happen as long as the approval voting subsystem instructs us to ignore duplicate
assignments.
* Dispatch a `ApprovalDistributionV1Message::Assignment(assignment, candidate_index)` to all peers in the
`BlockEntry`'s `known_by` set, excluding the peer in the `source`, if `source` has kind `MessageSource::Peer`. Add
the fingerprint of the assignment to the knowledge of each peer.
#### `import_and_circulate_approval(source: MessageSource, approval: IndirectSignedApprovalVote)`
Imports an approval signature referenced by block hash and candidate index:
* Load the `BlockEntry` using `approval.block_hash` and the candidate entry using `approval.candidate_entry`. If
either does not exist, report the source if it is `MessageSource::Peer` and return.
* Compute a fingerprint for the approval.
* Compute a fingerprint for the corresponding assignment. If the `BlockEntry`'s knowledge does not contain that
fingerprint, then report the source if it is `MessageSource::Peer` and return. All references to a fingerprint after
this refer to the approval's, not the assignment's.
* If the source is `MessageSource::Peer(sender)`:
* check if `peer` appears under `known_by` and whether the fingerprint is in the knowledge of the peer. If the peer
does not know the block, report for providing data out-of-view and proceed. If the peer does know the block and
the `sent` knowledge contains the fingerprint, report for providing replicate data and return, otherwise, insert
into the `received` knowledge and return.
* If the message fingerprint appears under the `BlockEntry`'s `Knowledge`, give the peer a small positive reputation
boost, add the fingerprint to the peer's knowledge only if it knows about the block and return. Note that we must do
this after checking for out-of-view to avoid being spammed. If we did this check earlier, a peer could provide data
out-of-view repeatedly and be rewarded for it.
* Construct a `SignedApprovalVote` using the candidates hashes and check against the validator's approval key,
based on the session info of the block. If invalid or no such validator, return `Err(InvalidVoteError)`.
* If the result of checking the signature is `Ok(CheckedIndirectSignedApprovalVote)`:
* Dispatch `ApprovalVotingMessage::ImportApproval(approval)` .
* Give the peer a positive reputation boost and add the fingerprint to both our and the peer's knowledge.
* If the result is `Err(InvalidVoteError)`:
* Report the peer and return.
* Load the candidate entry for the given candidate index. It should exist unless there is a logic error in the
approval voting subsystem.
* Set the approval state for the validator index to `ApprovalState::Approved`. It should already be in the `Assigned`
state as our `BlockEntry` knowledge contains a fingerprint for the assignment.
* Dispatch a `ApprovalDistributionV1Message::Approval(approval)` to all peers in the `BlockEntry`'s `known_by` set,
excluding the peer in the `source`, if `source` has kind `MessageSource::Peer`. Add the fingerprint of the
assignment to the knowledge of each peer. Note that this obeys the politeness conditions:
* We guarantee elsewhere that all peers within `known_by` are aware of all assignments relative to the block.
* We've checked that this specific approval has a corresponding assignment within the `BlockEntry`.
* Thus, all peers are aware of the assignment or have a message to them in-flight which will make them so.
#### `unify_with_peer(peer: PeerId, view)`
1. Initialize a set `missing_knowledge = {}`
For each block in the view:
1. Load the `BlockEntry` for the block. If the block is unknown, or the number is less than or equal to the view's
finalized number go to step 6.
1. Inspect the `known_by` set of the `BlockEntry`. If the peer already knows all assignments/approvals, go to step 6.
1. Add the peer to `known_by` and add the hash and missing knowledge of the block to `missing_knowledge`.
1. Return to step 2 with the ancestor of the block.
1. For each block in `missing_knowledge`, send all assignments and approvals for all candidates in those blocks to the
peer.
@@ -0,0 +1,30 @@
# Approval voting parallel
The approval-voting-parallel subsystem acts as an orchestrator for the tasks handled by the [Approval Voting](approval-voting.md)
and [Approval Distribution](approval-distribution.md) subsystems. Initially, these two systems operated separately and interacted
with each other and other subsystems through orchestra.
With approval-voting-parallel, we have a single subsystem that creates two types of workers:
- Four approval-distribution workers that operate in parallel, each handling tasks based on the validator_index of the message
originator.
- One approval-voting worker that performs the tasks previously managed by the standalone approval-voting subsystem.
This subsystem does not maintain any state. Instead, it functions as an orchestrator that:
- Spawns and initializes each workers.
- Forwards each message and signal to the appropriate worker.
- Aggregates results for messages that require input from more than one worker, such as GetApprovalSignatures.
## Forwarding logic
The messages received and forwarded by approval-voting-parallel split in three categories:
- Signals which need to be forwarded to all workers.
- Messages that only the `approval-voting` worker needs to handle, `ApprovalVotingParallelMessage::ApprovedAncestor`
and `ApprovalVotingParallelMessage::GetApprovalSignaturesForCandidate`
- Control messages that all `approval-distribution` workers need to receive `ApprovalVotingParallelMessage::NewBlocks`,
`ApprovalVotingParallelMessage::ApprovalCheckingLagUpdate` and all network bridge variants `ApprovalVotingParallelMessage::NetworkBridgeUpdate`
except `ApprovalVotingParallelMessage::NetworkBridgeUpdate(NetworkBridgeEvent::PeerMessage)`
- Data messages `ApprovalVotingParallelMessage::NetworkBridgeUpdate(NetworkBridgeEvent::PeerMessage)` which need to be sent
just to a single `approval-distribution` worker based on the ValidatorIndex. The logic for assigning the work is:
```
assigned_worker_index = validator_index % number_of_workers;
```
@@ -0,0 +1,531 @@
# Approval Voting
Reading the [section on the approval protocol](../../protocol-approval.md) will likely be necessary to understand the
aims of this subsystem.
Approval votes are split into two parts: Assignments and Approvals. Validators first broadcast their assignment to
indicate intent to check a candidate. Upon successfully checking, they don't immediately send the vote instead
they queue the check for a short period of time `MAX_APPROVAL_COALESCE_WAIT_TICKS` to give the opportunity of the
validator to vote for more than one candidate. Once MAX_APPROVAL_COALESCE_WAIT_TICKS have passed or at least
`MAX_APPROVAL_COALESCE_COUNT` are ready they broadcast an approval vote for all candidates. If a validator
doesn't broadcast their approval vote shortly after issuing an assignment, this is an indication that they are
being prevented from recovering or validating the block data and that more validators should self-select to
check the candidate. This is known as a "no-show".
The core of this subsystem is a Tick-based timer loop, where Ticks are 500ms. We also reason about time in terms of
`DelayTranche`s, which measure the number of ticks elapsed since a block was produced. We track metadata for all
un-finalized but included candidates. We compute our local assignments to check each candidate, as well as which
`DelayTranche` those assignments may be minimally triggered at. As the same candidate may appear in more than one block,
we must produce our potential assignments for each (Block, Candidate) pair. The timing loop is based on waiting for
assignments to become no-shows or waiting to broadcast and begin our own assignment to check.
Another main component of this subsystem is the logic for determining when a (Block, Candidate) pair has been approved
and when to broadcast and trigger our own assignment. Once a (Block, Candidate) pair has been approved, we mark a
corresponding bit in the `BlockEntry` that indicates the candidate has been approved under the block. When we trigger
our own assignment, we broadcast it via Approval Distribution, begin fetching the data from Availability Recovery, and
then pass it through to the Candidate Validation. Once these steps are successful, we issue our approval vote. If any of
these steps fail, we don't issue any vote and will "no-show" from the perspective of other validators in addition a
dispute is raised via the dispute-coordinator, by sending `IssueLocalStatement`.
Where this all fits into Pezkuwi is via block finality. Our goal is to not finalize any block containing a candidate
that is not approved. We provide a hook for a custom GRANDPA voting rule - GRANDPA makes requests of the form (target,
minimum) consisting of a target block (i.e. longest chain) that it would like to finalize, and a minimum block which,
due to the rules of GRANDPA, must be voted on. The minimum is typically the last finalized block, but may be beyond it,
in the case of having a last-round-estimate beyond the last finalized. Thus, our goal is to inform GRANDPA of some block
between target and minimum which we believe can be finalized safely. We do this by iterating backwards from the target
to the minimum and finding the longest continuous chain from minimum where all candidates included by those blocks have
been approved.
## Protocol
Input:
* `ApprovalVotingMessage::ImportAssignment`
* `ApprovalVotingMessage::ImportApproval`
* `ApprovalVotingMessage::ApprovedAncestor`
Output:
* `ApprovalDistributionMessage::DistributeAssignment`
* `ApprovalDistributionMessage::DistributeApproval`
* `RuntimeApiMessage::Request`
* `ChainApiMessage`
* `AvailabilityRecoveryMessage::Recover`
* `CandidateExecutionMessage::ValidateFromExhaustive`
## Functionality
The approval voting subsystem is responsible for casting votes and determining approval of candidates and as a result,
blocks.
This subsystem wraps a database which is used to store metadata about unfinalized blocks and the candidates within them.
Candidates may appear in multiple blocks, and assignment criteria are chosen differently based on the hash of the block
they appear in.
## Database Schema
The database schema is designed with the following goals in mind:
1. To provide an easy index from unfinalized blocks to candidates
1. To provide a lookup from candidate hash to approval status
1. To be easy to clear on start-up. What has happened while we were offline is unimportant.
1. To be fast to clear entries outdated by finality
Structs:
```rust
struct TrancheEntry {
tranche: DelayTranche,
// assigned validators who have not yet approved, and the instant we received
// their assignment.
assignments: Vec<(ValidatorIndex, Tick)>,
}
pub struct OurAssignment {
/// Our assignment certificate.
cert: AssignmentCertV2,
/// The tranche for which the assignment refers to.
tranche: DelayTranche,
/// Our validator index for the session in which the candidates were included.
validator_index: ValidatorIndex,
/// Whether the assignment has been triggered already.
triggered: bool,
}
pub struct ApprovalEntry {
tranches: Vec<TrancheEntry>, // sorted ascending by tranche number.
backing_group: GroupIndex,
our_assignment: Option<OurAssignment>,
our_approval_sig: Option<ValidatorSignature>,
assigned_validators: Bitfield, // `n_validators` bits.
approved: bool,
}
struct CandidateEntry {
candidate: CandidateReceipt,
session: SessionIndex,
// Assignments are based on blocks, so we need to track assignments separately
// based on the block we are looking at.
block_assignments: HashMap<Hash, ApprovalEntry>,
approvals: Bitfield, // n_validators bits
}
struct BlockEntry {
block_hash: Hash,
session: SessionIndex,
slot: Slot,
// random bytes derived from the VRF submitted within the block by the block
// author as a credential and used as input to approval assignment criteria.
relay_vrf_story: [u8; 32],
// The candidates included as-of this block and the index of the core they are
// leaving. Sorted ascending by core index.
candidates: Vec<(CoreIndex, Hash)>,
// A bitfield where the i'th bit corresponds to the i'th candidate in `candidates`.
// The i'th bit is `true` iff the candidate has been approved in the context of
// this block. The block can be considered approved has all bits set to 1
approved_bitfield: Bitfield,
children: Vec<Hash>,
// A list of candidates we have checked, but didn't not sign and
// advertise the vote yet.
candidates_pending_signature: BTreeMap<CandidateIndex, CandidateSigningContext>,
// Assignments we already distributed. A 1 bit means the candidate index for which
// we already have sent out an assignment. We need this to avoid distributing
// multiple core assignments more than once.
distributed_assignments: Bitfield,
}
// slot_duration * 2 + DelayTranche gives the number of delay tranches since the
// unix epoch.
type Tick = u64;
struct StoredBlockRange(BlockNumber, BlockNumber);
```
In the schema, we map
```
"StoredBlocks" => StoredBlockRange
BlockNumber => Vec<BlockHash>
BlockHash => BlockEntry
CandidateHash => CandidateEntry
```
## Logic
```rust
const APPROVAL_SESSIONS: SessionIndex = 6;
// The minimum amount of ticks that an assignment must have been known for.
const APPROVAL_DELAY: Tick = 2;
```
In-memory state:
```rust
struct ApprovalVoteRequest {
validator_index: ValidatorIndex,
block_hash: Hash,
candidate_index: CandidateIndex,
}
// Requests that background work (approval voting tasks) may need to make of the main subsystem
// task.
enum BackgroundRequest {
ApprovalVote(ApprovalVoteRequest),
// .. others, unspecified as per implementation.
}
// This is the general state of the subsystem. The actual implementation may split this
// into further pieces.
struct State {
earliest_session: SessionIndex,
session_info: Vec<SessionInfo>,
babe_epoch: Option<BabeEpoch>, // information about a cached BABE epoch.
keystore: Keystore,
// A scheduler which keeps at most one wakeup per hash, candidate hash pair and
// maps such pairs to `Tick`s.
wakeups: Wakeups,
// These are connected to each other.
background_tx: mpsc::Sender<BackgroundRequest>,
background_rx: mpsc::Receiver<BackgroundRequest>,
}
```
This guide section makes no explicit references to writes to or reads from disk. Instead, it handles them implicitly,
with the understanding that updates to block, candidate, and approval entries are persisted to disk.
[`SessionInfo`](../../runtime/session_info.md)
On start-up, we clear everything currently stored by the database. This is done by loading the `StoredBlockRange`,
iterating through each block number, iterating through each block hash, and iterating through each candidate referenced
by each block. Although this is `O(o*n*p)`, we don't expect to have more than a few unfinalized blocks at any time and
in extreme cases, a few thousand. The clearing operation should be relatively fast as a result.
Main loop:
* Each iteration, select over all of
* The next `Tick` in `wakeups`: trigger `wakeup_process` for each `(Hash, Hash)` pair scheduled under the `Tick` and
then remove all entries under the `Tick`.
* The next message from the overseer: handle the message as described in the [Incoming Messages
section](#incoming-messages)
* The next approval vote request from `background_rx`
* If this is an `ApprovalVoteRequest`, [Issue an approval vote](#issue-approval-vote).
### Incoming Messages
#### `OverseerSignal::BlockFinalized`
On receiving an `OverseerSignal::BlockFinalized(h)`, we fetch the block number `b` of that block from the `ChainApi`
subsystem. We update our `StoredBlockRange` to begin at `b+1`. Additionally, we remove all block entries and candidates
referenced by them up to and including `b`. Lastly, we prune out all descendants of `h` transitively: when we remove a
`BlockEntry` with number `b` that is not equal to `h`, we recursively delete all the `BlockEntry`s referenced as
children. We remove the `block_assignments` entry for the block hash and if `block_assignments` is now empty, remove the
`CandidateEntry`. We also update each of the `BlockNumber -> Vec<Hash>` keys in the database to reflect the blocks at
that height, clearing if empty.
#### `OverseerSignal::ActiveLeavesUpdate`
On receiving an `OverseerSignal::ActiveLeavesUpdate(update)`:
* We determine the set of new blocks that were not in our previous view. This is done by querying the ancestry of all
new items in the view and contrasting against the stored `BlockNumber`s. Typically, there will be only one new
block. We fetch the headers and information on these blocks from the `ChainApi` subsystem. Stale leaves in the
update can be ignored.
* We update the `StoredBlockRange` and the `BlockNumber` maps.
* We use the `RuntimeApiSubsystem` to determine information about these blocks. It is generally safe to assume that
runtime state is available for recent, unfinalized blocks. In the case that it isn't, it means that we are catching
up to the head of the chain and needn't worry about assignments to those blocks anyway, as the security assumption
of the protocol tolerates nodes being temporarily offline or out-of-date.
* We fetch the set of candidates included by each block by dispatching a `RuntimeApiRequest::CandidateEvents` and
checking the `CandidateIncluded` events.
* We fetch the session of the block by dispatching a `session_index_for_child` request with the parent-hash of the
block.
* If the `session index - APPROVAL_SESSIONS > state.earliest_session`, then bump `state.earliest_sessions` to that
amount and prune earlier sessions.
* If the session isn't in our `state.session_info`, load the session info for it and for all sessions since the
earliest-session, including the earliest-session, if that is missing. And it can be, just after pruning, if we've
done a big jump forward, as is the case when we've just finished chain synchronization.
* If any of the runtime API calls fail, we just warn and skip the block.
* We use the `RuntimeApiSubsystem` to determine the set of candidates included in these blocks and use BABE logic to
determine the slot number and VRF of the blocks.
* We also note how late we appear to have received the block. We create a `BlockEntry` for each block and a
`CandidateEntry` for each candidate obtained from `CandidateIncluded` events after making a
`RuntimeApiRequest::CandidateEvents` request.
* For each candidate, if the amount of needed approvals is more than the validators remaining after the backing group
of the candidate is subtracted, then the candidate is insta-approved as approval would be impossible otherwise. If
all candidates in the block are insta-approved, or there are no candidates in the block, then the block is
insta-approved. If the block is insta-approved, a [`ChainSelectionMessage::Approved`][CSM] should be sent for the
block.
* Ensure that the `CandidateEntry` contains a `block_assignments` entry for the block, with the correct backing group
set.
* If a validator in this session, compute and assign `our_assignment` for the `block_assignments`
* Only if not a member of the backing group.
* Run `RelayVRFModulo` and `RelayVRFDelay` according to the [the approvals protocol
section](../../protocol-approval.md#assignment-criteria). Ensure that the assigned core derived from the output is
covered by the auxiliary signature aggregated in the `VRFPRoof`.
* [Handle Wakeup](#handle-wakeup) for each new candidate in each new block - this will automatically broadcast a
0-tranche assignment, kick off approval work, and schedule the next delay.
* Dispatch an `ApprovalDistributionMessage::NewBlocks` with the meta information filled out for each new block.
#### `ApprovalVotingMessage::ImportAssignment`
On receiving a `ApprovalVotingMessage::ImportAssignment` message, we assume the assignment cert itself has already been
checked to be valid we proceed then to import the assignment inside the block entry. The cert itself contains
information necessary to determine the candidate that is being assigned-to. In detail:
* Load the `BlockEntry` for the relay-parent referenced by the message. If there is none, return
`AssignmentCheckResult::Bad`.
* Fetch the `SessionInfo` for the session of the block
* Determine the assignment key of the validator based on that.
* Determine the claimed core index by looking up the candidate with given index in `block_entry.candidates`. Return
`AssignmentCheckResult::Bad` if missing.
* Import the assignment.
* Load the candidate in question and access the `approval_entry` for the block hash the cert references.
* Ignore if we already observe the validator as having been assigned.
* Ensure the validator index is not part of the backing group for the candidate.
* Ensure the validator index is not present in the approval entry already.
* Create a tranche entry for the delay tranche in the approval entry and note the assignment within it.
* Note the candidate index within the approval entry.
* [Schedule a wakeup](#schedule-wakeup) for this block, candidate pair.
* return the appropriate `AssignmentCheckResult` on the response channel.
#### `ApprovalVotingMessage::ImportApproval`
On receiving a `ImportApproval(indirect_approval_vote, response_channel)` message:
* Fetch the `BlockEntry` from the indirect approval vote's `block_hash`. If none, return `ApprovalCheckResult::Bad`.
* Fetch all `CandidateEntry` from the indirect approval vote's `candidate_indices`. If the block did not trigger
inclusion of enough candidates, return `ApprovalCheckResult::Bad`.
* Send `ApprovalCheckResult::Accepted`
* [Import the checked approval vote](#import-checked-approval) for all candidates
#### `ApprovalVotingMessage::ApprovedAncestor`
On receiving an `ApprovedAncestor(Hash, BlockNumber, response_channel)`:
* Iterate over the ancestry of the hash all the way back to block number given, starting from the provided block hash.
Load the `CandidateHash`es from each block entry.
* Keep track of an `all_approved_max: Option<(Hash, BlockNumber, Vec<(Hash, Vec<CandidateHash>))>`.
* For each block hash encountered, load the `BlockEntry` associated. If any are not found, return `None` on the
response channel and conclude.
* If the block entry's `approval_bitfield` has all bits set to 1 and `all_approved_max == None`, set `all_approved_max
= Some((current_hash, current_number))`.
* If the block entry's `approval_bitfield` has any 0 bits, set `all_approved_max = None`.
* If `all_approved_max` is `Some`, push the current block hash and candidate hashes onto the list of blocks and
candidates `all_approved_max`.
* After iterating all ancestry, return `all_approved_max`.
### Updates and Auxiliary Logic
#### Import Checked Approval
* Import an approval vote which we can assume to have passed signature checks and correspond to an imported
assignment.
* Requires `(BlockEntry, CandidateEntry, ValidatorIndex)`
* Set the corresponding bit of the `approvals` bitfield in the `CandidateEntry` to `1`. If already `1`, return.
* Checks the approval state of a candidate under a specific block, and updates the block and candidate entries
accordingly.
* Checks the `ApprovalEntry` for the block.
* [determine the tranches to inspect](#determine-required-tranches) of the candidate,
* [the candidate is approved under the block](#check-approval), set the corresponding bit in the
`block_entry.approved_bitfield`.
* If the block is now fully approved and was not before, send a [`ChainSelectionMessage::Approved`][CSM].
* Otherwise, [schedule a wakeup of the candidate](#schedule-wakeup)
* If the approval vote originates locally, set the `our_approval_sig` in the candidate entry.
#### Handling Wakeup
* Handle a previously-scheduled wakeup of a candidate under a specific block.
* Requires `(relay_block, candidate_hash)`
* Load the `BlockEntry` and `CandidateEntry` from disk. If either is not present, this may have lost a race with
finality and can be ignored. Also load the `ApprovalEntry` for the block and candidate.
* [determine the `RequiredTranches` of the candidate](#determine-required-tranches).
* Determine if we should trigger our assignment.
* If we've already triggered or `OurAssignment` is `None`, we do not trigger.
* If we have `RequiredTranches::All`, then we trigger if the candidate is [not approved](#check-approval). We have
no next wakeup as we assume that other validators are doing the same and we will be implicitly woken up by
handling new votes.
* If we have `RequiredTranches::Pending { considered, next_no_show, uncovered, maximum_broadcast, clock_drift }`,
then we trigger if our assignment's tranche is less than or equal to `maximum_broadcast` and the current tick,
with `clock_drift` applied, is at least the tick of our tranche.
* If we have `RequiredTranches::Exact { .. }` then we do not trigger, because this value indicates that no new
assignments are needed at the moment.
* If we should trigger our assignment
* Import the assignment to the `ApprovalEntry`
* Broadcast on network with an `ApprovalDistributionMessage::DistributeAssignment`.
* [Launch approval work](#launch-approval-work) for the candidate.
* [Schedule a new wakeup](#schedule-wakeup) of the candidate.
#### Schedule Wakeup
* Requires `(approval_entry, candidate_entry)` which effectively denotes a `(Block Hash, Candidate Hash)` pair - the
candidate, along with the block it appears in.
* Also requires `RequiredTranches`
* If the `approval_entry` is approved, this doesn't need to be woken up again.
* If `RequiredTranches::All` - no wakeup. We assume other incoming votes will trigger wakeup and potentially
re-schedule.
* If `RequiredTranches::Pending { considered, next_no_show, uncovered, maximum_broadcast, clock_drift }` - schedule at
the lesser of the next no-show tick, or the tick, offset positively by `clock_drift` of the next non-empty tranche
we are aware of after `considered`, including any tranche containing our own unbroadcast assignment. This can lead
to no wakeup in the case that we have already broadcast our assignment and there are no pending no-shows; that is,
we have approval votes for every assignment we've received that is not already a no-show. In this case, we will be
re-triggered by other validators broadcasting their assignments.
* If `RequiredTranches::Exact { next_no_show, latest_assignment_tick, .. }` - set a wakeup for the earlier of the next
no-show tick or the latest assignment tick + `APPROVAL_DELAY`.
#### Launch Approval Work
* Requires `(SessionIndex, SessionInfo, CandidateReceipt, ValidatorIndex, backing_group, block_hash, candidate_index)`
* Extract the public key of the `ValidatorIndex` from the `SessionInfo` for the session.
* Issue an `AvailabilityRecoveryMessage::RecoverAvailableData(candidate, session_index, Some(backing_group),
Some(core_index), response_sender)`
* Load the historical validation code of the teyrchain by dispatching a
`RuntimeApiRequest::ValidationCodeByHash(descriptor.validation_code_hash)` against the state of `block_hash`.
* Spawn a background task with a clone of `background_tx`
* Wait for the available data
* Issue a `CandidateValidationMessage::ValidateFromExhaustive` message with `APPROVAL_EXECUTION_TIMEOUT` as the
timeout parameter.
* Wait for the result of validation
* Check that the result of validation, if valid, matches the commitments in the receipt.
* If valid, issue a message on `background_tx` detailing the request.
* If any of the data, the candidate, or the commitments are invalid, issue on `background_tx` a
[`DisputeCoordinatorMessage::IssueLocalStatement`](../../types/overseer-protocol.md#dispute-coordinator-message)
with `valid = false` to initiate a dispute.
#### Issue Approval Vote
* Fetch the block entry and candidate entry. Ignore if `None` - we've probably just lost a race with finality.
* [Import the checked approval vote](#import-checked-approval). It is "checked" as we've just issued the signature.
* IF `MAX_APPROVAL_COALESCE_COUNT` candidates are in the waiting queue
* Construct a `SignedApprovalVote` with the validator index for the session and all candidate hashes in the waiting queue.
* Construct a `IndirectSignedApprovalVote` using the information about the vote.
* Dispatch `ApprovalDistributionMessage::DistributeApproval`.
* ELSE
* Queue the candidate in the `BlockEntry::candidates_pending_signature`
* Arm a per BlockEntry timer with latest tick we can send the vote.
### Delayed vote distribution
* [Issue Approval Vote](#issue-approval-vote) arms once a per block timer if there are no requirements to send the
vote immediately.
* When the timer wakes up it will either:
* IF there is a candidate in the queue past its sending tick:
* Construct a `SignedApprovalVote` with the validator index for the session and all candidate hashes in the waiting queue.
* Construct a `IndirectSignedApprovalVote` using the information about the vote.
* Dispatch `ApprovalDistributionMessage::DistributeApproval`.
* ELSE
* Re-arm the timer with latest tick we have then send the vote.
### Determining Approval of Candidate
#### Determine Required Tranches
This logic is for inspecting an approval entry that tracks the assignments received, along with information on which
assignments have corresponding approval votes. Inspection also involves the current time and expected requirements and
is used to help the higher-level code determine the following:
* Whether to broadcast the local assignment
* Whether to check that the candidate entry has been completely approved.
* If the candidate is waiting on approval, when to schedule the next wakeup of the `(candidate, block)` pair at a
point where the state machine could be advanced.
These routines are pure functions which only depend on the environmental state. The expectation is that this
determination is re-run every time we attempt to update an approval entry: either when we trigger a wakeup to advance
the state machine based on a no-show or our own broadcast, or when we receive further assignments or approvals from the
network.
Thus it may be that at some point in time, we consider that tranches 0..X is required to be considered, but as we
receive more information, we might require fewer tranches. Or votes that we perceived to be missing and require
replacement are filled in and change our view.
Requires `(approval_entry, approvals_received, tranche_now, block_tick, no_show_duration, needed_approvals)`
```rust
enum RequiredTranches {
// All validators appear to be required, based on tranches already taken and remaining no-shows.
All,
// More tranches required - We're awaiting more assignments.
Pending {
/// The highest considered delay tranche when counting assignments.
considered: DelayTranche,
/// The tick at which the next no-show, of the assignments counted, would occur.
next_no_show: Option<Tick>,
/// The highest tranche to consider when looking to broadcast own assignment.
/// This should be considered along with the clock drift to avoid broadcasting
/// assignments that are before the local time.
maximum_broadcast: DelayTranche,
/// The clock drift, in ticks, to apply to the local clock when determining whether
/// to broadcast an assignment or when to schedule a wakeup. The local clock should be treated
/// as though it is `clock_drift` ticks earlier.
clock_drift: Tick,
},
// An exact number of required tranches and a number of no-shows. This indicates that the amount of `needed_approvals`
// are assigned and additionally all no-shows are covered.
Exact {
/// The tranche to inspect up to.
needed: DelayTranche,
/// The amount of missing votes that should be tolerated.
tolerated_missing: usize,
/// When the next no-show would be, if any. This is used to schedule the next wakeup in the
/// event that there are some assignments that don't have corresponding approval votes. If this
/// is `None`, all assignments have approvals.
next_no_show: Option<Tick>,
/// The last tick at which a needed assignment was received.
last_assignment_tick: Option<Tick>,
}
}
```
**Clock-drift and Tranche-taking**
Our vote-counting procedure depends heavily on how we interpret time based on the presence of no-shows - assignments
which have no corresponding approval after some time.
We have this is because of how we handle no-shows: we keep track of the depth of no-shows we are covering.
As an example: there may be initial no-shows in tranche 0. It'll take `no_show_duration` ticks before those are
considered no-shows. Then, we don't want to immediately take `no_show_duration` more tranches. Instead, we want to take
one tranche for each uncovered no-show. However, as we take those tranches, there may be further no-shows. Since these
depth-1 no-shows should have only been triggered after the depth-0 no-shows were already known to be no-shows, we need
to discount the local clock by `no_show_duration` to see whether these should be considered no-shows or not. There may
be malicious parties who broadcast their assignment earlier than they were meant to, who shouldn't be counted as instant
no-shows. We continue onwards to cover all depth-1 no-shows which may lead to depth-2 no-shows and so on.
Likewise, when considering how many tranches to take, the no-show depth should be used to apply a depth-discount or
clock drift to the `tranche_now`.
**Procedure**
* Start with `depth = 0`.
* Set a clock drift of `depth * no_show_duration`
* Take tranches up to `tranche_now - clock_drift` until all needed assignments are met.
* Keep track of the `next_no_show` according to the clock drift, as we go.
* Keep track of the `last_assignment_tick` as we go.
* If running out of tranches before then, return `Pending { considered, next_no_show, maximum_broadcast, clock_drift
}`
* If there are no no-shows, return `Exact { needed, tolerated_missing, next_no_show, last_assignment_tick }`
* `maximum_broadcast` is either `DelayTranche::max_value()` at tranche 0 or otherwise by the last considered tranche +
the number of uncovered no-shows at this point.
* If there are no-shows, return to the beginning, incrementing `depth` and attempting to cover the number of no-shows.
Each no-show must be covered by a non-empty tranche, which are tranches that have at least one assignment. Each
non-empty tranche covers exactly one no-show.
* If at any point, it seems that all validators are required, do an early return with `RequiredTranches::All` which
indicates that everyone should broadcast.
#### Check Approval
* Check whether a candidate is approved under a particular block.
* Requires `(block_entry, candidate_entry, approval_entry, n_tranches)`
* If we have `3 * n_approvals > n_validators`, return true. This is because any set with f+1 validators must have at
least one honest validator, who has approved the candidate.
* If `n_tranches` is `RequiredTranches::Pending`, return false
* If `n_tranches` is `RequiredTranches::All`, return false.
* If `n_tranches` is `RequiredTranches::Exact { tranche, tolerated_missing, latest_assignment_tick, .. }`, then we
return whether all assigned validators up to `tranche` less `tolerated_missing` have approved and
`latest_assignment_tick + APPROVAL_DELAY >= tick_now`.
* e.g. if we had 5 tranches and 1 tolerated missing, we would accept only if all but 1 of assigned validators in
tranches 0..=5 have approved. In that example, we also accept all validators in tranches 0..=5 having approved,
but that would indicate that the `RequiredTranches` value was incorrectly constructed, so it is not realistic.
`tolerated_missing` actually represents covered no-shows. If there are more missing approvals than there are
tolerated missing, that indicates that there are some assignments which are not yet no-shows, but may become
no-shows, and we should wait for the validators to either approve or become no-shows.
* e.g. If the above passes and the `latest_assignment_tick` was 5 and the current tick was 6, then we'd return
false.
### Time
#### Current Tranche
* Given the slot number of a block, and the current time, this informs about the current tranche.
* Convert `time.saturating_sub(slot_number.to_time())` to a delay tranches value
[CSM]: ../../types/overseer-protocol.md#chainselectionmessage
@@ -0,0 +1,7 @@
# Availability Subsystems
The availability subsystems are responsible for ensuring that Proofs of Validity of backed candidates are widely
available within the validator set, without requiring every node to retain a full copy. They accomplish this by broadly
distributing erasure-coded chunks of the PoV, keeping track of which validator has which chunk by means of signed
bitfields. They are also responsible for reassembling a complete PoV when required, e.g. when an approval checker needs
to validate a teyrchain block.
@@ -0,0 +1,84 @@
# Availability Distribution
This subsystem is responsible for distribution availability data to peers. Availability data are chunks, `PoV`s and
`AvailableData` (which is `PoV` + `PersistedValidationData`). It does so via request response protocols.
In particular this subsystem is responsible for:
- Respond to network requests requesting availability data by querying the [Availability
Store](../utility/availability-store.md).
- Request chunks from backing validators to put them in the local `Availability Store` whenever we find an occupied core
on any fresh leaf, this is to ensure availability by at least 2/3+ of all validators, this happens after a candidate
is backed.
- Fetch `PoV` from validators, when requested via `FetchPoV` message from backing (`pov_requester` module).
The backing subsystem is responsible of making available data available in the local `Availability Store` upon
validation. This subsystem will serve any network requests by querying that store.
## Protocol
This subsystem does not handle any peer set messages, but the `pov_requester` does connect to validators of the same
backing group on the validation peer set, to ensure fast propagation of statements between those validators and for
ensuring already established connections for requesting `PoV`s. Other than that this subsystem drives request/response
protocols.
Input:
- `OverseerSignal::ActiveLeaves(ActiveLeavesUpdate)`
- `AvailabilityDistributionMessage{msg: ChunkFetchingRequest}`
- `AvailabilityDistributionMessage{msg: PoVFetchingRequest}`
- `AvailabilityDistributionMessage{msg: FetchPoV}`
Output:
- `NetworkBridgeMessage::SendRequests(Requests, IfDisconnected::TryConnect)`
- `AvailabilityStore::QueryChunk(candidate_hash, index, response_channel)`
- `AvailabilityStore::StoreChunk(candidate_hash, chunk)`
- `AvailabilityStore::QueryAvailableData(candidate_hash, response_channel)`
- `RuntimeApiRequest::SessionIndexForChild`
- `RuntimeApiRequest::SessionInfo`
- `RuntimeApiRequest::AvailabilityCores`
## Functionality
### PoV Requester
The PoV requester in the `pov_requester` module takes care of staying connected to validators of the current backing
group of this very validator on the `Validation` peer set and it will handle `FetchPoV` requests by issuing network
requests to those validators. It will check the hash of the received `PoV`, but will not do any further validation. That
needs to be done by the original `FetchPoV` sender (backing subsystem).
### Chunk Requester
After a candidate is backed, the availability of the PoV block must be confirmed by 2/3+ of all validators. The chunk
requester is responsible of making that availability a reality.
It does that by querying checking occupied cores for all active leaves. For each occupied core it will spawn a task
fetching the erasure chunk which has the `ValidatorIndex` of the node. For this an `ChunkFetchingRequest` is issued, via
Substrate's generic request/response protocol.
The spawned task will start trying to fetch the chunk from validators in responsible group of the occupied core, in a
random order. For ensuring that we use already open TCP connections wherever possible, the requester maintains a cache
and preserves that random order for the entire session.
Note however that, because not all validators in a group have to be actual backers, not all of them are required to have
the needed chunk. This in turn could lead to low throughput, as we have to wait for fetches to fail, before reaching a
validator finally having our chunk. We do rank back validators not delivering our chunk, but as backers could vary from
block to block on a perfectly legitimate basis, this is still not ideal. See issues
[2509](https://github.com/paritytech/polkadot/issues/2509) and
[2512](https://github.com/paritytech/polkadot/issues/2512) for more information.
The current implementation also only fetches chunks for occupied cores in blocks in active leaves. This means though, if
active leaves skips a block or we are particularly slow in fetching our chunk, we might not fetch our chunk if
availability reached 2/3 fast enough (slot becomes free). This is not desirable as we would like as many validators as
possible to have their chunk. See this [issue](https://github.com/paritytech/polkadot/issues/2513) for more details.
### Serving
On the other side the subsystem will listen for incoming `ChunkFetchingRequest`s and `PoVFetchingRequest`s from the
network bridge and will respond to queries, by looking the requested chunks and `PoV`s up in the availability store,
this happens in the `responder` module.
We rely on the backing subsystem to make available data available locally in the `Availability Store` after it has
validated it.
@@ -0,0 +1,184 @@
# Availability Recovery
This subsystem is responsible for recovering the data made available via the
[Availability Distribution](availability-distribution.md) subsystem, necessary for candidate validation during the
approval/disputes processes. Additionally, it is also being used by collators to recover PoVs in adversarial scenarios
where the other collators of the para are censoring blocks.
According to the Pezkuwi protocol, in order to recover any given `AvailableData`, we generally must recover at least
`f + 1` pieces from validators of the session. Thus, we should connect to and query randomly chosen validators until we
have received `f + 1` pieces.
In practice, there are various optimisations implemented in this subsystem which avoid querying all chunks from
different validators and/or avoid doing the chunk reconstruction altogether.
## Protocol
This version of the availability recovery subsystem is based only on request-response network protocols.
Input:
* `AvailabilityRecoveryMessage::RecoverAvailableData(candidate, session, backing_group, core_index, response)`
Output:
* `NetworkBridgeMessage::SendRequests`
* `AvailabilityStoreMessage::QueryAllChunks`
* `AvailabilityStoreMessage::QueryAvailableData`
* `AvailabilityStoreMessage::QueryChunkSize`
## Functionality
We hold a state which tracks the currently ongoing recovery tasks. A `RecoveryTask` is a structure encapsulating all
network tasks needed in order to recover the available data in respect to a candidate.
Each `RecoveryTask` has a collection of ordered recovery strategies to try.
```rust
/// Subsystem state.
struct State {
/// Each recovery task is implemented as its own async task,
/// and these handles are for communicating with them.
ongoing_recoveries: FuturesUnordered<RecoveryHandle>,
/// A recent block hash for which state should be available.
live_block: (BlockNumber, Hash),
/// An LRU cache of recently recovered data.
availability_lru: LruMap<CandidateHash, CachedRecovery>,
/// Cached runtime info.
runtime_info: RuntimeInfo,
}
struct RecoveryParams {
/// Discovery ids of `validators`.
pub validator_authority_keys: Vec<AuthorityDiscoveryId>,
/// Number of validators.
pub n_validators: usize,
/// The number of regular chunks needed.
pub threshold: usize,
/// The number of systematic chunks needed.
pub systematic_threshold: usize,
/// A hash of the relevant candidate.
pub candidate_hash: CandidateHash,
/// The root of the erasure encoding of the candidate.
pub erasure_root: Hash,
/// Metrics to report.
pub metrics: Metrics,
/// Do not request data from availability-store. Useful for collators.
pub bypass_availability_store: bool,
/// The type of check to perform after available data was recovered.
pub post_recovery_check: PostRecoveryCheck,
/// The blake2-256 hash of the PoV.
pub pov_hash: Hash,
/// Protocol name for ChunkFetchingV1.
pub req_v1_protocol_name: ProtocolName,
/// Protocol name for ChunkFetchingV2.
pub req_v2_protocol_name: ProtocolName,
/// Whether or not chunk mapping is enabled.
pub chunk_mapping_enabled: bool,
/// Channel to the erasure task handler.
pub erasure_task_tx: mpsc::Sender<ErasureTask>,
}
pub struct RecoveryTask<Sender: overseer::AvailabilityRecoverySenderTrait> {
sender: Sender,
params: RecoveryParams,
strategies: VecDeque<Box<dyn RecoveryStrategy<Sender>>>,
state: task::State,
}
#[async_trait::async_trait]
/// Common trait for runnable recovery strategies.
pub trait RecoveryStrategy<Sender: overseer::AvailabilityRecoverySenderTrait>: Send {
/// Main entry point of the strategy.
async fn run(
mut self: Box<Self>,
state: &mut task::State,
sender: &mut Sender,
common_params: &RecoveryParams,
) -> Result<AvailableData, RecoveryError>;
/// Return the name of the strategy for logging purposes.
fn display_name(&self) -> &'static str;
/// Return the strategy type for use as a metric label.
fn strategy_type(&self) -> &'static str;
}
```
### Signal Handling
On `ActiveLeavesUpdate`, if `activated` is non-empty, set `state.live_block_hash` to the first block in `Activated`.
Ignore `BlockFinalized` signals.
On `Conclude`, shut down the subsystem.
#### `AvailabilityRecoveryMessage::RecoverAvailableData(...)`
1. Check the `availability_lru` for the candidate and return the data if present.
1. Check if there is already a recovery handle for the request. If so, add the response handle to it.
1. Otherwise, load the session info for the given session under the state of `live_block_hash`, and initiate a recovery
task with `launch_recovery_task`. Add a recovery handle to the state and add the response channel to it.
1. If the session info is not available, return `RecoveryError::Unavailable` on the response channel.
### Recovery logic
#### `handle_recover(...) -> Result<()>`
Instantiate the appropriate `RecoveryStrategy`es, based on the subsystem configuration, params and session info.
Call `launch_recovery_task()`.
#### `launch_recovery_task(state, ctx, response_sender, recovery_strategies, params) -> Result<()>`
Create the `RecoveryTask` and launch it as a background task running `recovery_task.run()`.
#### `recovery_task.run(mut self) -> Result<AvailableData, RecoveryError>`
* Loop:
* Pop a strategy from the queue. If none are left, return `RecoveryError::Unavailable`.
* Run the strategy.
* If the strategy returned successfully or returned `RecoveryError::Invalid`, break the loop.
### Recovery strategies
#### `FetchFull`
This strategy tries requesting the full available data from the validators in the backing group to
which the node is already connected. They are tried one by one in a random order.
It is very performant if there's enough network bandwidth and the backing group is not overloaded.
The costly reed-solomon reconstruction is not needed.
#### `FetchSystematicChunks`
Very similar to `FetchChunks` below but requests from the validators that hold the systematic chunks, so that we avoid
reed-solomon reconstruction. Only possible if `node_features::FeatureIndex::AvailabilityChunkMapping` is enabled and
the `core_index` is supplied (currently only for recoveries triggered by approval voting).
More info in
[RFC-47](https://github.com/polkadot-fellows/RFCs/blob/main/text/0047-assignment-of-availability-chunks.md).
#### `FetchChunks`
The least performant strategy but also the most comprehensive one. It's the only one that cannot fail under the
byzantine threshold assumption, so it's always added as the last one in the `recovery_strategies` queue.
Performs parallel chunk requests to validators. When enough chunks were received, do the reconstruction.
In the worst case, all validators will be tried.
### Default recovery strategy configuration
#### For validators
If the estimated available data size is smaller than a configured constant (currently 1Mib for Pezkuwi or 4Mib for
other networks), try doing `FetchFull` first.
Next, if the preconditions described in `FetchSystematicChunks` above are met, try systematic recovery.
As a last resort, do `FetchChunks`.
#### For collators
Collators currently only use `FetchChunks`, as they only attempt recoveries in rare scenarios.
Moreover, the recovery task is specially configured to not attempt requesting data from the local availability-store
(because it doesn't exist) and to not reencode the data after a successful recovery (because it's an expensive check
that is not needed; checking the pov_hash is enough for collators).
@@ -0,0 +1,40 @@
# Bitfield Distribution
Validators vote on the availability of a backed candidate by issuing signed bitfields, where each bit corresponds to a
single candidate. These bitfields can be used to compactly determine which backed candidates are available or not based
on a 2/3+ quorum.
## Protocol
`PeerSet`: `Validation`
Input: [`BitfieldDistributionMessage`](../../types/overseer-protocol.md#bitfield-distribution-message) which are
gossiped to all peers, no matter if validator or not.
Output:
- `NetworkBridge::SendValidationMessage([PeerId], message)` gossip a verified incoming bitfield on to interested
subsystems within this validator node.
- `NetworkBridge::ReportPeer(PeerId, cost_or_benefit)` improve or penalize the reputation of peers based on the messages
that are received relative to the current view.
- `ProvisionerMessage::ProvisionableData(ProvisionableData::Bitfield(relay_parent, SignedAvailabilityBitfield))` pass on
the bitfield to the other submodules via the overseer.
## Functionality
This is implemented as a gossip system.
It is necessary to track peer connection, view change, and disconnection events, in order to maintain an index of which
peers are interested in which relay parent bitfields.
Before gossiping incoming bitfields, they must be checked to be signed by one of the validators of the validator set
relevant to the current relay parent. Only accept bitfields relevant to our current view and only distribute bitfields
to other peers when relevant to their most recent view. Accept and distribute only one bitfield per validator.
When receiving a bitfield either from the network or from a `DistributeBitfield` message, forward it along to the block
authorship (provisioning) subsystem for potential inclusion in a block.
Peers connecting after a set of valid bitfield gossip messages was received, those messages must be cached and sent upon
connection of new peers or re-connecting peers.
@@ -0,0 +1,37 @@
# Bitfield Signing
Validators vote on the availability of a backed candidate by issuing signed bitfields, where each bit corresponds to a
single candidate. These bitfields can be used to compactly determine which backed candidates are available or not based
on a 2/3+ quorum.
## Protocol
Input:
There is no dedicated input mechanism for bitfield signing. Instead, Bitfield Signing produces a bitfield representing
the current state of availability on `StartWork`.
Output:
- `BitfieldDistribution::DistributeBitfield`: distribute a locally signed bitfield
- `AvailabilityStore::QueryChunk(CandidateHash, validator_index, response_channel)`
## Functionality
Upon receipt of an `ActiveLeavesUpdate`, launch bitfield signing job for each `activated` head referring to a fresh
leaf. Stop the job for each `deactivated` head.
## Bitfield Signing Job
Localized to a specific relay-parent `r` If not running as a validator, do nothing.
- For each fresh leaf, begin by waiting a fixed period of time so availability distribution has the chance to make
candidates available.
- Determine our validator index `i`, the set of backed candidates pending availability in `r`, and which bit of the
bitfield each corresponds to.
- Start with an empty bitfield. For each bit in the bitfield, if there is a candidate pending availability, query the
[Availability Store](../utility/availability-store.md) for whether we have the availability chunk for our validator
index. The `OccupiedCore` struct contains the candidate hash so the full candidate does not need to be fetched from
runtime.
- For all chunks we have, set the corresponding bit in the bitfield.
- Sign the bitfield and dispatch a `BitfieldDistribution::DistributeBitfield` message.
@@ -0,0 +1,15 @@
# Backing Subsystems
The backing subsystems, when conceived as a black box, receive an arbitrary quantity of parablock candidates and
associated proofs of validity from arbitrary untrusted collators. From these, they produce a bounded quantity of
backable candidates which relay chain block authors may choose to include in a subsequent block.
In broad strokes, the flow operates like this:
- **Candidate Selection** winnows the field of parablock candidates, selecting up to one of them to second.
- **Candidate Backing** ensures that a seconding candidate is valid, then generates the appropriate `Statement`. It also
keeps track of which candidates have received the backing of a quorum of other validators.
- **Statement Distribution** is the networking component which ensures that all validators receive each others'
statements.
- **PoV Distribution** is the networking component which ensures that validators considering a candidate can get the
appropriate PoV.
@@ -0,0 +1,189 @@
# Candidate Backing
> NOTE: This module has suffered changes for the elastic scaling implementation. As a result, parts of this document may
be out of date and will be updated at a later time. Issue tracking the update:
https://github.com/pezkuwichain/pezkuwi-sdk/issues/132
The Candidate Backing subsystem ensures every parablock considered for relay block inclusion has been seconded by at
least one validator, and approved by a quorum. Parablocks for which not enough validators will assert correctness are
discarded. If the block later proves invalid, the initial backers are slashable; this gives Pezkuwi a rational threat
model during subsequent stages.
Its role is to produce backable candidates for inclusion in new relay-chain blocks. It does so by issuing signed
[`Statement`s][Statement] and tracking received statements signed by other validators. Once enough statements are
received, they can be combined into backing for specific candidates.
Note that though the candidate backing subsystem attempts to produce as many backable candidates as possible, it does
_not_ attempt to choose a single authoritative one. The choice of which actually gets included is ultimately up to the
block author, by whatever metrics it may use; those are opaque to this subsystem.
Once a sufficient quorum has agreed that a candidate is valid, this subsystem notifies the [Provisioner][PV], which in
turn engages block production mechanisms to include the parablock.
## Protocol
Input: [`CandidateBackingMessage`][CBM]
Output:
* [`CandidateValidationMessage`][CVM]
* [`RuntimeApiMessage`][RAM]
* [`CollatorProtocolMessage`][CPM]
* [`ProvisionerMessage`][PM]
* [`AvailabilityDistributionMessage`][ADM]
* [`StatementDistributionMessage`][SDM]
## Functionality
The [Collator Protocol][CP] subsystem is the primary source of non-overseer messages into this subsystem. That subsystem
generates appropriate [`CandidateBackingMessage`s][CBM] and passes them to this subsystem.
This subsystem requests validation from the [Candidate Validation][CV] and generates an appropriate
[`Statement`][Statement]. All `Statement`s are then passed on to the [Statement Distribution][SD] subsystem to be
gossiped to peers. When [Candidate Validation][CV] decides that a candidate is invalid, and it was recommended to us to
second by our own [Collator Protocol][CP] subsystem, a message is sent to the [Collator Protocol][CP] subsystem with the
candidate's hash so that the collator which recommended it can be penalized.
The subsystem should maintain a set of handles to Candidate Backing Jobs that are currently live, as well as the
relay-parent to which they correspond.
### On Overseer Signal
* If the signal is an [`OverseerSignal`][OverseerSignal]`::ActiveLeavesUpdate`:
* spawn a Candidate Backing Job for each `activated` head referring to a fresh leaf, storing a bidirectional channel
with the Candidate Backing Job in the set of handles.
* cease the Candidate Backing Job for each `deactivated` head, if any.
* If the signal is an [`OverseerSignal`][OverseerSignal]`::Conclude`: Forward conclude messages to all jobs, wait a
small amount of time for them to join, and then exit.
### On Receiving `CandidateBackingMessage`
* If the message is a [`CandidateBackingMessage`][CBM]`::GetBackedCandidates`, get all backable candidates from the
statement table and send them back.
* If the message is a [`CandidateBackingMessage`][CBM]`::Second`, sign and dispatch a `Seconded` statement only if we
have not seconded any other candidate and have not signed a `Valid` statement for the requested candidate. Signing
both a `Seconded` and `Valid` message is a double-voting misbehavior with a heavy penalty, and this could occur if
another validator has seconded the same candidate and we've received their message before the internal seconding
request.
* If the message is a [`CandidateBackingMessage`][CBM]`::Statement`, count the statement to the quorum. If the statement
in the message is `Seconded` and it contains a candidate that belongs to our assignment, request the corresponding
`PoV` from the backing node via `AvailabilityDistribution` and launch validation. Issue our own `Valid` or `Invalid`
statement as a result.
If the seconding node did not provide us with the `PoV` we will retry fetching from other backing validators.
> big TODO: "contextual execution"
>
> * At the moment we only allow inclusion of _new_ teyrchain candidates validated by _current_ validators.
> * Allow inclusion of _old_ teyrchain candidates validated by _current_ validators.
> * Allow inclusion of _old_ teyrchain candidates validated by _old_ validators.
>
> This will probably blur the lines between jobs, will probably require inter-job communication and a short-term memory
> of recently backable, but not backed candidates.
## Candidate Backing Job
The Candidate Backing Job represents the work a node does for backing candidates with respect to a particular
relay-parent.
The goal of a Candidate Backing Job is to produce as many backable candidates as possible. This is done via signed
[`Statement`s][STMT] by validators. If a candidate receives a majority of supporting Statements from the Teyrchain
Validators currently assigned, then that candidate is considered backable.
### On Startup
* Fetch current validator set, validator -> teyrchain assignments from [`Runtime API`][RA] subsystem using
[`RuntimeApiRequest::Validators`][RAM] and [`RuntimeApiRequest::ValidatorGroups`][RAM]
* Determine if the node controls a key in the current validator set. Call this the local key if so.
* If the local key exists, extract the teyrchain head and validation function from the [`Runtime API`][RA] for the
teyrchain the local key is assigned to by issuing a [`RuntimeApiRequest::Validators`][RAM]
* Issue a [`RuntimeApiRequest::SigningContext`][RAM] message to get a context that will later be used upon signing.
### On Receiving New Candidate Backing Message
```rust
match msg {
GetBackedCandidates(hashes, tx) => {
// Send back a set of backable candidates.
}
CandidateBackingMessage::Second(hash, candidate) => {
if candidate is unknown and in local assignment {
if spawn_validation_work(candidate, teyrchain head, validation function).await == Valid {
send(DistributePoV(pov))
}
}
}
CandidateBackingMessage::Statement(hash, statement) => {
// count to the votes on this candidate
if let Statement::Seconded(candidate) = statement {
if candidate.teyrchain_id == our_assignment {
spawn_validation_work(candidate, teyrchain head, validation function)
}
}
}
}
```
Add `Seconded` statements and `Valid` statements to a quorum. If the quorum reaches a pre-defined threshold, send a
[`ProvisionerMessage`][PM]`::ProvisionableData(ProvisionableData::BackedCandidate(CandidateReceipt))` message. `Invalid`
statements that conflict with already witnessed `Seconded` and `Valid` statements for the given candidate, statements
that are double-votes, self-contradictions and so on, should result in issuing a
[`ProvisionerMessage`][PM]`::MisbehaviorReport` message for each newly detected case of this kind.
Backing does not need to concern itself with providing statements to the dispute coordinator as the dispute coordinator
scrapes them from chain. This way the import is batched and contains only statements that actually made it on some
chain.
### Validating Candidates
```rust
fn spawn_validation_work(candidate, teyrchain head, validation function) {
asynchronously {
let pov = (fetch pov block).await
let valid = (validate pov block).await;
if valid {
// make PoV available for later distribution. Send data to the availability store to keep.
// sign and dispatch `valid` statement to network if we have not seconded the given candidate.
} else {
// sign and dispatch `invalid` statement to network.
}
}
}
```
### Fetch PoV Block
Create a `(sender, receiver)` pair. Dispatch a [`AvailabilityDistributionMessage`][ADM]`::FetchPoV{ validator_index,
pov_hash, candidate_hash, tx, }` and listen on the passed receiver for a response. Availability distribution will send
the request to the validator specified by `validator_index`, which might not be serving it for whatever reasons,
therefore we need to retry with other backing validators in that case.
### Validate PoV Block
Create a `(sender, receiver)` pair. Dispatch a `CandidateValidationMessage::Validate(validation function, candidate,
pov, BACKING_EXECUTION_TIMEOUT, sender)` and listen on the receiver for a response.
### Distribute Signed Statement
Dispatch a [`StatementDistributionMessage`][SDM]`::Share(relay_parent, SignedFullStatementWithPVD)`.
[OverseerSignal]: ../../types/overseer-protocol.md#overseer-signal
[Statement]: ../../types/backing.md#statement-type
[STMT]: ../../types/backing.md#statement-type
[CPM]: ../../types/overseer-protocol.md#collator-protocol-message
[RAM]: ../../types/overseer-protocol.md#runtime-api-message
[CVM]: ../../types/overseer-protocol.md#validation-request-type
[PM]: ../../types/overseer-protocol.md#provisioner-message
[CBM]: ../../types/overseer-protocol.md#candidate-backing-message
[ADM]: ../../types/overseer-protocol.md#availability-distribution-message
[SDM]: ../../types/overseer-protocol.md#statement-distribution-message
[DCM]: ../../types/overseer-protocol.md#dispute-coordinator-message
[CP]: ../collators/collator-protocol.md
[CV]: ../utility/candidate-validation.md
[SD]: statement-distribution.md
[RA]: ../utility/runtime-api.md
[PV]: ../utility/provisioner.md
@@ -0,0 +1 @@
# PoV Distribution
@@ -0,0 +1,162 @@
# Prospective Teyrchains
> NOTE: This module has suffered changes for the elastic scaling implementation. As a result, parts of this document may
be out of date and will be updated at a later time. Issue tracking the update:
https://github.com/pezkuwichain/pezkuwi-sdk/issues/132
## Overview
**Purpose:** Tracks and handles prospective teyrchain fragments and informs
other backing-stage subsystems of work to be done.
"prospective":
- [*prə'spɛktɪv*] adj.
- future, likely, potential
Asynchronous backing changes the runtime to accept teyrchain candidates from a
certain allowed range of historic relay-parents. This means we can now build
*prospective teyrchains* that is, trees of potential (but likely) future
teyrchain blocks. This is the subsystem responsible for doing so.
Other subsystems such as Backing rely on Prospective Teyrchains, e.g. for
determining if a candidate can be seconded. This subsystem is the main
coordinator of work within the node for the collation and backing phases of
teyrchain consensus.
Prospective Teyrchains is primarily an implementation of fragment trees. It also
handles concerns such as:
- the relay-chain being forkful
- session changes
See the following sections for more details.
### Fragment Trees
This subsystem builds up fragment trees, which are trees of prospective para
candidates. Each path through the tree represents a possible state transition
path for the para. Each potential candidate is a fragment, or a node, in the
tree. Candidates are validated against constraints as they are added.
This subsystem builds up trees for each relay-chain block in the view, for each
para. These fragment trees are used for:
- providing backable candidates to other subsystems
- sanity-checking that candidates can be seconded
- getting seconded candidates under active leaves
- etc.
For example, here is a tree with several possible paths:
```
Para Head registered by the relay chain: included_head
↲ ↳
depth 0: head_0_a head_0_b
↲ ↳
depth 1: head_1_a head_1_b
↲ | ↳
depth 2: head_2_a1 head_2_a2 head_2_a3
```
### The Relay-Chain Being Forkful
We account for the same candidate possibly appearing in different forks. While
we still build fragment trees for each head in each fork, we are efficient with
how we reference candidates to save space.
### Session Changes
Allowed ancestry doesn't cross session boundary. That is, you can only build on
top of the freshest relay parent when the session starts. This is a current
limitation that may be lifted in the future.
Also, runtime configuration values needed for constraints (such as
`max_pov_size`) are constant within a session. This is important when building
prospective validation data. This is unlikely to change.
## Messages
### Incoming
- `ActiveLeaves`
- Notification of a change in the set of active leaves.
- Constructs fragment trees for each para for each new leaf.
- `ProspectiveTeyrchainsMessage::IntroduceCandidate`
- Informs the subsystem of a new candidate.
- Sent by the Backing Subsystem when it is importing a statement for a
new candidate.
- `ProspectiveTeyrchainsMessage::CandidateSeconded`
- Informs the subsystem that a previously introduced candidate has
been seconded.
- Sent by the Backing Subsystem when it is importing a statement for a
new candidate after it sends `IntroduceCandidate`, if that wasn't
rejected by Prospective Teyrchains.
- `ProspectiveTeyrchainsMessage::CandidateBacked`
- Informs the subsystem that a previously introduced candidate has
been backed.
- Sent by the Backing Subsystem after it successfully imports a
statement giving a candidate the necessary quorum of backing votes.
- `ProspectiveTeyrchainsMessage::GetBackableCandidates`
- Get the requested number of backable candidate hashes along with their relay parent for a given
teyrchain,under a given relay-parent (leaf) hash, which are descendants of given candidate
hashes.
- Sent by the Provisioner when requesting backable candidates, when
selecting candidates for a given relay-parent.
- `ProspectiveTeyrchainsMessage::GetHypotheticalMembership`
- Gets the hypothetical frontier membership of candidates with the
given properties under the specified active leaves' fragment trees.
- Sent by the Backing Subsystem when sanity-checking whether a candidate can
be seconded based on its hypothetical frontiers.
- `ProspectiveTeyrchainsMessage::GetMinimumRelayParents`
- Gets the minimum accepted relay-parent number for each para in the
fragment tree for the given relay-chain block hash.
- That is, this returns the minimum relay-parent block number in the
same branch of the relay-chain which is accepted in the fragment
tree for each para-id.
- Sent by the Backing, Statement Distribution, and Collator Protocol
subsystems when activating leaves in the implicit view.
- `ProspectiveTeyrchainsMessage::GetProspectiveValidationData`
- Gets the validation data of some prospective candidate. The
candidate doesn't need to be part of any fragment tree.
- Sent by the Collator Protocol subsystem (validator side) when
handling a fetched collation result.
### Outgoing
- `RuntimeApiRequest::ParaBackingState`
- Gets the backing state of the given para (the constraints of the para and
candidates pending availability).
- `RuntimeApiRequest::BackingConstraints`
- Gets the constraints on the actions that can be taken by a new teyrchain
block.
- `RuntimeApiRequest::AvailabilityCores`
- Gets information on all availability cores.
- `ChainApiMessage::Ancestors`
- Requests the `k` ancestor block hashes of a block with the given
hash.
- `ChainApiMessage::BlockHeader`
- Requests the block header by hash.
## Glossary
- **Candidate storage:** Stores candidates and information about them
such as their relay-parents and their backing states. Is indexed in
various ways.
- **Constraints:**
- Constraints on the actions that can be taken by a new teyrchain
block.
- Exhaustively define the set of valid inputs and outputs to teyrchain
execution.
- **Fragment:** A prospective para block (that is, a block not yet referenced by
the relay-chain). Fragments are anchored to the relay-chain at a particular
relay-parent.
- **Fragment tree:**
- A tree of fragments. Together, these fragments define one or more
prospective paths a teyrchain's state may transition through.
- See the "Fragment Tree" section.
- **Inclusion emulation:** Emulation of the logic that the runtime uses
for checking teyrchain blocks.
- **Relay-parent:** A particular relay-chain block that a fragment is
anchored to.
- **Scope:** The scope of a fragment tree, defining limits on nodes
within the tree.
@@ -0,0 +1,412 @@
# Statement Distribution
This subsystem is responsible for distributing signed statements that we have generated and forwarding statements
generated by our peers. Received candidate receipts and statements are passed to the [Candidate Backing
subsystem](candidate-backing.md) to handle producing local statements. On receiving
`StatementDistributionMessage::Share`, this subsystem distributes the message across the network with redundancy to
ensure a fast backing process.
## Overview
**Goal:** every well-connected node is aware of every next potential teyrchain block.
Validators can either:
- receive teyrchain block from collator, check block, and gossip statement.
- receive statements from other validators, check the teyrchain block if it originated within their own group, gossip
forward statement if valid.
Validators must have statements, candidates, and persisted validation from all other validators. This is because we need
to store statements from validators who've checked the candidate on the relay chain, so we know who to hold accountable
in case of disputes. Any validator can be selected as the next relay-chain block author, and this is not revealed in
advance for security reasons. As a result, all validators must have a up to date view of all possible teyrchain
candidates + backing statements that could be placed on-chain in the next block.
[This blog post](https://pezkuwichain.io/blog/polkadot-v1-0-sharding-and-economic-security) puts it another way:
"Validators who aren't assigned to the teyrchain still listen for the attestations [statements] because whichever
validator ends up being the author of the relay-chain block needs to bundle up attested teyrchain blocks for several
teyrchains and place them into the relay-chain block."
Backing-group quorum (that is, enough backing group votes) must be reached before the block author will consider the
candidate. Therefore, validators need to consider _all_ seconded candidates within their own group, because that's what
they're assigned to work on. Validators only need to consider _backable_ candidates from other groups. This informs the
design of the statement distribution protocol to have separate phases for in-group and out-group distribution,
respectively called "cluster" and "grid" mode (see below).
### With Async Backing
Asynchronous backing changes the runtime to accept teyrchain candidates from a certain allowed range of historic
relay-parents. These candidates must be backed by the group assigned to the teyrchain as-of their corresponding relay
parents.
## Protocol
To address the concern of dealing with large numbers of spam candidates or statements, the overall design approach is to
combine a focused "clustering" protocol for legitimate fresh candidates with a broad-distribution "grid" protocol to
quickly get backed candidates into the hands of many validators. Validators do not eagerly send each other heavy
`CommittedCandidateReceipt`, but instead request these lazily through request/response protocols.
A high-level description of the protocol follows:
### Messages
Nodes can send each other a few kinds of messages: `Statement`, `BackedCandidateManifest`,
`BackedCandidateAcknowledgement`.
- `Statement` messages contain only a signed compact statement, without full candidate info.
- `BackedCandidateManifest` messages advertise a description of a backed candidate and stored statements.
- `BackedCandidateAcknowledgement` messages acknowledge that a backed candidate is fully known.
### Request/response protocol
Nodes can request the full `CommittedCandidateReceipt` and `PersistedValidationData`, along with statements, over a
request/response protocol. This is the `AttestedCandidateRequest`; the response is `AttestedCandidateResponse`.
### Importability and the Hypothetical Frontier
The **prospective teyrchains** subsystem maintains prospective "fragment trees" which can be used to determine whether a
particular teyrchain candidate could possibly be included in the future. Candidates which either are within a fragment
tree or _would be_ part of a fragment tree if accepted are said to be in the "hypothetical frontier".
The **statement-distribution** subsystem keeps track of all candidates, and updates its knowledge of the hypothetical
frontier based on events such as new relay parents, new confirmed candidates, and newly backed candidates.
We only consider statements as "importable" when the corresponding candidate is part of the hypothetical frontier, and
only send "importable" statements to the backing subsystem itself.
### Cluster Mode
- Validator nodes are partitioned into groups (with some exceptions), and validators within a group at a relay-parent
can send each other `Statement` messages for any candidates within that group and based on that relay-parent.
- This is referred to as the "cluster" mode.
- Right now these are the same as backing groups, though "cluster" specifically refers to the set of nodes
communicating with each other in the first phase of distribution.
- `Seconded` statements must be sent before `Valid` statements.
- `Seconded` statements may only be sent to other members of the group when the candidate is fully known by the local
validator.
- "Fully known" means the validator has the full `CommittedCandidateReceipt` and `PersistedValidationData`, which it
receives on request from other validators or from a collator.
- The reason for this is that sending a statement (which is always a `CompactStatement` carrying nothing but a hash
and signature) to the cluster, is also a signal that the sending node is available to request the candidate from.
- This makes the protocol easier to reason about, while also reducing network messages about candidates that don't
really exist.
- Validators in a cluster receiving messages about unknown candidates request the candidate (and statements) from other
cluster members which have it.
- Spam considerations
- The maximum depth of candidates allowed in asynchronous backing determines the maximum amount of `Seconded`
statements originating from a validator V which each validator in a cluster may send to others. This bounds the
number of candidates.
- There is a small number of validators in each group, which further limits the amount of candidates.
- We accept candidates which don't fit in the fragment trees of any relay parents.
- "Accept" means "attempt to request and store in memory until useful or expired".
- We listen to prospective teyrchains subsystem to learn of new additions to the fragment trees.
- Use this to attempt to import the candidate later.
### Grid Mode
- Every consensus session provides randomness and a fixed validator set, which is used to build a redundant grid
topology.
- It's redundant in the sense that there are 2 paths from every node to every other node. See "Grid Topology" section
for more details.
- This grid topology is used to create a sending path from each validator group to every validator.
- When a node observes a candidate as backed, it sends a `BackedCandidateManifest` to their "receiving" nodes.
- If receiving nodes don't yet know the candidate, they request it.
- Once they know the candidate, they respond with a `BackedCandidateAcknowledgement`.
- Once two nodes perform a manifest/acknowledgement exchange, they can send `Statement` messages directly to each other
for any new statements they might need.
- This limits the amount of statements we'd have to deal with w.r.t. candidates that don't really exist. See "Manifest
Exchange" section.
- There are limitations on the number of candidates that can be advertised by each peer, similar to those in the
cluster. Validators do not request candidates which exceed these limitations.
- Validators request candidates as soon as they are advertised, but do not import the statements until the candidate is
part of the hypothetical frontier, and do not re-advertise or acknowledge until the candidate is considered both
backable and part of the hypothetical frontier.
- Note that requesting is not an implicit acknowledgement, and an explicit acknowledgement must be sent upon receipt.
### Disabled validators
After a validator is disabled in the runtime, other validators should no longer
accept statements from it. Filtering out of statements from disabled validators
on the node side is purely an optimization, as it will be done in the runtime
as well.
We use the state of the relay parent to check whether a validator is disabled
to avoid race conditions and ensure that disabling works well in the presence
of re-enabling.
## Messages
### Incoming
- `ActiveLeaves`
- Notification of a change in the set of active leaves.
- `StatementDistributionMessage::Share`
- Notification of a locally-originating statement. That is, this statement comes from our node and should be
distributed to other nodes.
- Sent by the Backing Subsystem after it successfully imports a locally-originating statement.
- `StatementDistributionMessage::Backed`
- Notification of a candidate being backed (received enough validity votes from the backing group).
- Sent by the Backing Subsystem after it successfully imports a statement for the first time and after sending
~Share~.
- `StatementDistributionMessage::NetworkBridgeUpdate`
- See next section.
#### Network bridge events
- v1 compatibility
- Messages for the v1 protocol are routed to the legacy statement distribution.
- `Statement`
- Notification of a signed statement.
- Sent by a peer's Statement Distribution subsystem when circulating statements.
- `BackedCandidateManifest`
- Notification of a backed candidate being known by the sending node.
- For the candidate being requested by the receiving node if needed.
- Announcement.
- Sent by a peer's Statement Distribution subsystem.
- `BackedCandidateKnown`
- Notification of a backed candidate being known by the sending node.
- For informing a receiving node which already has the candidate.
- Acknowledgement.
- Sent by a peer's Statement Distribution subsystem.
### Outgoing
- `NetworkBridgeTxMessage::SendValidationMessages`
- Sends a peer all pending messages / acknowledgements / statements for a relay parent, either through the cluster or
the grid.
- `NetworkBridgeTxMessage::SendValidationMessage`
- Circulates a compact statement to all peers who need it, either through the cluster or the grid.
- `NetworkBridgeTxMessage::ReportPeer`
- Reports a peer (either good or bad).
- `CandidateBackingMessage::Statement`
- Note a validator's statement about a particular candidate.
- `ProspectiveTeyrchainsMessage::GetHypotheticalMembership`
- Gets the hypothetical frontier membership of candidates under active leaves' fragment trees.
- `NetworkBridgeTxMessage::SendRequests`
- Sends requests, initiating the request/response protocol.
## Request/Response
We also have a request/response protocol because validators do not eagerly send each other heavy
`CommittedCandidateReceipt`, but instead need to request these lazily.
### Protocol
1. Requesting Validator
- Requests are queued up with `RequestManager::get_or_insert`.
- Done as needed, when handling incoming manifests/statements.
- `RequestManager::dispatch_requests` sends any queued-up requests.
- Calls `RequestManager::next_request` to completion.
- Creates the `OutgoingRequest`, saves the receiver in `RequestManager::pending_responses`.
- Does nothing if we have more responses pending than the limit of parallel requests.
2. Peer
- Requests come in on a peer on the `IncomingRequestReceiver`.
- Runs in a background responder task which feeds requests to `answer_request` through `MuxedMessage`.
- This responder task has a limit on the number of parallel requests.
- `answer_request` on the peer takes the request and sends a response.
- Does this using the response sender on the request.
3. Requesting Validator
- `receive_response` on the original validator yields a response.
- Response was sent on the request's response sender.
- Uses `RequestManager::await_incoming` to await on pending responses in an unordered fashion.
- Runs on the `MuxedMessage` receiver.
- `handle_response` handles the response.
### API
- `dispatch_requests`
- Dispatches pending requests for candidate data & statements.
- `answer_request`
- Answers an incoming request for a candidate.
- Takes an incoming `AttestedCandidateRequest`.
- `receive_response`
- Wait on the next incoming response.
- If there are no requests pending, this future never resolves.
- Returns `UnhandledResponse`
- `handle_response`
- Handles an incoming response.
- Takes `UnhandledResponse`
## Manifests
A manifest is a message about a known backed candidate, along with a description of the statements backing it. It can be
one of two kinds:
- `Full`: Contains information about the candidate and should be sent to peers who may not have the candidate yet. This
is also called an `Announcement`.
- `Acknowledgement`: Omits information implicit in the candidate, and should be sent to peers which are guaranteed to
have the candidate already.
### Manifest Exchange
Manifest exchange is when a receiving node received a `Full` manifest and replied with an `Acknowledgement`. It
indicates that both nodes know the candidate as valid and backed. This allows the nodes to send `Statement` messages
directly to each other for any new statements.
Why? This limits the amount of statements we'd have to deal with w.r.t. candidates that don't really exist. Limiting
out-of-group statement distribution between peers to only candidates that both peers agree are backed and exist ensures
we only have to store statements about real candidates.
In practice, manifest exchange means that one of three things have happened:
- They announced, we acknowledged.
- We announced, they acknowledged.
- We announced, they announced.
Concerning the last case, note that it is possible for two nodes to have each other in their sending set. Consider:
```
1 2
3 4
```
If validators 2 and 4 are in group B, then there is a path `2->1->3` and `4->3->1`. Therefore, 1 and 3 might send each
other manifests for the same candidate at the same time, without having seen the other's yet. This also counts as a
manifest exchange, but is only allowed to occur in this way.
After the exchange is complete, we update pending statements. Pending statements are those we know locally that the
remote node does not.
#### Alternative Paths Through The Topology
Nodes should send a `BackedCandidateAcknowledgement(CandidateHash, StatementFilter)` notification to any peer which has
sent a manifest, and the candidate has been acquired by other means. This keeps alternative paths through the topology
open, which allows nodes to receive additional statements that come later, but not after the candidate has been posted
on-chain.
This is mostly about the limitation that the runtime has no way for block authors to post statements that come after the
parablock is posted on-chain and ensure those validators still get rewarded. Technically, we only need enough statements
to back the candidate and the manifest + request will provide that. But more statements might come shortly afterwards,
and we want those to end up on-chain as well to ensure all validators in the group are rewarded.
For clarity, here is the full timeline:
1. candidate seconded
1. backable in cluster
1. distributed along grid
1. latecomers issue statements
1. candidate posted on chain
1. really latecomers issue statements
## Cluster Module
The cluster module provides direct distribution of unbacked candidates within a group. By utilizing this initial phase
of propagating only within clusters/groups, we bound the number of `Seconded` messages per validator per relay-parent,
helping us prevent spam. Validators can try to circumvent this, but they would only consume a few KB of memory and it is
trivially slashable on chain.
The cluster module determines whether to accept/reject messages from other validators in the same group. It keeps track
of what we have sent to other validators in the group, and pending statements. For the full protocol, see "Protocol".
## Grid Module
The grid module provides distribution of backed candidates and late statements outside the backing group. For the full
protocol, see the "Protocol" section.
### Grid Topology
For distributing outside our cluster (aka backing group) we use a 2D grid topology. This limits the amount of peers we
send messages to, and handles view updates.
The basic operation of the grid topology is that:
- A validator producing a message sends it to its row-neighbors and its column-neighbors.
- A validator receiving a message originating from one of its row-neighbors sends it to its column-neighbors.
- A validator receiving a message originating from one of its column-neighbors sends it to its row-neighbors.
This grid approach defines 2 unique paths for every validator to reach every other validator in at most 2 hops,
providing redundancy.
Propagation follows these rules:
- Each node has a receiving set and a sending set. These are different for each group. That is, if a node receives a
candidate from group A, it checks if it is allowed to receive from that node for candidates from group A.
- For groups that we are in, receive from nobody and send to our X/Y peers.
- For groups that we are not part of:
- We receive from any validator in the group we share a slice with and send to the corresponding X/Y slice in the
other dimension.
- For any validators we don't share a slice with, we receive from the nodes which share a slice with them.
### Example
For size 11, the matrix would be:
```
0 1 2
3 4 5
6 7 8
9 10
```
e.g. for index 10, the neighbors would be 1, 4, 7, 9 -- these are the nodes we could directly communicate with (e.g.
either send to or receive from).
Now, which of these neighbors can 10 receive from? Recall that the sending/receiving sets for 10 would be different for
different groups. Here are some hypothetical scenarios:
- **Scenario 1:** 9 belongs to group A but not 10. Here, 10 can directly receive candidates from group A from 9. 10
would propagate them to the nodes in {1, 4, 7} that are not in A.
- **Scenario 2:** 6 is in group A instead of 9, and 7 is not in group A. 10 can receive group A messages from 7 or 9. 10
will try to relay these messages, but 7 and 9 together should have already propagated the message to all x/y peers of
10. If so, then 10 will just receive acknowledgements in reply rather than requests.
- **Scenario 3:** 10 itself is in group A. 10 would not receive candidates from this group from any other nodes through
the grid. It would itself send such candidates to all its neighbors that are not in A.
### Seconding Limit
The seconding limit is a per-validator limit. Before asynchronous backing, we had a rule that every validator was only
allowed to second one candidate per relay parent. With asynchronous backing, we have a 'maximum depth' which makes it
possible to second multiple candidates per relay parent. The seconding limit is set to `max depth + 1` to set an upper
bound on candidates entering the system.
## Candidates Module
The candidates module provides a tracker for all known candidates in the view, whether they are confirmed or not, and
how peers have advertised the candidates. What is a confirmed candidate? It is a candidate for which we have the full
receipt and the persisted validation data. This module gets confirmed candidates from two sources:
- It can be that a validator fetched a collation directly from the collator and validated it.
- The first time a validator gets an announcement for an unknown candidate, it will send a request for the candidate.
Upon receiving a response and validating it (see `UnhandledResponse::validate_response`), it will mark the candidate
as confirmed.
## Requests Module
The requests module provides a manager for pending requests for candidate data, as well as pending responses. See
"Request/Response Protocol" for a high-level description of the flow. See module-docs for full details.
## Glossary
- **Acknowledgement:** A partial manifest sent to a validator that already has the candidate to inform them that the
sending node also knows the candidate. Concludes a manifest exchange.
- **Announcement:** A full manifest indicating that a backed candidate is known by the sending node. Initiates a
manifest exchange.
- **Attestation:** See "Statement".
- **Backable vs. Backed:**
- Note that we sometimes use "backed" to refer to candidates that are "backable", but not yet backed on chain.
- **Backed** should technically mean that the parablock candidate and its backing statements have been added to a
relay chain block.
- **Backable** is when the necessary backing statements have been acquired but those statements and the parablock
candidate haven't been backed in a relay chain block yet.
- **Fragment tree:** A teyrchain fragment not referenced by the relay-chain. It is a tree of prospective teyrchain
blocks.
- **Manifest:** A message about a known backed candidate, along with a description of the statements backing it. There
are two kinds of manifest, `Acknowledgement` and `Announcement`. See "Manifests" section.
- **Peer:** Another validator that a validator is connected to.
- **Request/response:** A protocol used to lazily request and receive heavy candidate data when needed.
- **Reputation:** Tracks reputation of peers. Applies annoyance cost and good behavior benefits.
- **Statement:** Signed statements that can be made about teyrchain candidates.
- **Seconded:** Proposal of a teyrchain candidate. Implicit validity vote.
- **Valid:** States that a teyrchain candidate is valid.
- **Target:** Target validator to send a statement to.
- **View:** Current knowledge of the chain state.
- **Explicit view** / **immediate view**
- The view a peer has of the relay chain heads and highest finalized block.
- **Implicit view**
- Derived from the immediate view. Composed of active leaves and minimum relay-parents allowed for candidates of
various teyrchains at those leaves.
@@ -0,0 +1,8 @@
# Collators
Collators are special nodes which bridge a teyrchain to the relay chain. They are simultaneously full nodes of the
teyrchain, and at least light clients of the relay chain. Their overall contribution to the system is the generation of
Proofs of Validity for teyrchain candidates.
The **Collation Generation** subsystem triggers collators to produce collations and then forwards them to **Collator
Protocol** to circulate to validators.
@@ -0,0 +1,142 @@
# Collation Generation
The collation generation subsystem is executed on collator nodes and produces candidates to be distributed to
validators. If configured to produce collations for a para, it produces collations and then feeds them to the [Collator
Protocol][CP] subsystem, which handles the networking.
## Protocol
Collation generation for Teyrchains currently works in the following way:
1. A new relay chain block is imported.
2. The collation generation subsystem checks if the core associated to the teyrchain is free and if yes, continues.
3. Collation generation calls our collator callback, if present, to generate a PoV. If none exists, do nothing.
4. Authoring logic determines if the current node should build a PoV.
5. Build new PoV and give it back to collation generation.
## Messages
### Incoming
- `ActiveLeaves`
- Notification of a change in the set of active leaves.
- Triggers collation generation procedure outlined in "Protocol" section.
- `CollationGenerationMessage::Initialize`
- Initializes the subsystem. Carries a config.
- No more than one initialization message should ever be sent to the collation generation subsystem.
- Sent by a collator to initialize this subsystem.
- `CollationGenerationMessage::SubmitCollation`
- If the subsystem isn't initialized or the relay-parent is too old to be relevant, ignore the message.
- Otherwise, use the provided parameters to generate a [`CommittedCandidateReceipt`]
- Submit the collation to the collator-protocol with `CollatorProtocolMessage::DistributeCollation`.
### Outgoing
- `CollatorProtocolMessage::DistributeCollation`
- Provides a generated collation to distribute to validators.
## Functionality
The process of generating a collation for a teyrchain is very teyrchain-specific. As such, the details of how to do so
are left beyond the scope of this description. The subsystem should be implemented as an abstract wrapper, which is
aware of this configuration:
```rust
/// The output of a collator.
///
/// This differs from `CandidateCommitments` in two ways:
///
/// - does not contain the erasure root; that's computed at the Pezkuwi level, not at Cumulus
/// - contains a proof of validity.
pub struct Collation {
/// Messages destined to be interpreted by the Relay chain itself.
pub upward_messages: Vec<UpwardMessage>,
/// The horizontal messages sent by the teyrchain.
pub horizontal_messages: Vec<OutboundHrmpMessage<ParaId>>,
/// New validation code.
pub new_validation_code: Option<ValidationCode>,
/// The head-data produced as a result of execution.
pub head_data: HeadData,
/// Proof to verify the state transition of the teyrchain.
pub proof_of_validity: PoV,
/// The number of messages processed from the DMQ.
pub processed_downward_messages: u32,
/// The mark which specifies the block number up to which all inbound HRMP messages are processed.
pub hrmp_watermark: BlockNumber,
}
/// Result of the [`CollatorFn`] invocation.
pub struct CollationResult {
/// The collation that was build.
pub collation: Collation,
/// An optional result sender that should be informed about a successfully seconded collation.
///
/// There is no guarantee that this sender is informed ever about any result, it is completely okay to just drop it.
/// However, if it is called, it should be called with the signed statement of a teyrchain validator seconding the
/// collation.
pub result_sender: Option<oneshot::Sender<CollationSecondedSignal>>,
}
/// Signal that is being returned when a collation was seconded by a validator.
pub struct CollationSecondedSignal {
/// The hash of the relay chain block that was used as context to sign [`Self::statement`].
pub relay_parent: Hash,
/// The statement about seconding the collation.
///
/// Anything else than `Statement::Seconded` is forbidden here.
pub statement: SignedFullStatement,
}
/// Collation function.
///
/// Will be called with the hash of the relay chain block the teyrchain block should be build on and the
/// [`ValidationData`] that provides information about the state of the teyrchain on the relay chain.
///
/// Returns an optional [`CollationResult`].
pub type CollatorFn = Box<
dyn Fn(
Hash,
&PersistedValidationData,
) -> Pin<Box<dyn Future<Output = Option<CollationResult>> + Send>>
+ Send
+ Sync,
>;
/// Configuration for the collation generator
pub struct CollationGenerationConfig {
/// Collator's authentication key, so it can sign things.
pub key: CollatorPair,
/// Collation function. See [`CollatorFn`] for more details.
pub collator: Option<CollatorFn>,
/// The teyrchain that this collator collates for
pub para_id: ParaId,
}
```
The configuration should be optional, to allow for the case where the node is not run with the capability to collate.
### Summary in plain English
- **Collation (output of a collator)**
- Contains the PoV (proof to verify the state transition of the teyrchain) and other data.
- **Collation result**
- Contains the collation, and an optional result sender for a collation-seconded signal.
- **Collation seconded signal**
- The signal that is returned when a collation was seconded by a validator.
- **Collation function**
- Called with the relay chain block the parablock will be built on top of.
- Called with the validation data.
- Provides information about the state of the teyrchain on the relay chain.
- **Collation generation config**
- Contains collator's authentication key, optional collator function, and teyrchain ID.
[CP]: collator-protocol.md
@@ -0,0 +1,196 @@
# Collator Protocol
> NOTE: This module has suffered changes for the elastic scaling implementation. As a result, parts of this document may
be out of date and will be updated at a later time. Issue tracking the update:
https://github.com/pezkuwichain/pezkuwi-sdk/issues/132
The Collator Protocol implements the network protocol by which collators and validators communicate. It is used by
collators to distribute collations to validators and used by validators to accept collations by collators.
Collator-to-Validator networking is more difficult than Validator-to-Validator networking because the set of possible
collators for any given para is unbounded, unlike the validator set. Validator-to-Validator networking protocols can
easily be implemented as gossip because the data can be bounded, and validators can authenticate each other by their
`PeerId`s for the purposes of instantiating and accepting connections.
Since, at least at the level of the para abstraction, the collator-set for any given para is unbounded, validators need
to make sure that they are receiving connections from capable and honest collators and that their bandwidth and time are
not being wasted by attackers. Communicating across this trust-boundary is the most difficult part of this subsystem.
Validation of candidates is a heavy task, and furthermore, the [`PoV`][PoV] itself is a large piece of data.
Empirically, `PoV`s are on the order of 10MB.
> TODO: note the incremental validation function Ximin proposes at https://github.com/paritytech/polkadot/issues/1348
As this network protocol serves as a bridge between collators and validators, it communicates primarily with one
subsystem on behalf of each. As a collator, this will receive messages from the [`CollationGeneration`][CG] subsystem.
As a validator, this will communicate only with the [`CandidateBacking`][CB].
## Protocol
Input: [`CollatorProtocolMessage`][CPM]
Output:
* [`RuntimeApiMessage`][RAM]
* [`NetworkBridgeMessage`][NBM]
* [`CandidateBackingMessage`][CBM]
## Functionality
This network protocol uses the `Collation` peer-set of the [`NetworkBridge`][NB].
It uses the [`CollatorProtocolV1Message`](../../types/network.md#collator-protocol) as its `WireMessage`
Since this protocol functions both for validators and collators, it is easiest to go through the protocol actions for
each of them separately.
Validators and collators.
```dot process
digraph {
c1 [shape=MSquare, label="Collator 1"];
c2 [shape=MSquare, label="Collator 2"];
v1 [shape=MSquare, label="Validator 1"];
v2 [shape=MSquare, label="Validator 2"];
c1 -> v1;
c1 -> v2;
c2 -> v2;
}
```
### Collators
It is assumed that collators are only collating on a single teyrchain. Collations are generated by the [Collation
Generation][CG] subsystem. We will keep up to one local collation per relay-parent, based on `DistributeCollation`
messages. If the para is not scheduled on any core, at the relay-parent, or the relay-parent isn't in the active-leaves
set, we ignore the message as it must be invalid in that case - although this indicates a logic error elsewhere in the
node.
We keep track of the Para ID we are collating on as a collator. This starts as `None`, and is updated with each
`CollateOn` message received. If the `ParaId` of a collation requested to be distributed does not match the one we
expect, we ignore the message.
As with most other subsystems, we track the active leaves set by following `ActiveLeavesUpdate` signals.
For the purposes of actually distributing a collation, we need to be connected to the validators who are interested in
collations on that `ParaId` at this point in time. We assume that there is a discovery API for connecting to a set of
validators.
As seen in the [Scheduler Module][SCH] of the runtime, validator groups are fixed for an entire session and their
rotations across cores are predictable. Collators will want to do these things when attempting to distribute collations
at a given relay-parent:
* Determine which core the para collated-on is assigned to.
* Determine the group on that core.
* Issue a discovery request for the validators of the current group
with[`NetworkBridgeMessage`][NBM]`::ConnectToValidators`.
Once connected to the relevant peers for the current group assigned to the core (transitively, the para), advertise the
collation to any of them which advertise the relay-parent in their view (as provided by the [Network Bridge][NB]). If
any respond with a request for the full collation, provide it. However, we only send one collation at a time per relay
parent, other requests need to wait. This is done to reduce the bandwidth requirements of a collator and also increases
the chance to fully send the collation to at least one validator. From the point where one validator has received the
collation and seconded it, it will also start to share this collation with other validators in its backing group. Upon
receiving a view update from any of these peers which includes a relay-parent for which we have a collation that they
will find relevant, advertise the collation to them if we haven't already.
### Validators
On the validator side of the protocol, validators need to accept incoming connections from collators. They should keep
some peer slots open for accepting new speculative connections from collators and should disconnect from collators who
are not relevant.
```dot process
digraph G {
label = "Declaring, advertising, and providing collations";
labelloc = "t";
rankdir = LR;
subgraph cluster_collator {
rank = min;
label = "Collator";
graph[style = border, rank = min];
c1, c2 [label = ""];
}
subgraph cluster_validator {
rank = same;
label = "Validator";
graph[style = border];
v1, v2 [label = ""];
}
c1 -> v1 [label = "Declare and advertise"];
v1 -> c2 [label = "Request"];
c2 -> v2 [label = "Provide"];
v2 -> v2 [label = "Note Good/Bad"];
}
```
When peers connect to us, they can `Declare` that they represent a collator with given public key and intend to collate
on a specific para ID. Once they've declared that, and we checked their signature, they can begin to send advertisements
of collations. The peers should not send us any advertisements for collations that are on a relay-parent outside of our
view or for a para outside of the one they've declared.
The protocol tracks advertisements received and the source of the advertisement. The advertisement source is the
`PeerId` of the peer who sent the message. We accept one advertisement per collator per source per relay-parent.
As a validator, we will handle requests from other subsystems to fetch a collation on a specific `ParaId` and
relay-parent. These requests are made with the request response protocol `CollationFetchingRequest` request. To do so,
we need to first check if we have already gathered a collation on that `ParaId` and relay-parent. If not, we need to
select one of the advertisements and issue a request for it. If we've already issued a request, we shouldn't issue
another one until the first has returned.
When acting on an advertisement, we issue a `Requests::CollationFetchingV1`. However, we only request one collation at a
time per relay parent. This reduces the bandwidth requirements and as we can second only one candidate per relay parent,
the others are probably not required anyway. If the request times out, we need to note the collator as being unreliable
and reduce its priority relative to other collators.
### Interaction with [Candidate Backing][CB]
As collators advertise the availability, a validator will simply second the first valid parablock candidate per relay
head by sending a [`CandidateBackingMessage`][CBM]`::Second`. Note that this message contains the relay parent of the
advertised collation, the candidate receipt and the [PoV][PoV].
Subsequently, once a valid parablock candidate has been seconded, the [`CandidateBacking`][CB] subsystem will send a
[`CollatorProtocolMessage`][CPM]`::Seconded`, which will trigger this subsystem to notify the collator at the `PeerId`
that first advertised the parablock on the seconded relay head of their successful seconding.
## Future Work
Several approaches have been discussed, but all have some issues:
* The current approach is very straightforward. However, that protocol is vulnerable to a single collator which, as an
attack or simply through chance, gets its block candidate to the node more often than its fair share of the time.
* If collators produce blocks via Aura, BABE or in future Sassafras, it may be possible to choose an "Official" collator
for the round, but it may be tricky to ensure that the PVF logic is enforced at collator leader election.
* We could use relay-chain BABE randomness to generate some delay `D` on the order of 1 second, +* 1 second. The
collator would then second the first valid parablock which arrives after `D`, or in case none has arrived by `2*D`,
the last valid parablock which has arrived. This makes it very hard for a collator to game the system to always get
its block nominated, but it reduces the maximum throughput of the system by introducing delay into an already tight
schedule.
* A variation of that scheme would be to have a fixed acceptance window `D` for parablock candidates and keep track of
count `C`: the number of parablock candidates received. At the end of the period `D`, we choose a random number I in
the range `[0, C)` and second the block at Index I. Its drawback is the same: it must wait the full `D` period before
seconding any of its received candidates, reducing throughput.
* In order to protect against DoS attacks, it may be prudent to run throw out collations from collators that have
behaved poorly (whether recently or historically) and subsequently only verify the PoV for the most suitable of
collations.
[CB]: ../backing/candidate-backing.md
[CBM]: ../../types/overseer-protocol.md#candidate-backing-mesage
[CG]: collation-generation.md
[CPM]: ../../types/overseer-protocol.md#collator-protocol-message
[CS]: ../backing/candidate-selection.md
[CSM]: ../../types/overseer-protocol.md#candidate-selection-message
[NB]: ../utility/network-bridge.md
[NBM]: ../../types/overseer-protocol.md#network-bridge-message
[PoV]: ../../types/availability.md#proofofvalidity
[RAM]: ../../types/overseer-protocol.md#runtime-api-message
[SCH]: ../../runtime/scheduler.md
@@ -0,0 +1,18 @@
# Disputes Subsystems
If approval voting finds an invalid candidate, a dispute is raised. The disputes
subsystems are concerned with the following:
1. Disputes can be raised
1. Disputes (votes) get propagated to all other validators
1. Votes get recorded as necessary
1. Nodes will participate in disputes in a sensible fashion
1. Finality is stopped while a candidate is being disputed on chain
1. Chains can be reverted in case a dispute concludes invalid
1. Votes are provided to the provisioner for importing on chain, in order for
slashing to work.
The dispute-coordinator subsystem interfaces with the provisioner and chain
selection to make the bulk of this possible. `dispute-distribution` is concerned
with getting votes out to other validators and receiving them in a spam
resilient way.
@@ -0,0 +1,659 @@
# Dispute Coordinator
The coordinator is the central subsystem of the node-side components which participate in disputes. It wraps a database,
which is used to track statements observed by _all_ validators over some window of sessions. Votes older than this
session window are pruned.
In particular the dispute-coordinator is responsible for:
- Ensuring that the node is able to raise a dispute in case an invalid candidate is found during approval checking.
- Ensuring that backing and approval votes will be recorded on chain. With these votes on chain we can be certain that
appropriate targets for slashing will be available for concluded disputes. Also, scraping these votes during a dispute
is necessary for critical spam prevention measures.
- Ensuring backing votes will never get overridden by explicit votes.
- Coordinating actual participation in a dispute, ensuring that the node participates in any justified dispute in a way
that ensures resolution of disputes on the network even in the case of many disputes raised (flood/DoS scenario).
- Ensuring disabled validators are not able to spam disputes.
- Ensuring disputes resolve, even for candidates on abandoned forks as much as reasonably possible, to rule out "free
tries" and thus guarantee our gambler's ruin property.
- Providing an API for chain selection, so we can prevent finalization of any chain which has included candidates for
which a dispute is either ongoing or concluded invalid and avoid building on chains with an included invalid
candidate.
- Providing an API for retrieving (resolved) disputes, including all votes, both implicit (approval, backing) and
explicit dispute votes. So validators can get rewarded/slashed accordingly.
## Ensuring That Disputes Can Be Raised
If a candidate turns out invalid in approval checking, the `approval-voting` subsystem will try to issue a dispute. For
this, it will send a message `DisputeCoordinatorMessage::IssueLocalStatement` to the dispute coordinator, indicating to
cast an explicit invalid vote. It is the responsibility of the dispute coordinator on reception of such a message to
create and sign that explicit invalid vote and trigger a dispute if none for that candidate is already ongoing.
In order to raise a dispute, a node has to be able to provide two opposing votes. Given that the reason of the backing
phase is to have validators with skin in the game, the opposing valid vote will very likely be a backing vote. It could
also be some already cast approval vote, but the significant point here is: As long as we have backing votes available,
any node will be able to raise a dispute.
Therefore a vital responsibility of the dispute coordinator is to make sure backing votes are available for all
candidates that might still get disputed. To accomplish this task in an efficient way the dispute-coordinator relies on
chain scraping. Whenever a candidate gets backed on chain, we record in chain storage the backing votes imported in that
block. This way, given the chain state for a given relay chain block, we can retrieve via a provided runtime API the
backing votes imported by that block. The dispute coordinator makes sure to query those votes for any non finalized
blocks: In case of missed blocks, it will do chain traversal as necessary.
Relying on chain scraping is very efficient for two reasons:
1. Votes are already batched. We import all available backing votes for a candidate all at once. If instead we imported
votes from candidate-backing as they came along, we would import each vote individually which is inefficient in the
current dispute coordinator implementation (quadratic complexity).
2. We also import less votes in total, as we avoid importing statements for candidates that never got successfully
backed on any chain.
It also is secure, because disputes are only ever raised in the approval voting phase. A node only starts the approval
process after it has seen a candidate included on some chain, for that to happen it must have been backed previously.
Therefore backing votes are available at that point in time. Signals are processed first, so even if a block is skipped
and we only start importing backing votes on the including block, we will have seen the backing votes by the time we
process messages from approval voting.
In summary, for making it possible for a dispute to be raised, recording of backing votes from chain is sufficient and
efficient. In particular there is no need to preemptively import approval votes, which has shown to be a very
inefficient process. (Quadratic complexity adds up, with 35 votes in total per candidate)
Approval votes are very relevant nonetheless as we are going to see in the next section.
## Ensuring approval votes will be recorded
### Ensuring Recording
Only votes recorded by the dispute coordinator will be considered for slashing.
While there is no need to record approval votes in the dispute coordinator preemptively, we make some effort to have any
in approval-voting received approval votes recorded when a dispute actually happens:
This is not required for concluding the dispute, as nodes send their own vote anyway (either explicit valid or their
existing approval-vote). What nodes can do though, is participating in approval-voting, casting a vote, but later when a
dispute is raised reconsider their vote and send an explicit invalid vote. If they managed to only have that one
recorded, then they could avoid a slash.
This is not a problem for our basic security assumptions: The backers are the ones to be supposed to have skin in the
game, so we are not too worried about colluding approval voters getting away slash free as the gambler's ruin property is
maintained anyway. There is however a separate problem, from colluding approval-voters, that is "lazy" approval voters.
If it were easy and reliable for approval-voters to reconsider their vote, in case of an actual dispute, then they don't
have a direct incentive (apart from playing a part in securing the network) to properly run the validation function at
all - they could just always vote "valid" totally risk free. (While they would always risk a slash by voting invalid.)
So we do want to fetch approval votes from approval-voting. Importing votes is most efficient when batched. At the same
time approval voting and disputes are running concurrently so approval votes are expected to trickle in still, when a
dispute is already ongoing.
Hence, we have the following requirements for importing approval votes:
1. Only import them when there is a dispute, because otherwise we are wasting lots of resources _always_ for the
exceptional case of a dispute.
2. Import votes batched when possible, to avoid quadratic import complexity.
3. Take into account that approval voting is still ongoing, while a dispute is already running.
With a design where approval voting sends votes to the dispute-coordinator by itself, we would need to make approval
voting aware of ongoing disputes and once it is aware it could start sending all already existing votes batched and
trickling in votes as they come. The problem with this is, that it adds some unnecessary complexity to approval-voting
and also we might still import most of the votes unbatched one-by-one, depending on what point in time the dispute was
raised.
Instead of the dispute coordinator informing approval-voting of an ongoing dispute for it to begin forwarding votes to
the dispute coordinator, it makes more sense for the dispute-coordinator to just ask approval-voting for votes of
candidates in dispute. This way, the dispute coordinator can also pick the best time for maximizing the number of votes
in the batch.
Now the question remains, when should the dispute coordinator ask approval-voting for votes?
In fact for slashing it is only relevant to have them once the dispute concluded, so we can query approval voting the
moment the dispute concludes! Two concerns that come to mind, are easily addressed:
1. Timing: We would like to rely as little as possible on implementation details of approval voting. In particular, if
the dispute is ongoing for a long time, do we have any guarantees that approval votes are kept around long enough by
approval voting? Will approval votes still be present by the time the dispute concludes in all cases? The answer is
nuanced, but in general we cannot rely on it. The problem is first, that finalization and approval-voting is an
off-chain process so there is no global consensus: As soon as at least f+1 honest (f=n/3, where n is the number of
validators/nodes) nodes have seen the dispute conclude, finalization will take place and approval votes will be
cleared. This would still be fine, if we had some guarantees that those honest nodes will be able to include those
votes in a block. This guarantee does not exist unfortunately, we will discuss the problem and solutions in more
detail [below][#Ensuring Chain Import].
The second problem is that approval-voting will abandon votes as soon as a chain can no longer be finalized (some
other/better fork already has been). This second problem can somehow be mitigated by also importing votes as soon as
a dispute is detected, but not fully resolved. It is still inherently racy. The good thing is, this should be good
enough: We are worried about lazy approval checkers, the system does not need to be perfect. It should be enough if
there is some risk of getting caught.
2. We are not worried about the dispute not concluding, as nodes will always send their own vote, regardless of it being
an explicit or an already existing approval-vote.
Conclusion: As long as we make sure, if our own approval vote gets imported (which would prevent dispute participation)
to also distribute it via dispute-distribution, disputes can conclude. To mitigate raciness with approval-voting
deleting votes we will import approval votes twice during a dispute: Once when it is raised, to make as sure as possible
to see approval votes also for abandoned forks and second when the dispute concludes, to maximize the amount of
potentially malicious approval votes to be recorded. The raciness obviously is not fully resolved by this, but this is
fine as argued above.
Ensuring vote import on chain is covered in the next section.
What we don't care about is that honest approval-voters will likely validate twice, once in approval voting and once via
dispute-participation. Avoiding that does not really seem worthwhile though, as disputes are for one exceptional, so a
little wasted effort won't affect everyday performance - second, even with eager importing of approval votes, those
doubled work is still present as disputes and approvals are racing. Every time participation is faster than approval, a
node would do double work.
### Ensuring Chain Import
While in the previous section we discussed means for nodes to ensure relevant votes are recorded so lazy approval
checkers get slashed properly, it is crucial to also discuss the actual chain import. Only if we guarantee that recorded
votes will get imported on chain (on all potential chains really) we will succeed in executing slashes. Particularly we
need to make sure backing votes end up on chain consistently.
Dispute distribution will make sure all explicit dispute votes get distributed among nodes which includes current block
producers (current authority set) which is an important property: If the dispute carries on across an era change, we
need to ensure that the new validator set will learn about any disputes and their votes, so they can put that
information on chain. Dispute-distribution luckily has this property and always sends votes to the current authority
set. The issue is, for dispute-distribution, nodes send only their own explicit (or in some cases their approval vote)
in addition to some opposing vote. This guarantees that at least some backing or approval vote will be present at the
block producer, but we don't have a 100% guarantee to have votes for all backers, even less for approval checkers.
Reason for backing votes: While backing votes will be present on at least some chain, that does not mean that any such
chain is still considered for block production in the current set - they might only exist on an already abandoned fork.
This means a block producer that just joined the set, might not have seen any of them.
For approvals it is even more tricky and less necessary: Approval voting together with finalization is a completely
off-chain process therefore those protocols don't care about block production at all. Approval votes only have a
guarantee of being propagated between the nodes that are responsible for finalizing the concerned blocks. This implies
that on an era change the current authority set, will not necessarily get informed about any approval votes for the
previous era. Hence even if all validators of the previous era successfully recorded all approval votes in the dispute
coordinator, they won't get a chance to put them on chain, hence they won't be considered for slashing.
It is important to note, that the essential properties of the system still hold: Dispute-distribution will distribute at
_least one_ "valid" vote to the current authority set, hence at least one node will get slashed in case of outcome
"invalid". Also in reality the validator set is rarely exchanged 100%, therefore in practice some validators in the
current authority set will overlap with the ones in the previous set and will be able to record votes on chain.
Still, for maximum accountability we need to make sure a previous authority set can communicate votes to the next one,
regardless of any chain: This is yet to be implemented see section "Resiliency" in dispute-distribution and
[this](https://github.com/paritytech/polkadot/issues/3398) ticket.
## Coordinating Actual Dispute Participation
Once the dispute coordinator learns about a dispute, it is its responsibility to make sure the local node participates
in that dispute.
The dispute coordinator learns about a dispute by importing votes from either chain scraping or from
dispute-distribution. If it finds opposing votes (always the case when coming from dispute-distribution), it records the
presence of a dispute. Then, in case it does not find any local vote for that dispute already, it needs to trigger
participation in the dispute (see previous section for considerations when the found local vote is an approval vote).
Participation means, recovering availability and re-evaluating the POV. The result of that validation (either valid or
invalid) will be the node's vote on that dispute: Either explicit "invalid" or "valid". The dispute coordinator will
inform `dispute-distribution` about our vote and `dispute-distribution` will make sure that our vote gets distributed to
all other validators.
Nothing ever is that easy though. We can not blindly import anything that comes along and trigger participation no
matter what.
### Spam Considerations
In Pezkuwi's security model, it is important that attempts to attack the system result in a slash of the offenders.
Therefore we need to make sure that this slash is actually happening. Attackers could try to prevent the slashing from
taking place, by overwhelming validators with disputes in such a way that no single dispute ever concludes, because
nodes are busy processing newly incoming ones. Other attacks are imaginable as well, like raising disputes for
candidates that don't exist, just filling up everyone's disk slowly or worse making nodes try to participate, which will
result in lots of network requests for recovering availability.
The last point brings up a significant consideration in general: Disputes are about escalation: Every node will suddenly
want to check, instead of only a few. A single message will trigger the whole network to start significant amount of
work and will cause lots of network traffic and messages. Hence the dispute system is very susceptible to being a brutal
amplifier for DoS attacks, resulting in DoS attacks to become very easy and cheap, if we are not careful.
One counter measure we are taking is making raising of disputes a costly thing: If you raise a dispute, because you
claim a candidate is invalid, although it is in fact valid - you will get slashed, hence you pay for consuming those
resources. The issue is: This only works if the dispute concerns a candidate that actually exists!
If a node raises a dispute for a candidate that never got included (became available) on any chain, then the dispute can
never conclude, hence nobody gets slashed. It makes sense to point out that this is less bad than it might sound at
first, as trying to participate in a dispute for a non existing candidate is "relatively" cheap. Each node will send out
a few hundred tiny request messages for availability chunks, which all will end up in a tiny response "NoSuchChunk" and
then no participation will actually happen as there is nothing to participate. Malicious nodes could provide chunks,
which would make things more costly, but at the full expense of the attackers bandwidth - no amplification here. I am
bringing that up for completeness only: Triggering a thousand nodes to send out a thousand tiny network messages by just
sending out a single garbage message, is still a significant amplification and is nothing to ignore - this could
absolutely be used to cause harm!
### Participation
As explained, just blindly participating in any "dispute" that comes along is not a good idea. First we would like to
make sure the dispute is actually genuine, to prevent cheap DoS attacks. Secondly, in case of genuine disputes, we would
like to conclude one after the other, in contrast to processing all at the same time, slowing down progress on all of
them, bringing individual processing to a complete halt in the worst case (nodes get overwhelmed at some stage in the
pipeline).
To ensure to only spend significant work on genuine disputes, we only trigger participation at all on any _vote import_
if any of the following holds true:
- We saw the disputed candidate included in some not yet finalized block on at least one fork of the chain.
- We have seen the disputed candidate backed in some not yet finalized block on at least one fork of the chain. This
ensures the candidate is at least not completely made up and there has been some effort already flown into that
candidate. Generally speaking a dispute shouldn't be raised for a candidate which is backed but is not yet included.
Disputes are raised during approval checking. We participate on such disputes as a precaution - maybe we haven't seen
the `CandidateIncluded` event yet?
- The dispute is already confirmed: Meaning that 1/3+1 nodes already participated, as this suggests in our threat model
that there was at least one honest node that already voted, so the dispute must be genuine.
In addition to that, we only participate in a non-confirmed dispute if at least one vote against the candidate is from
a non-disabled validator.
Note: A node might be out of sync with the chain and we might only learn about a block, including a candidate, after we
learned about the dispute. This means, we have to re-evaluate participation decisions on block import!
With this, nodes won't waste significant resources on completely made up candidates. The next step is to process dispute
participation in a (globally) ordered fashion. Meaning a majority of validators should arrive at at least roughly at the
same ordering of participation, for disputes to get resolved one after another. This order is only relevant if there are
lots of disputes, so we obviously only need to worry about order if participations start queuing up.
We treat participation for candidates that we have seen included with priority and put them on a priority queue which
sorts participation based on the block number of the relay parent of the candidate and for candidates with the same
relay parent height further by the `CandidateHash`. This ordering is globally unique and also prioritizes older
candidates.
The latter property makes sense, because if an older candidate turns out invalid, we can roll back the full chain at
once. If we resolved earlier disputes first and they turned out invalid as well, we might need to roll back a couple of
times instead of just once to the oldest offender. This is obviously a good idea, in particular it makes it impossible
for an attacker to prevent rolling back a very old candidate, by keeping raising disputes for newer candidates.
For candidates we have not seen included, but we know are backed (thanks to chain scraping) or we have seen a dispute
with 1/3+1 participation (confirmed dispute) on them - we put participation on a best-effort queue. It has got the same
ordering as the priority one - by block heights of the relay parent, older blocks are with priority. There is a
possibility not to be able to obtain the block number of the parent when we are inserting the dispute in the queue. To
account for races, we will promote any existing participation request to the priority queue once we learn about an
including block. NOTE: this is still work in progress and is tracked by [this
issue](https://github.com/paritytech/polkadot/issues/5875).
### Abandoned Forks
Finalization: As mentioned we care about included and backed candidates on any non-finalized chain, given that any
disputed chain will not get finalized, we don't need to care about finalized blocks, but what about forks that fall
behind the finalized chain in terms of block number? For those we would still like to be able to participate in any
raised disputes, otherwise attackers might be able to avoid a slash if they manage to create a better fork after they
learned about the approval checkers. Therefore we do care about those forks even after they have fallen behind the
finalized chain.
For simplicity we also care about the actual finalized chain (not just forks) up to a certain depth. We do have to limit
the depth, because otherwise we open a DoS vector again. The depth (into the finalized chain) should be oriented on the
approval-voting execution timeout, in particular it should be significantly larger. Otherwise by the time the execution
is allowed to finish, we already dropped information about those candidates and the dispute could not conclude.
## Import
### Spam Considerations
In the last section we looked at how to treat queuing participations to handle heavy dispute load well. This already
ensures, that honest nodes won't amplify cheap DoS attacks. There is one minor issue remaining: Even if we delay
participation until we have some confirmation of the authenticity of the dispute, we should also not blindly import all
votes arriving into the database as this might be used to just slowly fill up disk space, until the node is no longer
functional. This leads to our last protection mechanism at the dispute coordinator level (dispute-distribution also has
its own), which is spam slots. For each import containing an invalid vote, where we don't know whether it might be spam
or not we increment a counter for each signing participant of explicit `invalid` votes.
What votes do we treat as a potential spam? A vote will increase a spam slot if and only if all of the following
conditions are satisfied:
- the candidate under dispute was not seen included nor backed on any chain
- the dispute is not confirmed
- we haven't cast a vote for the dispute
- at least one vote against the candidate is from a non-disabled validator
Whenever any vote on a dispute is imported these conditions are checked. If the dispute is found not to be potential
spam, then spam slots for the disputed candidate hash are cleared. This decrements the spam count for every validator
which had voted invalid.
To keep spam slots from filling up unnecessarily we want to clear spam slots whenever a candidate is seen to be backed
or included. Fortunately this behavior is achieved by clearing slots on vote import as described above. Because on chain
backing votes are processed when a block backing the disputed candidate is discovered, spam slots are cleared for every
backed candidate. Included candidates have also been seen as backed on the same fork, so decrementing spam slots is
handled in that case as well.
The reason this works is because we only need to worry about actual dispute votes. Import of backing votes are already
rate limited and concern only real candidates. For approval votes a similar argument holds (if they come from
approval-voting), but we also don't import them until a dispute already concluded. For actual dispute votes we need two
opposing votes, so there must be an explicit `invalid` vote in the import. Only a third of the validators can be
malicious, so spam disk usage is limited to `2*vote_size*n/3*NUM_SPAM_SLOTS`, with `n` being the number of validators.
### Disabling
Once a validator has committed an offence (e.g. losing a dispute), it is considered disabled for the rest of the era.
In addition to using the on-chain state of disabled validators, we also keep track of validators who lost a dispute
off-chain. The reason for this is a dispute can be raised for a candidate in a previous era, which means that a
validator that is going to be slashed for it might not even be in the current active set. That means it can't be
disabled on-chain. We need a way to prevent someone from disputing all valid candidates in the previous era. We do this
by keeping track of the validators who lost a dispute in the past few sessions and use that list in addition to the
on-chain disabled validators state. In addition to past session misbehavior, this also helps in case a slash is delayed.
When we receive a dispute statements set, we do the following:
1. Take the on-chain state of disabled validators at the relay parent block.
1. Take a list of those who lost a dispute in that session in the order that prioritizes the biggest and newest offence.
1. Combine the two lists and take the first byzantine threshold validators from it.
1. If the dispute is unconfirmed, check if all votes against the candidate are from disabled validators.
If so, we don't participate in the dispute, but record the votes.
### Backing Votes
Backing votes are in some way special. For starters they are the only valid votes that are guaranteed to exist for any
valid dispute to be raised. Second they are the only votes that commit to a shorter execution timeout
`BACKING_EXECUTION_TIMEOUT`, compared to a more lenient timeout used in approval voting. To account properly for
execution time variance across machines, slashing might treat backing votes differently (more aggressively) than other
voting `valid` votes. Hence in import we shall never override a backing vote with another valid vote. They can not be
assumed to be interchangeable.
## Attacks & Considerations
The following attacks on the priority queue and best-effort queues are considered in above design.
### Priority Queue
On the priority queue, we will only queue participations for candidates we have seen included on any chain. Any attack
attempt would start with a candidate included on some chain, but an attacker could try to only reveal the including
relay chain blocks to just some honest validators and stop as soon as it learns that some honest validator would have a
relevant approval assignment.
Without revealing the including block to any honest validator, we don't really have an attack yet. Once the block is
revealed though, the above is actually very hard. Each honest validator will re-distribute the block it just learned
about. This means an attacker would need to pull of a targeted DoS attack, which allows the validator to send its
assignment, but prevents it from forwarding and sharing the relay chain block.
This sounds already hard enough, provided that we also start participation if we learned about an including block after
the dispute has been raised already (we need to update participation queues on new leaves), but to be even safer we
choose to have an additional best-effort queue.
### Best-Effort Queue
While attacking the priority queue is already pretty hard, attacking the best-effort queue is even harder. For a
candidate to be a threat, it has to be included on some chain. For it to be included, it has to have been backed before
and at least n/3 honest nodes must have seen that block, so availability (inclusion) can be reached. Making a full third
of the nodes not further propagate a block, while at the same time allowing them to fetch chunks, sign and distribute
bitfields seems almost infeasible and even if accomplished, those nodes would be enough to confirm a dispute and we have
not even touched the above fact that in addition, for an attack, the following including block must be shared with
honest validators as well.
It is worth mentioning that a successful attack on the priority queue as outlined above is already outside of our threat
model, as it assumes n/3 malicious nodes + additionally malfunctioning/DoSed nodes. Even more so for attacks on the
best-effort queue, as our threat model only allows for n/3 malicious _or_ malfunctioning nodes in total. It would
therefore be a valid decision to ditch the best-effort queue, if it proves to become a burden or creates other issues.
One issue we should not be worried about though is spam. For abusing best-effort for spam, the following scenario would
be necessary:
An attacker controls a backing group: The attacker can then have candidates backed and choose to not provide chunks.
This should come at a cost to miss out on rewards for backing, so is not free. At the same time it is rate limited, as a
backing group can only back so many candidates legitimately. (~ 1 per slot):
1. They have to wait until a malicious actor becomes block producer (for causing additional forks via equivocation for
example).
2. Forks are possible, but if caused by equivocation also not free.
3. For each fork the attacker has to wait until the candidate times out, for backing another one.
Assuming there can only be a handful of forks, 2) together with 3) the candidate timeout restriction, frequency should
indeed be in the ballpark of once per slot. Scaling linearly in the number of controlled backing groups, so two groups
would mean 2 backings per slot, ...
So by this reasoning an attacker could only do very limited harm and at the same time will have to pay some price for it
(it will miss out on rewards). Overall the work done by the network might even be in the same ballpark as if actors just
behaved honestly:
1. Validators would have fetched chunks
2. Approval checkers would have done approval checks
While because of the attack (backing, not providing chunks and afterwards disputing the candidate), the work for 1000
validators would be:
All validators sending out ~ 1000 tiny requests over already established connections, with also tiny (byte) responses.
This means around a million requests, while in the honest case it would be ~ 10000 (30 approval checkers x330) - where
each request triggers a response in the range of kilobytes. Hence network load alone will likely be higher in the honest
case than in the DoS attempt case, which would mean the DoS attempt actually reduces load, while also costing rewards.
In the worst case this can happen multiple times, as we would retry that on every vote import. The effect would still be
in the same ballpark as honest behavior though and can also be mitigated by chilling repeated availability recovery
requests for example.
## Out of Scope
### No Disputes for Non Included Candidates
We only ever care about disputes for candidates that have been included on at least some chain (became available). This
is because the availability system was designed for precisely that: Only with inclusion (availability) we have
guarantees about the candidate to actually be available. Because only then we have guarantees that malicious backers can
be reliably checked and slashed. Also, by design non included candidates do not pose any threat to the system.
One could think of an (additional) dispute system to make it possible to dispute any candidate that has been proposed by
a validator, no matter whether it got successfully included or even backed. Unfortunately, it would be very brittle (no
availability) and also spam protection would be way harder than for the disputes handled by the dispute-coordinator. In
fact, all the spam handling strategies described above would simply be unavailable.
It is worth thinking about who could actually raise such disputes anyway: Approval checkers certainly not, as they will
only ever check once availability succeeded. The only other nodes that meaningfully could/would are honest backing nodes
or collators. For collators spam considerations would be even worse as there can be an unlimited number of them and we
can not charge them for spam, so trying to handle disputes raised by collators would be even more complex. For honest
backers: It actually makes more sense for them to wait until availability is reached as well, as only then they have
guarantees that other nodes will be able to check. If they disputed before, all nodes would need to recover the data
from them, so they would be an easy DoS target.
In summary: The availability system was designed for raising disputes in a meaningful and secure way after availability
was reached. Trying to raise disputes before does not meaningfully contribute to the systems security/might even weaken
it as attackers are warned before availability is reached, while at the same time adding significant amount of
complexity. We therefore punt on such disputes and concentrate on disputes the system was designed to handle.
### No Disputes for Already Finalized Blocks
Note that by above rules in the `Participation` section, we will not participate in disputes concerning a candidate in
an already finalized block. This is because, disputing an already finalized block is simply too late and therefore of
little value. Once finalized, bridges have already processed the block for example, so we have to assume the damage is
already done. Governance has to step in and fix what can be fixed.
Making disputes for already finalized blocks possible would only provide two features:
1. We can at least still slash attackers.
2. We can freeze the chain to some governance only mode, in an attempt to minimize potential harm done.
Both seem kind of worthwhile, although as argued above, it is likely that there is not too much that can be done in 2
and we would likely only ending up DoSing the whole system without much we can do. 1 can also be achieved via governance
mechanisms.
In any case, our focus should be making as sure as reasonably possible that any potentially invalid block does not get
finalized in the first place. Not allowing disputing already finalized blocks actually helps a great deal with this goal
as it massively reduces the amount of candidates that can be disputed.
This makes attempts to overwhelm the system with disputes significantly harder and counter measures way easier. We can
limit inclusion for example (as suggested [here](https://github.com/paritytech/polkadot/issues/5898) in case of high
dispute load. Another measure we have at our disposal is that on finality lag block production will slow down,
implicitly reducing the rate of new candidates that can be disputed. Hence, the cutting-off of the unlimited candidate
supply of already finalized blocks, guarantees the necessary DoS protection and ensures we can have measures in place to
keep up with processing of disputes.
If we allowed participation for disputes for already finalized candidates, the above spam protection mechanisms would be
insufficient/relying 100% on full and quick disabling of spamming validators.
## Database Schema
We use an underlying Key-Value database where we assume we have the following operations available:
- `write(key, value)`
- `read(key) -> Option<value>`
- `iter_with_prefix(prefix) -> Iterator<(key, value)>` - gives all keys and values in lexicographical order where the
key starts with `prefix`.
We use this database to encode the following schema:
```rust
("candidate-votes", SessionIndex, CandidateHash) -> Option<CandidateVotes>
"recent-disputes" -> RecentDisputes
"earliest-session" -> Option<SessionIndex>
```
The meta information that we track per-candidate is defined as the `CandidateVotes` struct. This draws on the [dispute
statement types][DisputeTypes]
```rust
/// Tracked votes on candidates, for the purposes of dispute resolution.
pub struct CandidateVotes {
/// The receipt of the candidate itself.
pub candidate_receipt: CandidateReceipt,
/// Votes of validity, sorted by validator index.
pub valid: Vec<(ValidDisputeStatementKind, ValidatorIndex, ValidatorSignature)>,
/// Votes of invalidity, sorted by validator index.
pub invalid: Vec<(InvalidDisputeStatementKind, ValidatorIndex, ValidatorSignature)>,
}
/// The mapping for recent disputes; any which have not yet been pruned for being ancient.
pub type RecentDisputes = std::collections::BTreeMap<(SessionIndex, CandidateHash), DisputeStatus>;
/// The status of dispute. This is a state machine which can be altered by the
/// helper methods.
pub enum DisputeStatus {
/// The dispute is active and unconcluded.
Active,
/// The dispute has been concluded in favor of the candidate
/// since the given timestamp.
ConcludedFor(Timestamp),
/// The dispute has been concluded against the candidate
/// since the given timestamp.
///
/// This takes precedence over `ConcludedFor` in the case that
/// both are true, which is impossible unless a large amount of
/// validators are participating on both sides.
ConcludedAgainst(Timestamp),
/// Dispute has been confirmed (more than `byzantine_threshold` have already participated/ or
/// we have seen the candidate included already/participated successfully ourselves).
Confirmed,
}
```
## Protocol
Input: [`DisputeCoordinatorMessage`][DisputeCoordinatorMessage]
Output:
- [`RuntimeApiMessage`][RuntimeApiMessage]
## Functionality
This assumes a constant `DISPUTE_WINDOW: SessionWindowSize`. This should correspond to at least 1 day.
Ephemeral in-memory state:
```rust
struct State {
keystore: Arc<LocalKeystore>,
rolling_session_window: RollingSessionWindow,
highest_session: SessionIndex,
spam_slots: SpamSlots,
participation: Participation,
ordering_provider: OrderingProvider,
participation_receiver: WorkerMessageReceiver,
metrics: Metrics,
// This tracks only rolling session window failures.
// It can be a `Vec` if the need to track more arises.
error: Option<SessionsUnavailable>,
/// Latest relay blocks that have been successfully scraped.
last_scraped_blocks: LruMap<Hash, ()>,
}
```
### On startup
When the subsystem is initialised it waits for a new leaf (message `OverseerSignal::ActiveLeaves`). The leaf is used to
initialise a `RollingSessionWindow` instance (contains leaf hash and `DISPUTE_WINDOW` which is a constant).
Next the active disputes are loaded from the DB and initialize spam slots accordingly, then for each loaded dispute, we
either send a `DisputeDistribution::SendDispute` if there is a local vote from us available or if there is none and
participation is in order, we push the dispute to participation.
### The main loop
Just after the subsystem initialisation the main loop (`fn run_until_error()`) runs until `OverseerSignal::Conclude`
signal is received. Before executing the actual main loop the leaf and the participations, obtained during startup are
enqueued for processing. If there is capacity (the number of running participations is less than
`MAX_PARALLEL_PARTICIPATIONS`) participation jobs are started (`func participate`). Finally the component waits for
messages from Overseer. The behaviour on each message is described in the following subsections.
### On `OverseerSignal::ActiveLeaves`
Initiates processing via the `Participation` module and updates the internal state of the subsystem. More concretely:
- Passes the `ActiveLeavesUpdate` message to the ordering provider.
- Updates the session info cache.
- Updates `self.highest_session`.
- Prunes old spam slots in case the session window has advanced.
- Scrapes on chain votes.
### On `MuxedMessage::Participation`
This message is sent from `Participation` module and indicates a processed dispute participation. It's the result of
the processing job initiated with `OverseerSignal::ActiveLeaves`. The subsystem issues a `DisputeMessage` with the
result.
### On `OverseerSignal::Conclude`
Exit gracefully.
### On `OverseerSignal::BlockFinalized`
Performs cleanup of the finalized candidate.
### On `DisputeCoordinatorMessage::ImportStatements`
Import statements by validators are processed in `fn handle_import_statements()`. The function has got three main
responsibilities:
- Initiate participation in disputes and sending out of any existing own approval vote in case of a raised dispute.
- Persist all fresh votes in the database. Fresh votes in this context means votes that are not already processed by the
node.
- Spam protection on all invalid (`DisputeStatement::Invalid`) votes. Please check the SpamSlots section for details on
how spam protection works.
### On `DisputeCoordinatorMessage::RecentDisputes`
Returns all recent disputes saved in the DB.
### On `DisputeCoordinatorMessage::ActiveDisputes`
Returns all recent disputes concluded within the last `ACTIVE_DURATION_SECS` .
### On `DisputeCoordinatorMessage::QueryCandidateVotes`
Loads `candidate-votes` for every `(SessionIndex, CandidateHash)` in the input query and returns data within each
`CandidateVote`. If a particular `candidate-vote` is missing, that particular request is omitted from the response.
### On `DisputeCoordinatorMessage::IssueLocalStatement`
Executes `fn issue_local_statement()` which performs the following operations:
- Deconstruct into parts `{ session_index, candidate_hash, candidate_receipt, is_valid }`.
- Construct a [`DisputeStatement`][DisputeStatement] based on `Valid` or `Invalid`, depending on the parameterization of
this routine.
- Sign the statement with each key in the `SessionInfo`'s list of teyrchain validation keys which is present in the
keystore, except those whose indices appear in `voted_indices`. This will typically just be one key, but this does
provide some future-proofing for situations where the same node may run on behalf multiple validators. At the time of
writing, this is not a use-case we support as other subsystems do not invariably provide this guarantee.
- Write statement to DB.
- Send a `DisputeDistributionMessage::SendDispute` message to get the vote distributed to other validators.
### On `DisputeCoordinatorMessage::DetermineUndisputedChain`
Executes `fn determine_undisputed_chain()` which performs the following:
- Load `"recent-disputes"`.
- Deconstruct into parts `{ base_number, block_descriptions, rx }`
- Starting from the beginning of `block_descriptions`:
1. Check the `RecentDisputes` for a dispute of each candidate in the block description.
1. If there is a dispute which is active or concluded negative, exit the loop.
- For the highest index `i` reached in the `block_descriptions`, send `(base_number + i + 1, block_hash)` on the
channel, unless `i` is 0, in which case `None` should be sent. The `block_hash` is determined by inspecting
`block_descriptions[i]`.
[DisputeTypes]: ../../types/disputes.md
[DisputeStatement]: ../../types/disputes.md#disputestatement
[DisputeCoordinatorMessage]: ../../types/overseer-protocol.md#dispute-coordinator-message
[RuntimeApiMessage]: ../../types/overseer-protocol.md#runtime-api-message
@@ -0,0 +1,429 @@
# Dispute Distribution
Dispute distribution is responsible for ensuring all concerned validators will
be aware of a dispute and have the relevant votes.
## Design Goals
This design should result in a protocol that is:
- resilient to nodes being temporarily unavailable
- make sure nodes are aware of a dispute quickly
- relatively efficient, should not cause too much stress on the network
- be resilient when it comes to spam
- be simple and boring: We want disputes to work when they happen
## Protocol
Distributing disputes needs to be a reliable protocol. We would like to make as
sure as possible that our vote got properly delivered to all concerned
validators. For this to work, this subsystem won't be gossip based, but instead
will use a request/response protocol for application level confirmations. The
request will be the payload (the actual votes/statements), the response will
be the confirmation. See [below][#wire-format].
### Input
[`DisputeDistributionMessage`][DisputeDistributionMessage]
### Output
- [`DisputeCoordinatorMessage::ActiveDisputes`][DisputeCoordinatorMessage]
- [`DisputeCoordinatorMessage::ImportStatements`][DisputeCoordinatorMessage]
- [`DisputeCoordinatorMessage::QueryCandidateVotes`][DisputeCoordinatorMessage]
- [`RuntimeApiMessage`][RuntimeApiMessage]
### Wire format
#### Disputes
Protocol: `"/<genesis_hash>/<fork_id>/send_dispute/1"`
Request:
```rust
struct DisputeRequest {
/// The candidate being disputed.
pub candidate_receipt: CandidateReceipt,
/// The session the candidate appears in.
pub session_index: SessionIndex,
/// The invalid vote data that makes up this dispute.
pub invalid_vote: InvalidDisputeVote,
/// The valid vote that makes this dispute request valid.
pub valid_vote: ValidDisputeVote,
}
/// Any invalid vote (currently only explicit).
pub struct InvalidDisputeVote {
/// The voting validator index.
pub validator_index: ValidatorIndex,
/// The validator signature, that can be verified when constructing a
/// `SignedDisputeStatement`.
pub signature: ValidatorSignature,
/// Kind of dispute statement.
pub kind: InvalidDisputeStatementKind,
}
/// Any valid vote (backing, approval, explicit).
pub struct ValidDisputeVote {
/// The voting validator index.
pub validator_index: ValidatorIndex,
/// The validator signature, that can be verified when constructing a
/// `SignedDisputeStatement`.
pub signature: ValidatorSignature,
/// Kind of dispute statement.
pub kind: ValidDisputeStatementKind,
}
```
Response:
```rust
enum DisputeResponse {
Confirmed
}
```
#### Vote Recovery
Protocol: `"/<genesis_hash>/<fork_id>/req_votes/1"`
```rust
struct IHaveVotesRequest {
candidate_hash: CandidateHash,
session: SessionIndex,
valid_votes: Bitfield,
invalid_votes: Bitfield,
}
```
Response:
```rust
struct VotesResponse {
/// All votes we have, but the requester was missing.
missing: Vec<(DisputeStatement, ValidatorIndex, ValidatorSignature)>,
}
```
## Starting a Dispute
A dispute is initiated once a node sends the first `DisputeRequest` wire message,
which must contain an "invalid" vote and a "valid" vote.
The dispute distribution subsystem can get instructed to send that message out to
all concerned validators by means of a `DisputeDistributionMessage::SendDispute`
message. That message must contain an invalid vote from the local node and some
valid one, e.g. a backing statement.
We include a valid vote as well, so any node regardless of whether it is synced
with the chain or not or has seen backing/approval vote can see that there are
conflicting votes available, hence we have a valid dispute. Nodes will still
need to check whether the disputing votes are somewhat current and not some
stale ones.
## Participating in a Dispute
Upon receiving a `DisputeRequest` message, a dispute distribution will trigger the
import of the received votes via the dispute coordinator
(`DisputeCoordinatorMessage::ImportStatements`). The dispute coordinator will
take care of participating in that dispute if necessary. Once it is done, the
coordinator will send a `DisputeDistributionMessage::SendDispute` message to dispute
distribution. From here, everything is the same as for starting a dispute,
except that if the local node deemed the candidate valid, the `SendDispute`
message will contain a valid vote signed by our node and will contain the
initially received `Invalid` vote.
Note, that we rely on `dispute-coordinator` to check validity of a dispute for spam
protection (see below).
## Sending of messages
Starting and participating in a dispute are pretty similar from the perspective
of dispute distribution. Once we receive a `SendDispute` message, we try to make
sure to get the data out. We keep track of all the teyrchain validators that
should see the message, which are all the teyrchain validators of the session
where the dispute happened as they will want to participate in the dispute. In
addition we also need to get the votes out to all authorities of the current
session (which might be the same or not and may change during the dispute).
Those authorities will not participate in the dispute, but need to see the
statements so they can include them in blocks.
### Reliability
We only consider a message transmitted, once we received a confirmation message.
If not, we will keep retrying getting that message out as long as the dispute is
deemed alive. To determine whether a dispute is still alive we will ask the
`dispute-coordinator` for a list of all still active disputes via a
`DisputeCoordinatorMessage::ActiveDisputes` message before each retry run. Once
a dispute is no longer live, we will clean up the state accordingly.
### Order
We assume `SendDispute` messages are coming in an order of importance, hence
`dispute-distribution` will make sure to send out network messages in the same
order, even on retry.
### Rate Limit
For spam protection (see below), we employ an artificial rate limiting on sending
out messages in order to not hit the rate limit at the receiving side, which
would result in our messages getting dropped and our reputation getting reduced.
## Reception
As we shall see the receiving side is mostly about handling spam and ensuring
the dispute-coordinator learns about disputes as fast as possible.
Goals for the receiving side:
1. Get new disputes to the dispute-coordinator as fast as possible, so
prioritization can happen properly.
2. Batch votes per disputes as much as possible for good import performance.
3. Prevent malicious nodes exhausting node resources by sending lots of messages.
4. Prevent malicious nodes from sending so many messages/(fake) disputes,
preventing us from concluding good ones.
5. Limit ability of malicious nodes of delaying the vote import due to batching
logic.
Goal 1 and 2 seem to be conflicting, but an easy compromise is possible: When
learning about a new dispute, we will import the vote immediately, making the
dispute coordinator aware and also getting immediate feedback on the validity.
Then if valid we can batch further incoming votes, with less time constraints as
the dispute-coordinator already knows about the dispute.
Goal 3 and 4 are obviously very related and both can easily be solved via rate
limiting as we shall see below. Rate limits should already be implemented at the
Substrate level, but [are not](https://github.com/paritytech/substrate/issues/7750)
at the time of writing. But even if they were, the enforced Substrate limits would
likely not be configurable and thus would still be to high for our needs as we can
rely on the following observations:
1. Each honest validator will only send one message (apart from duplicates on
timeout) per candidate/dispute.
2. An honest validator needs to fully recover availability and validate the
candidate for casting a vote.
With these two observations, we can conclude that honest validators will usually
not send messages at a high rate. We can therefore enforce conservative rate
limits and thus minimize harm spamming malicious nodes can have.
Before we dive into how rate limiting solves all spam issues elegantly, let's
discuss that honest behaviour further:
What about session changes? Here we might have to inform a new validator set of
lots of already existing disputes at once.
With observation 1) and a rate limit that is per peer, we are still good:
Let's assume a rate limit of one message per 200ms per sender. This means 5
messages from each validator per second. 5 messages means 5 disputes!
Conclusively, we will be able to conclude 5 disputes per second - no matter what
malicious actors are doing. This is assuming dispute messages are sent ordered,
but even if not perfectly ordered: On average it will be 5 disputes per second.
This is good enough! All those disputes are valid ones and will result in
slashing and disabling of validators. Let's assume all of them conclude `valid`,
and we disable validators only after 100 raised concluding valid disputes, we
would still start disabling misbehaving validators in only 20 seconds.
One could also think that in addition participation is expected to take longer,
which means on average we can import/conclude disputes faster than they are
generated - regardless of dispute spam. Unfortunately this is not necessarily
true: There might be teyrchains with very light load where recovery and
validation can be accomplished very quickly - maybe faster than we can import
those disputes.
This is probably an argument for not imposing a too low rate limit, although the
issue is more general: Even without any rate limit, if an attacker generates
disputes at a very high rate, nodes will be having trouble keeping participation
up, hence the problem should be mitigated at a [more fundamental
layer](https://github.com/paritytech/polkadot/issues/5898).
For nodes that have been offline for a while, the same argument as for session
changes holds, but matters even less: We assume 2/3 of nodes to be online, so
even if the worst case 1/3 offline happens and they could not import votes fast
enough (as argued above, they in fact can) it would not matter for consensus.
### Rate Limiting
As suggested previously, rate limiting allows to mitigate all threats that come
from malicious actors trying to overwhelm the system in order to get away without
a slash, when it comes to dispute-distribution. In this section we will explain
how in greater detail.
The idea is to open a queue with limited size for each peer. We will process
incoming messages as fast as we can by doing the following:
1. Check that the sending peer is actually a valid authority - otherwise drop
message and decrease reputation/disconnect.
2. Put message on the peer's queue, if queue is full - drop it.
Every `RATE_LIMIT` seconds (or rather milliseconds), we pause processing
incoming requests to go a full circle and process one message from each queue.
Processing means `Batching` as explained in the next section.
### Batching
To achieve goal 2 we will batch incoming votes/messages together before passing
them on as a single batch to the `dispute-coordinator`. To adhere to goal 1 as
well, we will do the following:
1. For an incoming message, we check whether we have an existing batch for that
candidate, if not we import directly to the dispute-coordinator, as we have
to assume this is concerning a new dispute.
2. We open a batch and start collecting incoming messages for that candidate,
instead of immediately forwarding.
3. We keep collecting votes in the batch until we receive less than
`MIN_KEEP_BATCH_ALIVE_VOTES` unique votes in the last `BATCH_COLLECTING_INTERVAL`. This is
important to accommodate for goal 5 and also 3.
4. We send the whole batch to the dispute-coordinator.
This together with rate limiting explained above ensures we will be able to
process valid disputes: We can limit the number of simultaneous existing batches
to some high value, but can be rather certain that this limit will never be
reached - hence we won't drop valid disputes:
Let's assume `MIN_KEEP_BATCH_ALIVE_VOTES` is 10, `BATCH_COLLECTING_INTERVAL`
is `500ms` and above `RATE_LIMIT` is `100ms`. 1/3 of validators are malicious,
so for 1000 this means around 330 malicious actors worst case.
All those actors can send a message every `100ms`, that is 10 per second. This
means at the beginning of an attack they can open up around 3300 batches. Each
containing two votes. So memory usage is still negligible. In reality it is even
less, as we also demand 10 new votes to trickle in per batch in order to keep it
alive, every `500ms`. Hence for the first second, each batch requires 20 votes
each. Each message is 2 votes, so this means 10 messages per batch. Hence to
keep those batches alive 10 attackers are needed for each batch. This reduces
the number of opened batches by a factor of 10: So we only have 330 batches in 1
second - each containing 20 votes.
The next second: In order to further grow memory usage, attackers have to
maintain 10 messages per batch and second. Number of batches equals the number
of attackers, each has 10 messages per second, all are needed to maintain the
batches in memory. Therefore we have a hard cap of around 330 (number of
malicious nodes) open batches. Each can be filled with number of malicious
actor's votes. So 330 batches with each 330 votes: Let's assume approximately 100
bytes per signature/vote. This results in a worst case memory usage of
`330 * 330 * 100 ~= 10 MiB`.
For 10_000 validators, we are already in the Gigabyte range, which means that
with a validator set that large we might want to be more strict with the rate limit or
require a larger rate of incoming votes per batch to keep them alive.
For a thousand validators a limit on batches of around 1000 should never be
reached in practice. Hence due to rate limiting we have a very good chance to
not ever having to drop a potential valid dispute due to some resource limit.
Further safe guards are possible: The dispute-coordinator actually
confirms/denies imports. So once we receive a denial by the dispute-coordinator
for the initial imported votes, we can opt into flushing the batch immediately
and importing the votes. This swaps memory usage for more CPU usage, but if that
import is deemed invalid again we can immediately decrease the reputation of the
sending peers, so this should be a net win. For the time being we punt on this
for simplicity.
Instead of filling batches to maximize memory usage, attackers could also try to
overwhelm the dispute coordinator by only sending votes for new candidates all
the time. This attack vector is mitigated also by above rate limit and
decreasing the peer's reputation on denial of the invalid imports by the
coordinator.
### Node Startup
Nothing special happens on node startup. We expect the `dispute-coordinator` to
inform us about any ongoing disputes via `SendDispute` messages.
## Backing and Approval Votes
Backing and approval votes get imported when they arrive/are created via the
dispute coordinator by corresponding subsystems.
We assume that under normal operation each node will be aware of backing and
approval votes and optimize for that case. Nevertheless we want disputes to
conclude fast and reliable, therefore if a node is not aware of backing/approval
votes it can request the missing votes from the node that informed it about the
dispute (see [Resiliency](#Resiliency])
## Resiliency
The above protocol should be sufficient for most cases, but there are certain
cases we also want to have covered:
- Non validator nodes might be interested in ongoing voting, even before it is
recorded on chain.
- Nodes might have missed votes, especially backing or approval votes.
Recovering them from chain is difficult and expensive, due to runtime upgrades
and untyped extrinsics.
- More importantly, on era changes the new authority set, from the perspective
of approval-voting have no need to see "old" approval votes, hence they might
not see them, can therefore not import them into the dispute coordinator and
therefore no authority will put them on chain.
To cover those cases, we introduce a second request/response protocol, which can
be handled on a lower priority basis as the one above. It consists of the
request/response messages as described in the [protocol
section][#vote-recovery].
Nodes may send those requests to validators, if they feel they are missing
votes. E.g. after some timeout, if no majority was reached yet in their point of
view or if they are not aware of any backing/approval votes for a received
disputed candidate.
The receiver of a `IHaveVotesRequest` message will do the following:
1. See if the sender is missing votes we are aware of - if so, respond with
those votes.
2. Check whether the sender knows about any votes, we don't know about and if so
send a `IHaveVotesRequest` request back, with our knowledge.
3. Record the peer's knowledge.
When to send `IHaveVotesRequest` messages:
1. Whenever we are asked to do so via
`DisputeDistributionMessage::FetchMissingVotes`.
2. Approximately once per block to some random validator as long as the dispute
is active.
Spam considerations: Nodes want to accept those messages once per validator and
per slot. They are free to drop more frequent requests or requests for stale
data. Requests coming from non validator nodes, can be handled on a best effort
basis.
## Considerations
Dispute distribution is critical. We should keep track of available validator
connections and issue warnings if we are not connected to a majority of
validators. We should also keep track of failed sending attempts and log
warnings accordingly. As disputes are rare and TCP is a reliable protocol,
probably each failed attempt should trigger a warning in logs and also logged
into some Prometheus metric.
## Disputes for non available candidates
If deemed necessary we can later on also support disputes for non available
candidates, but disputes for those cases have totally different requirements.
First of all such disputes are not time critical. We just want to have
some offender slashed at some point, but we have no risk of finalizing any bad
data.
Second, as we won't have availability for such data, the node that initiated the
dispute will be responsible for providing the disputed data initially. Then
nodes which did the check already are also providers of the data, hence
distributing load and making prevention of the dispute from concluding harder
and harder over time. Assuming an attacker can not DoS a node forever, the
dispute will succeed eventually, which is all that matters. And again, even if
an attacker managed to prevent such a dispute from happening somehow, there is
no real harm done: There was no serious attack to begin with.
[DisputeDistributionMessage]: ../../types/overseer-protocol.md#dispute-distribution-message
[RuntimeApiMessage]: ../../types/overseer-protocol.md#runtime-api-message
@@ -0,0 +1,25 @@
# GRANDPA Voting Rule
Specifics on the motivation and types of constraints we apply to the GRANDPA voting logic as well as the definitions of
**viable** and **finalizable** blocks can be found in the [Chain Selection Protocol](../protocol-chain-selection.md)
section. The subsystem which provides us with viable leaves is the [Chain Selection
Subsystem](utility/chain-selection.md).
GRANDPA's regular voting rule is for each validator to select the longest chain they are aware of. GRANDPA proceeds in
rounds, collecting information from all online validators and determines the blocks that a supermajority of validators
all have in common with each other.
The low-level GRANDPA logic will provide us with a **required block**. We can find the best leaf containing that block
in its chain with the
[`ChainSelectionMessage::BestLeafContaining`](../types/overseer-protocol.md#chain-selection-message). If the result is
`None`, then we will simply cast a vote on the required block.
The **viable** leaves provided from the chain selection subsystem are not necessarily **finalizable**, so we need to
perform further work to discover the finalizable ancestor of the block. The first constraint is to avoid voting on any
unapproved block. The highest approved ancestor of a given block can be determined by querying the Approval Voting
subsystem via the [`ApprovalVotingMessage::ApprovedAncestor`](../types/overseer-protocol.md#approval-voting) message. If
the response is `Some`, we continue and apply the second constraint. The second constraint is to avoid voting on any
block containing a candidate undergoing an active dispute. The list of block hashes and candidates returned from
`ApprovedAncestor` should be reversed, and passed to the
[`DisputeCoordinatorMessage::DetermineUndisputedChain`](../types/overseer-protocol.md#dispute-coordinator-message) to
determine the **finalizable** block which will be our eventual vote.
@@ -0,0 +1,147 @@
# Overseer
The overseer is responsible for these tasks:
1. Setting up, monitoring, and handing failure for overseen subsystems.
1. Providing a "heartbeat" of which relay-parents subsystems should be working on.
1. Acting as a message bus between subsystems.
The hierarchy of subsystems:
```text
+--------------+ +------------------+ +--------------------+
| | | |----> Subsystem A |
| Block Import | | | +--------------------+
| Events |------> | +--------------------+
+--------------+ | |----> Subsystem B |
| Overseer | +--------------------+
+--------------+ | | +--------------------+
| | | |----> Subsystem C |
| Finalization |------> | +--------------------+
| Events | | | +--------------------+
| | | |----> Subsystem D |
+--------------+ +------------------+ +--------------------+
```
The overseer determines work to do based on block import events and block finalization events. It does this by keeping
track of the set of relay-parents for which work is currently being done. This is known as the "active leaves" set. It
determines an initial set of active leaves on startup based on the data on-disk, and uses events about blockchain import
to update the active leaves. Updates lead to
[`OverseerSignal`](../types/overseer-protocol.md#overseer-signal)`::ActiveLeavesUpdate` being sent according to new
relay-parents, as well as relay-parents to stop considering. Block import events inform the overseer of leaves that no
longer need to be built on, now that they have children, and inform us to begin building on those children. Block
finalization events inform us when we can stop focusing on blocks that appear to have been orphaned.
The overseer is also responsible for tracking the freshness of active leaves. Leaves are fresh when they're encountered
for the first time, and stale when they're encountered for subsequent times. This can occur after chain reversions or
when the fork-choice rule abandons some chain. This distinction is used to manage **Reversion Safety**. Consensus
messages are often localized to a specific relay-parent, and it is often a misbehavior to equivocate or sign two
conflicting messages. When reverting the chain, we may begin work on a leaf that subsystems have already signed messages
for. Subsystems which need to account for reversion safety should avoid performing work on stale leaves.
The overseer's logic can be described with these functions:
## On Startup
* Start all subsystems
* Determine all blocks of the blockchain that should be built on. This should typically be the head of the best fork of
the chain we are aware of. Sometimes add recent forks as well.
* Send an `OverseerSignal::ActiveLeavesUpdate` to all subsystems with `activated` containing each of these blocks.
* Begin listening for block import and finality events
## On Block Import Event
* Apply the block import event to the active leaves. A new block should lead to its addition to the active leaves set
and its parent being deactivated.
* Mark any stale leaves as stale. The overseer should track all leaves it activates to determine whether leaves are
fresh or stale.
* Send an `OverseerSignal::ActiveLeavesUpdate` message to all subsystems containing all activated and deactivated
leaves.
* Ensure all `ActiveLeavesUpdate` messages are flushed before resuming activity as a message router.
> TODO: in the future, we may want to avoid building on too many sibling blocks at once. the notion of a "preferred
> head" among many competing sibling blocks would imply changes in our "active leaves" update rules here
## On Finalization Event
* Note the height `h` of the newly finalized block `B`.
* Prune all leaves from the active leaves which have height `<= h` and are not `B`.
* Issue `OverseerSignal::ActiveLeavesUpdate` containing all deactivated leaves.
## On Subsystem Failure
Subsystems are essential tasks meant to run as long as the node does. Subsystems can spawn ephemeral work in the form of
jobs, but the subsystems themselves should not go down. If a subsystem goes down, it will be because of a critical error
that should take the entire node down as well.
## Communication Between Subsystems
When a subsystem wants to communicate with another subsystem, or, more typically, a job within a subsystem wants to
communicate with its counterpart under another subsystem, that communication must happen via the overseer. Consider this
example where a job on subsystem A wants to send a message to its counterpart under subsystem B. This is a realistic
scenario, where you can imagine that both jobs correspond to work under the same relay-parent.
```text
+--------+ +--------+
| | | |
|Job A-1 | (sends message) (receives message) |Job B-1 |
| | | |
+----|---+ +----^---+
| +------------------------------+ ^
v | | |
+---------v---------+ | | +---------|---------+
| | | | | |
| Subsystem A | | Overseer / Message | | Subsystem B |
| -------->> Bus -------->> |
| | | | | |
+-------------------+ | | +-------------------+
| |
+------------------------------+
```
First, the subsystem that spawned a job is responsible for handling the first step of the communication. The overseer is
not aware of the hierarchy of tasks within any given subsystem and is only responsible for subsystem-to-subsystem
communication. So the sending subsystem must pass on the message via the overseer to the receiving subsystem, in such a
way that the receiving subsystem can further address the communication to one of its internal tasks, if necessary.
This communication prevents a certain class of race conditions. When the Overseer determines that it is time for
subsystems to begin working on top of a particular relay-parent, it will dispatch a `ActiveLeavesUpdate` message to all
subsystems to do so, and those messages will be handled asynchronously by those subsystems. Some subsystems will receive
those messages before others, and it is important that a message sent by subsystem A after receiving
`ActiveLeavesUpdate` message will arrive at subsystem B after its `ActiveLeavesUpdate` message. If subsystem A
maintained an independent channel with subsystem B to communicate, it would be possible for subsystem B to handle the
side message before the `ActiveLeavesUpdate` message, but it wouldn't have any logical course of action to take with the
side message - leading to it being discarded or improperly handled. Well-architected state machines should have a
single source of inputs, so that is what we do here.
One exception is reasonable to make for responses to requests. A request should be made via the overseer in order to
ensure that it arrives after any relevant `ActiveLeavesUpdate` message. A subsystem issuing a request as a result of a
`ActiveLeavesUpdate` message can safely receive the response via a side-channel for two reasons:
1. It's impossible for a request to be answered before it arrives, it is provable that any response to a request obeys
the same ordering constraint.
1. The request was sent as a result of handling a `ActiveLeavesUpdate` message. Then there is no possible future in
which the `ActiveLeavesUpdate` message has not been handled upon the receipt of the response.
So as a single exception to the rule that all communication must happen via the overseer we allow the receipt of
responses to requests via a side-channel, which may be established for that purpose. This simplifies any cases where the
outside world desires to make a request to a subsystem, as the outside world can then establish a side-channel to
receive the response on.
It's important to note that the overseer is not aware of the internals of subsystems, and this extends to the jobs that
they spawn. The overseer isn't aware of the existence or definition of those jobs, and is only aware of the outer
subsystems with which it interacts. This gives subsystem implementations leeway to define internal jobs as they see fit,
and to wrap a more complex hierarchy of state machines than having a single layer of jobs for relay-parent-based work.
Likewise, subsystems aren't required to spawn jobs. Certain types of subsystems, such as those for shared storage or
networking resources, won't perform block-based work but would still benefit from being on the Overseer's message bus.
These subsystems can just ignore the overseer's signals for block-based work.
Furthermore, the protocols by which subsystems communicate with each other should be well-defined irrespective of the
implementation of the subsystem. In other words, their interface should be distinct from their implementation. This will
prevent subsystems from accessing aspects of each other that are beyond the scope of the communication boundary.
## On shutdown
Send an `OverseerSignal::Conclude` message to each subsystem and wait some time for them to conclude before
hard-exiting.
@@ -0,0 +1,469 @@
# Subsystems and Jobs
In this section we define the notions of Subsystems and Jobs. These are
guidelines for how we will employ an architecture of hierarchical state
machines. We'll have a top-level state machine which oversees the next level of
state machines which oversee another layer of state machines and so on. The next
sections will lay out these guidelines for what we've called subsystems and
jobs, since this model applies to many of the tasks that the Node-side behavior
needs to encompass, but these are only guidelines and some Subsystems may have
deeper hierarchies internally.
Subsystems are long-lived worker tasks that are in charge of performing some
particular kind of work. All subsystems can communicate with each other via a
well-defined protocol. Subsystems can't generally communicate directly, but must
coordinate communication through an [Overseer](overseer.md), which is
responsible for relaying messages, handling subsystem failures, and dispatching
work signals.
Most work that happens on the Node-side is related to building on top of a
specific relay-chain block, which is contextually known as the "relay parent".
We call it the relay parent to explicitly denote that it is a block in the relay
chain and not on a teyrchain. We refer to the parent because when we are in the
process of building a new block, we don't know what that new block is going to
be. The parent block is our only stable point of reference, even though it is
usually only useful when it is not yet a parent but in fact a leaf of the
block-DAG expected to soon become a parent (because validators are authoring on
top of it). Furthermore, we are assuming a forkful blockchain-extension
protocol, which means that there may be multiple possible children of the
relay-parent. Even if the relay parent has multiple children blocks, the parent
of those children is the same, and the context in which those children is
authored should be the same. The parent block is the best and most stable
reference to use for defining the scope of work items and messages, and is
typically referred to by its cryptographic hash.
Since this goal of determining when to start and conclude work relative to a
specific relay-parent is common to most, if not all subsystems, it is logically
the job of the Overseer to distribute those signals as opposed to each subsystem
duplicating that effort, potentially being out of synchronization with each
other. Subsystem A should be able to expect that subsystem B is working on the
same relay-parents as it is. One of the Overseer's tasks is to provide this
heartbeat, or synchronized rhythm, to the system.
The work that subsystems spawn to be done on a specific relay-parent is known as
a job. Subsystems should set up and tear down jobs according to the signals
received from the overseer. Subsystems may share or cache state between jobs.
Subsystems must be robust to spurious exits. The outputs of the set of
subsystems as a whole comprises of signed messages and data committed to disk.
Care must be taken to avoid issuing messages that are not substantiated. Since
subsystems need to be safe under spurious exits, it is the expected behavior
that an `OverseerSignal::Conclude` can just lead to breaking the loop and
exiting directly as opposed to waiting for everything to shut down gracefully.
## Subsystem Message Traffic
Which subsystems send messages to which other subsystems.
**Note**: This diagram omits the overseer for simplicity. In fact, all messages
are relayed via the overseer.
**Note**: Messages with a filled diamond arrowhead ("♦") include a
`oneshot::Sender` which communicates a response from the recipient. Messages
with an open triangle arrowhead ("Δ") do not include a return sender.
```dot process
digraph {
rankdir=LR;
node [shape = oval];
concentrate = true;
av_store [label = "Availability Store"]
avail_dist [label = "Availability Distribution"]
avail_rcov [label = "Availability Recovery"]
bitf_dist [label = "Bitfield Distribution"]
bitf_sign [label = "Bitfield Signing"]
cand_back [label = "Candidate Backing"]
cand_sel [label = "Candidate Selection"]
cand_val [label = "Candidate Validation"]
chn_api [label = "Chain API"]
coll_gen [label = "Collation Generation"]
coll_prot [label = "Collator Protocol"]
net_brdg [label = "Network Bridge"]
pov_dist [label = "PoV Distribution"]
provisioner [label = "Provisioner"]
runt_api [label = "Runtime API"]
stmt_dist [label = "Statement Distribution"]
av_store -> runt_api [arrowhead = "diamond", label = "Request::CandidateEvents"]
av_store -> chn_api [arrowhead = "diamond", label = "BlockNumber"]
av_store -> chn_api [arrowhead = "diamond", label = "BlockHeader"]
av_store -> runt_api [arrowhead = "diamond", label = "Request::Validators"]
av_store -> chn_api [arrowhead = "diamond", label = "FinalizedBlockHash"]
avail_dist -> net_brdg [arrowhead = "onormal", label = "Request::SendValidationMessages"]
avail_dist -> runt_api [arrowhead = "diamond", label = "Request::AvailabilityCores"]
avail_dist -> net_brdg [arrowhead = "onormal", label = "ReportPeer"]
avail_dist -> av_store [arrowhead = "diamond", label = "QueryDataAvailability"]
avail_dist -> av_store [arrowhead = "diamond", label = "QueryChunk"]
avail_dist -> av_store [arrowhead = "diamond", label = "StoreChunk"]
avail_dist -> runt_api [arrowhead = "diamond", label = "Request::Validators"]
avail_dist -> chn_api [arrowhead = "diamond", label = "Ancestors"]
avail_dist -> runt_api [arrowhead = "diamond", label = "Request::SessionIndexForChild"]
avail_rcov -> net_brdg [arrowhead = "onormal", label = "ReportPeer"]
avail_rcov -> av_store [arrowhead = "diamond", label = "QueryChunk"]
avail_rcov -> net_brdg [arrowhead = "diamond", label = "ConnectToValidators"]
avail_rcov -> net_brdg [arrowhead = "onormal", label = "SendValidationMessage::Chunk"]
avail_rcov -> net_brdg [arrowhead = "onormal", label = "SendValidationMessage::RequestChunk"]
bitf_dist -> net_brdg [arrowhead = "onormal", label = "ReportPeer"]
bitf_dist -> provisioner [arrowhead = "onormal", label = "ProvisionableData::Bitfield"]
bitf_dist -> net_brdg [arrowhead = "onormal", label = "SendValidationMessage"]
bitf_dist -> net_brdg [arrowhead = "onormal", label = "SendValidationMessage"]
bitf_dist -> runt_api [arrowhead = "diamond", label = "Request::Validatiors"]
bitf_dist -> runt_api [arrowhead = "diamond", label = "Request::SessionIndexForChild"]
bitf_sign -> av_store [arrowhead = "diamond", label = "QueryChunkAvailability"]
bitf_sign -> runt_api [arrowhead = "diamond", label = "Request::AvailabilityCores"]
bitf_sign -> bitf_dist [arrowhead = "onormal", label = "DistributeBitfield"]
cand_back -> av_store [arrowhead = "diamond", label = "StoreAvailableData"]
cand_back -> pov_dist [arrowhead = "diamond", label = "FetchPoV"]
cand_back -> cand_val [arrowhead = "diamond", label = "ValidateFromChainState"]
cand_back -> cand_sel [arrowhead = "onormal", label = "Invalid"]
cand_back -> provisioner [arrowhead = "onormal", label = "ProvisionableData::MisbehaviorReport"]
cand_back -> provisioner [arrowhead = "onormal", label = "ProvisionableData::BackedCandidate"]
cand_back -> pov_dist [arrowhead = "onormal", label = "DistributePoV"]
cand_back -> stmt_dist [arrowhead = "onormal", label = "Share"]
cand_sel -> coll_prot [arrowhead = "diamond", label = "FetchCollation"]
cand_sel -> cand_back [arrowhead = "onormal", label = "Second"]
cand_val -> runt_api [arrowhead = "diamond", label = "Request::PersistedValidationData"]
cand_val -> runt_api [arrowhead = "diamond", label = "Request::ValidationCode"]
cand_val -> runt_api [arrowhead = "diamond", label = "Request::CheckValidationOutputs"]
coll_gen -> coll_prot [arrowhead = "onormal", label = "DistributeCollation"]
coll_prot -> net_brdg [arrowhead = "onormal", label = "ReportPeer"]
coll_prot -> net_brdg [arrowhead = "onormal", label = "Declare"]
coll_prot -> net_brdg [arrowhead = "onormal", label = "AdvertiseCollation"]
coll_prot -> net_brdg [arrowhead = "onormal", label = "Collation"]
coll_prot -> net_brdg [arrowhead = "onormal", label = "RequestCollation"]
coll_prot -> cand_sel [arrowhead = "onormal", label = "Collation"]
net_brdg -> avail_dist [arrowhead = "onormal", label = "NetworkBridgeUpdate"]
net_brdg -> bitf_dist [arrowhead = "onormal", label = "NetworkBridgeUpdate"]
net_brdg -> pov_dist [arrowhead = "onormal", label = "NetworkBridgeUpdate"]
net_brdg -> stmt_dist [arrowhead = "onormal", label = "NetworkBridgeUpdate"]
net_brdg -> coll_prot [arrowhead = "onormal", label = "NetworkBridgeUpdate"]
pov_dist -> net_brdg [arrowhead = "onormal", label = "SendValidationMessage"]
pov_dist -> net_brdg [arrowhead = "onormal", label = "ReportPeer"]
provisioner -> cand_back [arrowhead = "diamond", label = "GetBackedCandidates"]
provisioner -> chn_api [arrowhead = "diamond", label = "BlockNumber"]
stmt_dist -> net_brdg [arrowhead = "onormal", label = "SendValidationMessage"]
stmt_dist -> net_brdg [arrowhead = "onormal", label = "ReportPeer"]
stmt_dist -> cand_back [arrowhead = "onormal", label = "Statement"]
stmt_dist -> runt_api [arrowhead = "onormal", label = "Request::Validators"]
stmt_dist -> runt_api [arrowhead = "onormal", label = "Request::SessionIndexForChild"]
}
```
## The Path to Inclusion (Node Side)
Let's contextualize that diagram a bit by following a teyrchain block from its
creation through finalization. Teyrchains can use completely arbitrary processes
to generate blocks. The relay chain doesn't know or care about the details; each
teyrchain just needs to provide a [collator](collators/collation-generation.md).
**Note**: Inter-subsystem communications are relayed via the overseer, but that
step is omitted here for brevity.
**Note**: Dashed lines indicate a request/response cycle, where the response is
communicated asynchronously via a oneshot channel. Adjacent dashed lines may be
processed in parallel.
```mermaid
sequenceDiagram
participant Overseer
participant CollationGeneration
participant RuntimeApi
participant CollatorProtocol
Overseer ->> CollationGeneration: ActiveLeavesUpdate
loop for each activated head
CollationGeneration -->> RuntimeApi: Request availability cores
CollationGeneration -->> RuntimeApi: Request validators
Note over CollationGeneration: Determine an appropriate ScheduledCore <br/>and OccupiedCoreAssumption
CollationGeneration -->> RuntimeApi: Request full validation data
Note over CollationGeneration: Build the collation
CollationGeneration ->> CollatorProtocol: DistributeCollation
end
```
The `DistributeCollation` messages that `CollationGeneration` sends to the
`CollatorProtocol` contains two items: a `CandidateReceipt` and `PoV`. The
`CollatorProtocol` is then responsible for distributing that collation to
interested validators. However, not all potential collations are of interest.
The `CandidateSelection` subsystem is responsible for determining which
collations are interesting, before `CollatorProtocol` actually fetches the
collation.
```mermaid
sequenceDiagram
participant CollationGeneration
participant CS as CollatorProtocol::CollatorSide
participant NB as NetworkBridge
participant VS as CollatorProtocol::ValidatorSide
participant CandidateSelection
CollationGeneration ->> CS: DistributeCollation
CS -->> NB: ConnectToValidators
Note over CS,NB: This connects to multiple validators.
CS ->> NB: Declare
NB ->> VS: Declare
Note over CS: Ensure that the connected validator is among<br/>the para's validator set. Otherwise, skip it.
CS ->> NB: AdvertiseCollation
NB ->> VS: AdvertiseCollation
VS ->> CandidateSelection: Collation
Note over CandidateSelection: Lots of other machinery in play here,<br/>but there are only two outcomes from the<br/>perspective of the `CollatorProtocol`:
alt happy path
CandidateSelection -->> VS: FetchCollation
Activate VS
VS ->> NB: RequestCollation
NB ->> CS: RequestCollation
CS ->> NB: Collation
NB ->> VS: Collation
Deactivate VS
else CandidateSelection already selected a different candidate
Note over CandidateSelection: silently drop
end
```
Assuming we hit the happy path, flow continues with `CandidateSelection`
receiving a `(candidate_receipt, pov)` as the return value from its
`FetchCollation` request. The only time `CandidateSelection` actively requests a
collation is when it hasn't yet seconded one for some `relay_parent`, and is
ready to second.
```mermaid
sequenceDiagram
participant CS as CandidateSelection
participant CB as CandidateBacking
participant CV as CandidateValidation
participant PV as Provisioner
participant SD as StatementDistribution
participant PD as PoVDistribution
CS ->> CB: Second
% fn validate_and_make_available
CB -->> CV: ValidateFromChainState
Note over CB,CV: There's some complication in the source, as<br/>candidates are actually validated in a separate task.
alt valid
Note over CB: This is where we transform the CandidateReceipt into a CommittedCandidateReceipt
% CandidateBackingJob::sign_import_and_distribute_statement
% CandidateBackingJob::import_statement
CB ->> PV: ProvisionableData::BackedCandidate
% CandidateBackingJob::issue_new_misbehaviors
opt if there is misbehavior to report
CB ->> PV: ProvisionableData::MisbehaviorReport
end
% CandidateBackingJob::distribute_signed_statement
CB ->> SD: Share
% CandidateBackingJob::distribute_pov
CB ->> PD: DistributePoV
else invalid
CB ->> CS: Invalid
end
```
At this point, you'll see that control flows in two directions: to
`StatementDistribution` to distribute the `SignedStatement`, and to
`PoVDistribution` to distribute the `PoV`. However, that's largely a mirage:
while the initial implementation distributes `PoV`s by gossip, that's
inefficient, and will be replaced with a system which fetches `PoV`s only when
actually necessary.
> TODO: figure out more precisely the current status and plans; write them up
Therefore, we'll follow the `SignedStatement`. The `StatementDistribution`
subsystem is largely concerned with implementing a gossip protocol:
```mermaid
sequenceDiagram
participant SD as StatementDistribution
participant NB as NetworkBridge
alt On receipt of a<br/>SignedStatement from CandidateBacking
% fn circulate_statement_and_dependents
SD ->> NB: SendValidationMessage
Note right of NB: Bridge sends validation message to all appropriate peers
else On receipt of peer validation message
NB ->> SD: NetworkBridgeUpdate
% fn handle_incoming_message
alt if we aren't already aware of the relay parent for this statement
SD ->> NB: ReportPeer
end
% fn circulate_statement
opt if we know of peers who haven't seen this message, gossip it
SD ->> NB: SendValidationMessage
end
end
```
But who are these `Listener`s who've asked to be notified about incoming
`SignedStatement`s? Nobody, as yet.
Let's pick back up with the PoV Distribution subsystem.
```mermaid
sequenceDiagram
participant CB as CandidateBacking
participant PD as PoVDistribution
participant Listener
participant NB as NetworkBridge
CB ->> PD: DistributePoV
Note over PD,Listener: Various subsystems can register listeners for when PoVs arrive
loop for each Listener
PD ->> Listener: Arc<PoV>
end
Note over PD: Gossip to connected peers
PD ->> NB: SendPoV
Note over PD,NB: On receipt of a network PoV, PovDistribution forwards it to each Listener.<br/>It also penalizes bad gossipers.
```
Unlike in the case of `StatementDistribution`, there is another subsystem which
in various circumstances already registers a listener to be notified when a new
`PoV` arrives: `CandidateBacking`. Note that this is the second time that
`CandidateBacking` has gotten involved. The first instance was from the
perspective of the validator choosing to second a candidate via its
`CandidateSelection` subsystem. This time, it's from the perspective of some
other validator, being informed that this foreign `PoV` has been received.
```mermaid
sequenceDiagram
participant SD as StatementDistribution
participant CB as CandidateBacking
participant PD as PoVDistribution
participant AS as AvailabilityStore
SD ->> CB: Statement
% CB::maybe_validate_and_import => CB::kick_off_validation_work
CB -->> PD: FetchPoV
Note over CB,PD: This call creates the Listener from the previous diagram
CB ->> AS: StoreAvailableData
```
At this point, things have gone a bit nonlinear. Let's pick up the thread again
with `BitfieldSigning`. As the `Overseer` activates each relay parent, it starts
a `BitfieldSigningJob` which operates on an extremely simple metric: after
creation, it immediately goes to sleep for 1.5 seconds. On waking, it records
the state of the world pertaining to availability at that moment.
```mermaid
sequenceDiagram
participant OS as Overseer
participant BS as BitfieldSigning
participant RA as RuntimeApi
participant AS as AvailabilityStore
participant BD as BitfieldDistribution
OS ->> BS: ActiveLeavesUpdate
loop for each activated relay parent
Note over BS: Wait 1.5 seconds
BS -->> RA: Request::AvailabilityCores
loop for each availability core
BS -->> AS: QueryChunkAvailability
end
BS ->> BD: DistributeBitfield
end
```
`BitfieldDistribution` is, like the other `*Distribution` subsystems, primarily
interested in implementing a peer-to-peer gossip network propagating its
particular messages. However, it also serves as an essential relay passing the
message along.
```mermaid
sequenceDiagram
participant BS as BitfieldSigning
participant BD as BitfieldDistribution
participant NB as NetworkBridge
participant PV as Provisioner
BS ->> BD: DistributeBitfield
BD ->> PV: ProvisionableData::Bitfield
BD ->> NB: SendValidationMessage::BitfieldDistribution::Bitfield
```
We've now seen the message flow to the `Provisioner`: both `CandidateBacking`
and `BitfieldDistribution` contribute provisionable data. Now, let's look at
that subsystem.
Much like the `BitfieldSigning` subsystem, the `Provisioner` creates a new job
for each newly-activated leaf, and starts a timer. Unlike `BitfieldSigning`, we
won't depict that part of the process, because the `Provisioner` also has other
things going on.
```mermaid
sequenceDiagram
participant A as Arbitrary
participant PV as Provisioner
participant CB as CandidateBacking
participant BD as BitfieldDistribution
participant RA as RuntimeApi
participant PI as TeyrchainsInherentDataProvider
alt receive provisionable data
alt
CB ->> PV: ProvisionableData
else
BD ->> PV: ProvisionableData
end
loop over stored Senders
PV ->> A: ProvisionableData
end
Note over PV: store bitfields and backed candidates
else receive request for inherent data
PI ->> PV: RequestInherentData
alt we have already constructed the inherent data
PV ->> PI: send the inherent data
else we have not yet constructed the inherent data
Note over PV,PI: Store the return sender without sending immediately
end
else timer times out
note over PV: Waited 2 seconds
PV -->> RA: RuntimeApiRequest::AvailabilityCores
Note over PV: construct and store the inherent data
loop over stored inherent data requests
PV ->> PI: (SignedAvailabilityBitfields, BackedCandidates)
end
end
```
In principle, any arbitrary subsystem could send a `RequestInherentData` to the
`Provisioner`. In practice, only the `TeyrchainsInherentDataProvider` does so.
The tuple `(SignedAvailabilityBitfields, BackedCandidates, ParentHeader)` is
injected by the `TeyrchainsInherentDataProvider` into the inherent data. From
that point on, control passes from the node to the runtime.
@@ -0,0 +1,3 @@
# Utility Subsystems
The utility subsystems are an assortment which don't have a natural home in another subsystem collection.
@@ -0,0 +1,240 @@
# Availability Store
This is a utility subsystem responsible for keeping available certain data and pruning that data.
The two data types:
- Full PoV blocks of candidates we have validated
- Availability chunks of candidates that were backed and noted available on-chain.
For each of these data we have pruning rules that determine how long we need to keep that data available.
PoV hypothetically only need to be kept around until the block where the data was made fully available is finalized.
However, disputes can revert finality, so we need to be a bit more conservative and we add a delay. We should keep the
PoV until a block that finalized availability of it has been finalized for 1 day + 1 hour.
Availability chunks need to be kept available until the dispute period for the corresponding candidate has ended. We can
accomplish this by using the same criterion as the above. This gives us a pruning condition of the block finalizing
availability of the chunk being final for 1 day + 1 hour.
There is also the case where a validator commits to make a PoV available, but the corresponding candidate is never
backed. In this case, we keep the PoV available for 1 hour.
There may be multiple competing blocks all ending the availability phase for a particular candidate. Until finality, it
will be unclear which of those is actually the canonical chain, so the pruning records for PoVs and Availability chunks
should keep track of all such blocks.
## Lifetime of the block data and chunks in storage
```dot process
digraph {
label = "Block data FSM\n\n\n";
labelloc = "t";
rankdir="LR";
st [label = "Stored"; shape = circle]
inc [label = "Included"; shape = circle]
fin [label = "Finalized"; shape = circle]
prn [label = "Pruned"; shape = circle]
st -> inc [label = "Block\nincluded"]
st -> prn [label = "Stored block\ntimed out"]
inc -> fin [label = "Block\nfinalized"]
inc -> st [label = "Competing blocks\nfinalized"]
fin -> prn [label = "Block keep time\n(1 day + 1 hour) elapsed"]
}
```
## Database Schema
We use an underlying Key-Value database where we assume we have the following operations available:
- `write(key, value)`
- `read(key) -> Option<value>`
- `iter_with_prefix(prefix) -> Iterator<(key, value)>` - gives all keys and values in lexicographical order where the
key starts with `prefix`.
We use this database to encode the following schema:
```rust
("available", CandidateHash) -> Option<AvailableData>
("chunk", CandidateHash, u32) -> Option<ErasureChunk>
("meta", CandidateHash) -> Option<CandidateMeta>
("unfinalized", BlockNumber, BlockHash, CandidateHash) -> Option<()>
("prune_by_time", Timestamp, CandidateHash) -> Option<()>
```
Timestamps are the wall-clock seconds since Unix epoch. Timestamps and block numbers are both encoded as big-endian so
lexicographic order is ascending.
The meta information that we track per-candidate is defined as the `CandidateMeta` struct
```rust
struct CandidateMeta {
state: State,
data_available: bool,
chunks_stored: Bitfield,
}
enum State {
/// Candidate data was first observed at the given time but is not available in any block.
Unavailable(Timestamp),
/// The candidate was first observed at the given time and was included in the given list of unfinalized blocks, which may be
/// empty. The timestamp here is not used for pruning. Either one of these blocks will be finalized or the state will regress to
/// `State::Unavailable`, in which case the same timestamp will be reused.
Unfinalized(Timestamp, Vec<(BlockNumber, BlockHash)>),
/// Candidate data has appeared in a finalized block and did so at the given time.
Finalized(Timestamp)
}
```
We maintain the invariant that if a candidate has a meta entry, its available data exists on disk if `data_available` is
true. All chunks mentioned in the meta entry are available.
Additionally, there is exactly one `prune_by_time` entry which holds the candidate hash unless the state is
`Unfinalized`. There may be zero, one, or many "unfinalized" keys with the given candidate, and this will correspond to
the `state` of the meta entry.
## Protocol
Input: [`AvailabilityStoreMessage`][ASM]
Output:
- [`RuntimeApiMessage`][RAM]
## Functionality
For each head in the `activated` list:
- Load all ancestors of the head back to the finalized block so we don't miss anything if import notifications are
missed. If a `StoreChunk` message is received for a candidate which has no entry, then we will prematurely lose the
data.
- Note any new candidates backed in the head. Update the `CandidateMeta` for each. If the `CandidateMeta` does not
exist, create it as `Unavailable` with the current timestamp. Register a `"prune_by_time"` entry based on the current
timestamp + 1 hour.
- Note any new candidate included in the head. Update the `CandidateMeta` for each, performing a transition from
`Unavailable` to `Unfinalized` if necessary. That includes removing the `"prune_by_time"` entry. Add the head hash and
number to the state, if unfinalized. Add an `"unfinalized"` entry for the block and candidate.
- The `CandidateEvent` runtime API can be used for this purpose.
On `OverseerSignal::BlockFinalized(finalized)` events:
- for each key in `iter_with_prefix("unfinalized")`
- Stop if the key is beyond `("unfinalized, finalized)`
- For each block number f that we encounter, load the finalized hash for that block.
- The state of each `CandidateMeta` we encounter here must be `Unfinalized`, since we loaded the candidate from an
`"unfinalized"` key.
- For each candidate that we encounter under `f` and the finalized block hash,
- Update the `CandidateMeta` to have `State::Finalized`. Remove all `"unfinalized"` entries from the old
`Unfinalized` state.
- Register a `"prune_by_time"` entry for the candidate based on the current time + 1 day + 1 hour.
- For each candidate that we encounter under `f` which is not under the finalized block hash,
- Remove all entries under `f` in the `Unfinalized` state.
- If the `CandidateMeta` has state `Unfinalized` with an empty list of blocks, downgrade to `Unavailable` and
re-schedule pruning under the timestamp + 1 hour. We do not prune here as the candidate still may be included in
a descendant of the finalized chain.
- Remove all `"unfinalized"` keys under `f`.
- Update `last_finalized` = finalized.
This is roughly `O(n * m)` where n is the number of blocks finalized since the last update, and `m` is the number of
teyrchains.
On `QueryAvailableData` message:
- Query `("available", candidate_hash)`
This is `O(n)` in the size of the data, which may be large.
On `QueryDataAvailability` message:
- Query whether `("meta", candidate_hash)` exists and `data_available == true`.
This is `O(n)` in the size of the metadata which is small.
On `QueryChunk` message:
- Query `("chunk", candidate_hash, index)`
This is `O(n)` in the size of the data, which may be large.
On `QueryAllChunks` message:
- Query `("meta", candidate_hash)`. If `None`, send an empty response and return.
- For all `1` bits in the `chunks_stored`, query `("chunk", candidate_hash, index)`. Ignore but warn on errors, and
return a vector of all loaded chunks.
On `QueryChunkAvailability` message:
- Query whether `("meta", candidate_hash)` exists and the bit at `index` is set.
This is `O(n)` in the size of the metadata which is small.
On `StoreChunk` message:
- If there is a `CandidateMeta` under the candidate hash, set the bit of the erasure-chunk in the `chunks_stored`
bitfield to `1`. If it was not `1` already, write the chunk under `("chunk", candidate_hash, chunk_index)`.
This is `O(n)` in the size of the chunk.
On `StoreAvailableData` message:
- Compute the erasure root of the available data and compare it with `expected_erasure_root`. Return
`StoreAvailableDataError::InvalidErasureRoot` on mismatch.
- If there is no `CandidateMeta` under the candidate hash, create it with `State::Unavailable(now)`. Load the
`CandidateMeta` otherwise.
- Store `data` under `("available", candidate_hash)` and set `data_available` to true.
- Store each chunk under `("chunk", candidate_hash, index)` and set every bit in `chunks_stored` to `1`.
This is `O(n)` in the size of the data as the aggregate size of the chunks is proportional to the data.
Every 5 minutes, run a pruning routine:
- for each key in `iter_with_prefix("prune_by_time")`:
- If the key is beyond `("prune_by_time", now)`, return.
- Remove the key.
- Extract `candidate_hash` from the key.
- Load and remove the `("meta", candidate_hash)`
- For each erasure chunk bit set, remove `("chunk", candidate_hash, bit_index)`.
- If `data_available`, remove `("available", candidate_hash)`
This is O(n * m) in the amount of candidates and average size of the data stored. This is probably the most expensive
operation but does not need to be run very often.
## Basic scenarios to test
Basically we need to test the correctness of data flow through state FSMs described earlier. These tests obviously
assume that some mocking of time is happening.
- Stored data that is never included pruned in necessary timeout
- A block (and/or a chunk) is added to the store.
- We never note that the respective candidate is included.
- Until a defined timeout the data in question is available.
- After this timeout the data is no longer available.
- Stored data is kept until we are certain it is finalized.
- A block (and/or a chunk) is added to the store.
- It is available.
- Before the inclusion timeout expires notify storage that the candidate was included.
- The data is still available.
- Wait for an absurd amount of time (longer than 1 day).
- Check that the data is still available.
- Send finality notification about the block in question.
- Wait for some time below finalized data timeout.
- The data is still available.
- Wait until the data should have been pruned.
- The data is no longer available.
- Fork-awareness of the relay chain is taken into account
- Block `B1` is added to the store.
- Block `B2` is added to the store.
- Notify the subsystem that both `B1` and `B2` were included in different leafs of relay chain.
- Notify the subsystem that the leaf with `B1` was finalized.
- Leaf with `B2` is never finalized.
- Leaf with `B2` is pruned and its data is no longer available.
- Wait until the finalized data of `B1` should have been pruned.
- `B1` is no longer available.
[RAM]: ../../types/overseer-protocol.md#runtime-api-message
[ASM]: ../../types/overseer-protocol.md#availability-store-message
@@ -0,0 +1,99 @@
# Candidate Validation
This subsystem is responsible for handling candidate validation requests. It is a simple request/response server.
A variety of subsystems want to know if a teyrchain block candidate is valid. None of them care about the detailed
mechanics of how a candidate gets validated, just the results. This subsystem handles those details.
## High-Level Flow
```dot process
digraph {
rankdir="LR";
pre [label = "Pvf-Checker"; shape = square]
bac [label = "Backing"; shape = square]
app [label = "Approval\nVoting"; shape = square]
dis [label = "Dispute\nCoordinator"; shape = square]
can [label = "Candidate\nValidation"; shape = square]
pvf [label = "PVF Host"; shape = square]
pre -> can [style = dashed]
bac -> can
app -> can
dis -> can
can -> pvf [label = "Precheck"; style = dashed]
can -> pvf [label = "Validate"]
}
```
## Protocol
Input: [`CandidateValidationMessage`](../../types/overseer-protocol.md#validation-request-type)
Output: Validation result via the provided response side-channel.
## Functionality
This subsystem groups the requests it handles in two categories: *candidate validation* and *PVF pre-checking*.
The first category can be further subdivided in two request types: one which draws out validation data from the state,
and another which accepts all validation data exhaustively. Validation returns three possible outcomes on the response
channel: the candidate is valid, the candidate is invalid, or an internal error occurred.
Teyrchain candidates are validated against their validation function: A piece of Wasm code that describes the
state-transition of the teyrchain. Validation function execution is not metered. This means that an execution which is
an infinite loop or simply takes too long must be forcibly exited by some other means. For this reason, we recommend
dispatching candidate validation to be done on subprocesses which can be killed if they time-out.
Upon receiving a validation request, the first thing the candidate validation subsystem should do is make sure it has
all the necessary parameters to the validation function. These are:
* The Validation Function itself.
* The [`CandidateDescriptor`](../../types/candidate.md#candidatedescriptor).
* The [`ValidationData`](../../types/candidate.md#validationdata).
* The [`PoV`](../../types/availability.md#proofofvalidity).
The second category is for PVF pre-checking. This is primarily used by the [PVF pre-checker](pvf-prechecker.md)
subsystem.
### Determining Parameters
For a [`CandidateValidationMessage`][CVM]`::ValidateFromExhaustive`, these parameters are exhaustively provided.
For a [`CandidateValidationMessage`][CVM]`::ValidateFromChainState`, some more work needs to be done. Due to the
uncertainty of Availability Cores (implemented in the [`Scheduler`](../../runtime/scheduler.md) module of the runtime),
a candidate at a particular relay-parent and for a particular para may have two different valid validation-data to be
executed under depending on what is assumed to happen if the para is occupying a core at the onset of the new block.
This is encoded as an `OccupiedCoreAssumption` in the runtime API.
The way that we can determine which assumption the candidate is meant to be executed under is simply to do an exhaustive
check of both possibilities based on the state of the relay-parent. First we fetch the validation data under the
assumption that the block occupying becomes available. If the `validation_data_hash` of the `CandidateDescriptor`
matches this validation data, we use that. Otherwise, if the `validation_data_hash` matches the validation data fetched
under the `TimedOut` assumption, we use that. Otherwise, we return a `ValidationResult::Invalid` response and conclude.
Then, we can fetch the validation code from the runtime based on which type of candidate this is. This gives us all the
parameters. The descriptor and PoV come from the request itself, and the other parameters have been derived from the
state.
> TODO: This would be a great place for caching to avoid making lots of runtime requests. That would need a job, though.
### Execution of the Teyrchain Wasm
Once we have all parameters, we can spin up a background task to perform the validation in a way that doesn't hold up
the entire event loop. Before invoking the validation function itself, this should first do some basic checks:
* The collator signature is valid (only if `CandidateDescriptor` has version 1)
* The PoV provided matches the `pov_hash` field of the descriptor
For more details please see [PVF Host and Workers](pvf-host-and-workers.md).
### Checking Validation Outputs
If we can assume the presence of the relay-chain state (that is, during processing
[`CandidateValidationMessage`][CVM]`::ValidateFromChainState`) we can run all the checks that the relay-chain would run
at the inclusion time thus confirming that the candidate will be accepted.
[CVM]: ../../types/overseer-protocol.md#validationrequesttype
@@ -0,0 +1,23 @@
# Chain API
The Chain API subsystem is responsible for providing a single point of access to chain state data via a set of
pre-determined queries.
## Protocol
Input: [`ChainApiMessage`](../../types/overseer-protocol.md#chain-api-message)
Output: None
## Functionality
On receipt of `ChainApiMessage`, answer the request and provide the response to the side-channel embedded within the
request.
Currently, the following requests are supported:
* Block hash to number
* Block hash to header
* Block weight
* Finalized block number to hash
* Last finalized block number
* Ancestors
@@ -0,0 +1,61 @@
# Chain Selection Subsystem
This subsystem implements the necessary metadata for the implementation of the [chain
selection](../../protocol-chain-selection.md) portion of the protocol.
The subsystem wraps a database component which maintains a view of the unfinalized chain and records the properties of
each block: whether the block is **viable**, whether it is **stagnant**, and whether it is **reverted**. It should also
maintain an updated set of active leaves in accordance with this view, which should be cheap to query. Leaves are
ordered descending first by weight and then by block number.
This subsystem needs to update its information on the unfinalized chain:
* On every leaf-activated signal
* On every block-finalized signal
* On every `ChainSelectionMessage::Approve`
* On every `ChainSelectionMessage::RevertBlocks`
* Periodically, to detect stagnation.
Simple implementations of these updates do `O(n_unfinalized_blocks)` disk operations. If the amount of unfinalized
blocks is relatively small, the updates should not take very much time. However, in cases where there are hundreds or
thousands of unfinalized blocks the naive implementations of these update algorithms would have to be replaced with more
sophisticated versions.
## `OverseerSignal::ActiveLeavesUpdate`
Determine all new blocks implicitly referenced by any new active leaves and add them to the view. Update the set of
viable leaves accordingly. The weights of imported blocks can be determined by the
[`ChainApiMessage::BlockWeight`](../../types/overseer-protocol.md#chain-api-message).
## `OverseerSignal::BlockFinalized`
Delete data for all orphaned chains and update all metadata descending from the new finalized block accordingly, along
with the set of viable leaves. Note that finalizing a **reverted** or **stagnant** block means that the descendants of
those blocks may lose that status because the definitions of those properties don't include the finalized chain. Update
the set of viable leaves accordingly.
## `ChainSelectionMessage::Approved`
Update the approval status of the referenced block. If the block was stagnant and thus non-viable and is now viable,
then the metadata of all of its descendants needs to be updated as well, as they may no longer be stagnant either.
Update the set of viable leaves accordingly.
## `ChainSelectionMessage::Leaves`
Gets all leaves of the chain, i.e. block hashes that are suitable to build upon and have no suitable children. Supplies
the leaves in descending order by score.
## `ChainSelectionMessage::BestLeafContaining`
If the required block is unknown or not viable, then return `None`. Iterate over all leaves in order of descending
weight, returning the first leaf containing the required block in its chain, and `None` otherwise.
## `ChainSelectionMessage::RevertBlocks`
This message indicates that a dispute has concluded against a teyrchain block candidate. The message passes along a
vector containing the block number and block hash of each block where the disputed candidate was included. The passed
blocks will be marked as reverted, and their descendants will be marked as non-viable.
## Periodically
Detect stagnant blocks and apply the stagnant definition to all descendants. Update the set of viable leaves
accordingly.
@@ -0,0 +1,19 @@
# Gossip Support
The Gossip Support Subsystem is responsible for keeping track of session changes
and issuing a connection request to all validators in the next, current and
a few past sessions if we are a validator in these sessions.
The request will add all validators to a reserved PeerSet, meaning we will not
reject a connection request from any validator in that set.
In addition to that, it creates a gossip overlay topology per session which
limits the amount of messages sent and received to be an order of sqrt of the
validators. Our neighbors in this graph will be forwarded to the network bridge
with the `NetworkBridgeMessage::NewGossipTopology` message.
See https://github.com/paritytech/polkadot/issues/3239 for more details.
The gossip topology is used by teyrchain distribution subsystems,
such as Bitfield Distribution, (small) Statement Distribution and
Approval Distribution to limit the amount of peers we send messages to
and handle view updates.
@@ -0,0 +1,161 @@
# Network Bridge
One of the main features of the overseer/subsystem duality is to avoid shared ownership of resources and to communicate
via message-passing. However, implementing each networking subsystem as its own network protocol brings a fair share of
challenges.
The most notable challenge is coordinating and eliminating race conditions of peer connection and disconnection events.
If we have many network protocols that peers are supposed to be connected on, it is difficult to enforce that a peer is
indeed connected on all of them or the order in which those protocols receive notifications that peers have connected.
This becomes especially difficult when attempting to share peer state across protocols. All of the Teyrchain-Host's
gossip protocols eliminate DoS with a data-dependency on current chain heads. However, it is inefficient and confusing
to implement the logic for tracking our current chain heads as well as our peers' on each of those subsystems. Having
one subsystem for tracking this shared state and distributing it to the others is an improvement in architecture and
efficiency.
One other piece of shared state to track is peer reputation. When peers are found to have provided value or cost, we
adjust their reputation accordingly.
So in short, this Subsystem acts as a bridge between an actual network component and a subsystem's protocol. The
implementation of the underlying network component is beyond the scope of this module. We make certain assumptions about
the network component:
- The network allows registering of protocols and multiple versions of each protocol.
- The network handles version negotiation of protocols with peers and only connects the peer on the highest version of
the protocol.
- Each protocol has its own peer-set, although there may be some overlap.
- The network provides peer-set management utilities for discovering the peer-IDs of validators and a means of dialing
peers with given IDs.
The network bridge makes use of the peer-set feature, but is not generic over peer-set. Instead, it exposes two
peer-sets that event producers can attach to: `Validation` and `Collation`. More information can be found on the
documentation of the [`NetworkBridgeMessage`][NBM].
## Protocol
Input: [`NetworkBridgeMessage`][NBM]
Output: - [`ApprovalDistributionMessage`][AppD]`::NetworkBridgeUpdate` -
[`BitfieldDistributionMessage`][BitD]`::NetworkBridgeUpdate` -
[`CollatorProtocolMessage`][CollP]`::NetworkBridgeUpdate` -
[`StatementDistributionMessage`][StmtD]`::NetworkBridgeUpdate`
## Functionality
This network bridge sends messages of these types over the network.
```rust
enum WireMessage<M> {
ProtocolMessage(M),
ViewUpdate(View),
}
```
and instantiates this type twice, once using the [`ValidationProtocolV1`][VP1] message type, and once with the
[`CollationProtocolV1`][CP1] message type.
```rust
type ValidationV1Message = WireMessage<ValidationProtocolV1>;
type CollationV1Message = WireMessage<CollationProtocolV1>;
```
### Startup
On startup, we register two protocols with the underlying network utility. One for validation and one for collation. We
register only version 1 of each of these protocols.
### Main Loop
The bulk of the work done by this subsystem is in responding to network events, signals from the overseer, and messages
from other subsystems.
Each network event is associated with a particular peer-set.
### Overseer Signal: `ActiveLeavesUpdate`
The `activated` and `deactivated` lists determine the evolution of our local view over time. A
`ProtocolMessage::ViewUpdate` is issued to each connected peer on each peer-set, and a
`NetworkBridgeEvent::OurViewChange` is issued to each event handler for each protocol.
We only send view updates if the node has indicated that it has finished major blockchain synchronization.
If we are connected to the same peer on both peer-sets, we will send the peer two view updates as a result.
### Overseer Signal: `BlockFinalized`
We update our view's `finalized_number` to the provided one and delay `ProtocolMessage::ViewUpdate` and
`NetworkBridgeEvent::OurViewChange` till the next `ActiveLeavesUpdate`.
### Network Event: `PeerConnected`
Issue a `NetworkBridgeEvent::PeerConnected` for each [Event Handler](#event-handlers) of the peer-set and negotiated
protocol version of the peer. Also issue a `NetworkBridgeEvent::PeerViewChange` and send the peer our current view, but
only if the node has indicated that it has finished major blockchain synchronization. Otherwise, we only send the peer
an empty view.
### Network Event: `PeerDisconnected`
Issue a `NetworkBridgeEvent::PeerDisconnected` for each [Event Handler](#event-handlers) of the peer-set and negotiated
protocol version of the peer.
### Network Event: `ProtocolMessage`
Map the message onto the corresponding [Event Handler](#event-handlers) based on the peer-set this message was received
on and dispatch via overseer.
### Network Event: `ViewUpdate`
- Check that the new view is valid and note it as the most recent view update of the peer on this peer-set.
- Map a `NetworkBridgeEvent::PeerViewChange` onto the corresponding [Event Handler](#event-handlers) based on the
peer-set this message was received on and dispatch via overseer.
### `ReportPeer`
- Adjust peer reputation according to cost or benefit provided
### `DisconnectPeer`
- Disconnect the peer from the peer-set requested, if connected.
### `SendValidationMessage` / `SendValidationMessages`
- Issue a corresponding `ProtocolMessage` to each listed peer on the validation peer-set.
### `SendCollationMessage` / `SendCollationMessages`
- Issue a corresponding `ProtocolMessage` to each listed peer on the collation peer-set.
### `ConnectToValidators`
- Determine the DHT keys to use for each validator based on the relay-chain state and Runtime API.
- Recover the Peer IDs of the validators from the DHT. There may be more than one peer ID per validator.
- Send all `(ValidatorId, PeerId)` pairs on the response channel.
- Feed all Peer IDs to peer set manager the underlying network provides.
### `NewGossipTopology`
- Map all `AuthorityDiscoveryId`s to `PeerId`s and issue a corresponding `NetworkBridgeUpdate` to all validation
subsystems.
## Event Handlers
Network bridge event handlers are the intended recipients of particular network protocol messages. These are each a
variant of a message to be sent via the overseer.
### Validation V1
- `ApprovalDistributionV1Message -> ApprovalDistributionMessage::NetworkBridgeUpdate`
- `BitfieldDistributionV1Message -> BitfieldDistributionMessage::NetworkBridgeUpdate`
- `StatementDistributionV1Message -> StatementDistributionMessage::NetworkBridgeUpdate`
### Collation V1
- `CollatorProtocolV1Message -> CollatorProtocolMessage::NetworkBridgeUpdate`
[NBM]: ../../types/overseer-protocol.md#network-bridge-message
[AppD]: ../../types/overseer-protocol.md#approval-distribution-message
[BitD]: ../../types/overseer-protocol.md#bitfield-distribution-message
[StmtD]: ../../types/overseer-protocol.md#statement-distribution-message
[CollP]: ../../types/overseer-protocol.md#collator-protocol-message
[VP1]: ../../types/network.md#validation-v1
[CP1]: ../../types/network.md#collation-v1
@@ -0,0 +1,9 @@
# Peer Set Manager
> TODO
## Protocol
## Functionality
## Jobs, if any
@@ -0,0 +1,271 @@
# Provisioner
> NOTE: This module has suffered changes for the elastic scaling implementation. As a result, parts of this document may
be out of date and will be updated at a later time. Issue tracking the update:
https://github.com/pezkuwichain/pezkuwi-sdk/issues/132
Relay chain block authorship authority is governed by BABE and is beyond the scope of the Overseer and the rest of the
subsystems. That said, ultimately the block author needs to select a set of backable teyrchain candidates and other
consensus data, and assemble a block from them. This subsystem is responsible for providing the necessary data to all
potential block authors.
## Provisionable Data
There are several distinct types of provisionable data, but they share this property in common: all should eventually be
included in a relay chain block.
### Backed Candidates
The block author can choose 0 or 1 backed teyrchain candidates per teyrchain; the only constraint is that each backable
candidate has the appropriate relay parent. However, the choice of a backed candidate must be the block author's. The
provisioner subsystem is how those block authors make this choice in practice.
### Signed Bitfields
[Signed bitfields](../../types/availability.md#signed-availability-bitfield) are attestations from a particular
validator about which candidates it believes are available. Those will only be provided on fresh leaves.
### Misbehavior Reports
Misbehavior reports are self-contained proofs of misbehavior by a validator or group of validators. For example, it is
very easy to verify a double-voting misbehavior report: the report contains two votes signed by the same key, advocating
different outcomes. Concretely, misbehavior reports become inherents which cause dots to be slashed.
Note that there is no mechanism in place which forces a block author to include a misbehavior report which it doesn't
like, for example if it would be slashed by such a report. The chain's defense against this is to have a relatively long
slash period, such that it's likely to encounter an honest author before the slash period expires.
### Dispute Inherent
The dispute inherent is similar to a misbehavior report in that it is an attestation of misbehavior on the part of a
validator or group of validators. Unlike a misbehavior report, it is not self-contained: resolution requires coordinated
action by several validators. The canonical example of a dispute inherent involves an approval checker discovering that
a set of validators has improperly approved an invalid teyrchain block: resolving this requires the entire validator set
to re-validate the block, so that the minority can be slashed.
Dispute resolution is complex and is explained in substantially more detail [here](../../runtime/disputes.md).
## Protocol
The subsystem should maintain a set of handles to Block Authorship Provisioning iterations that are currently live.
### On Overseer Signal
- `ActiveLeavesUpdate`:
- For each `activated` head:
- spawn a Block Authorship Provisioning iteration with the given relay parent, storing a bidirectional channel with
that iteration.
- For each `deactivated` head:
- terminate the Block Authorship Provisioning iteration for the given relay parent, if any.
- `Conclude`: Forward `Conclude` to all iterations, waiting a small amount of time for them to join, and then
hard-exiting.
### On `ProvisionerMessage`
Forward the message to the appropriate Block Authorship Provisioning iteration, or discard if no appropriate iteration
is currently active.
### Per Provisioning Iteration
Input: [`ProvisionerMessage`](../../types/overseer-protocol.md#provisioner-message). Backed candidates come from the
[Candidate Backing subsystem](../backing/candidate-backing.md), signed bitfields come from the [Bitfield Distribution
subsystem](../availability/bitfield-distribution.md), and disputes come from the [Disputes
Subsystem](../disputes/dispute-coordinator.md). Misbehavior reports are currently sent from the [Candidate Backing
subsystem](../backing/candidate-backing.md) and contain the following misbehaviors:
1. `Misbehavior::ValidityDoubleVote`
2. `Misbehavior::UnauthorizedStatement`
3. `Misbehavior::DoubleSign`
But we choose not to punish these forms of misbehavior for the time being. Risks from misbehavior are sufficiently
mitigated at the protocol level via reputation changes. Punitive actions here may become desirable enough to dedicate
time to in the future.
At initialization, this subsystem has no outputs.
Block authors request the inherent data they should use for constructing the inherent in the block which contains
teyrchain execution information.
## Block Production
When a validator is selected by BABE to author a block, it becomes a block producer. The provisioner is the subsystem
best suited to choosing which specific backed candidates and availability bitfields should be assembled into the block.
To engage this functionality, a `ProvisionerMessage::RequestInherentData` is sent; the response is a
[`ParaInherentData`](../../types/runtime.md#parainherentdata). Each relay chain block backs at most one backable
teyrchain block candidate per teyrchain. Additionally no further block candidate can be backed until the previous one
either gets declared available or expired. If bitfields indicate that candidate A, predecessor of B, should be declared
available, then B can be backed in the same relay block. Appropriate bitfields, as outlined in the section on [bitfield
selection](#bitfield-selection), and any dispute statements should be attached as well.
### Bitfield Selection
Our goal with respect to bitfields is simple: maximize availability. However, it's not quite as simple as always
including all bitfields; there are constraints which still need to be met:
- not more than one bitfield per validator
- each 1 bit must correspond to an occupied core
Beyond that, a semi-arbitrary selection policy is fine. In order to meet the goal of maximizing availability, a
heuristic of picking the bitfield with the greatest number of 1 bits set in the event of conflict is useful.
### Dispute Statement Selection
This is the point at which the block author provides further votes to active disputes or initiates new disputes in the
runtime state.
The block-authoring logic of the runtime has an extra step between handling the inherent-data and producing the actual
inherent call, which we assume performs the work of filtering out disputes which are not relevant to the on-chain state.
Backing votes are always kept in the dispute statement set. This ensures we punish the maximum number of misbehaving
backers.
To select disputes:
- Issue a `DisputeCoordinatorMessage::RecentDisputes` message and wait for the response. This is a set of all disputes
in recent sessions which we are aware of.
### Determining Bitfield Availability
An occupied core has a `CoreAvailability` bitfield. We also have a list of `SignedAvailabilityBitfield`s. We need to
determine from these whether or not a core at a particular index has become available.
The key insight required is that `CoreAvailability` is transverse to the `SignedAvailabilityBitfield`s: if we
conceptualize the list of bitfields as many rows, each bit of which is its own column, then `CoreAvailability` for a
given core index is the vertical slice of bits in the set at that index.
To compute bitfield availability, then:
- Start with a copy of `OccupiedCore.availability`
- For each bitfield in the list of `SignedAvailabilityBitfield`s:
- Get the bitfield's `validator_index`
- Update the availability. Conceptually, assuming bit vectors: `availability[validator_index] |= bitfield[core_idx]`
- Availability has a 2/3 threshold. Therefore: `3 * availability.count_ones() >= 2 * availability.len()`
### Candidate Selection: Prospective Teyrchains Mode
The state of the provisioner `PerRelayParent` tracks an important setting, `ProspectiveTeyrchainsMode`. This setting
determines which backable candidate selection method the provisioner uses.
`ProspectiveTeyrchainsMode::Disabled` - The provisioner uses its own internal legacy candidate selection.
`ProspectiveTeyrchainsMode::Enabled` - The provisioner requests that [prospective
teyrchains](../backing/prospective-teyrchains.md) provide selected candidates.
Candidates selected with `ProspectiveTeyrchainsMode::Enabled` are able to benefit from the increased block production
time asynchronous backing allows. For this reason all Pezkuwi protocol networks will eventually use prospective
teyrchains candidate selection. Then legacy candidate selection will be removed as obsolete.
### Prospective Teyrchains Candidate Selection
The goal of candidate selection is to determine which cores are free, and then to the degree possible, pick a candidate
appropriate to each free core. In prospective teyrchains candidate selection the provisioner handles the former process
while [prospective teyrchains](../backing/prospective-teyrchains.md) handles the latter.
To select backable candidates:
- Get the list of core states from the runtime API
- For each core state:
- On `CoreState::Free`
- The core is unscheduled and doesnt need to be provisioned with a candidate
- On `CoreState::Scheduled`
- The core is unoccupied and scheduled to accept a backed block for a particular `para_id`.
- The provisioner requests a backable candidate from [prospective teyrchains](../backing/prospective-teyrchains.md)
with the desired relay parent, the cores scheduled `para_id`, and an empty required path.
- On `CoreState::Occupied`
- The availability core is occupied by a teyrchain block candidate pending availability. A further candidate need
not be provided by the provisioner unless the core will be vacated this block. This is the case when either
bitfields indicate the current core occupant has been made available or a timeout is reached.
- If `bitfields_indicate_availability`
- If `Some(scheduled_core) = occupied_core.next_up_on_available`, the core will be vacated and in need of a
provisioned candidate. The provisioner requests a backable candidate from [prospective
teyrchains](../backing/prospective-teyrchains.md) with the cores scheduled `para_id` and a required path with
one entry. This entry corresponds to the parablock candidate previously occupying this core, which was made
available and can be built upon even though it hasnt been seen as included in a relay chain block yet. See the
Required Path section below for more detail.
- If `occupied_core.next_up_on_available` is `None`, then the core being vacated is unscheduled and doesnt need
to be provisioned with a candidate.
- Else-if `occupied_core.time_out_at == block_number`
- If `Some(scheduled_core) = occupied_core.next_up_on_timeout`, the core will be vacated and in need of a
provisioned candidate. A candidate is requested in exactly the same way as with `CoreState::Scheduled`.
- Else the core being vacated is unscheduled and doesnt need to be provisioned with a candidate The end result of
this process is a vector of `CandidateHash`s, sorted in order of their core index.
#### Required Path
Required path is a parameter for `ProspectiveTeyrchainsMessage::GetBackableCandidates`, which the provisioner sends in
candidate selection.
An empty required path indicates that the requested candidate chain should start with the most recently included
parablock for the given `para_id` as of the given relay parent.
In contrast, a required path with one or more entries prompts [prospective
teyrchains](../backing/prospective-teyrchains.md) to step forward through its fragment tree for the given `para_id` and
relay parent until the desired parablock is reached. We then select the chain starting with the direct child of that
parablock to pass to the provisioner.
The parablocks making up a required path do not need to have been previously seen as included in relay chain blocks.
Thus the ability to provision backable candidates based on a required path effectively decouples backing from inclusion.
### Legacy Candidate Selection
Legacy candidate selection takes place in the provisioner. Thus the provisioner needs to keep an up to date record of
all [backed_candidates](../../types/backing.md#backed-candidate) `PerRelayParent` to pick from.
The goal of candidate selection is to determine which cores are free, and then to the degree possible, pick a candidate
appropriate to each free core.
To determine availability:
- Get the list of core states from the runtime API
- For each core state:
- On `CoreState::Scheduled`, then we can make an `OccupiedCoreAssumption::Free`.
- On `CoreState::Occupied`, then we may be able to make an assumption:
- If the bitfields indicate availability and there is a scheduled `next_up_on_available`, then we can make an
`OccupiedCoreAssumption::Included`.
- If the bitfields do not indicate availability, and there is a scheduled `next_up_on_time_out`, and
`occupied_core.time_out_at == block_number_under_production`, then we can make an
`OccupiedCoreAssumption::TimedOut`.
- If we did not make an `OccupiedCoreAssumption`, then continue on to the next core.
- Now compute the core's `validation_data_hash`: get the `PersistedValidationData` from the runtime, given the known
`ParaId` and `OccupiedCoreAssumption`;
- Find an appropriate candidate for the core.
- There are two constraints: `backed_candidate.candidate.descriptor.para_id == scheduled_core.para_id &&
candidate.candidate.descriptor.validation_data_hash == computed_validation_data_hash`.
- In the event that more than one candidate meets the constraints, selection between the candidates is arbitrary.
However, not more than one candidate can be selected per core.
The end result of this process is a vector of `CandidateHash`s, sorted in order of their core index.
### Retrieving Full `BackedCandidate`s for Selected Hashes
Legacy candidate selection and prospective teyrchains candidate selection both leave us with a vector of
`CandidateHash`s. These are passed to the backing subsystem with `CandidateBackingMessage::GetBackedCandidates`.
The response is a vector of `BackedCandidate`s, sorted in order of their core index and ready to be provisioned to block
authoring. The candidate selection and retrieval process should select at maximum one candidate which upgrades the
runtime validation code.
## Glossary
- **Relay-parent:**
- A particular relay-chain block which serves as an anchor and reference point for processes and data which depend on
relay-chain state.
- **Active Leaf:**
- A relay chain block which is the head of an active fork of the relay chain.
- Block authorship provisioning jobs are spawned per active leaf and concluded for any leaves which become inactive.
- **Candidate Selection:**
- The process by which the provisioner selects backable teyrchain block candidates to pass to block authoring.
- Two versions, prospective teyrchains candidate selection and legacy candidate selection. See their respective
protocol sections for details.
- **Availability Core:**
- Often referred to simply as "cores", availability cores are an abstraction used for resource management. For the
provisioner, availability cores are most relevant in that core states determine which `para_id`s to provision
backable candidates for.
- For more on availability cores see [Scheduler Module: Availability
Cores](../../runtime/scheduler.md#availability-cores)
- **Availability Bitfield:**
- Often referred to simply as a "bitfield", an availability bitfield represents the view of parablock candidate
availability from a particular validator's perspective. Each bit in the bitfield corresponds to a single
[availability core](../../runtime-api/availability-cores.md).
- For more on availability bitfields see [availability](../../types/availability.md)
- **Backable vs. Backed:**
- Note that we sometimes use "backed" to refer to candidates that are "backable", but not yet backed on chain.
- Backable means that a quorum of the candidate's assigned backing group have provided signed affirming statements.
@@ -0,0 +1,265 @@
# PVF Host and Workers
The PVF host is responsible for handling requests to prepare and execute PVF
code blobs, which it sends to PVF **workers** running in their own child
processes. These workers are spawned from the `pezkuwi-prepare-worker` and
`pezkuwi-execute-worker` binaries.
While the workers are generally long-living, they also spawn one-off secure
**job processes** that perform the jobs. See "Job Processes" section below.
## High-Level Flow
```dot process
digraph {
rankdir="LR";
can [label = "Candidate\nValidation\nSubsystem"; shape = square]
pvf [label = "PVF Host"; shape = square]
pq [label = "Prepare\nQueue"; shape = square]
eq [label = "Execute\nQueue"; shape = square]
pp [label = "Prepare\nPool"; shape = square]
subgraph "cluster partial_sandbox_prep" {
label = "pezkuwi-prepare-worker\n(Partial Sandbox)\n\n\n";
labelloc = "t";
pw [label = "Prepare\nWorker"; shape = square]
subgraph "cluster full_sandbox_prep" {
label = "Fully Isolated Sandbox\n\n\n";
labelloc = "t";
pj [label = "Prepare\nJob"; shape = square]
}
}
subgraph "cluster partial_sandbox_exec" {
label = "pezkuwi-execute-worker\n(Partial Sandbox)\n\n\n";
labelloc = "t";
ew [label = "Execute\nWorker"; shape = square]
subgraph "cluster full_sandbox_exec" {
label = "Fully Isolated Sandbox\n\n\n";
labelloc = "t";
ej [label = "Execute\nJob"; shape = square]
}
}
can -> pvf [label = "Precheck"; style = dashed]
can -> pvf [label = "Validate"]
pvf -> pq [label = "Prepare"; style = dashed]
pvf -> eq [label = "Execute";]
pvf -> pvf [label = "see (2) and (3)"; style = dashed]
pq -> pp [style = dashed]
pp -> pw [style = dashed]
eq -> ew
pw -> pj [style = dashed]
ew -> ej
}
```
Some notes about the graph:
1. Once a job has finished, the response will flow back up the way it came.
2. In the case of execution, the host will send a request for preparation to the
Prepare Queue if needed. In that case, only after the preparation succeeds
does the Execute Queue continue with validation.
3. Multiple requests for preparing the same artifact are coalesced, so that the
work is only done once.
## Goals
This system has two high-level goals that we will touch on here: *determinism*
and *security*.
## Determinism
One high-level goal is to make PVF operations as deterministic as possible, to
reduce the rate of disputes. Disputes can happen due to e.g. a job timing out on
one machine, but not another. While we do not have full determinism, there are
some dispute reduction mechanisms in place right now.
### Retrying execution requests
If the execution request fails during **preparation**, we will retry if it is
possible that the preparation error was transient (e.g. if the error was a panic
or time out). We will only retry preparation if another request comes in after
15 minutes, to ensure any potential transient conditions had time to be
resolved. We will retry up to 5 times.
If the actual **execution** of the artifact fails, we will retry once if it was
a possibly transient error, to allow the conditions that led to the error to
hopefully resolve. We use a more brief delay here (1 second as opposed to 15
minutes for preparation (see above)), because a successful execution must happen
in a short amount of time.
If the execution fails during the backing phase, we won't retry to reduce the chance of
supporting nondeterministic candidates. This reduces the chance of nondeterministic blocks
getting backed and honest backers getting slashed.
We currently know of the following specific cases that will lead to a retried
execution request:
1. **OOM:** We have memory limits to try to prevent attackers from exhausting
host memory. If the memory limit is hit, we kill the job process and retry
the job. Alternatively, the host might have been temporarily low on memory
due to other processes running on the same machine. **NOTE:** This case will
lead to voting against the candidate (and possibly a dispute) if the retry is
still not successful.
2. **Syscall violations:** If the job attempts a system call that is blocked by
the sandbox's security policy, the job process is immediately killed and we
retry. **NOTE:** In the future, if we have a proper way to detect that the
job died due to a security violation, it might make sense not to retry in
this case.
3. **Artifact missing:** The prepared artifact might have been deleted due to
operator error or some bug in the system.
4. **Job errors:** For example, the job process panicked for some indeterminate
reason, which may or may not be independent of the candidate or PVF.
5. **Internal errors:** See "Internal Errors" section. In this case, after the
retry we abstain from voting.
6. **RuntimeConstruction** error. The precheck handles a general case of a wrong
artifact but doesn't guarantee its consistency between the preparation and
the execution. If something happened with the artifact between
the preparation of the artifact and its execution (e.g. the artifact was
corrupted on disk or a dirty node upgrade happened when the prepare worker
has a wasmtime version different from the execute worker's wasmtime version).
We treat such an error as possibly transient due to local issues and retry
one time.
### Preparation timeouts
We use timeouts for both preparation and execution jobs to limit the amount of
time they can take. As the time for a job can vary depending on the machine and
load on the machine, this can potentially lead to disputes where some validators
successfully execute a PVF and others don't.
One dispute mitigation we have in place is a more lenient timeout for
preparation during execution than during pre-checking. The rationale is that the
PVF has already passed pre-checking, so we know it should be valid, and we allow
it to take longer than expected, as this is likely due to an issue with the
machine and not the PVF.
### CPU clock timeouts
Another timeout-related mitigation we employ is to measure the time taken by
jobs using CPU time, rather than wall clock time. This is because the CPU time
of a process is less variable under different system conditions. When the
overall system is under heavy load, the wall clock time of a job is affected
more than the CPU time.
### Internal errors
An internal, or local, error is one that we treat as independent of the PVF
and/or candidate, i.e. local to the running machine. If this happens, then we
will first retry the job and if the errors persists, then we simply do not vote.
This prevents slashes, since otherwise our vote may not agree with that of the
other validators.
In general, for errors not raising a dispute we have to be very careful. This is
only sound, if either:
1. We ruled out that error in pre-checking. If something is not checked in
pre-checking, even if independent of the candidate and PVF, we must raise a
dispute.
2. We are 100% confident that it is a hardware/local issue: Like corrupted file,
etc.
Reasoning: Otherwise it would be possible to register a PVF where candidates can
not be checked, but we don't get a dispute - so nobody gets punished. Second, we
end up with a finality stall that is not going to resolve!
Note that any error from the job process we cannot treat as internal. The job
runs untrusted code and an attacker can therefore return arbitrary errors. If
they were to return errors that we treat as internal, they could make us abstain
from voting. Since we are unsure if such errors are legitimate, we will first
retry the candidate, and if the issue persists we are forced to vote invalid.
## Security
With [on-demand teyrchains](https://github.com/orgs/paritytech/projects/67), it
is much easier to submit PVFs to the chain for preparation and execution. This
makes it easier for erroneous disputes and slashing to occur, whether
intentional (as a result of a malicious attacker) or not (a bug or operator
error occurred).
Therefore, another goal of ours is to harden our security around PVFs, in order
to protect the economic interests of validators and increase overall confidence
in the system.
### Possible attacks / threat model
Webassembly is already sandboxed, but there have already been reported multiple
CVEs enabling remote code execution. See e.g. these two advisories from
[Mar 2023](https://github.com/bytecodealliance/wasmtime/security/advisories/GHSA-ff4p-7xrq-q5r8)
and [Jul 2022](https://github.com/bytecodealliance/wasmtime/security/advisories/GHSA-7f6x-jwh5-m9r4).
So what are we actually worried about? Things that come to mind:
1. **Consensus faults** - If an attacker can get some source of randomness they
could vote against with 50% chance and cause unresolvable disputes.
2. **Targeted slashes** - An attacker can target certain validators (e.g. some
validators running on vulnerable hardware) and make them vote invalid and get
them slashed.
3. **Mass slashes** - With some source of randomness they can do an untargeted
attack. I.e. a baddie can do significant economic damage by voting against
with 1/3 chance, without even stealing keys or completely replacing the
binary.
4. **Stealing keys** - That would be pretty bad. Should not be possible with
sandboxing. We should at least not allow filesystem-access or network access.
5. **Taking control over the validator.** E.g. replacing the `pezkuwi` binary
with a `pezkuwi-evil` binary. Should again not be possible with the above
sandboxing in place.
6. **Intercepting and manipulating packages** - Effect very similar to the
above, hard to do without also being able to do 4 or 5.
We do not protect against (1), (2), and (3), because there are too many sources
of randomness for an attacker to exploit.
We provide very good protection against (4), (5), and (6).
### Job Processes
As mentioned above, our architecture includes long-living **worker processes**
and one-off **job processes**. This separation is important so that the handling
of untrusted code can be limited to the job processes. A hijacked job process
can therefore not interfere with other jobs running in separate processes.
Furthermore, if an unexpected execution error occurred in the execution worker
and not the job itself, we generally can be confident that it has nothing to do
with the candidate, so we can abstain from voting. On the other hand, a hijacked
job is able to send back erroneous responses for candidates, so we know that we
should not abstain from voting on such errors from jobs. Otherwise, an attacker
could trigger a finality stall. (See "Internal Errors" section above.)
### Restricting file-system access
A basic security mechanism is to make sure that any process directly interfacing
with untrusted code does not have unnecessary access to the file-system. This
provides some protection against attackers accessing sensitive data or modifying
data on the host machine.
*Currently this is only supported on Linux.*
### Restricting networking
We also disable networking on PVF threads by disabling certain syscalls, such as
the creation of sockets. This prevents attackers from either downloading
payloads or communicating sensitive data from the validator's machine to the
outside world.
*Currently this is only supported on Linux.*
### Clearing env vars
We clear environment variables before handling untrusted code, because why give
attackers potentially sensitive data unnecessarily? And even if everything else
is locked down, env vars can potentially provide a source of randomness (see
point 1, "Consensus faults" above).
@@ -0,0 +1,73 @@
# PVF Pre-checker
The PVF pre-checker is a subsystem that is responsible for watching the relay chain for new PVFs that require
pre-checking. Head over to [overview] for the PVF pre-checking process overview.
## Protocol
There is no dedicated input mechanism for PVF pre-checker. Instead, PVF pre-checker looks on the `ActiveLeavesUpdate`
event stream for work.
This subsystem does not produce any output messages either. The subsystem will, however, send messages to the
[Runtime API] subsystem to query for the pending PVFs and to submit votes. In addition to that, it will also
communicate with [Candidate Validation] Subsystem to request PVF pre-check.
## Functionality
If the node is running in a collator mode, this subsystem will be disabled. The PVF pre-checker subsystem keeps track of
the PVFs that are relevant for the subsystem.
To be relevant for the subsystem, a PVF must be returned by the [`pvfs_require_precheck` runtime API][PVF pre-checking
runtime API] in any of the active leaves. If the PVF is not present in any of the active leaves, it ceases to be
relevant.
When a PVF just becomes relevant, the subsystem will send a message to the [Candidate Validation] subsystem asking for
the pre-check.
Upon receiving a message from the candidate-validation subsystem, the pre-checker will note down that the PVF has its
judgement and will also sign and submit a [`PvfCheckStatement`][PvfCheckStatement] via the [`submit_pvf_check_statement`
runtime API][PVF pre-checking runtime API]. In case, a judgement was received for a PVF that is no longer in view it is
ignored.
Since a vote only is valid during [one session][overview], the subsystem will have to resign and submit the statements
for the new session. The new session is assumed to be started if at least one of the leaves has a greater session index
that was previously observed in any of the leaves.
The subsystem tracks all the statements that it submitted within a session. If for some reason a PVF became irrelevant
and then becomes relevant again, the subsystem will not submit a new statement for that PVF within the same session.
If the node is not in the active validator set, it will still perform all the checks. However, it will only submit the
check statements when the node is in the active validator set.
### Rejecting failed PVFs
It is possible that the candidate validation was not able to check the PVF, e.g. if it timed out. In that case, the PVF
pre-checker will vote against it. This is considered safe, as there is no slashing for being on the wrong side of a
pre-check vote.
Rejecting instead of abstaining is better in several ways:
1. Conclusion is reached faster - we have actual votes, instead of relying on a timeout.
1. Being strict in pre-checking makes it safer to be more lenient in preparation errors afterwards. Hence we have more
leeway in avoiding raising dubious disputes, without making things less secure.
Also, if we only abstain, an attacker can specially craft a PVF wasm blob so that it will fail on e.g. 50% of the
validators. In that case a supermajority will never be reached and the vote will repeat multiple times, most likely with
the same result (since all votes are cleared on a session change). This is avoided by rejecting failed PVFs, and by only
requiring 1/3 of validators to reject a PVF to reach a decision.
### Note on Disputes
Having a pre-checking phase allows us to make certain assumptions later when preparing the PVF for execution. If a
runtime passed pre-checking, then we know that the runtime should be valid, and therefore any issue during preparation
for execution can be assumed to be a local problem on the current node.
For this reason, even deterministic preparation errors should not trigger disputes. And since we do not dispute as a
result of the pre-checking phase, as stated above, it should be impossible for preparation in general to result in
disputes.
[overview]: ../../pvf-prechecking.md
[Runtime API]: runtime-api.md
[PVF pre-checking runtime API]: ../../runtime-api/pvf-prechecking.md
[Candidate Validation]: candidate-validation.md
[PvfCheckStatement]: ../../types/pvf-prechecking.md#pvfcheckstatement
@@ -0,0 +1,21 @@
# Runtime API
The Runtime API subsystem is responsible for providing a single point of access to runtime state data via a set of
pre-determined queries. This prevents shared ownership of a blockchain client resource by providing
## Protocol
Input: [`RuntimeApiMessage`](../../types/overseer-protocol.md#runtime-api-message)
Output: None
## Functionality
On receipt of `RuntimeApiMessage::Request(relay_parent, request)`, answer the request using the post-state of the
`relay_parent` provided and provide the response to the side-channel embedded within the request.
## Jobs
> TODO Don't limit requests based on parent hash, but limit caching. No caching should be done for any requests on
> `relay_parent`s that are not active based on `ActiveLeavesUpdate` messages. Maybe with some leeway for things that
> have just been stopped.
@@ -0,0 +1,405 @@
# Approval Process
The Approval Process is the mechanism by which the relay-chain ensures that only valid parablocks are finalized and that
backing validators are held accountable for managing to get bad blocks included into the relay chain.
Having a teyrchain include a bad block into a fork of the relay-chain is not catastrophic as long as the block isn't
finalized by the relay-chain's finality gadget, GRANDPA. If the block isn't finalized, that means that the fork of the
relay-chain can be reverted in favor of another by means of a dynamic fork-choice rule which leads honest validators to
ignore any forks containing that parablock.
Dealing with a bad parablock proceeds in these stages:
1. Detection
2. Escalation
3. Consequences
First, the bad block must be detected by an honest party. Second, the honest party must escalate the bad block to be
checked by all validators. And last, the correct consequences of a bad block must occur. The first consequence, as
mentioned above, is to revert the chain so what full nodes perceive to be best no longer contains the bad parablock. The
second consequence is to slash all malicious validators. Note that, if the chain containing the bad block is reverted,
that the result of the dispute needs to be transplanted or at least transplantable to all other forks of the chain so
that malicious validators are slashed in all possible histories. Phrased alternatively, there needs to be no possible
relay-chain in which malicious validators get away cost-free.
Accepting a parablock is the end result of having passed through the detection stage without dispute, or having passed
through the escalation/dispute stage with a positive outcome. For this to work, we need the detection procedure to have
the properties that enough honest validators are always selected to check the parablock and that they cannot be
interfered with by an adversary. This needs to be balanced with the scaling concern of teyrchains in general: the
easiest way to get the first property is to have everyone check everything, but that is clearly too heavy. So we also
have a desired constraint on the other property that we have as few validators as possible check any particular
parablock. Our assignment function is the method by which we select validators to do approval checks on parablocks.
It often makes more sense to think of relay-chain blocks as having been approved or not as opposed to thinking about
whether parablocks have been approved. A relay-chain block containing a single bad parablock needs to be reverted, and a
relay-chain block that contains only approved parablocks can be called approved, as long as its parent relay-chain block
is also approved. It is important that the validity of any particular relay-chain block depend on the validity of its
ancestry, so we do not finalize a block which has a bad block in its ancestry.
```dot process Approval Process
digraph {
Included -> Assignments -> Approval -> Finality
Assignments -> Escalation -> Consequences
}
```
Approval has roughly two parts:
- **Assignments** determines which validators performs approval checks on which candidates. It ensures that each
candidate receives enough random checkers, while reducing adversaries' odds for obtaining enough checkers, and
limiting adversaries' foreknowledge. It tracks approval votes to identify when "no show" approval check takes
suspiciously long, perhaps indicating the node being under attack, and assigns more checks in this case. It tracks
relay chain equivocations to determine when adversaries possibly gained foreknowledge about assignments, and adds
additional checks in this case.
- **Approval checks** listens to the assignments subsystem for outgoing assignment notices that we shall check specific
candidates. It then performs these checks by first invoking the reconstruction subsystem to obtain the candidate,
second invoking the candidate validity utility subsystem upon the candidate, and finally sending out an approval vote,
or perhaps initiating a dispute.
These both run first as off-chain consensus protocols using messages gossiped among all validators, and second as an
on-chain record of this off-chain protocols' progress after the fact. We need the on-chain protocol to provide rewards
for the off-chain protocol.
Approval requires two gossiped message types, assignment notices created by its assignments subsystem, and approval
votes sent by our approval checks subsystem when authorized by the candidate validity utility subsystem.
## Approval keys
We need two separate keys for the approval subsystem:
- **Approval assignment keys** are sr25519/schnorrkel keys used only for the assignment criteria VRFs. We implicitly
sign assignment notices with approval assignment keys by including their relay chain context and additional data in
the VRF's extra message, but exclude these from its VRF input.
- **Approval vote keys** would only sign off on candidate parablock validity and has no natural key type restrictions.
There's no need for this to actually embody a new session key type. We just want to make a distinction between
assignments and approvals, although distant future node configurations might favor separate roles. We re-use the same
keys as are used for teyrchain backing in practice.
Approval vote keys could relatively easily be handled by some hardened signer tooling, perhaps even HSMs assuming we
select ed25519 for approval vote keys. Approval assignment keys might or might not support hardened signer tooling, but
doing so sounds far more complex. In fact, assignment keys determine only VRF outputs that determine approval checker
assignments, for which they can only act or not act, so they cannot equivocate, lie, etc. and represent little if any
slashing risk for validator operators.
In future, we shall determine which among the several hardening techniques best benefits the network as a whole. We
could provide a multi-process multi-machine architecture for validators, perhaps even reminiscent of GNUNet, or perhaps
more resembling smart HSM tooling. We might instead design a system that more resembled full systems, like Cosmos'
sentry nodes. In either case, approval assignments might be handled by a slightly hardened machine, but not necessarily
nearly as hardened as approval votes, but approval votes machines must similarly run foreign WASM code, which increases
their risk, so assignments being separate sounds helpful.
## Assignments
Approval assignment determines on which candidate teyrchain blocks each validator performs approval checks. An approval
session considers only one relay chain block and assigns only those candidates that relay chain block declares
available.
Assignment balances several concerns:
- limits adversaries' foreknowledge about assignments,
- ensures enough checkers, and
- distributes assignments relatively equitably.
Assignees determine their own assignments to check specific candidates using two or three assignment criteria.
Assignees never reveal their assignments until relevant, and gossip delays assignments sent early, which limits others'
foreknowledge. Assignees learn their assignment only with the relay chain block.
All criteria require the validator evaluate a verifiable random function (VRF) using their VRF secret key. All criteria
input specific data called "stories" about the session's relay chain block, and output candidates to check and a
precedence called a `DelayTranche`.
We liberate availability cores when their candidate becomes available of course, but one approval assignment criteria
continues associating each candidate with the core number it occupied when it became available.
Assignment operates in loosely timed rounds determined by this `DelayTranche`s, which proceed roughly 12 times faster
than six second block production assuming half second gossip times. If a candidate `C` needs more approval checkers by
the time we reach round `t` then any validators with an assignment to `C` in delay tranche `t` gossip their send
assignment notice for `C`. We continue until all candidates have enough approval checkers assigned. We take entire
tranches together if we do not yet have enough, so we expect strictly more than enough checkers. We also take later
tranches if some checkers return their approval votes too slow (see no shows below).
Assignment ensures validators check those relay chain blocks for which they have delay tranche zero aka the highest
precedence, so that adversaries always face honest checkers equal to the expected number of assignments with delay
tranche zero.
Among these criteria, the BABE VRF output provides the story for two, which reduces how frequently adversaries could
position their own checkers. We have one criterion whose story consists of the candidate's block hash plus external
knowledge that a relay chain equivocation exists with a conflicting candidate. It provides unforeseeable assignments
when adversaries gain foreknowledge about the other two by committing an equivocation in relay chain block production.
## Announcements / Notices
We gossip assignment notices among nodes so that all validators know which validators should check each candidate, and
if any candidate requires more checkers.
Assignment notices consist of a relay chain context given by a block hash, an assignment criteria, consisting of the
criteria identifier and optionally a criteria specific field, an assignee identifier, and a VRF signature by the
assignee, which itself consists of a VRF pre-output and a DLEQ proof. Its VRF input consists of the criteria, usually
including a criteria specific field, and a "story" about its relay chain context block.
We never include stories inside the gossip messages containing assignment notices, but require each validator
reconstruct them. We never care about assignments in the disputes process, so this does not complicate remote disputes.
In a Schnorr VRF, there is an extra signed message distinct from this input, which we set to the relay chain block hash.
As a result, assignment notices are self signing and can be "politely" gossiped without additional signatures, meaning
between nodes who can compute the story from the relay chain context. In other words, if we cannot compute the story
required by an assignment notice's VRF part then our self signing property fails and we cannot verify its origin. We
could fix this with either another signature layer (64 bytes) or by including the VRF input point computed from the
story (32 bytes), but doing so appears unhelpful.
Any validator could send their assignment notices and/or approval votes too early. We gossip the approval votes early
because they represent a major commitment by the validator. We delay gossiping the assignment notices until they agree
with our local clock however. We also impose a politeness condition that the recipient knows the relay chain context
used by the assignment notice.
## Stories
We based assignment criteria upon two possible "stories" about the relay chain block `R` that included the candidate aka
declared the candidate available. All stories have an output that attempts to minimize adversarial influence, which
then acts as the VRF input for an assignment criteria.
We first have a `RelayVRFStory` that outputs the randomness from another VRF output produced by the relay chain block
producer when creating `R`. Among honest nodes, only this one relay chain block producer who creates `R` knew the story
in advance, and even they knew nothing two epochs previously.
In BABE, we create this value calling `schnorrkel::vrf::VRFInOut::make_bytes` with a context "A&V RC-VRF", with the
`VRFInOut` coming from either the VRF that authorized block production for primary blocks, or else from the secondary
block VRF for the secondary block type.
In Sassafras, we shall always use the non-anonymized recycling VRF output, never the anonymized ring VRF that authorizes
block production. We do not currently know if Sassafras shall have a separate schnorrkel key, but if it reuses its ring
VRF key there is an equivalent `ring_vrf::VRFInOut::make_bytes`.
We like that `RelayVRFStory` admits relatively few choices, but an adversary who equivocates in relay chain block
production could learn assignments that depend upon the `RelayVRFStory` too early because the same relay chain VRF
appears in multiple blocks.
We therefore provide a secondary `RelayEquivocationStory` that outputs the candidate's block hash, but only for
candidate equivocations. We say a candidate `C` in `R` is an equivocation when there exists another relay chain block
`R1` that equivocates for `R` in the sense that `R` and `R1` have the same `RelayVRFStory`, but `R` contains `C` and
`R1` does not contain `C`.
We want checkers for candidate equivocations that lie outside our preferred relay chain as well, which represents a
slightly different usage for the assignments module, and might require more information in the gossip messages.
## Assignment criteria
Assignment criteria compute actual assignments using stories and the validators' secret approval assignment key.
Assignment criteria output a `Position` consisting of both a `ParaId` to be checked, as well as a precedence
`DelayTranche` for when the assignment becomes valid.
Assignment criteria come in four flavors, `RelayVRFModuloCompact`, `RelayVRFDelay`, `RelayEquivocation` and the
deprecated `RelayVRFModulo`. Among these, `RelayVRFModulo`, `RelayVRFModuloCompact` and `RelayVRFDelay` run a
VRF whose input is the output of a `RelayVRFStory`, while `RelayEquivocation` runs a VRF whose input is the
output of a `RelayEquivocationStory`.
Among these, we have two distinct VRF output computations:
`RelayVRFModulo` runs several distinct samples whose VRF input is the `RelayVRFStory` and the sample number. It
computes the VRF output with `schnorrkel::vrf::VRFInOut::make_bytes` using the context "A&V Core", reduces this number
modulo the number of availability cores, and outputs the candidate just declared available by, and included by aka
leaving, that availability core. We drop any samples that return no candidate because no candidate was leaving the
sampled availability core in this relay chain block. We choose three samples initially, but we could make Pezkuwi more
secure and efficient by increasing this to four or five, and reducing the backing checks accordingly. All successful
`RelayVRFModulo` samples are assigned delay tranche zero.
`RelayVRFModuloCompact` runs a single samples whose VRF input is the `RelayVRFStory` and the sample count. Similar
to `RelayVRFModulo` introduces multiple core assignments for tranche zero. It computes the VRF output with
`schnorrkel::vrf::VRFInOut::make_bytes` using the context "A&V Core v2" and samples up to 160 bytes of the output
as an array of `u32`. Then reduces each `u32` modulo the number of availability cores, and outputs up
to `relay_vrf_modulo_samples` availability core indices.
There is no sampling process for `RelayVRFDelay` and `RelayEquivocation`. We instead run them on specific candidates
and they compute a delay from their VRF output. `RelayVRFDelay` runs for all candidates included under, aka declared
available by, a relay chain block, and inputs the associated VRF output via `RelayVRFStory`. `RelayEquivocation` runs
only on candidate block equivocations, and inputs their block hashes via the `RelayEquivocation` story.
`RelayVRFDelay` and `RelayEquivocation` both compute their output with `schnorrkel::vrf::VRFInOut::make_bytes` using the
context "A&V Tranche" and reduce the result modulo `num_delay_tranches + zeroth_delay_tranche_width`, and consolidate
results 0 through `zeroth_delay_tranche_width` to be 0. In this way, they ensure the zeroth delay tranche has
`zeroth_delay_tranche_width+1` times as many assignments as any other tranche.
As future work (or TODO?), we should merge assignment notices with the same delay and story using `vrf_merge`. We
cannot merge those with the same delay and different stories because `RelayEquivocationStory`s could change but
`RelayVRFStory` never changes.
## Announcer and Watcher/Tracker
We track all validators' announced approval assignments for each candidate associated to each relay chain block, which
tells us which validators were assigned to which candidates.
We permit at most one assignment per candidate per story per validator, so one validator could be assigned under both
the `RelayVRFDelay` and `RelayEquivocation` criteria, but not under both `RelayVRFModulo/RelayVRFModuloCompact`
and `RelayVRFDelay` criteria, since those both use the same story. We permit only one approval vote per candidate per
validator, which counts for any applicable criteria.
We announce, and start checking for, our own assignments when the delay of their tranche is reached, but only if the
tracker says the assignee candidate requires more approval checkers. We never announce an assignment we believe unnecessary
because early announcements gives an adversary information. All delay tranche zero assignments always get announced,
which includes all `RelayVRFModulo` and `RelayVRFModuloCompact` assignments.
In other words, if some candidate `C` needs more approval checkers by the time we reach round `t` then any validators
with an assignment to `C` in delay tranche `t` gossip their send assignment notice for `C`, and begin reconstruction and
validation for 'C. If however `C` reached enough assignments, then validators with later assignments skip announcing
their assignments.
We continue until all candidates have enough approval checkers assigned. We never prioritize assignments within
tranches and count all or no assignments for a given tranche together, so we often overshoot the target number of
assigned approval checkers.
### No shows
We have a "no show" timeout longer than one relay chain slot, so at least 6 seconds, during which we expect approval
checks should succeed in reconstructing the candidate block, in redoing its erasure coding to check the candidate
receipt, and finally in rechecking the candidate block itself.
We consider a validator a "no show" if they do not approve or dispute within this "no show" timeout from our receiving
their assignment notice. We time this from our receipt of their assignment notice instead of our imagined real time for
their tranche because otherwise receiving late assignment notices creates immediate "no shows" and unnecessary work.
We worry "no shows" represent a validator under denial of service attack, presumably to prevent it from reconstructing
the candidate, but perhaps delaying it form gossiping a dispute too. We therefore always replace "no shows" by adding
one entire extra delay tranche worth of validators, so such attacks always result in additional checkers.
As an example, imagine we need 20 checkers, but tranche zero produces only 14, and tranche one only 4, then we take all
5 from tranche two, and thus require 23 checkers for that candidate. If one checker Charlie from tranche one or two
does not respond within say 8 seconds, then we add all 7 checkers from tranche three. If again one checker Cindy from
tranche three does not respond within 8 seconds then we take all 3 checkers from tranche four. We now have 33 checkers
working on the candidate, so this escalated quickly.
We escalated so quickly because we worried that Charlie and Cindy might be the only honest checkers assigned to that
candidate. If therefore either Charlie or Cindy finally return an approval, then we can conclude approval, and abandon
the checkers from tranche four.
We therefore require the "no show" timeout to be longer than a relay chain slot so that we can witness "no shows"
on-chain. We discuss below how this helps reward validators who replace "no shows".
We avoid slashing for "no shows" by itself, although being "no show" could enter into some computation that punishes
repeated poor performance, presumably replaces `ImOnline`, and we could reduce their rewards and further rewards those
who filled in.
As future work, we foresee expanding the "no show" scheme to anonymize the additional checkers, like by using assignment
noticed with a new criteria that employs a ring VRF and then all validators providing cover by requesting a couple
erasure coded pieces, but such anonymity scheme sound extremely complex and lie far beyond our initial functionality.
## Assignment postponement
We expect validators could occasionally overloaded when they randomly acquire too many assignments. All these
fluctuations amortize over multiple blocks fairly well, but this slows down finality.
We therefore permit validators to delay sending their assignment noticed intentionally. If nobody knows about their
assignment then they avoid creating "no shows" and the workload progresses normally.
We strongly prefer if postponements come from tranches higher aka less important than zero because tranche zero checks
provide somewhat more security.
TODO: When? Is this optimal for the network? etc.
## Approval coalescing
To reduce the necessary network bandwidth and cpu time when a validator has more than one candidate to approve we are
doing our best effort to send a single message that approves all available candidates with a single signature.
The implemented heuristic, is that each time we are ready to create a signature and send a vote for a candidate we
delay sending it until one of three things happen:
- We gathered a maximum of `MAX_APPROVAL_COALESCE_COUNT` candidates that we have already checked and we are
ready to sign approval for.
- `MAX_APPROVAL_COALESCE_WAIT_TICKS` have passed since checking oldest candidate and we were ready to sign
and send the approval message.
- We are already in the last third of the no-show period in order to avoid creating accidental no-shows, which in
turn might trigger other assignments.
## On-chain verification
We should verify approval on-chain to reward approval checkers. We therefore require the "no show" timeout to be longer
than a relay chain slot so that we can witness "no shows" on-chain, which helps with this goal. The major challenge with
an on-chain record of the off-chain process is adversarial block producers who may either censor votes or publish votes
to the chain which cause other votes to be ignored and unrewarded (reward stealing).
In principle, all validators have some "tranche" at which they're assigned to the teyrchain candidate, which ensures we
reach enough validators eventually. As noted above, we often retract "no shows" when the slow validator eventually
shows up, so witnessing their initially being a "no show" helps manage rewards.
We expect on-chain verification should work in two phases: We first record assignments notices and approval votes
on-chain in relay chain block, doing the VRF or regular signature verification again in block verification, and
inserting chain authenticated unsigned notes into the relay chain state that contain the checker, tranche, paraid, and
relay block height for each assignment notice. We then later have another relay chain block that runs some "approved"
intrinsic, which extract all these notes from the state and feeds them into our approval code.
We now encounter one niche concern in the interaction between postponement and on-chain verification: Any validator
with a tranche zero (or other low) assignment could delay sending an assignment notice, like because they postponed
their assigned tranche (which is allowed). If they later send this assignment notices right around finality time, then
they race with this approved. intrinsic: If their announcement gets on-chain (also allowed), then yes it delays
finality. If it does not get on-chain, then yes we've one announcement that the off-chain consensus system says is
valid, but the chain ignores for being too slow.
We need the chain to win in this case, but doing this requires imposing an annoyingly long overarching delay upon
finality. We might explore limits on postponement too, but this sounds much harder.
## Parameters
We prefer doing approval checkers assignments under `RelayVRFModulo` or `RelayVRFModuloCompact` as opposed to
`RelayVRFDelay` because `RelayVRFModulo` avoids giving individual checkers too many assignments and tranche zero
assignments benefit security the most. We suggest assigning at least 16 checkers under `RelayVRFModulo` or
`RelayVRFModuloCompact` although assignment levels have never been properly analyzed.
Our delay criteria `RelayVRFDelay` and `RelayEquivocation` both have two primary parameters, expected checkers per
tranche and the zeroth delay tranche width.
We require expected checkers per tranche to be less than three because otherwise an adversary with 1/3 stake could force
all nodes into checking all blocks. We strongly recommend expected checkers per tranche to be less than two, which
helps avoid both accidental and intentional explosions. We also suggest expected checkers per tranche be larger than
one, which helps prevent adversaries from predicting than advancing one tranche adds only their own validators.
We improve security more with tranche zero assignments, so `RelayEquivocation` should consolidates its first several
tranches into tranche zero. We describe this as the zeroth delay tranche width, which initially we set to 12 for
`RelayEquivocation` and `1` for `RelayVRFDelay`.
## Why VRFs?
We do assignments with VRFs to give "enough" checkers some meaning beyond merely "expected" checkers:
We could specify a protocol that used only system randomness, which works because our strongest defense is the expected
number of honest checkers who assign themselves. In this, adversaries could trivially flood their own blocks with their
own checkers, so this strong defense becomes our only defense, and delay tranches become useless, so some blocks
actually have zero approval checkers and possibly only one checker overall.
VRFs though require adversaries wait far longer between such attacks, which also helps against adversaries with little
at stake because they compromised validators. VRFs raise user confidence that no such "drive by" attacks occurred
because the delay tranche system ensure at least some minimum number of approval checkers. In this vein, VRFs permit
reducing backing checks and increasing approval checks, which makes Pezkuwi more efficient.
## Gossip
Any validator could send their assignment notices and/or approval votes too early. We gossip the approval votes because
they represent a major commitment by the validator. We retain but delay gossiping the assignment notices until they
agree with our local clock.
Assignment notices being gossiped too early might create a denial of service vector. If so, we might exploit the
relative time scheme that synchronizes our clocks, which conceivably permits just dropping excessively early
assignments.
## Finality GRANDPA Voting Rule
The relay-chain requires validators to participate in GRANDPA. In GRANDPA, validators submit off-chain votes on what
they believe to be the best block of the chain, and GRANDPA determines the common block contained by a supermajority of
sub-chains. There are also additional constraints on what can be submitted based on results of previous rounds of
voting.
In order to avoid finalizing anything which has not received enough approval votes or is disputed, we will pair the
approval protocol with an alteration to the GRANDPA voting strategy for honest nodes which causes them to vote only on
chains where every teyrchain candidate within has been approved. Furthermore, the voting rule prevents voting for
chains where there is any live dispute or any dispute has resolved to a candidate being invalid.
Thus, the finalized relay-chain should contain only relay-chain blocks where a majority believe that every block within
has been sufficiently approved.
### Future work
We could consider additional gossip messages with which nodes claims "slow availability" and/or "slow candidate" to fine
tune the assignments "no show" system, but long enough "no show" delays suffice probably.
We shall develop more practical experience with UDP once the availability system works using direct UDP connections. In
this, we should discover if reconstruction performs adequately with a complete graphs or benefits from topology
restrictions. At this point, an assignment notices could implicitly request pieces from a random 1/3rd, perhaps
topology restricted, which saves one gossip round. If this preliminary fast reconstruction fails, then nodes' request
alternative pieces directly. There is an interesting design space in how this overlaps with "slow availability" claims.
@@ -0,0 +1,67 @@
# Chain Selection
Chain selection processes in blockchains are used for the purpose of selecting blocks to build on and finalize. It is
important for these processes to be consistent among nodes and resilient to a maximum proportion of malicious nodes
which do not obey the chain selection process.
The teyrchain host uses both a block authoring system and a finality gadget. The chain selection strategy of the
teyrchain host involves two key components: a _leaf-selection_ rule and a set of _finality constraints_. When it's a
validator's turn to author on a block, they are expected to select the best block via the leaf-selection rule to build
on top of. When a validator is participating in finality, there is a minimum block which can be voted on, which is
usually the finalized block. The validator should select the best chain according to the leaf-selection rule and
subsequently apply the finality constraints to arrive at the actual vote cast by that validator.
Before diving into the particularities of the leaf-selection rule and the finality constraints, it's important to
discuss the goals that these components are meant to achieve. For this it is useful to create the definitions of
_viable_ and _finalizable_ blocks.
## Property Definitions
A block is considered **viable** when all of the following hold:
1. It is or descends from the finalized block
2. It is not **stagnant**
3. It is not **reverted**.
A block is considered a **viable leaf** when all of the following hold:
1. It is **viable**
2. It has no **viable** descendant.
A block is considered **stagnant** when either:
1. It is unfinalized, is not approved, and has not been approved within 2 minutes
2. Its parent is **stagnant**.
A block is considered **reverted** when either:
1. It is unfinalized and includes a candidate which has lost a dispute
2. Its parent is **reverted**
A block is considered **finalizable** when all of the following hold:
1. It is **viable**
2. Its parent, if unfinalized, is **finalizable**.
3. It is either finalized or approved.
4. It is either finalized or includes no candidates which have unresolved disputes or have lost a dispute.
## The leaf-selection rule
We assume that every block has an implicit weight or score which can be used to compare blocks. In BABE, this is
determined by the number of primary slots included in the chain. In PoW, this is the chain with either the most work or
GHOST weight.
The leaf-selection rule based on our definitions above is simple: we take the maximum-scoring viable leaf we are aware
of. In the case of a tie we select the one with a lower lexicographical block hash.
## The best-chain-containing rule
Finality gadgets, as mentioned above, will often impose an additional requirement to vote on a chain containing a
specific block, known as the **required** block. Although this is typically the most recently finalized block, it is
possible that it may be a block that is unfinalized. When receiving such a request:
1. If the required block is the best finalized block, then select the best viable leaf.
2. If the required block is unfinalized and non-viable, then select the required block and go no further. This is likely
an indication that something bad will be finalized in the network, which will never happen when approvals & disputes
are functioning correctly. Nevertheless we account for the case here.
3. If the required block is unfinalized and viable, then iterate over the viable leaves in descending order by score and
select the first one which contains the required block in its chain. Backwards iteration is a simple way to check
this, but if unfinalized chains grow long then Merkle Mountain-Ranges will most likely be more efficient.
Once selecting a leaf, the chain should be constrained to the maximum of the required block or the highest
**finalizable** ancestor.
@@ -0,0 +1,133 @@
# Disputes
Fast forward to [more detailed disputes requirements](./disputes-flow.md).
## Motivation and Background
All teyrchain blocks that end up in the finalized relay chain should be valid. This does not apply to blocks that are
only backed, but not included.
We have two primary components for ensuring that nothing invalid ends up in the finalized relay chain:
* Approval Checking, as described [here](./protocol-approval.md) and implemented accordingly in the [Approval
Voting](node/approval/approval-voting.md) subsystem. This protocol can be shown to prevent invalid teyrchain blocks
from making their way into the finalized relay chain as long as the amount of attempts are limited.
* Disputes, this protocol, which ensures that each attempt to include something bad is caught, and the offending
validators are punished. Disputes differ from backing and approval process (and can not be part of those) in that a
dispute is independent of a particular fork, while both backing and approval operate on particular forks. This
distinction is important! Approval voting stops, if an alternative fork which might not contain the currently approved
candidate gets finalized. This is totally fine from the perspective of approval voting as its sole purpose is to make
sure invalid blocks won't get finalized. For disputes on the other hand we have different requirements: Even though the
"danger" is past and the adversaries were not able to get their invalid block approved, we still want them to get
slashed for the attempt. Otherwise they just have been able to get a free try, but this is something we need to avoid in
our security model, as it is based on the assumption that the probability of getting an invalid block finalized is very
low and an attacker would get bankrupt before it could have tried often enough.
Every dispute stems from a disagreement among two or more validators. If a bad actor creates a bad block, but the bad
actor never distributes it to honest validators, then nobody will dispute it. Of course, such a situation is not even an
attack on the network, so we don't need to worry about defending against it.
We are interested in identifying and deterring the following attack scenario:
* A parablock included on a branch of the relay chain is bad
We are also interested in identifying these additional scenarios:
* A parablock backed on a branch of the relay chain is bad
* A parablock seconded, but not backed on any branch of the relay chain, is bad.
Punishing misbehavior in the latter two scenarios doesn't effect our security guarantees and introduces substantial
technical challenges as described in the `No Disputes for Non Included Candidates` section of [Dispute
Coordinator](./node/disputes/dispute-coordinator.md). We therefore choose to punt on disputes in these cases, instead
favoring the protocol simplicity resulting from only punishing in the first scenario.
As covered in the [protocol overview](./protocol-overview.md), checking a teyrchain block requires 3 pieces of data: the
teyrchain validation code, the [`AvailableData`](types/availability.md), and the
[`CandidateReceipt`](types/candidate.md). The validation code is available on-chain, and published ahead of time, so
that no two branches of the relay chain have diverging views of the validation code for a given teyrchain. Note that
only for the first scenario, where the parablock has been included on a branch of the relay chain, is the data
necessarily available. Thus, dispute processes should begin with an availability process to ensure availability of the
`AvailableData`. This availability process will conclude quickly if the data is already available. If the data is not
already available, then the initiator of the dispute must make it available.
Disputes have both an on-chain and an off-chain component. Slashing and punishment is handled on-chain, so votes by
validators on either side of the dispute must be placed on-chain. Furthermore, a dispute on one branch of the relay
chain should be transposed to all other active branches of the relay chain. The fact that slashing occurs _in all
histories_ is crucial for deterring attempts to attack the network. The attacker should not be able to escape with their
funds because the network has moved on to another branch of the relay chain where no attack was attempted.
In fact, this is why we introduce a distinction between _local_ and _remote_ disputes. We categorize disputes as either
local or remote relative to any particular branch of the relay chain. Local disputes are about dealing with our first
scenario, where a parablock has been included on the specific branch we are looking at. In these cases, the chain is
corrupted all the way back to the point where the parablock was backed and must be discarded. However, as mentioned
before, the dispute must propagate to all other branches of the relay chain. All other disputes are considered _remote_.
For the on-chain component, when handling a dispute for a block which was not included in the current fork of the relay
chain, it is impossible to discern between our attack scenarios. It is possible that the parablock was included
somewhere, or backed somewhere, or wasn't backed anywhere. The on-chain component for handling these cases will be the
same.
## Initiation
Disputes are initiated by any validator who finds their opinion on the validity of a parablock in opposition to another
issued statement. As all statements currently gathered by the relay chain imply validity, disputes will be initiated
only by nodes which perceive that the parablock is bad.
The initiation of a dispute begins off-chain. A validator signs a message indicating that it disputes the validity of
the parablock and notifies all other validators, off-chain, of all of the statements it is aware of for the disputed
parablock. These may be backing statements or approval-checking statements. It is worth noting that there is no special
message type for initiating a dispute. It is the same message as is used to participate in a dispute and vote
negatively. As such, there is no consensus required on who initiated a dispute, only on the fact that there is a dispute
in-progress.
In practice, the initiator of a dispute will be either one of the backers or one of the approval checkers for the
parablock. If the result of execution is found to be invalid, the validator will initiate the dispute as described
above. Furthermore, if the dispute occurs during the backing phase, the initiator must make the data available to other
validators. If the dispute occurs during approval checking, the data is already available.
Lastly, it is possible that for backing disputes, i.e. where the data is not already available among all validators,
that an adversary may DoS the few parties who are checking the block to prevent them from distributing the data to other
validators participating in the dispute process. Note that this can only occur pre-inclusion for any given parablock, so
the downside of this attack is small and it is not security-critical to address these cases. However, we assume that the
adversary can only prevent the validator from issuing messages for a limited amount of time. We also assume that there
is a side-channel where the relay chain's governance mechanisms can trigger disputes by providing the full PoV and
candidate receipt on-chain manually.
## Dispute Participation
Once becoming aware of a dispute, it is the responsibility of all validators to participate in the dispute. Concretely,
this means:
* Circulate all statements about the candidate that we are aware of - backing statements, approval checking
statements, and dispute statements.
* If we have already issued any type of statement about the candidate, go no further.
* Download the [`AvailableData`](types/availability.md). If possible, this should first be attempted from other
dispute participants or backing validators, and then [(via
erasure-coding)](node/availability/availability-recovery.md) from all validators.
* Extract the Validation Code from any recent relay chain block. Code is guaranteed to be kept available on-chain, so
we don't need to download any particular fork of the chain.
* Execute the block under the validation code, using the `AvailableData`, and check that all outputs are correct,
including the `erasure-root` of the [`CandidateReceipt`](types/candidate.md).
* Issue a dispute participation statement to the effect of the validity of the candidate block.
Disputes _conclude_ after ⅔ supermajority is reached in either direction.
The on-chain component of disputes can be initiated by providing any two conflicting votes and it also waits for a ⅔
supermajority on either side. The on-chain component also tracks which parablocks have already been disputed so the same
parablock may only be disputed once on any particular branch of the relay chain. Lastly, it also tracks which blocks
have been included on the current branch of the relay chain. When a dispute is initiated for a para, inclusion is halted
for the para until the dispute concludes.
The author of a relay chain block should initiate the on-chain component of disputes for all disputes which the chain is
not aware of, and provide all statements to the on-chain component as well. This should all be done via _inherents_.
Validators can learn about dispute statements in two ways:
* Receiving them from other validators over gossip
* Scraping them from imported blocks of the relay chain. This is also used for validators to track other types of
statements, such as backing statements.
Validators are rewarded for providing statements to the chain as well as for participating in the dispute, on either
side. However, the losing side of the dispute is slashed.
## Dispute Conclusion
Disputes, roughly, are over when one side reaches a ⅔ supermajority. They may also never conclude without either side
witnessing supermajority, which will only happen if the majority of validators are unable to vote for some reason.
Furthermore, disputes on-chain will stay open for some fixed amount of time even after concluding, to accept new votes.
Late votes, after the dispute already reached a ⅔ supermajority, must be rewarded (albeit a smaller amount) as well.
@@ -0,0 +1,286 @@
# Protocol Overview
This section aims to describe, at a high level, the actors and protocols involved in running teyrchains in Pezkuwi.
Specifically, we describe how different actors communicate with each other, what data structures they keep both
individually and collectively, and the high-level purpose on why they do these things.
Our top-level goal is to carry a teyrchain block from authoring to secure inclusion, and define a process which can be
carried out repeatedly and in parallel for many different teyrchains to extend them over time. Understanding of the
high-level approach taken here is important to provide context for the proposed architecture further on. The key parts
of Pezkuwi relevant to this are the main Pezkuwi blockchain, known as the relay-chain, and the actors which provide
security and inputs to this blockchain.
First, it's important to go over the main actors we have involved in this protocol.
1. Validators. These nodes are responsible for validating proposed teyrchain blocks. They do so by checking a
Proof-of-Validity (PoV) of the block and ensuring that the PoV remains available. They put financial capital down as
"skin in the game" which can be slashed (destroyed) if they are proven to have misvalidated.
1. Collators. These nodes are responsible for creating the Proofs-of-Validity that validators know how to check.
Creating a PoV typically requires familiarity with the transaction format and block authoring rules of the teyrchain,
as well as having access to the full state of the teyrchain.
This implies a simple pipeline where collators send validators teyrchain blocks and their requisite PoV to check. Then,
validators validate the block using the PoV, signing statements which describe either the positive or negative outcome,
and with enough positive statements, the block can be noted on the relay-chain. Negative statements are not a veto but
will lead to a dispute, with those on the wrong side being slashed. If another validator later detects that a validator
or group of validators incorrectly signed a statement claiming a block was valid, then those validators will be
_slashed_, with the checker receiving a bounty.
However, there is a problem with this formulation. In order for another validator to check the previous group of
validators' work after the fact, the PoV must remain _available_ so the other validator can fetch it in order to check
the work. The PoVs are expected to be too large to include in the blockchain directly, so we require an alternate _data
availability_ scheme which requires validators to prove that the inputs to their work will remain available, and so
their work can be checked. Empirical tests tell us that many PoVs may be between 1 and 10MB during periods of heavy
load.
Here is a description of the Inclusion Pipeline: the path a teyrchain block (or parablock, for short) takes from
creation to inclusion:
1. Validators are selected and assigned to teyrchains by the Validator Assignment routine.
1. A collator produces the teyrchain block, which is known as a teyrchain candidate or candidate, along with a PoV for
the candidate.
1. The collator forwards the candidate and PoV to validators assigned to the same teyrchain via the [Collator
Protocol](node/collators/collator-protocol.md).
1. The validators assigned to a teyrchain at a given point in time participate in the [Candidate Backing
subsystem](node/backing/candidate-backing.md) to validate candidates that were put forward for validation. Candidates
which gather enough signed validity statements from validators are considered "backable". Their backing is the set of
signed validity statements.
1. A relay-chain block author, selected by BABE, can note up to one (1) backable candidate for each teyrchain to include
in the relay-chain block alongside its backing. A backable candidate once included in the relay-chain is considered
backed in that fork of the relay-chain.
1. Once backed in the relay-chain, the teyrchain candidate is considered to be "pending availability". It is not
considered to be included as part of the teyrchain until it is proven available.
1. In the following relay-chain blocks, validators will participate in the [Availability Distribution
subsystem](node/availability/availability-distribution.md) to ensure availability of the candidate. Information
regarding the availability of the candidate will be noted in the subsequent relay-chain blocks.
1. Once the relay-chain state machine has enough information to consider the candidate's PoV as being available, the
candidate is considered to be part of the teyrchain and is graduated to being a full teyrchain block, or parablock
for short.
Note that the candidate can fail to be included in any of the following ways:
- The collator is not able to propagate the candidate to any validators assigned to the teyrchain.
- The candidate is not backed by validators participating in the Candidate Backing Subsystem.
- The candidate is not selected by a relay-chain block author to be included in the relay chain
- The candidate's PoV is not considered as available within a timeout and is discarded from the relay chain.
This process can be divided further down. Steps 2 & 3 relate to the work of the collator in collating and distributing
the candidate to validators via the Collation Distribution Subsystem. Steps 3 & 4 relate to the work of the validators
in the Candidate Backing Subsystem and the block author (itself a validator) to include the block into the relay chain.
Steps 6, 7, and 8 correspond to the logic of the relay-chain state-machine (otherwise known as the Runtime) used to
fully incorporate the block into the chain. Step 7 requires further work on the validators' parts to participate in the
Availability Distribution Subsystem and include that information into the relay chain for step 8 to be fully realized.
This brings us to the second part of the process. Once a parablock is considered available and part of the teyrchain, it
is still "pending approval". At this stage in the pipeline, the parablock has been backed by a majority of validators in
the group assigned to that teyrchain, and its data has been guaranteed available by the set of validators as a whole.
Once it's considered available, the host will even begin to accept children of that block. At this point, we can
consider the parablock as having been tentatively included in the teyrchain, although more confirmations are desired.
However, the validators in the teyrchain-group (known as the "Teyrchain Validators" for that teyrchain) are sampled from
a validator set which contains some proportion of byzantine, or arbitrarily malicious members. This implies that the
Teyrchain Validators for some teyrchain may be majority-dishonest, which means that (secondary) approval checks must be
done on the block before it can be considered approved. This is necessary only because the Teyrchain Validators for a
given teyrchain are sampled from an overall validator set which is assumed to be up to <1/3 dishonest - meaning that
there is a chance to randomly sample Teyrchain Validators for a teyrchain that are majority or fully dishonest and can
back a candidate wrongly. The Approval Process allows us to detect such misbehavior after-the-fact without allocating
more Teyrchain Validators and reducing the throughput of the system. A parablock's failure to pass the approval process
will invalidate the block as well as all of its descendants. However, only the validators who backed the block in
question will be slashed, not the validators who backed the descendants.
The Approval Process, at a glance, looks like this:
1. Parablocks that have been included by the Inclusion Pipeline are pending approval for a time-window known as the
secondary checking window.
1. During the secondary-checking window, validators randomly self-select to perform secondary checks on the parablock.
1. These validators, known in this context as secondary checkers, acquire the parablock and its PoV, and re-run the
validation function.
1. The secondary checkers gossip the result of their checks. Contradictory results lead to escalation, where all
validators are required to check the block. The validators on the losing side of the dispute are slashed.
1. At the end of the Approval Process, the parablock is either Approved or it is rejected. More on the rejection process
later.
More information on the Approval Process can be found in the dedicated section on [Approval](protocol-approval.md). More
information on Disputes can be found in the dedicated section on [Disputes](protocol-disputes.md).
These two pipelines sum up the sequence of events necessary to extend and acquire full security on a Parablock. Note
that the Inclusion Pipeline must conclude for a specific teyrchain before a new block can be accepted on that teyrchain.
After inclusion, the Approval Process kicks off, and can be running for many teyrchain blocks at once.
Reiterating the lifecycle of a candidate:
1. Candidate: put forward by a collator to a validator.
1. Seconded: put forward by a validator to other validators
1. Backable: validity attested to by a majority of assigned validators
1. Backed: Backable & noted in a fork of the relay-chain.
1. Pending availability: Backed but not yet considered available.
1. Included: Backed and considered available.
1. Accepted: Backed, available, and undisputed
```dot process Inclusion Pipeline
digraph {
subgraph cluster_vg {
label=<
Teyrchain Validators
<br/>
(subset of all)
>
labeljust=l
style=filled
color=lightgrey
node [style=filled color=white]
v1 [label="Validator 1"]
v2 [label="Validator 2"]
v3 [label="Validator 3"]
b [label="(3) Backable", shape=box]
v1 -> v2 [label="(2) Seconded"]
v1 -> v3 [label="(2) Seconded"]
v2 -> b [style=dashed arrowhead=none]
v3 -> b [style=dashed arrowhead=none]
v1 -> b [style=dashed arrowhead=none]
}
v4 [label=<
<b>Validator 4</b> (relay chain)
<br/>
<font point-size="10">
(selected by BABE)
</font>
>]
col [label="Collator"]
pa [label="(5) Relay Block (Pending Availability)", shape=box]
pb [label="Parablock", shape=box]
rc [label="Relay Chain Validators"]
subgraph cluster_approval {
label=<
Secondary Checkers
<br/>
(subset of all)
>
labeljust=l
style=filled
color=lightgrey
node [style=filled color=white]
a5 [label="Validator 5"]
a6 [label="Validator 6"]
a7 [label="Validator 7"]
}
b -> v4 [label="(4) Backed"]
col -> v1 [label="(1) Candidate"]
v4 -> pa
pa -> pb [label="(6) a few blocks later..." arrowhead=none]
pb -> a5
pb -> a6
pb -> a7
a5 -> rc [label="(7) Approved"]
a6 -> rc [label="(7) Approved"]
a7 -> rc [label="(7) Approved"]
}
```
The diagram above shows the happy path of a block from (1) Candidate to the (7) Approved state.
It is also important to take note of the fact that the relay-chain is extended by BABE, which is a forkful algorithm.
That means that different block authors can be chosen at the same time, and may not be building on the same block
parent. Furthermore, the set of validators is not fixed, nor is the set of teyrchains. And even with the same set of
validators and teyrchains, the validators' assignments to teyrchains is flexible. This means that the architecture
proposed in the next chapters must deal with the variability and multiplicity of the network state.
```dot process
digraph {
rca [label="Relay Block A" shape=box]
rcb [label="Relay Block B" shape=box]
rcc [label="Relay Block C" shape=box]
vg1 [label=<
<b>Validator Group 1</b>
<br/>
<br/>
<font point-size="10">
(Validator 4)
<br/>
(Validator 1) (Validator 2)
<br/>
(Validator 5)
</font>
>]
vg2 [label=<
<b>Validator Group 2</b>
<br/>
<br/>
<font point-size="10">
(Validator 7)
<br/>
(Validator 3) (Validator 6)
</font>
>]
rcb -> rca
rcc -> rcb
vg1 -> rcc [label="Building on C" style=dashed arrowhead=none]
vg2 -> rcb [label="Building on B" style=dashed arrowhead=none]
}
```
In this example, group 1 has received block C while the others have not due to network asynchrony. Now, a validator from
group 2 may be able to build another block on top of B, called `C'`. Assume that afterwards, some validators become
aware of both C and `C'`, while others remain only aware of one.
```dot process
digraph {
rca [label="Relay Block A" shape=box]
rcb [label="Relay Block B" shape=box]
rcc [label="Relay Block C" shape=box]
rcc_prime [label="Relay Block C'" shape=box]
vg1 [label=<
<b>Validator Group 1</b>
<br/>
<br/>
<font point-size="10">
(Validator 4) (Validator 1)
</font>
>]
vg2 [label=<
<b>Validator Group 2</b>
<br/>
<br/>
<font point-size="10">
(Validator 7) (Validator 6)
</font>
>]
vg3 [label=<
<b>Validator Group 3</b>
<br/>
<br/>
<font point-size="10">
(Validator 2) (Validator 3)
<br/>
(Validator 5)
</font>
>]
rcb -> rca
rcc -> rcb
rcc_prime -> rcb
vg1 -> rcc [style=dashed arrowhead=none]
vg2 -> rcc_prime [style=dashed arrowhead=none]
vg3 -> rcc_prime [style=dashed arrowhead=none]
vg3 -> rcc [style=dashed arrowhead=none]
}
```
Those validators that are aware of many competing heads must be aware of the work happening on each one. They may
contribute to some or a full extent on both. It is possible that due to network asynchrony two forks may grow in
parallel for some time, although in the absence of an adversarial network this is unlikely in the case where there are
validators who are aware of both chain heads.
@@ -0,0 +1,437 @@
# Validator Disabling
## Background
As established in the [approval process](protocol-approval.md) dealing with bad parablocks is a three step process:
1. Detection
1. Escalation
1. Consequences
The main system responsible for dispensing **consequences** for malicious actors is the [dispute
system](protocol-disputes.md) which eventually dispenses slash events. The slashes itself can be dispensed quickly (a
matter of blocks) but for an extra layer of auditing all slashes are deferred for 27 days (in Pezkuwi/Kusama) which
gives time for Governance to investigate and potentially alter the punishment. Dispute concluding by itself does not
immediately remove the validator from the active validator set.
> **Note:** \
> There was an additional mechanism of automatically chilling the validator which removed their intent to participate in
> the next election, but the removed validator could simply re-register his intent to validate.
There is a need to have a more immediate way to deal with malicious validators. This is where the validator disabling
comes in. It is focused on dispensing **low latency** consequences for malicious actors. It is important to note that
the validator disabling is not a replacement for the dispute or slashing systems. It is a complementary system that is
focused on lighter but immediate consequences usually in the form of restricted validator privileges.
The primary goals are:
- Eliminate or minimize cases where attackers can get free attempts at attacking the network
- Eliminate or minimize the risks of honest nodes being pushed out of consensus when getting unjustly slashed (defense
in depth)
The above two goals are generally at odds so a careful balance has to be struck between them. We will achieve them by
sacrificing some **liveness** in favor of **soundness** when the network is under stress. Maintaining some liveness but
absolute soundness is paramount.
> **Note:** \
> Liveness = Valid candidates can go through (at a decent pace) \
> Security = Invalid candidates cannot go through (or are statistically very improbable)
Side goals are:
- Reduce the damages to honest nodes that had a fault which might cause repeated slashes
- Reduce liveness impact of individual malicious attackers
## System Overview
High level assumptions and goals of the validator disabling system that will be further discussed in the following
sections:
1. If validator gets slashed (even 0%) we mark them as disabled in the runtime and on the node side.
1. We only disable up to byzantine threshold of the validators.
1. If there are more offenders than byzantine threshold disable only the highest offenders. (Some might get re-enabled.)
1. Disablement lasts for 1 era.
1. Disabled validators remain in the active validator set but have some limited permissions.
1. Disabled validators can get re-elected.
1. Disabled validators can participate in approval checking.
1. Disabled validators can participate in GRANDPA/BEEFY, but equivocations cause disablement.
1. Disabled validators cannot author blocks.
1. Disabled validators cannot back candidates.
1. Disabled validators cannot initiate disputes, but their votes are still counted if a dispute occurs.
1. Disabled validators making dispute statements no-show in approval checking.
</br></br></br>
# Risks
## Risks of NOT having validator disabling
Assume that if an offense is committed a slash is deposited but the perpetrator can still act normally. He will be
slashed 100% with a long delay (slash deferral duration which is 27 days). This is akin to the current design.
A simple argument for disabling is that if someone is already slashed 100% and they have nothing to lose they could
cause harm to the network and should be silenced.
What harm could they cause?
**1. Liveness attacks:**
- 1.1. Break sharding (with mass no-shows or mass disputes): It forces everyone to do all the work which affects
liveness but doesn't kill it completely. The chain can progress at a slow rate.
- 1.2. Mass invalid candidate backing: Spawns a lot of worthless work that needs to be done but it is bounded by backing
numbers. Honest backers will still back valid candidates and that cannot be stopped. Honest block authors will
eventually select valid candidates and even if disputed they will win and progress the chain.
**2. Soundness attacks:**
- 2.1. The best and possibly only way to affect soundness is by getting lucky in the approval process. If by chance all
approval voters would be malicious, the attackers could get a single invalid candidate through. Their chances would be
relatively low but in general this risk has to be taken seriously as it significantly reduces the safety buffer around
approval checking.
> **Note:** With 30 approvals needed chance that a malicious candidate going through is around 4\*10^-15. Assuming
> attackers can back invalid candidates on 50 cores for 48 hours straight and only those candidates get included it
> still gives a 7\*10^-9 chance of success which is still relatively small considering the cost (all malicious stake
> slashed).
Attacks 1.2 and 2.1 should generally be pretty futile as a solo attacker while 1.1 could be possible with mass disputes
even from a single attacker. Nevertheless whatever the attack vector within the old system the attackers would get
*eventually* get slashed and pushed out of the active validator set but they had plenty of time to wreck havoc.
## Risks of having validator disabling
Assume we fully push out validator when they commit offenses.
The primary risk behind having any sort of disabling is that it is a double-edged sword that in case of any dispute bugs
or sources of PVF non-determinism could disable honest nodes or be abused by attackers to specifically silence honest
nodes.
Validators being pushed out of the validator set are an issue because that can greatly skew the numbers game in approval
checking (% for 30-ish malicious in a row).
There are also censorship or liveness issues if backing is suddenly dominated by malicious nodes but in general even if
some honest blocks get backed liveness should be preserved.
> **Note:** It is worth noting that is fundamentally a defense in depth strategy because if we assume disputes are
> perfect it should not be a real concern. In reality disputes and determinism are difficult to get right, and
> non-determinism and happen so defense in depth is crucial when handling those subsystems.
</br></br></br>
# Risks Mitigation
## Addressing the risks of having validator disabling
One safety measure is bounding the disabled number to 1/3 ([**Point 2.**](#system-overview)) or to be exact the
byzantine threshold. If for any reason more than 1/3 of validators are getting disabled it means that some part of the
protocol failed or there is more than 1/3 malicious nodes which breaks the assumptions.
Even in such a dire situation where more than 1/3 got disabled the most likely scenario is a non-determinism bug or
sacrifice attack bug. Those attacks generally cause minor slashes to multiple honest nodes. In such a case the situation
could be salvaged by prioritizing highest offenders for disabling ([**Point 3.**](#system-overview)).
> **Note:** \
> System can be launched with re-enabling and will still provide some security improvements. Re-enabling will be
> launched in an upgrade after the initial deployment.
Fully pushing out offending validator out of the validator set it too risky in case of a dispute bug, non-determinism or
sacrifice attacks. Main issue lies in skewing the numbers in approval checking so instead of fully blocking disabled
nodes a different approach can be taken - one were only some functionalities are disabled ([**Point
5.**](#system-overview)). Once of those functionalities can be approval voting which as pointed above is so crucial that
even in a disabled state nodes should be able to participate in it ([**Point 7.**](#system-overview)).
> **Note:** \
> Approval Checking statement are implicitly valid. Sending a statement for an invalid candidate is a part of the
> dispute logic which we did not yet discuss. For now we only allow nodes to state that a candidate is valid or remain
> silent. But this solves the main risk of disabling.
Because we capped the number of disabled nodes to 1/3 there will always be at least 1/3 honest nodes to participate in
backing so liveness should be preserved. That means that backing **COULD** be safely disabled for disabled nodes
([**Point 10.**](#system-overview)).
## Addressing the risks of NOT having validator disabling
To determine if backing **SHOULD** be disabled the attack vector of 1.2 (Mass invalid candidate backing) and 2.1
(Getting lucky in approval voting) need to be considered. In both of those cases having extra backed malicious
candidates gives attackers extra chances to get lucky in approval checking. The solution is to not allow for backing in
disablement. ([**Point 10.**](#system-overview))
The attack vector 1.1 (Break sharding) requires a bit more nuance. If we assume that the attacker is a single entity and
that he can get a lot of disputes through he could potentially incredibly easily break sharding. This generally points
into the direction of disallowing that during disablement ([**Point 11.**](#system-overview)).
This might seem like an issue because it takes away the escalation privileges of disabled approval checkers but this is
NOT true. By issuing a dispute statement those nodes remain silent in approval checking because they skip their approval
statement and thus will count as a no-show. This will create a mini escalation for that particular candidate. This means
that disabled nodes maintain just enough escalation that they can protect soundness (same argument as soundness
protection during a DoS attack on approval checking) but they lose their extreme escalation privilege which are only
given to flawlessly performing nodes ([**Point 12.**](#system-overview)).
As a defense in depth measure dispute statements from disabled validators count toward confirming disputes (byzantine
threshold needed to confirm). If a dispute is confirmed everyone participates in it. This protects us from situations
where due to a bug more than byzantine threshold of validators would be disabled.
> **Note:** \
> The way this behavior is achieved easily in implementation is that honest nodes note down dispute statements from
> disabled validators just like they would for normal nodes, but they do not release their own dispute statements unless
> the dispute is confirmed already. This simply stops the escalation process of disputes.
</br></br>
# Disabling Duration
## Context
A crucial point to understand is that as of the time of writing all slashing events as alluded to in the begging are
delayed for 27 days before being executed. This is primarily because it gives governance enough time to investigate and
potentially intervene. For that duration when the slash is pending the stake is locked and cannot be moved. Time to
unbond you stake is 28 days which ensures that the stake will eventually be slashed before being withdrawn.
## Design
A few options for the duration of disablement were considered:
- 1 epoch (4h in Pezkuwi)
- 1 era (24h in Pezkuwi)
- 2-26 eras
- 27 eras
1 epoch is a short period and between a few epochs the validator will most likely be exactly the same. It is also very
difficult to fix any local node issues for honest validator in such a short time so the chance for a repeated offense is
high.
1 era gives a bit more time to fix any minor issues. Additionally, it guarantees a validator set change at so many of
the currently disabled validator might no longer be present anyway. It also gives the time for the validator to chill
themselves if they have identified a cause and want to spend more time fixing it. ([**Point 4.**](#system-overview))
Higher values could be considered and the main arguments for those are based around the fact that it reduces the number
of repeated attacks that will be allowed before the slash execution. Generally 1 attack per era for 27 eras resulting in
27 attacks at most should not compromise our safety assumptions. Although this direction could be further explored and
might be parametrized for governance to decide.
</br></br></br>
# Economic consequences of Disablement
Disablement is generally a form of punishment and that will be reflected in the rewards at the end of an era. A disabled
validator will not receive any rewards for backing or block authoring. which will reduce its profits.
That means that the opportunity cost of being disabled is a punishment by itself and thus it can be used for some cases
where a minor punishment is needed. Current implementation was using 0% slashes to mark nodes for chilling and similar
approach of 0% slashes can be used to mark validators for disablement. ([**Point 1.**](#system-overview)) 0% slashes
could for instance be used to punish approval checkers voting invalid on valid candidates.
Anything higher than 0% will of course also lead to a disablement.
> **Notes:** \
> Alternative designs incorporating disabling proportional to offenses were explored but they were deemed too complex
> and not worth the effort. Main issue with those is that proportional disabling would cause back and forth between
> disabled and enabled which complicated tracking the state of disabled validators and messes with optimistic node
> optimizations. Main benefits were that minor slashes will be barely disabled which has nice properties against
> sacrifice attacks.
</br></br></br>
# Redundancy
Some systems can be greatly simplified or outright removed thanks to the above changes. This leads to reduced complexity
around the systems that were hard to reason about and were sources of potential bugs or new attack vectors.
## Automatic Chilling
Chilling is process of a validator dropping theirs intent to validate. This removes them from the upcoming NPoS
elections and effectively pushes them out of the validator set as quickly as of the next era (or 2 era in case of late
offenses). All nominators of that validator were also getting unsubscribed from that validator. Validator could
re-register their intent to validate at any time. The intent behind this logic was to protect honest stakes from
repeated slashes caused by unnoticed bugs. It would give time for validators to fix their issue before continuing as a
validator.
Chilling had a myriad of problems. It assumes that validators and nominators remain very active and monitor everything.
If a validator got slashed he was getting automatically chilled and his nominators were getting unsubscribed. This was
an issue because of minor non-malicious slashes due to node operator mistakes or small bugs. Validators got those bugs
fixed quickly and were reimbursed but nominator had to manually re-subscribe to the validator, which they often
postponed for very lengthy amounts of time most likely due to simply not checking their stake. **This forced
unsubscribing of nominators was later disabled.**
Automatic chilling was achieving its goals in ideal scenarios (no attackers, no lazy nominators) but it opened new
vulnerabilities for attackers. The biggest issue was that chilling in case of honest node slashes could lead to honest
validators being quickly pushed out of the next validator set within the next era. This retains the validator set size
but gives an edge to attackers as they can more easily win slots in the NPoS election.
Disabling allows for punishment that limits the damages malicious actors can cause without having to resort to kicking
them out of the validator set. This protects us from the edge case of honest validators getting quickly pushed out of
the set by slashes. ([**Point 6.**](#system-overview))
> **Notes:** \
> As long as honest slashes absolutely cannot occur automatic chilling is a sensible and desirable. This means it could
> be re-enabled once PolkaVM introduces deterministic gas metering. Then best of both worlds could be achieved.
## Forcing New Era
Previous implementation of disabling had some limited mechanisms allowing for validators disablement and if too many
were disabled forcing a new era (new election). Frame staking pallet offered the ability to force a new era but it was
also deemed unsafe as it could be abused and compromised the security of the network for instance by weakening the
randomness used throughout the protocol.
</br></br></br>
# Other types of slashing
Above slashes were specifically referring to slashing events coming from disputes against candidates, but in Pezkuwi
other types of offenses exist for example GRANDPA equivocations or block authoring offenses. Question is if the above
defined design can handle those offenses.
## GRANDPA/BEEFY Offenses
The main offences for GRANDPA/BEEFY are equivocations. It is not a very serious offense and some nodes committing do not
endanger the system and performance is barely affected. If more than byzantine threshold of nodes equivocate it is a
catastrophic failure potentially resulting in 2 finalized blocks on the same height in the case of GRANDPA.
Honest nodes generally should not commit those offenses so the goal of protecting them does not apply here.
> **Note:** \
> A validator running multiple nodes with the same identity might equivocate. Doing that is highly not advised but it
> has happened before.
It's not a game of chance so giving attackers extra chances does not compromise soundness. Also it requires a
supermajority of honest nodes to successfully finalize blocks so any disabling of honest nodes from GRANDPA might
compromise liveness.
Best approach is to allow disabled nodes to participate in GRANDPA/BEEFY as normal and as mentioned before
GRANDPA/BABE/BEEFY equivocations should not happen to honest nodes so we can safely disable the offenders. Additionally
the slashes for singular equivocations will be very low so those offenders would easily get re-enabled in the case of
more serious offenders showing up. ([**Point 8.**](#system-overview))
## Block Authoring Offenses (BABE Equivocations)
Even if all honest nodes are disabled in Block Authoring (BA) liveness is generally preserved. At least 50% of blocks
produced should still be honest. Soundness wise disabled nodes can create a decent amount of wasted work by creating bad
blocks but they only get to do it in bounded amounts.
Disabling in BA is not a requirement as both liveness and soundness are preserved but it is the current default behavior
as well as it offers a bit less wasted work.
Offenses in BA just like in backing can be caused by faulty PVFs or bugs. They might happen to honest nodes and
disabling here while not a requirement can also ensure that this node does not repeat the offense as it might not be
trusted with it's PVF anymore.
Both points above don't present significant risks when disabling so the default behavior is to disable in BA and because
of offenses in BA. ([**Point 9.**](#system-overview)) This filters out honest faulty nodes as well as protects from some
attackers.
</br></br></br>
# Extra Design Considerations
## Disabling vs Accumulating Slashes
Instant disabling generally allows us to remove the need for accumulating slashes. It is a more immediate punishment and
it is a more lenient punishment for honest nodes.
The current architecture of using max slashing can be used and it works around the problems of delaying the slash for a
long period.
An alternative design with immediate slashing and acclimating slashing could relevant to other systems but it goes
against the governance auditing mechanisms so it's not be suitable for Pezkuwi.
## Disabling vs Getting Pushed Out of NPoS Elections
Validator disabling and getting forced ouf of NPoS elections (1 era) due to slashes are actually very similar processes
in terms of outcomes but there are some differences:
- **latency** (next few blocks for validator disabling and 27 days for getting pushed out organically)
- **pool restriction** (validator disabling could effectively lower the number of active validators during an era if we
fully disable)
- **granularity** (validator disabling could remove only a portion of validator privileges instead of all)
Granularity is particularly crucial in the final design as only a few select functions are disabled while others remain.
## Enabling Approval Voter Slashes
The original Pezkuwi 1.0 design describes that all validators on the loosing side of the dispute are slashed. In the
current system only the backers are slashed and any approval voters on the wrong side will not be slashed. This creates
some undesirable incentives:
- Lazy approval checkers (approvals yay`ing everything)
- Spammy approval checkers (approval voters nay`ing everything)
Initially those slashes were disabled to reduce the complexity and to minimize the risk surface in case the system
malfunctioned. This is especially risky in case any nondeterministic bugs are present in the system. Once validator
re-enabling is launched approval voter slashes can be re-instated. Numbers need to be further explored but slashes
between 0-2% are reasonable. 0% would still disable which with the opportunity cost consideration should be enough.
> **Note:** \
> Spammy approval checkers are in fact not a big issue as a side effect of the offchain-disabling introduced by the
> Defense Against Past-Era Dispute Spam (**Node**) [#2225](https://github.com/pezkuwichain/pezkuwi-sdk/issues/119). It
> makes it so all validators loosing a dispute are locally disabled and ignored for dispute initiation so it effectively
> silences spammers. They can still no-show but the damage is minimized.
## Interaction with all types of misbehaviors
With re-enabling in place and potentially approval voter slashes enabled the overall misbehaviour-punishment system can
be as highlighted in the table below:
|Misbehaviour |Slash % |Onchain Disabling |Offchain Disabling |Chilling |Reputation Costs |
|------------ |------- |----------------- |------------------ |-------- |----------------- |
|Backing Invalid |100% |Yes (High Prio) |Yes (High Prio) |No |No |
|ForInvalid Vote |2% |Yes (Mid Prio) |Yes (Mid Prio) |No |No |
|AgainstValid Vote |0% |Yes (Low Prio) |Yes (Low Prio) |No |No |
|GRANDPA / BABE / BEEFY Equivocations |0.01-100% |Yes (Varying Prio) |No |No |No |
|Seconded + Valid Equivocation |- |No |No |No |No |
|Double Seconded Equivocation |- |No |No |No |Yes |
*Ignoring AURA offences.
**There are some other misbehaviour types handled in rep only (DoS prevention etc) but they are not relevant to this strategy.
*** BEEFY will soon introduce new slash types so this strategy table will need to be revised but no major changes are expected.
</br></br></br>
# Implementation
Implementation of the above design covers a few additional areas that allow for node-side optimizations.
## Core Features
1. Disabled Validators Tracking (**Runtime**) [#2950](https://github.com/pezkuwichain/pezkuwi-sdk/issues/125)
- Expose a ``disabled_validators`` map through a Runtime API
1. Enforce Backing Disabling (**Runtime**) [#1592](https://github.com/pezkuwichain/pezkuwi-sdk/issues/110)
- Filter out votes from ``disabled_validators`` in ``BackedCandidates`` in ``process_inherent_data``
1. Substrate Byzantine Threshold (BZT) as Limit for Disabling
[#1963](https://github.com/pezkuwichain/pezkuwi-sdk/issues/114)
- Can be parametrized but default to BZT
- Disable only up to 1/3 of validators
1. Respect Disabling in Backing Statement Distribution (**Node**)
[#1591](https://github.com/pezkuwichain/pezkuwi-sdk/issues/112)
- This is an optimization as in the end it would get filtered in the runtime anyway
- Filter out backing statements coming from ``disabled_validators``
1. Respect Disablement in Backing (**Node**) [#2951](https://github.com/pezkuwichain/pezkuwi-sdk/issues/126)
- This is an optimization as in the end it would get filtered in the runtime anyway
- Don't start backing new candidates when disabled
- Don't react to backing requests when disabled
1. Stop Automatic Chilling of Offenders [#1962](https://github.com/pezkuwichain/pezkuwi-sdk/issues/113)
- Chilling still persists as a state but is no longer automatically applied on offenses
1. Respect Disabling in Dispute Participation (**Node**) [#2225](https://github.com/pezkuwichain/pezkuwi-sdk/issues/119)
- Receive dispute statements from ``disabled_validators`` but do not release own statements
- Ensure dispute confirmation when BZT statements from disabled
1. Remove Liveness Slashes [#1964](https://github.com/pezkuwichain/pezkuwi-sdk/issues/115)
- Remove liveness slashes from the system
- The are other incentives to be online and they could be abused to attack the system
1. Defense Against Past-Era Dispute Spam (**Node**) [#2225](https://github.com/pezkuwichain/pezkuwi-sdk/issues/119)
- This is needed because runtime cannot disable validators which it no longer knows about
- Add a node-side parallel store of ``disabled_validators``
- Add new disabled validators to node-side store when they loose a dispute in any leaf in scope
- Runtime ``disabled_validators`` always have priority over node-side ``disabled_validators``
- Respect the BZT threshold
> **Note:** \
> An alternative design here was considered where instead of tracking new incoming leaves a relay parent is used.
> This would guarantee determinism as different nodes can see different leaves, but this approach was leaving too
> wide of a window because of Async-Backing. Relay Parent could have been significantly in the past and it would
> give a lot of time for past session disputes to be spammed.
1. Do not block finality for "disabled" disputes [#3358](https://github.com/paritytech/polkadot-sdk/pull/3358)
- Emergency fix to not block finality for disputes initiated only by disabled validators
1. Re-enable small offender when approaching BZT (**Runtime**) #TODO
- When BZT limit is reached and there are more offenders to be disabled re-enable the smallest offenders to disable
the biggest ones
@@ -0,0 +1,102 @@
# PVF Pre-checking Overview
## Motivation
Teyrchains' validation function is described by a wasm module that we refer to as a PVF. Since a PVF is a wasm module
the typical way of executing it is to compile it to machine code.
Typically an optimizing compiler consists of algorithms that are able to optimize the resulting machine code heavily.
However, while those algorithms perform quite well for a typical wasm code produced by standard toolchains (e.g.
rustc/LLVM), those algorithms can be abused to consume a lot of resources. Moreover, since those algorithms are rather
complex there is a lot of room for a bug that can crash the compiler.
If compilation of a Teyrchain Validation Function (PVF) takes too long or uses too much memory, this can leave a node in
limbo as to whether a candidate of that teyrchain is valid or not.
The amount of time that a PVF takes to compile is a subjective resource limit and as such PVFs may be maliciously
crafted so that there is e.g. a 50/50 split of validators which can and cannot compile and execute the PVF.
This has the following implications:
- In backing, inclusion may be slow due to backing groups being unable to execute the block
- In approval checking, there may be many no-shows, leading to slow finality
- In disputes, neither side may reach supermajority. Nobody will get slashed and the chain will not be reverted or
finalized.
As a result of this issue we need a fairly hard guarantee that the PVFs of registered teyrchains/threads can be compiled
within a reasonable amount of time.
## Solution
The problem is solved by having a pre-checking process.
### Pre-checking
Pre-checking mostly consists of attempting to prepare (compile) the PVF WASM blob. We use more strict limits (e.g.
timeouts) here compared to regular preparation for execution. This way errors during preparation later are likely
unrelated to the PVF itself, as it already passed pre-checking. We can treat such errors as local node issues.
We also have an additional step where we attempt to instantiate the WASM runtime without running it. This is unrelated
to preparation so we don't time it, but it does help us catch more issues.
### Protocol
Pre-checking is run when a new validation code is included in the chain. A new PVF can be added in two cases:
- A new teyrchain is registered.
- An existing teyrchain signalled an upgrade of its validation code.
Before any of those operations finish, the PVF pre-checking vote is initiated. The PVF pre-checking vote is identified
by the PVF code hash that is being voted on. If there is already PVF pre-checking process running, then no new PVF
pre-checking vote will be started. Instead, the operation just subscribes to the existing vote.
The pre-checking vote can be concluded either by obtaining a threshold of votes for a decision, or if it expires. The
threshold to accept is a supermajority of 2/3 of validators. We reject once a supermajority is no longer possible.
Each validator checks the list of PVFs available for voting. The vote is binary, i.e. accept or reject a given PVF. As
soon as the threshold of votes are collected for one of the sides of the vote, the voting is concluded in that direction
and the effects of the voting are enacted.
Only validators from the active set can participate in the vote. The set of active validators can change each session.
That's why we reset the votes each session. A voting that observed a certain number of sessions will be rejected.
The effects of the PVF accepting depend on the operations requested it:
1. All onboardings subscribed to the approved PVF pre-checking process will get scheduled and after passing 2 session
boundaries they will be onboarded.
1. All upgrades subscribed to the approved PVF pre-checking process will get scheduled very similarly to the existing
process. Upgrades with pre-checking are really the same process that is just delayed by the time required for
pre-checking voting. In case of instant approval the mechanism is exactly the same.
In case PVF pre-checking process was concluded with rejection, then all the operations that are subscribed to the
rejected PVF pre-checking process will be processed as follows. That is, onboarding or upgrading will be cancelled.
The logic described above is implemented by the [paras] module.
### Subsystem
On the node-side, there is a PVF pre-checking [subsystem][pvf-prechecker-subsystem] that scans the chain for new PVFs
via using [runtime APIs][pvf-runtime-api]. Upon finding a new PVF, the subsystem will initiate a PVF pre-checking
request and wait for the result. Whenever the result is obtained, the subsystem will use the [runtime
API][pvf-runtime-api] to submit a vote for the PVF. The vote is an unsigned transaction. The vote will be distributed
via the gossip similarly to a normal transaction. Eventually a block producer will include the vote into the block where
it will be handled by the [runtime][paras].
## Summary
Teyrchains' validation function is described by a wasm module that we refer to as a PVF.
In order to make the PVF usable for candidate validation it has to be registered on-chain.
As part of the registration process, it has to go through pre-checking. Pre-checking is a game of attempting preparation
and additional checks, and reporting the results back on-chain.
We define preparation as a process that: validates the consistency of the wasm binary (aka prevalidation) and the
compilation of the wasm module into machine code (referred to as an artifact).
Besides pre-checking, preparation can also be triggered by execution, since a compiled artifact is needed for the
execution. If an artifact already exists, execution will skip preparation. If it does do preparation, execution uses a
more lenient timeout than preparation, to avoid the situation where honest validators fail on valid, pre-checked PVFs.
[paras]: runtime/paras.md
[pvf-runtime-api]: runtime-api/pvf-prechecking.md
[pvf-prechecker-subsystem]: node/utility/pvf-prechecker.md
@@ -0,0 +1,78 @@
# Runtime APIs
Runtime APIs are the means by which the node-side code extracts information from the state of the runtime.
Every block in the relay-chain contains a *state root* which is the root hash of a state trie encapsulating all storage
of runtime modules after execution of the block. This is a cryptographic commitment to a unique state. We use the
terminology of accessing the *state at* a block to refer accessing the state referred to by the state root of that
block.
Although Runtime APIs are often used for simple storage access, they are actually empowered to do arbitrary computation.
The implementation of the Runtime APIs lives within the Runtime as Wasm code and exposes `extern` functions that can be
invoked with arguments and have a return value. Runtime APIs have access to a variety of host functions, which are
contextual functions provided by the Wasm execution context, that allow it to carry out many different types of
behaviors.
Abilities provided by host functions includes:
* State Access
* Offchain-DB Access
* Submitting transactions to the transaction queue
* Optimized versions of cryptographic functions
* More
So it is clear that Runtime APIs are a versatile and powerful tool to leverage the state of the chain. In general, we
will use Runtime APIs for these purposes:
* Access of a storage item
* Access of a bundle of related storage items
* Deriving a value from storage based on arguments
* Submitting misbehavior reports
More broadly, we have the goal of using Runtime APIs to write Node-side code that fulfills the requirements set by the
Runtime. In particular, the constraints set forth by the [Scheduler](../runtime/scheduler.md) and
[Inclusion](../runtime/inclusion.md) modules. These modules are responsible for advancing paras with a two-phase
protocol where validators are first chosen to validate and back a candidate and then required to ensure availability of
referenced data. In the second phase, validators are meant to attest to those para-candidates that they have their
availability chunk for. As the Node-side code needs to generate the inputs into these two phases, the runtime API needs
to transmit information from the runtime that is aware of the Availability Cores model instantiated by the Scheduler and
Inclusion modules.
Node-side code is also responsible for detecting and reporting misbehavior performed by other validators, and the set of
Runtime APIs needs to provide methods for observing live disputes and submitting reports as transactions.
The next sections will contain information on specific runtime APIs. The format is this:
```rust
/// Fetch the value of the runtime API at the block.
///
/// Definitionally, the `at` parameter cannot be any block that is not in the chain.
/// Thus the return value is unconditional. However, for in-practice implementations
/// it may be possible to provide an `at` parameter as a hash, which may not refer to a
/// valid block or one which implements the runtime API. In those cases it would be
/// best for the implementation to return an error indicating the failure mode.
fn some_runtime_api(at: Block, arg1: Type1, arg2: Type2, ...) -> ReturnValue;
```
Certain runtime APIs concerning the state of a para require the caller to provide an `OccupiedCoreAssumption`. This
indicates how the result of the runtime API should be computed if there is a candidate from the para occupying an
availability core in the [Inclusion Module](../runtime/inclusion.md).
The choices of assumption are whether the candidate occupying that core should be assumed to have been made available
and included or timed out and discarded, along with a third option to assert that the core was not occupied. This choice
affects everything from the parent head-data, the validation code, and the state of message-queues. Typically, users
will take the assumption that either the core was free or that the occupying candidate was included, as timeouts are
expected only in adversarial circumstances and even so, only in a small minority of blocks directly following validator
set rotations.
```rust
/// An assumption being made about the state of an occupied core.
enum OccupiedCoreAssumption {
/// The candidate occupying the core was made available and included to free the core.
Included,
/// The candidate occupying the core timed out and freed the core without advancing the para.
TimedOut,
/// The core was not occupied to begin with.
Free,
}
```
@@ -0,0 +1,71 @@
# Availability Cores
Yields information on all availability cores. Cores are either free or occupied. Free cores can have paras assigned to
them. Occupied cores don't, but they can become available part-way through a block due to bitfields and then have
something scheduled on them. To allow optimistic validation of candidates, the occupied cores are accompanied by
information on what is upcoming. This information can be leveraged when validators perceive that there is a high
likelihood of a core becoming available based on bitfields seen, and then optimistically validate something that would
become scheduled based on that, although there is no guarantee on what the block producer will actually include in the
block.
See also the [Scheduler Module](../runtime/scheduler.md) for a high-level description of what an availability core is
and why it exists.
```rust
fn availability_cores(at: Block) -> Vec<CoreState>;
```
This is all the information that a validator needs about scheduling for the current block. It includes all information
on [Scheduler](../runtime/scheduler.md) core-assignments and [Inclusion](../runtime/inclusion.md) state of blocks
occupying availability cores. It includes data necessary to determine not only which paras are assigned now, but which
cores are likely to become freed after processing bitfields, and exactly which bitfields would be necessary to make them
so. The implementation of this runtime API should invoke `Scheduler::clear` and `Scheduler::schedule(Vec::new(),
current_block_number + 1)` to ensure that scheduling is accurate.
```rust
struct OccupiedCore {
// NOTE: this has no ParaId as it can be deduced from the candidate descriptor.
/// If this core is freed by availability, this is the assignment that is next up on this
/// core, if any. None if there is nothing queued for this core.
next_up_on_available: Option<ScheduledCore>,
/// The relay-chain block number this began occupying the core at.
occupied_since: BlockNumber,
/// The relay-chain block this will time-out at, if any.
time_out_at: BlockNumber,
/// If this core is freed by being timed-out, this is the assignment that is next up on this
/// core. None if there is nothing queued for this core or there is no possibility of timing
/// out.
next_up_on_time_out: Option<ScheduledCore>,
/// A bitfield with 1 bit for each validator in the set. `1` bits mean that the corresponding
/// validators has attested to availability on-chain. A 2/3+ majority of `1` bits means that
/// this will be available.
availability: Bitfield,
/// The group assigned to distribute availability pieces of this candidate.
group_responsible: GroupIndex,
/// The hash of the candidate occupying the core.
candidate_hash: CandidateHash,
/// The descriptor of the candidate occupying the core.
candidate_descriptor: CandidateDescriptor,
}
struct ScheduledCore {
/// The ID of a para scheduled.
para_id: ParaId,
/// The collator required to author the block, if any.
collator: Option<CollatorId>,
}
enum CoreState {
/// The core is currently occupied.
Occupied(OccupiedCore),
/// The core is currently free, with a para scheduled and given the opportunity
/// to occupy.
///
/// If a particular Collator is required to author this block, that is also present in this
/// variant.
Scheduled(ScheduledCore),
/// The core is currently free and there is nothing scheduled. This can be the case for on-demand
/// cores when there are no on-demand teyrchain blocks queued. Leased cores will never be left idle.
Free,
}
```
@@ -0,0 +1,16 @@
# Candidate Events
Yields a vector of events concerning candidates that occurred within the given block.
```rust
enum CandidateEvent {
/// This candidate receipt was backed in the most recent block.
CandidateBacked(CandidateReceipt, HeadData, CoreIndex, GroupIndex),
/// This candidate receipt was included and became a parablock at the most recent block.
CandidateIncluded(CandidateReceipt, HeadData, CoreIndex, GroupIndex),
/// This candidate receipt was not made available in time and timed out.
CandidateTimedOut(CandidateReceipt, HeadData, CoreIndex),
}
fn candidate_events(at: Block) -> Vec<CandidateEvent>;
```
@@ -0,0 +1,11 @@
# Candidate Pending Availability
Get the receipt of a candidate pending availability. This returns `Some` for any paras assigned to occupied cores in
`availability_cores` and `None` otherwise.
```rust
// Deprecated.
fn candidate_pending_availability(at: Block, ParaId) -> Option<CommittedCandidateReceipt>;
// Use this one
fn candidates_pending_availability(at: Block, ParaId) -> Vec<CommittedCandidateReceipt>;
```
@@ -0,0 +1,8 @@
# Candidates Included
This runtime API is for checking which candidates have been included within the chain, locally.
```rust
/// Input and output have the same length.
fn candidates_included(Vec<(SessionIndex, CandidateHash)>) -> Vec<bool>;
```
@@ -0,0 +1,27 @@
# Disputes Info
Get information about all disputes known by the chain as well as information about which validators the disputes
subsystem will accept disputes from. These disputes may be either live or concluded. The
[`DisputeState`](../types/disputes.md#disputestate) can be used to determine whether the dispute still accepts votes, as
well as which validators' votes may be included.
```rust
struct Dispute {
session: SessionIndex,
candidate: CandidateHash,
dispute_state: DisputeState,
local: bool,
}
struct SpamSlotsInfo {
max_spam_slots: u32,
session_spam_slots: Vec<(SessionIndex, Vec<u32>)>,
}
struct DisputesInfo {
disputes: Vec<Dispute>,
spam_slots: SpamSlotsInfo,
}
fn disputes_info() -> DisputesInfo;
```
@@ -0,0 +1,13 @@
# Persisted Validation Data
Yields the [`PersistedValidationData`](../types/candidate.md#persistedvalidationdata) for the given
[`ParaId`](../types/candidate.md#paraid) along with an assumption that should be used if the para currently occupies a
core:
```rust
/// Returns the persisted validation data for the given para and occupied core assumption.
///
/// Returns `None` if either the para is not registered or the assumption is `Freed`
/// and the para already occupies a core.
fn persisted_validation_data(at: Block, ParaId, OccupiedCoreAssumption) -> Option<PersistedValidationData>;
```
@@ -0,0 +1,22 @@
# PVF Pre-checking
> ⚠️ This runtime API was added in v2.
There are two main runtime APIs to work with PVF pre-checking.
The first runtime API is designed to fetch all PVFs that require pre-checking voting. The PVFs are identified by their
code hashes. As soon as the PVF gains required support, the runtime API will not return the PVF anymore.
```rust
fn pvfs_require_precheck() -> Vec<ValidationCodeHash>;
```
The second runtime API is needed to submit the judgement for a PVF, whether it is approved or not. The voting process
uses unsigned transactions. The [`PvfCheckStatement`](../types/pvf-prechecking.md) is circulated through the network via
gossip similar to a normal transaction. At some point the validator will include the statement in the block, where it
will be processed by the runtime. If that was the last vote before gaining the super-majority, this PVF will not be
returned by `pvfs_require_precheck` anymore.
```rust
fn submit_pvf_check_statement(stmt: PvfCheckStatement, signature: ValidatorSignature);
```
@@ -0,0 +1,13 @@
# Session Index
Get the session index that is expected at the child of a block.
In the [`Initializer`](../runtime/initializer.md) module, session changes are buffered by one block. The session index
of the child of any relay block is always predictable by that block's state.
This session index can be used to derive a [`SigningContext`](../types/candidate.md#signing-context).
```rust
/// Returns the session index expected at a child of the block.
fn session_index_for_child(at: Block) -> SessionIndex;
```
@@ -0,0 +1,21 @@
# Validation Code
Fetch the validation code used by a para, making the given `OccupiedCoreAssumption`.
```rust
fn validation_code(at: Block, ParaId, OccupiedCoreAssumption) -> Option<ValidationCode>;
```
Fetch the validation code (past, present or future) by its hash.
```rust
fn validation_code_by_hash(at: Block, ValidationCodeHash) -> Option<ValidationCode>;
```
Fetch the validation code hash used by a para, making the given `OccupiedCoreAssumption`.
> ⚠️ This API was introduced in `TeyrchainHost` v2.
```rust
fn validation_code_hash(at: Block, ParaId, OccupiedCoreAssumption) -> Option<ValidationCodeHash>;
```
@@ -0,0 +1,32 @@
# Validator Groups
Yields the validator groups used during the current session. The validators in the groups are referred to by their index
into the validator-set and this is assumed to be as-of the child of the block whose state is being queried.
```rust
/// A helper data-type for tracking validator-group rotations.
struct GroupRotationInfo {
session_start_block: BlockNumber,
group_rotation_frequency: BlockNumber,
now: BlockNumber, // The successor of the block in whose state this runtime API is queried.
}
impl GroupRotationInfo {
/// Returns the index of the group needed to validate the core at the given index,
/// assuming the given amount of cores/groups.
fn group_for_core(&self, core_index, cores) -> GroupIndex;
/// Returns the block number of the next rotation after the current block. If the current block
/// is 10 and the rotation frequency is 5, this should return 15.
fn next_rotation_at(&self) -> BlockNumber;
/// Returns the block number of the last rotation before or including the current block. If the
/// current block is 10 and the rotation frequency is 5, this should return 10.
fn last_rotation_at(&self) -> BlockNumber;
}
/// Returns the validator groups and rotation info localized based on the block whose state
/// this is invoked on. Note that `now` in the `GroupRotationInfo` should be the successor of
/// the number of the block.
fn validator_groups(at: Block) -> (Vec<Vec<ValidatorIndex>>, GroupRotationInfo);
```
@@ -0,0 +1,8 @@
# Validators
Yields the validator-set at the state of a given block. This validator set is always the one responsible for backing
teyrchains in the child of the provided block.
```rust
fn validators(at: Block) -> Vec<ValidatorId>;
```
@@ -0,0 +1,99 @@
# Runtime Architecture
It's clear that we want to separate different aspects of the runtime logic into different modules. Modules define their
own storage, routines, and entry-points. They also define initialization and finalization logic.
Due to the (lack of) guarantees provided by a particular blockchain-runtime framework, there is no defined or dependable
order in which modules' initialization or finalization logic will run. Supporting this blockchain-runtime framework is
important enough to include that same uncertainty in our model of runtime modules in this guide. Furthermore,
initialization logic of modules can trigger the entry-points or routines of other modules. This is one architectural
pressure against dividing the runtime logic into multiple modules. However, in this case the benefits of splitting
things up outweigh the costs, provided that we take certain precautions against initialization and entry-point races.
We also expect, although it's beyond the scope of this guide, that these runtime modules will exist alongside various
other modules. This has two facets to consider. First, even if the modules that we describe here don't invoke each
others' entry points or routines during initialization, we still have to protect against those other modules doing that.
Second, some of those modules are expected to provide governance capabilities for the chain. Configuration exposed by
teyrchain-host modules is mostly for the benefit of these governance modules, to allow the operators or community of the
chain to tweak parameters.
The runtime's primary role is to manage scheduling and updating of teyrchains, as well as handling misbehavior reports
and slashing. This guide doesn't focus on how teyrchains are registered, only that they are. Also, this runtime
description assumes that validator sets are selected somehow, but doesn't assume any other details than a periodic
_session change_ event. Session changes give information about the incoming validator set and the validator set of the
following session.
The runtime also serves another role, which is to make data available to the Node-side logic via Runtime APIs. These
Runtime APIs should be sufficient for the Node-side code to author blocks correctly.
There is some functionality of the relay chain relating to teyrchains that we also consider beyond the scope of this
document. In particular, all modules related to how teyrchains are registered aren't part of this guide, although we do
provide routines that should be called by the registration process.
We will split the logic of the runtime up into these modules:
- Initializer: manages initialization order of the other modules.
- Shared: manages shared storage and configurations for other modules.
- Configuration: manages configuration and configuration updates in a non-racy manner.
- Paras: manages chain-head and validation code for teyrchains.
- Scheduler: manages teyrchain scheduling as well as validator assignments.
- Inclusion: handles the inclusion and availability of scheduled teyrchains.
- SessionInfo: manages various session keys of validators and other params stored per session.
- Disputes: handles dispute resolution for included, available parablocks.
- Slashing: handles slashing logic for concluded disputes.
- HRMP: handles horizontal messages between paras.
- UMP: handles upward messages from a para to the relay chain.
- DMP: handles downward messages from the relay chain to the para.
The [Initializer module](initializer.md) is special - it's responsible for handling the initialization logic of the
other modules to ensure that the correct initialization order and related invariants are maintained. The other modules
won't specify a on-initialize logic, but will instead expose a special semi-private routine that the initialization
module will call. The other modules are relatively straightforward and perform the roles described above.
The Teyrchain Host operates under a changing set of validators. Time is split up into periodic sessions, where each
session brings a potentially new set of validators. Sessions are buffered by one, meaning that the validators of the
upcoming session `n+1` are determined at the end of session `n-1`, right before session `n` starts. Teyrchain Host
runtime modules need to react to changes in the validator set, as it will affect the runtime logic for processing
candidate backing, availability bitfields, and misbehavior reports. The Teyrchain Host modules can't determine
ahead-of-time exactly when session change notifications are going to happen within the block (note: this depends on
module initialization order again - better to put session before teyrchains modules).
The relay chain is intended to use BABE or SASSAFRAS, which both have the property that a session changing at a block is
determined not by the number of the block but instead by the time the block is authored. In some sense, sessions change
in-between blocks, not at blocks. This has the side effect that the session of a child block cannot be determined solely
by the parent block's identifier. Being able to unilaterally determine the validator-set at a specific block based on
its parent hash would make a lot of Node-side logic much simpler.
In order to regain the property that the validator set of a block is predictable by its parent block, we delay session
changes' application to Teyrchains by 1 block. This means that if there is a session change at block X, that session
change will be stored and applied during initialization of direct descendants of X. This principal side effect of this
change is that the Teyrchains runtime can disagree with session or consensus modules about which session it currently
is. Misbehavior reporting routines in particular will be affected by this, although not severely. The teyrchains runtime
might believe it is the last block of the session while the system is really in the first block of the next session. In
such cases, a historical validator-set membership proof will need to accompany any misbehavior report, although they
typically do not need to during current-session misbehavior reports.
So the other role of the initializer module is to forward session change notifications to modules in the initialization
order. Session change is also the point at which the [Configuration Module](configuration.md) updates the configuration.
Most of the other modules will handle changes in the configuration during their session change operation, so the
initializer should provide both the old and new configuration to all the other modules alongside the session change
notification. This means that a session change notification should consist of the following data:
```rust
struct SessionChangeNotification {
// The new validators in the session.
validators: Vec<ValidatorId>,
// The validators for the next session.
queued: Vec<ValidatorId>,
// The configuration before handling the session change.
prev_config: HostConfiguration,
// The configuration after handling the session change.
new_config: HostConfiguration,
// A secure random seed for the session, gathered from BABE.
random_seed: [u8; 32],
// The session index of the beginning session.
session_index: SessionIndex,
}
```
> TODO Diagram: order of runtime operations (initialization, session change)
@@ -0,0 +1,72 @@
# Configuration Pallet
This module is responsible for managing all configuration of the teyrchain host in-flight. It provides a central point
for configuration updates to prevent races between configuration changes and teyrchain-processing logic. Configuration
can only change during the session change routine, and as this module handles the session change notification first it
provides an invariant that the configuration does not change throughout the entire session. Both the
[scheduler](scheduler.md) and [inclusion](inclusion.md) modules rely on this invariant to ensure proper behavior of the
scheduler.
The configuration that we will be tracking is the [`HostConfiguration`](../types/runtime.md#host-configuration) struct.
## Storage
The configuration module is responsible for two main pieces of storage.
```rust
/// The current configuration to be used.
Configuration: HostConfiguration;
/// A pending configuration to be applied on session change.
PendingConfigs: Vec<(SessionIndex, HostConfiguration)>;
/// A flag that says if the consistency checks should be omitted.
BypassConsistencyCheck: bool;
```
## Session change
The session change routine works as follows:
- If there is no pending configurations, then return early.
- Take all pending configurations that are less than or equal to the current session index.
- Get the pending configuration with the highest session index and apply it to the current configuration. Discard the
earlier ones if any.
## Routines
```rust
enum InconsistentError {
// ...
}
impl HostConfiguration {
fn check_consistency(&self) -> Result<(), InconsistentError> { /* ... */ }
}
/// Get the host configuration.
pub fn configuration() -> HostConfiguration {
Configuration::get()
}
/// Schedules updating the host configuration. The update is given by the `updater` closure. The
/// closure takes the current version of the configuration and returns the new version.
/// Returns an `Err` if the closure returns a broken configuration. However, there are a couple of
/// exceptions:
///
/// - if the configuration that was passed in the closure is already broken, then it will pass the
/// update: you cannot break something that is already broken.
/// - If the `BypassConsistencyCheck` flag is set, then the checks will be skipped.
///
/// The changes made by this function will always be scheduled at session X, where X is the current session index + 2.
/// If there is already a pending update for X, then the closure will receive the already pending configuration for
/// session X.
///
/// If there is already a pending update for the current session index + 1, then it won't be touched. Otherwise,
/// that would violate the promise of this function that changes will be applied on the second session change (cur + 2).
fn schedule_config_update(updater: impl FnOnce(&mut HostConfiguration<BlockNumberFor<T>>)) -> DispatchResult
```
## Entry-points
The Configuration module exposes an entry point for each configuration member. These entry-points accept calls only from
governance origins. These entry-points will use the `update_configuration` routine to update the specific configuration
field.
@@ -0,0 +1,167 @@
# Disputes Pallet
After a backed candidate is made available, it is included and proceeds into an acceptance period during which
validators are randomly selected to do (secondary) approval checks of the parablock. Any reports disputing the validity
of the candidate will cause escalation, where even more validators are requested to check the block, and so on, until
either the parablock is determined to be invalid or valid. Those on the wrong side of the dispute are slashed and, if
the parablock is deemed invalid, the relay chain is rolled back to a point before that block was included.
However, this isn't the end of the story. We are working in a forkful blockchain environment, which carries three
important considerations:
1. For security, validators that misbehave shouldn't only be slashed on one fork, but on all possible forks. Validators
that misbehave shouldn't be able to create a new fork of the chain when caught and get away with their misbehavior.
1. It is possible (and likely) that the parablock being contested has not appeared on all forks.
1. If a block author believes that there is a disputed parablock on a specific fork that will resolve to a reversion of
the fork, that block author has more incentive to build on a different fork which does not include that parablock.
This means that in all likelihood, there is the possibility of disputes that are started on one fork of the relay chain,
and as soon as the dispute resolution process starts to indicate that the parablock is indeed invalid, that fork of the
relay chain will be abandoned and the dispute will never be fully resolved on that chain.
Even if this doesn't happen, there is the possibility that there are two disputes underway, and one resolves leading to
a reversion of the chain before the other has concluded. In this case we want to both transplant the concluded dispute
onto other forks of the chain as well as the unconcluded dispute.
We account for these requirements by having the disputes module handle two kinds of disputes.
1. Local disputes: those contesting the validity of the current fork by disputing a parablock included within it.
1. Remote disputes: a dispute that has partially or fully resolved on another fork which is transplanted to the local
fork for completion and eventual slashing.
When a local dispute concludes negatively, the chain needs to be abandoned and reverted back to a block where the state
does not contain the bad parablock. We expect that due to the [Approval Checking Protocol](../protocol-approval.md), the
current executing block should not be finalized. So we do two things when a local dispute concludes negatively:
1. Freeze the state of teyrchains so nothing further is backed or included.
1. Issue a digest in the header of the block that signals to nodes that this branch of the chain is to be abandoned.
If, as is expected, the chain is unfinalized, the freeze will have no effect as no honest validator will attempt to
build on the frozen chain. However, if the approval checking protocol has failed and the bad parablock is finalized, the
freeze serves to put the chain into a governance-only mode.
The storage of this module is designed around tracking [`DisputeState`s](../types/disputes.md#disputestate), updating
them with votes, and tracking blocks included by this branch of the relay chain. It also contains a `Frozen` parameter
designed to freeze the state of all teyrchains.
## Storage
Storage Layout:
```rust
LastPrunedSession: Option<SessionIndex>,
// All ongoing or concluded disputes for the last several sessions.
Disputes: double_map (SessionIndex, CandidateHash) -> Option<DisputeState>,
// All included blocks on the chain, as well as the block number in this chain that
// should be reverted back to if the candidate is disputed and determined to be invalid.
Included: double_map (SessionIndex, CandidateHash) -> Option<BlockNumber>,
// Whether the chain is frozen or not. Starts as `None`. When this is `Some`,
// the chain will not accept any new teyrchain blocks for backing or inclusion,
// and its value indicates the last valid block number in the chain.
// It can only be set back to `None` by governance intervention.
Frozen: Option<BlockNumber>,
```
> `byzantine_threshold` refers to the maximum number `f` of validators which may be byzantine. The total number of
> validators is `n = 3f + e` where `e in { 1, 2, 3 }`.
## Session Change
1. If the current session is not greater than `config.dispute_period + 1`, nothing to do here.
1. Set `pruning_target = current_session - config.dispute_period - 1`. We add the extra `1` because we want to keep
things for `config.dispute_period` _full_ sessions. The stuff at the end of the most recent session has been around
for a little over 0 sessions, not a little over 1.
1. If `LastPrunedSession` is `None`, then set `LastPrunedSession` to `Some(pruning_target)` and return.
1. Otherwise, clear out all disputes and included candidates entries in the range `last_pruned..=pruning_target` and set
`LastPrunedSession` to `Some(pruning_target)`.
## Block Initialization
This is currently a `no op`.
## Routines
* `filter_multi_dispute_data(MultiDisputeStatementSet) -> MultiDisputeStatementSet`:
1. Takes a `MultiDisputeStatementSet` and filters it down to a `MultiDisputeStatementSet` that satisfies all the
criteria of `provide_multi_dispute_data`. That is, eliminating ancient votes, duplicates and unconfirmed disputes.
This can be used by block authors to create the final submission in a block which is guaranteed to pass the
`provide_multi_dispute_data` checks.
* `provide_multi_dispute_data(MultiDisputeStatementSet) -> Vec<(SessionIndex, Hash)>`:
1. Pass on each dispute statement set to `provide_dispute_data`, propagating failure.
2. Return a list of all candidates who just had disputes initiated.
* `provide_dispute_data(DisputeStatementSet) -> bool`: Provide data to an ongoing dispute or initiate a dispute.
1. All statements must be issued under the correct session for the correct candidate.
1. `SessionInfo` is used to check statement signatures and this function should fail if any signatures are invalid.
1. If there is no dispute under `Disputes`, create a new `DisputeState` with blank bitfields.
1. If `concluded_at` is `Some`, and is `concluded_at + config.post_conclusion_acceptance_period < now`, return false.
1. Import all statements into the dispute. This should fail if any statements are duplicate or if the corresponding
bit for the corresponding validator is set in the dispute already.
1. If `concluded_at` is `None`, reward all statements.
1. If `concluded_at` is `Some`, reward all statements slightly less.
1. If either side now has supermajority and did not previously, slash the other side. This may be both sides, and we
support this possibility in code, but note that this requires validators to participate on both sides which has
negative expected value. Set `concluded_at` to `Some(now)` if it was `None`.
1. If just concluded against the candidate and the `Included` map contains `(session, candidate)`: invoke
`revert_and_freeze` with the stored block number.
1. Return true if just initiated, false otherwise.
* `disputes() -> Vec<(SessionIndex, CandidateHash, DisputeState)>`: Get a list of all disputes and info about dispute
state.
1. Iterate over all disputes in `Disputes` and collect into a vector.
* `note_included(SessionIndex, CandidateHash, included_in: BlockNumber)`:
1. Add `(SessionIndex, CandidateHash)` to the `Included` map with `included_in - 1` as the value.
1. If there is a dispute under `(SessionIndex, CandidateHash)` that has concluded against the candidate, invoke
`revert_and_freeze` with the stored block number.
* `concluded_invalid(SessionIndex, CandidateHash) -> bool`: Returns whether a candidate has already concluded a dispute
in the negative.
* `is_frozen()`: Load the value of `Frozen` from storage. Return true if `Some` and false if `None`.
* `revert_and_freeze(BlockNumber)`:
1. If `is_frozen()` return.
1. Set `Frozen` to `Some(BlockNumber)` to indicate a rollback to the block number.
1. Issue a `Revert(BlockNumber + 1)` log to indicate a rollback of the block's child in the header chain, which is the
same as a rollback to the block number.
# Disputes filtering
All disputes delivered to the runtime by the client are filtered before the actual import. In this context actual import
means persisted in the runtime storage. The filtering has got two purposes:
* Limit the amount of data saved onchain.
* Prevent persisting malicious dispute data onchain.
*Implementation note*: Filtering is performed in function `filter_dispute_data` from `Disputes` pallet.
The filtering is performed on the whole statement set which is about to be imported onchain. The following filters are
applied:
1. Remove ancient disputes - if a dispute is concluded before the block number indicated in `OLDEST_ACCEPTED` parameter
it is removed from the set. `OLDEST_ACCEPTED` is a runtime configuration option. *Implementation note*:
`dispute_post_conclusion_acceptance_period` from `HostConfiguration` is used in the current Pezkuwi/Kusama
implementation.
2. Remove votes from unknown validators. If there is a vote from a validator which wasn't an authority in the session
where the dispute was raised - they are removed. Please note that this step removes only single votes instead of
removing the whole dispute.
3. Remove one sided disputes - if a dispute doesn't contain two opposing votes it is not imported onchain. This serves
as a measure not to import one sided disputes. A dispute is raised only if there are two opposing votes so if the
client is not sending them the dispute is a potential spam.
4. Remove unconfirmed disputes - if a dispute contains less votes than the byzantine threshold it is removed. This is
also a spam precaution. A legitimate client will send only confirmed disputes to the runtime.
# Rewards and slashing
After the disputes are filtered the validators participating in the disputes are rewarded and more importantly the
offenders are slashed. Generally there can be two types of punishments:
* "against valid" - the offender claimed that a valid candidate is invalid.
* "for invalid" - the offender claimed that an invalid candidate is valid.
A dispute might be inconclusive. This means that it has timed out without being confirmed. A confirmed dispute is one
containing votes more than the byzantine threshold (1/3 of the active validators). Validators participating in
inconclusive disputes are not slashed. Thanks to the applied filtering (described in the previous section) one can be
confident that there are no spam disputes in the runtime. So if a validator is not voting it is due to another reason
(e.g. being under DoS attack). There is no reason to punish such validators with a slash.
*Implementation note*: Slashing is performed in `process_checked_dispute_data` from `Disputes` pallet.
@@ -0,0 +1,52 @@
# DMP Pallet
A module responsible for Downward Message Processing (DMP). See [Messaging Overview](../messaging.md) for more details.
## Storage
Storage layout required for implementation of DMP.
```rust
/// The downward messages addressed for a certain para.
DownwardMessageQueues: map ParaId => Vec<InboundDownwardMessage>;
/// A mapping that stores the downward message queue MQC head for each para.
///
/// Each link in this chain has a form:
/// `(prev_head, B, H(M))`, where
/// - `prev_head`: is the previous head hash or zero if none.
/// - `B`: is the relay-chain block number in which a message was appended.
/// - `H(M)`: is the hash of the message being appended.
DownwardMessageQueueHeads: map ParaId => Hash;
```
## Initialization
No initialization routine runs for this module.
## Routines
Candidate Acceptance Function:
* `check_processed_downward_messages(P: ParaId, relay_parent_number: BlockNumber, processed_downward_messages: u32)`:
1. Checks that `processed_downward_messages` is at least 1 if `DownwardMessageQueues` for `P` is not empty at the
given `relay_parent_number`.
1. Checks that `DownwardMessageQueues` for `P` is at least `processed_downward_messages` long.
Candidate Enactment:
* `prune_dmq(P: ParaId, processed_downward_messages: u32)`:
1. Remove the first `processed_downward_messages` from the `DownwardMessageQueues` of `P`.
Utility routines.
`queue_downward_message(P: ParaId, M: DownwardMessage)`: 1. Check if the size of `M` exceeds the
`config.max_downward_message_size`. If so, return an error. 1. Wrap `M` into `InboundDownwardMessage` using the
current block number for `sent_at`. 1. Obtain a new MQC link for the resulting `InboundDownwardMessage` and replace
`DownwardMessageQueueHeads` for `P` with the resulting hash. 1. Add the resulting `InboundDownwardMessage` into
`DownwardMessageQueues` for `P`.
## Session Change
1. For each `P` in `outgoing_paras` (generated by `Paras::on_new_session`):
1. Remove all `DownwardMessageQueues` of `P`.
1. Remove `DownwardMessageQueueHeads` for `P`.
@@ -0,0 +1,278 @@
# HRMP Pallet
A module responsible for Horizontally Relay-routed Message Passing (HRMP). See [Messaging Overview](../messaging.md) for
more details.
## Storage
HRMP related structs:
```rust
/// A description of a request to open an HRMP channel.
struct HrmpOpenChannelRequest {
/// Indicates if this request was confirmed by the recipient.
confirmed: bool,
/// The amount that the sender supplied at the time of creation of this request.
sender_deposit: Balance,
/// The maximum message size that could be put into the channel.
max_message_size: u32,
/// The maximum number of messages that can be pending in the channel at once.
max_capacity: u32,
/// The maximum total size of the messages that can be pending in the channel at once.
max_total_size: u32,
}
/// A metadata of an HRMP channel.
struct HrmpChannel {
/// The amount that the sender supplied as a deposit when opening this channel.
sender_deposit: Balance,
/// The amount that the recipient supplied as a deposit when accepting opening this channel.
recipient_deposit: Balance,
/// The maximum number of messages that can be pending in the channel at once.
max_capacity: u32,
/// The maximum total size of the messages that can be pending in the channel at once.
max_total_size: u32,
/// The maximum message size that could be put into the channel.
max_message_size: u32,
/// The current number of messages pending in the channel.
/// Invariant: should be less or equal to `max_capacity`.
msg_count: u32,
/// The total size in bytes of all message payloads in the channel.
/// Invariant: should be less or equal to `max_total_size`.
total_size: u32,
/// A head of the Message Queue Chain for this channel. Each link in this chain has a form:
/// `(prev_head, B, H(M))`, where
/// - `prev_head`: is the previous value of `mqc_head` or zero if none.
/// - `B`: is the [relay-chain] block number in which a message was appended
/// - `H(M)`: is the hash of the message being appended.
/// This value is initialized to a special value that consists of all zeroes which indicates
/// that no messages were previously added.
mqc_head: Option<Hash>,
}
```
HRMP related storage layout
```rust
/// The set of pending HRMP open channel requests.
///
/// The set is accompanied by a list for iteration.
///
/// Invariant:
/// - There are no channels that exists in list but not in the set and vice versa.
HrmpOpenChannelRequests: map HrmpChannelId => Option<HrmpOpenChannelRequest>;
HrmpOpenChannelRequestsList: Vec<HrmpChannelId>;
/// This mapping tracks how many open channel requests are initiated by a given sender para.
/// Invariant: `HrmpOpenChannelRequests` should contain the same number of items that has `(X, _)`
/// as the number of `HrmpOpenChannelRequestCount` for `X`.
HrmpOpenChannelRequestCount: map ParaId => u32;
/// This mapping tracks how many open channel requests were accepted by a given recipient para.
/// Invariant: `HrmpOpenChannelRequests` should contain the same number of items `(_, X)` with
/// `confirmed` set to true, as the number of `HrmpAcceptedChannelRequestCount` for `X`.
HrmpAcceptedChannelRequestCount: map ParaId => u32;
/// A set of pending HRMP close channel requests that are going to be closed during the session change.
/// Used for checking if a given channel is registered for closure.
///
/// The set is accompanied by a list for iteration.
///
/// Invariant:
/// - There are no channels that exists in list but not in the set and vice versa.
HrmpCloseChannelRequests: map HrmpChannelId => Option<()>;
HrmpCloseChannelRequestsList: Vec<HrmpChannelId>;
/// The HRMP watermark associated with each para.
/// Invariant:
/// - each para `P` used here as a key should satisfy `Paras::is_valid_para(P)` within a session.
HrmpWatermarks: map ParaId => Option<BlockNumber>;
/// HRMP channel data associated with each para.
/// Invariant:
/// - each participant in the channel should satisfy `Paras::is_valid_para(P)` within a session.
HrmpChannels: map HrmpChannelId => Option<HrmpChannel>;
/// Ingress/egress indexes allow to find all the senders and receivers given the opposite
/// side. I.e.
///
/// (a) ingress index allows to find all the senders for a given recipient.
/// (b) egress index allows to find all the recipients for a given sender.
///
/// Invariants:
/// - for each ingress index entry for `P` each item `I` in the index should present in `HrmpChannels`
/// as `(I, P)`.
/// - for each egress index entry for `P` each item `E` in the index should present in `HrmpChannels`
/// as `(P, E)`.
/// - there should be no other dangling channels in `HrmpChannels`.
/// - the vectors are sorted.
HrmpIngressChannelsIndex: map ParaId => Vec<ParaId>;
HrmpEgressChannelsIndex: map ParaId => Vec<ParaId>;
/// Storage for the messages for each channel.
/// Invariant: cannot be non-empty if the corresponding channel in `HrmpChannels` is `None`.
HrmpChannelContents: map HrmpChannelId => Vec<InboundHrmpMessage>;
/// Maintains a mapping that can be used to answer the question:
/// What paras sent a message at the given block number for a given receiver.
/// Invariants:
/// - The inner `Vec<ParaId>` is never empty.
/// - The inner `Vec<ParaId>` cannot store two same `ParaId`.
/// - The outer vector is sorted ascending by block number and cannot store two items with the same
/// block number.
HrmpChannelDigests: map ParaId => Vec<(BlockNumber, Vec<ParaId>)>;
```
## Initialization
No initialization routine runs for this module.
## Routines
Candidate Acceptance Function:
* `check_hrmp_watermark(P: ParaId, new_hrmp_watermark)`:
1. `new_hrmp_watermark` should be strictly greater than the value of `HrmpWatermarks` for `P` (if any).
1. `new_hrmp_watermark` must not be greater than the context's block number.
1. `new_hrmp_watermark` should be either
1. equal to the context's block number
1. or in `HrmpChannelDigests` for `P` an entry with the block number should exist
* `check_outbound_hrmp(sender: ParaId, Vec<OutboundHrmpMessage>)`:
1. Checks that there are at most `config.hrmp_max_message_num_per_candidate` messages.
1. Checks that horizontal messages are sorted by ascending recipient ParaId and there is no two horizontal messages
have the same recipient.
1. For each horizontal message `M` with the channel `C` identified by `(sender, M.recipient)` check:
1. exists
1. `M`'s payload size doesn't exceed a preconfigured limit `C.max_message_size`
1. `M`'s payload size summed with the `C.total_size` doesn't exceed a preconfigured limit `C.max_total_size`.
1. `C.msg_count + 1` doesn't exceed a preconfigured limit `C.max_capacity`.
Candidate Enactment:
* `queue_outbound_hrmp(sender: ParaId, Vec<OutboundHrmpMessage>)`:
1. For each horizontal message `HM` with the channel `C` identified by `(sender, HM.recipient)`:
1. Append `HM` into `HrmpChannelContents` that corresponds to `C` with `sent_at` equals to the current block
number.
1. Locate or create an entry in `HrmpChannelDigests` for `HM.recipient` and append `sender` into the entry's
list.
1. Increment `C.msg_count`
1. Increment `C.total_size` by `HM`'s payload size
1. Append a new link to the MQC and save the new head in `C.mqc_head`. Note that the current block number as of
enactment is used for the link.
* `prune_hrmp(recipient, new_hrmp_watermark)`:
1. From `HrmpChannelDigests` for `recipient` remove all entries up to an entry with block number equal to
`new_hrmp_watermark`.
1. From the removed digests construct a set of paras that sent new messages within the interval between the old and
new watermarks.
1. For each channel `C` identified by `(sender, recipient)` for each `sender` coming from the set, prune messages up
to the `new_hrmp_watermark`.
1. For each pruned message `M` from channel `C`:
1. Decrement `C.msg_count`
1. Decrement `C.total_size` by `M`'s payload size.
1. Set `HrmpWatermarks` for `P` to be equal to `new_hrmp_watermark`
> NOTE: That collecting digests can be inefficient and the time it takes grows very fast. Thanks to the aggressive
> parameterization this shouldn't be a big of a deal. If that becomes a problem consider introducing an extra
> dictionary which says at what block the given sender sent a message to the recipient.
## Entry-points
The following entry-points are meant to be used for HRMP channel management.
Those entry-points are meant to be called from a teyrchain. `origin` is defined as the `ParaId` of the teyrchain
executed the message.
* `hrmp_init_open_channel(recipient, proposed_max_capacity, proposed_max_message_size)`:
1. Check that the `origin` is not `recipient`.
1. Check that `proposed_max_capacity` is less or equal to `config.hrmp_channel_max_capacity` and greater than zero.
1. Check that `proposed_max_message_size` is less or equal to `config.hrmp_channel_max_message_size` and greater
than zero.
1. Check that `recipient` is a valid para.
1. Check that there is no existing channel for `(origin, recipient)` in `HrmpChannels`.
1. Check that there is no existing open channel request (`origin`, `recipient`) in `HrmpOpenChannelRequests`.
1. Check that the sum of the number of already opened HRMP channels by the `origin` (the size of the set found
`HrmpEgressChannelsIndex` for `origin`) and the number of open requests by the `origin` (the value from
`HrmpOpenChannelRequestCount` for `origin`) doesn't exceed the limit of channels
(`config.hrmp_max_teyrchain_outbound_channels` or `config.hrmp_max_parathread_outbound_channels`) minus 1.
1. Check that `origin`'s balance is more or equal to `config.hrmp_sender_deposit`
1. Reserve the deposit for the `origin` according to `config.hrmp_sender_deposit`
1. Increase `HrmpOpenChannelRequestCount` by 1 for `origin`.
1. Append `(origin, recipient)` to `HrmpOpenChannelRequestsList`.
1. Add a new entry to `HrmpOpenChannelRequests` for `(origin, recipient)`
1. Set `sender_deposit` to `config.hrmp_sender_deposit`
1. Set `max_capacity` to `proposed_max_capacity`
1. Set `max_message_size` to `proposed_max_message_size`
1. Set `max_total_size` to `config.hrmp_channel_max_total_size`
1. Send a downward message to `recipient` notifying about an inbound HRMP channel request.
* The DM is sent using `queue_downward_message`.
* The DM is represented by the `HrmpNewChannelOpenRequest` XCM message.
* `sender` is set to `origin`,
* `max_message_size` is set to `proposed_max_message_size`,
* `max_capacity` is set to `proposed_max_capacity`.
* `hrmp_accept_open_channel(sender)`:
1. Check that there is an existing request between (`sender`, `origin`) in `HrmpOpenChannelRequests`
1. Check that it is not confirmed.
1. Check that the sum of the number of inbound HRMP channels opened to `origin` (the size of the set found in
`HrmpIngressChannelsIndex` for `origin`) and the number of accepted open requests by the `origin` (the value from
`HrmpAcceptedChannelRequestCount` for `origin`) doesn't exceed the limit of channels
(`config.hrmp_max_teyrchain_inbound_channels` or `config.hrmp_max_parathread_inbound_channels`) minus 1.
1. Check that `origin`'s balance is more or equal to `config.hrmp_recipient_deposit`.
1. Reserve the deposit for the `origin` according to `config.hrmp_recipient_deposit`
1. For the request in `HrmpOpenChannelRequests` identified by `(sender, P)`, set `confirmed` flag to `true`.
1. Increase `HrmpAcceptedChannelRequestCount` by 1 for `origin`.
1. Send a downward message to `sender` notifying that the channel request was accepted.
* The DM is sent using `queue_downward_message`.
* The DM is represented by the `HrmpChannelAccepted` XCM message.
* `recipient` is set to `origin`.
* `hrmp_cancel_open_request(ch)`:
1. Check that `origin` is either `ch.sender` or `ch.recipient`
1. Check that the open channel request `ch` exists.
1. Check that the open channel request for `ch` is not confirmed.
1. Remove `ch` from `HrmpOpenChannelRequests` and `HrmpOpenChannelRequestsList`
1. Decrement `HrmpAcceptedChannelRequestCount` for `ch.recipient` by 1.
1. Unreserve the deposit of `ch.sender`.
* `hrmp_close_channel(ch)`:
1. Check that `origin` is either `ch.sender` or `ch.recipient`
1. Check that `HrmpChannels` for `ch` exists.
1. Check that `ch` is not in the `HrmpCloseChannelRequests` set.
1. If not already there, insert a new entry `Some(())` to `HrmpCloseChannelRequests` for `ch` and append `ch` to
`HrmpCloseChannelRequestsList`.
1. Send a downward message to the opposite party notifying about the channel closing.
* The DM is sent using `queue_downward_message`.
* The DM is represented by the `HrmpChannelClosing` XCM message with:
* `initiator` is set to `origin`,
* `sender` is set to `ch.sender`,
* `recipient` is set to `ch.recipient`.
* The opposite party is `ch.sender` if `origin` is `ch.recipient` and `ch.recipient` if `origin` is `ch.sender`.
## Session Change
1. For each `P` in `outgoing_paras` (generated by `Paras::on_new_session`):
1. Remove all inbound channels of `P`, i.e. `(_, P)`,
1. Remove all outbound channels of `P`, i.e. `(P, _)`,
1. Remove `HrmpOpenChannelRequestCount` for `P`
1. Remove `HrmpAcceptedChannelRequestCount` for `P`.
1. Remove `HrmpOpenChannelRequests` and `HrmpOpenChannelRequestsList` for `(P, _)` and `(_, P)`.
1. For each removed channel request `C`:
1. Unreserve the sender's deposit if the sender is not present in `outgoing_paras`
1. Unreserve the recipient's deposit if `C` is confirmed and the recipient is not present in
`outgoing_paras`
1. For each channel designator `D` in `HrmpOpenChannelRequestsList` we query the request `R` from
`HrmpOpenChannelRequests`:
1. if `R.confirmed = true`,
1. if both `D.sender` and `D.recipient` are not offboarded.
1. create a new channel `C` between `(D.sender, D.recipient)`.
1. Initialize the `C.sender_deposit` with `R.sender_deposit` and `C.recipient_deposit` with the value
found in the configuration `config.hrmp_recipient_deposit`.
1. Insert `sender` into the set `HrmpIngressChannelsIndex` for the `recipient`.
1. Insert `recipient` into the set `HrmpEgressChannelsIndex` for the `sender`.
1. decrement `HrmpOpenChannelRequestCount` for `D.sender` by 1.
1. decrement `HrmpAcceptedChannelRequestCount` for `D.recipient` by 1.
1. remove `R`
1. remove `D`
1. For each HRMP channel designator `D` in `HrmpCloseChannelRequestsList`
1. remove the channel identified by `D`, if exists.
1. remove `D` from `HrmpCloseChannelRequests`.
1. remove `D` from `HrmpCloseChannelRequestsList`
To remove a HRMP channel `C` identified with a tuple `(sender, recipient)`:
1. Return `C.sender_deposit` to the `sender`.
1. Return `C.recipient_deposit` to the `recipient`.
1. Remove `C` from `HrmpChannels`.
1. Remove `C` from `HrmpChannelContents`.
1. Remove `recipient` from the set `HrmpEgressChannelsIndex` for `sender`.
1. Remove `sender` from the set `HrmpIngressChannelsIndex` for `recipient`.
@@ -0,0 +1,184 @@
# Inclusion Pallet
> NOTE: This module has suffered changes for the elastic scaling implementation. As a result, parts of this document may
be out of date and will be updated at a later time. Issue tracking the update:
https://github.com/pezkuwichain/pezkuwi-sdk/issues/132
The inclusion module is responsible for inclusion and availability of scheduled teyrchains. It also manages the UMP
dispatch queue of each teyrchain.
## Storage
Helper structs:
```rust
struct AvailabilityBitfield {
bitfield: BitVec, // one bit per core.
submitted_at: BlockNumber, // for accounting, as meaning of bits may change over time.
}
struct CandidatePendingAvailability {
core: CoreIndex, // availability core
hash: CandidateHash,
descriptor: CandidateDescriptor,
availability_votes: Bitfield, // one bit per validator.
relay_parent_number: BlockNumber, // number of the relay-parent.
backers: Bitfield, // one bit per validator, set for those who backed the candidate.
backed_in_number: BlockNumber,
backing_group: GroupIndex,
}
```
Storage Layout:
```rust
/// The latest bitfield for each validator, referred to by index.
bitfields: map ValidatorIndex => AvailabilityBitfield;
/// Candidates pending availability.
PendingAvailability: map ParaId => CandidatePendingAvailability;
/// The commitments of candidates pending availability, by ParaId.
PendingAvailabilityCommitments: map ParaId => CandidateCommitments;
```
## Config Dependencies
* `MessageQueue`: The message queue provides general queueing and processing functionality. Currently it replaces the
old `UMP` dispatch queue. Other use-cases can be implemented as well by adding new variants to
`AggregateMessageOrigin`. Normally it should be set to an instance of the `MessageQueue` pallet.
## Session Change
1. Clear out all candidates pending availability.
1. Clear out all validator bitfields.
Optional:
1. The UMP queue of all outgoing paras can be "swept". This would prevent the dispatch queue from automatically being
serviced. It is a consideration for the chain and specific behaviour is not defined.
## Initialization
No initialization routine runs for this module. However, the initialization of the `MessageQueue` pallet will attempt to
process any pending UMP messages.
## Routines
All failed checks should lead to an unrecoverable error making the block invalid.
* `process_bitfields(expected_bits, Bitfields, core_lookup: Fn(CoreIndex) -> Option<ParaId>)`:
1. Call `sanitize_bitfields<true>` and use the sanitized `signed_bitfields` from now on.
1. Call `sanitize_backed_candidates<true>` and use the sanitized `backed_candidates` from now on.
1. Apply each bit of bitfield to the corresponding pending candidate, looking up on-demand teyrchain cores using the
`core_lookup`. Disregard bitfields that have a `1` bit for any free cores.
1. For each applied bit of each availability-bitfield, set the bit for the validator in the
`CandidatePendingAvailability`'s `availability_votes` bitfield. Track all candidates that now have >2/3 of bits set
in their `availability_votes`. These candidates are now available and can be enacted.
1. For all now-available candidates, invoke the `enact_candidate` routine with the candidate and relay-parent number.
1. Return a list of `(CoreIndex, CandidateHash)` from freed cores consisting of the cores where candidates have become
available.
* `sanitize_bitfields<T: crate::inclusion::Config>( unchecked_bitfields: UncheckedSignedAvailabilityBitfields,
disputed_bitfield: DisputedBitfield, expected_bits: usize, parent_hash: T::Hash, session_index: SessionIndex,
validators: &[ValidatorId], full_check: FullCheck, )`:
1. check that `disputed_bitfield` has the same number of bits as the `expected_bits`, iff not return early with an
empty vec.
1. each of the below checks is for each bitfield. If a check does not pass the bitfield will be skipped.
1. check that there are no bits set that reference a disputed candidate.
1. check that the number of bits is equal to `expected_bits`.
1. check that the validator index is strictly increasing (and thus also unique).
1. check that the validator bit index is not out of bounds.
1. check the validators signature, iff `full_check=FullCheck::Yes`.
* `sanitize_backed_candidates<T: crate::inclusion::Config, F: FnMut(usize, &BackedCandidate<T::Hash>) -> bool>( mut
backed_candidates: Vec<BackedCandidate<T::Hash>>, candidate_has_concluded_invalid_dispute: F, scheduled:
&[CoreAssignment], )`
1. filter out any backed candidates that have concluded invalid.
1. filters backed candidates whom's paraid was scheduled by means of the provided `scheduled` parameter.
1. sorts remaining candidates with respect to the core index assigned to them.
* `process_candidates(allowed_relay_parents, BackedCandidates, scheduled: Vec<CoreAssignment>, group_validators:
Fn(GroupIndex) -> Option<Vec<ValidatorIndex>>)`:
> For details on `AllowedRelayParentsTracker` see documentation for [Shared](./shared.md) module.
1. check that each candidate corresponds to a scheduled core and that they are ordered in the same order the cores
appear in assignments in `scheduled`.
1. check that `scheduled` is sorted ascending by `CoreIndex`, without duplicates.
1. check that the relay-parent from each candidate receipt is one of the allowed relay-parents.
1. check that there is no candidate pending availability for any scheduled `ParaId`.
1. check that each candidate's `validation_data_hash` corresponds to a `PersistedValidationData` computed from the
state of the context block.
1. If the core assignment includes a specific collator, ensure the backed candidate is issued by that collator.
1. Ensure that any code upgrade scheduled by the candidate does not happen within `config.validation_upgrade_cooldown`
of `Paras::last_code_upgrade(para_id, true)`, if any, comparing against the value of `Paras::FutureCodeUpgrades`
for the given para ID.
1. Check the collator's signature on the candidate data (only if `CandidateDescriptor` is version 1)
1. check the backing of the candidate using the signatures and the bitfields, comparing against the validators
assigned to the groups, fetched with the `group_validators` lookup, while group indices are computed by `Scheduler`
according to group rotation info.
1. call `check_upward_messages(config, para, commitments.upward_messages)` to check that the upward messages are
valid.
1. call `Dmp::check_processed_downward_messages(para, commitments.processed_downward_messages)` to check that the DMQ
is properly drained.
1. call `Hrmp::check_hrmp_watermark(para, commitments.hrmp_watermark)` for each candidate to check rules of processing
the HRMP watermark.
1. using `Hrmp::check_outbound_hrmp(sender, commitments.horizontal_messages)` ensure that the each candidate sent a
valid set of horizontal messages
1. create an entry in the `PendingAvailability` map for each backed candidate with a blank `availability_votes`
bitfield.
1. create a corresponding entry in the `PendingAvailabilityCommitments` with the commitments.
1. Return a `Vec<CoreIndex>` of all scheduled cores of the list of passed assignments that a candidate was
successfully backed for, sorted ascending by CoreIndex.
* `enact_candidate(relay_parent_number: BlockNumber, CommittedCandidateReceipt)`:
1. If the receipt contains a code upgrade, Call `Paras::schedule_code_upgrade(para_id, code, relay_parent_number,
config)`.
> TODO: Note that this is safe as long as we never enact candidates where the relay parent is across a session
> boundary. In that case, which we should be careful to avoid with contextual execution, the configuration might
> have changed and the para may de-sync from the host's understanding of it.
1. Reward all backing validators of each candidate, contained within the `backers` field.
1. call `receive_upward_messages` for each backed candidate, using the
[`UpwardMessage`s](../types/messages.md#upward-message) from the
[`CandidateCommitments`](../types/candidate.md#candidate-commitments).
1. call `Dmp::prune_dmq` with the para id of the candidate and the candidate's `processed_downward_messages`.
1. call `Hrmp::prune_hrmp` with the para id of the candidate and the candidate's `hrmp_watermark`.
1. call `Hrmp::queue_outbound_hrmp` with the para id of the candidate and the list of horizontal messages taken from
the commitment,
1. Call `Paras::note_new_head` using the `HeadData` from the receipt and `relay_parent_number`.
* `collect_pending`:
```rust
fn collect_pending(f: impl Fn(CoreIndex, BlockNumber) -> bool) -> Vec<CoreIndex> {
// sweep through all paras pending availability. if the predicate returns true, when given the core index and
// the block number the candidate has been pending availability since, then clean up the corresponding storage for that candidate and the commitments.
// return a vector of cleaned-up core IDs.
}
```
* `force_enact(ParaId)`: Forcibly enact the pending candidates of the given paraid as though they had been deemed
available by bitfields. Is a no-op if there is no candidate pending availability for this para-id.
If there are multiple candidates pending availability for this para-id, it will enact all of
them. This should generally not be used but it is useful during execution of Runtime APIs,
where the changes to the state are expected to be discarded directly after.
* `candidate_pending_availability(ParaId) -> Option<CommittedCandidateReceipt>`: returns the `CommittedCandidateReceipt`
pending availability for the para provided, if any.
* `candidates_pending_availability(ParaId) -> Vec<CommittedCandidateReceipt>`: returns the `CommittedCandidateReceipt`s
pending availability for the para provided, if any.
* `pending_availability(ParaId) -> Option<CandidatePendingAvailability>`: returns the metadata around the candidate
pending availability for the para, if any.
* `free_disputed(disputed: Vec<CandidateHash>) -> Vec<CoreIndex>`: Sweeps through all paras pending availability. If
the candidate hash is one of the disputed candidates, then clean up the corresponding storage for that candidate and
the commitments. Return a vector of cleaned-up core IDs.
These functions were formerly part of the UMP pallet:
* `check_upward_messages(P: ParaId, Vec<UpwardMessage>)`:
1. Checks that the teyrchain is not currently offboarding and error otherwise.
2. Checks that there are at most `config.max_upward_message_num_per_candidate` messages to be enqueued.
3. Checks that no message exceeds `config.max_upward_message_size`.
4. Checks that the total resulting queue size would not exceed `co`.
5. Verify that queuing up the messages could not result in exceeding the queue's footprint according to the config
items `config.max_upward_queue_count` and `config.max_upward_queue_size`. The queue's current footprint is provided
in `well_known_keys` in order to facilitate oraclisation on to the para.
Candidate Enactment:
* `receive_upward_messages(P: ParaId, Vec<UpwardMessage>)`:
1. Process each upward message `M` in order:
1. Place in the dispatch queue according to its para ID (or handle it immediately).
@@ -0,0 +1,56 @@
# Initializer Pallet
This module is responsible for initializing the other modules in a deterministic order. It also has one other purpose as
described in the overview of the runtime: accepting and forwarding session change notifications.
## Storage
```rust
HasInitialized: bool;
// buffered session changes along with the block number at which they should be applied.
//
// typically this will be empty or one element long. ordered ascending by BlockNumber and insertion
// order.
BufferedSessionChanges: Vec<(BlockNumber, ValidatorSet, ValidatorSet)>;
```
## Initialization
Before initializing modules, remove all changes from the `BufferedSessionChanges` with number less than or equal to the
current block number, and apply the last one. The session change is applied to all modules in the same order as
initialization.
The other teyrchains modules are initialized in this order:
1. Configuration
1. Shared
1. Paras
1. Scheduler
1. Inclusion
1. SessionInfo
1. Disputes
1. DMP
1. UMP
1. HRMP
The [Configuration Module](configuration.md) is first, since all other modules need to operate under the same
configuration as each other. Then the [Shared](shared.md) module is invoked, which determines the set of active
validators. It would lead to inconsistency if, for example, the scheduler ran first and then the configuration was
updated before the Inclusion module.
Set `HasInitialized` to true.
## Session Change
Store the session change information in `BufferedSessionChange` along with the block number at which it was submitted,
plus one. Although the expected operational parameters of the block authorship system should prevent more than one
change from being buffered at any time, it may occur. Regardless, we always need to track the block number at which the
session change can be applied so as to remain flexible over session change notifications being issued before or after
initialization of the current block.
## Finalization
Finalization order is less important in this case than initialization order, so we finalize the modules in the reverse
order from initialization.
Set `HasInitialized` to false.
@@ -0,0 +1,98 @@
# `ParaInherent`
> NOTE: This module has suffered changes for the elastic scaling implementation. As a result, parts of this document may
be out of date and will be updated at a later time. Issue tracking the update:
https://github.com/pezkuwichain/pezkuwi-sdk/issues/132
This module is responsible for providing all data given to the runtime by the block author to the various teyrchains
modules. The entry-point is mandatory, in that it must be invoked exactly once within every block, and it is also
"inherent", in that it is provided with no origin by the block author. The data within it carries its own
authentication; i.e. the data takes the form of signed statements by validators. Invalid data will be filtered and not
applied.
This module does not have the same initialization/finalization concerns as the others, as it only requires that entry
points be triggered after all modules have initialized and that finalization happens after entry points are triggered.
Both of these are assumptions we have already made about the runtime's order of operations, so this module doesn't need
to be initialized or finalized by the `Initializer`.
There are a couple of important notes to the operations in this inherent as they relate to disputes.
1. We don't accept bitfields or backed candidates if in "governance-only" mode from having a local dispute conclude on
this fork.
1. When disputes are initiated, we remove the block from pending availability. This allows us to roll back chains to the
block before blocks are included as opposed to backing. It's important to do this before processing bitfields.
1. `Inclusion::free_disputed` is kind of expensive so it's important to gate this on whether there are actually any
new disputes. Which should be never.
1. And we don't accept parablocks that have open disputes or disputes that have concluded against the candidate. It's
important to import dispute statements before backing, but this is already the case as disputes are imported before
processing bitfields.
## Storage
```rust
/// Whether the para inherent was included or not.
Included: Option<()>,
```
```rust
/// Scraped on chain votes to be used in disputes off-chain.
OnChainVotes: Option<ScrapedOnChainVotes>,
```
## Finalization
1. Take (get and clear) the value of `Included`. If it is not `Some`, throw an unrecoverable error.
## Entry Points
* `enter`: This entry-point accepts one parameter: [`ParaInherentData`](../types/runtime.md#ParaInherentData).
* `create_inherent`: This entry-point accepts one parameter: `InherentData`.
Both entry points share mostly the same code. `create_inherent` will meaningfully limit inherent data to adhere to the
weight limit, in addition to sanitizing any inputs and filtering out invalid data. Conceptually it is part of the block
production. The `enter` call on the other hand is part of block import and consumes/imports the data previously produced
by `create_inherent`.
In practice both calls process inherent data and apply it to the state. Block production and block import should arrive
at the same new state. Hence we re-use the same logic to ensure this is the case.
The only real difference between the two is, that on `create_inherent` we actually need the processed and filtered
inherent data to build the block, while on `enter` the processed data should for one be identical to the incoming
inherent data (assuming honest block producers) and second it is irrelevant, as we are not building a block but just
processing it, so the processed inherent data is simply dropped.
This also means that the `enter` function keeps data around for no good reason. This seems acceptable though as the size
of a block is rather limited. Nevertheless if we ever wanted to optimize this we can easily implement an inherent
collector that has two implementations, where one clones and stores the data and the other just passes it on.
## Sanitization
`ParasInherent` with the entry point of `create_inherent` sanitizes the input data, while the `enter` entry point
enforces already sanitized input data. If unsanitized data is provided the module generates an error.
Disputes are included in the block with a priority for a security reasons. It's important to include as many dispute
votes onchain as possible so that disputes conclude faster and the offenders are punished. However if there are too many
disputes to include in a block the dispute set is trimmed so that it respects max block weight.
Dispute data is first deduplicated and sorted by block number (older first) and dispute location (local then remote).
Concluded and ancient (disputes initiated before the post conclusion acceptance period) disputes are filtered out.
Votes with invalid signatures or from unknown validators (not found in the active set for the current session) are also
filtered out.
All dispute statements are included in the order described in the previous paragraph until the available block weight is
exhausted. After the dispute data is included all remaining weight is filled in with candidates and availability
bitfields. Bitfields are included with priority, then candidates containing code updates and finally any backed
candidates. If there is not enough weight for all backed candidates they are trimmed by random selection. Disputes are
processed in three separate functions - `deduplicate_and_sort_dispute_data`, `filter_dispute_data` and
`limit_and_sanitize_disputes`.
Availability bitfields are also sanitized by dropping malformed ones, containing disputed cores or bad signatures. Refer
to `sanitize_bitfields` function for implementation details.
Backed candidates sanitization removes malformed ones, candidates which have got concluded invalid disputes against them
or candidates produced by unassigned cores. Furthermore any backing votes from disabled validators for a candidate are
dropped. This is part of the validator disabling strategy. After filtering the statements from disabled validators a
backed candidate may end up with votes count less than `minimum_backing_votes` (a parameter from `HostConfiguration`).
In this case the whole candidate is dropped otherwise it will be rejected by `process_candidates` from pallet inclusion.
All checks related to backed candidates are implemented in `sanitize_backed_candidates` and
`filter_backed_statements_from_disabled_validators`.
@@ -0,0 +1,287 @@
# Paras Pallet
The Paras module is responsible for storing information on teyrchains. Registered teyrchains cannot change except at
session boundaries and after at least a full session has passed. This is primarily to ensure that the number and meaning
of bits required for the availability bitfields does not change except at session boundaries.
It's also responsible for:
- managing teyrchain validation code upgrades as well as maintaining availability of old teyrchain code and its pruning.
- vetting PVFs by means of the PVF pre-checking mechanism.
## Storage
### Utility Structs
```rust
// the two key times necessary to track for every code replacement.
pub struct ReplacementTimes {
/// The relay-chain block number that the code upgrade was expected to be activated.
/// This is when the code change occurs from the para's perspective - after the
/// first parablock included with a relay-parent with number >= this value.
expected_at: BlockNumber,
/// The relay-chain block number at which the parablock activating the code upgrade was
/// actually included. This means considered included and available, so this is the time at which
/// that parablock enters the acceptance period in this fork of the relay-chain.
activated_at: BlockNumber,
}
/// Metadata used to track previous teyrchain validation code that we keep in
/// the state.
pub struct ParaPastCodeMeta {
// Block numbers where the code was expected to be replaced and where the code
// was actually replaced, respectively. The first is used to do accurate lookups
// of historic code in historic contexts, whereas the second is used to do
// pruning on an accurate timeframe. These can be used as indices
// into the `PastCode` map along with the `ParaId` to fetch the code itself.
upgrade_times: Vec<ReplacementTimes>,
// This tracks the highest pruned code-replacement, if any.
last_pruned: Option<BlockNumber>,
}
struct ParaGenesisArgs {
/// The initial head-data to use.
genesis_head: HeadData,
/// The validation code to start with.
validation_code: ValidationCode,
/// True if teyrchain, false if parathread.
teyrchain: bool,
}
/// The possible states of a para, to take into account delayed lifecycle changes.
pub enum ParaLifecycle {
/// A Para is new and is onboarding.
Onboarding,
/// Para is a Parathread (on-demand teyrchain).
Parathread,
/// Para is a lease holding Teyrchain.
Teyrchain,
/// Para is a Parathread (on-demand Teyrchain) which is upgrading to a lease holding Teyrchain.
UpgradingParathread,
/// Para is a lease holding Teyrchain which is downgrading to an on-demand teyrchain.
DowngradingTeyrchain,
/// Parathread (on-demand teyrchain) is being offboarded.
OutgoingParathread,
/// Teyrchain is being offboarded.
OutgoingTeyrchain,
}
enum PvfCheckCause {
/// PVF vote was initiated by the initial onboarding process of the given para.
Onboarding(ParaId),
/// PVF vote was initiated by signalling of an upgrade by the given para.
Upgrade {
/// The ID of the teyrchain that initiated or is waiting for the conclusion of pre-checking.
id: ParaId,
/// The relay-chain block number that was used as the relay-parent for the parablock that
/// initiated the upgrade.
relay_parent_number: BlockNumber,
},
}
struct PvfCheckActiveVoteState {
// The two following vectors have their length equal to the number of validators in the active
// set. They start with all zeroes. A 1 is set at an index when the validator at the that index
// makes a vote. Once a 1 is set for either of the vectors, that validator cannot vote anymore.
// Since the active validator set changes each session, the bit vectors are reinitialized as
// well: zeroed and resized so that each validator gets its own bit.
votes_accept: BitVec,
votes_reject: BitVec,
/// The number of session changes this PVF vote has observed. Therefore, this number is
/// increased at each session boundary. When created, it is initialized with 0.
age: SessionIndex,
/// The block number at which this PVF vote was created.
created_at: BlockNumber,
/// A list of causes for this PVF pre-checking. Has at least one.
causes: Vec<PvfCheckCause>,
}
```
#### Para Lifecycle
Because the state changes of teyrchains are delayed, we track the specific state of the para using the `ParaLifecycle`
enum.
```
None Parathread (on-demand teyrchain) Teyrchain
+ + +
| | |
| (≈2 Session Delay) | |
| | |
+----------------------->+ |
| Onboarding | |
| | |
+-------------------------------------------------->+
| Onboarding | |
| | |
| +------------------------->+
| | UpgradingParathread |
| | |
| +<-------------------------+
| | DowngradingTeyrchain |
| | |
|<-----------------------+ |
| OutgoingParathread | |
| | |
+<--------------------------------------------------+
| | OutgoingTeyrchain |
| | |
+ + +
```
Note that if PVF pre-checking is enabled, onboarding of a para may potentially be delayed. This can happen due to PVF
pre-checking voting concluding late.
During the transition period, the para object is still considered in its existing state.
### Storage Layout
```rust
use frame_system::pallet_prelude::BlockNumberFor;
/// All currently active PVF pre-checking votes.
///
/// Invariant:
/// - There are no PVF pre-checking votes that exists in list but not in the set and vice versa.
PvfActiveVoteMap: map ValidationCodeHash => PvfCheckActiveVoteState;
/// The list of all currently active PVF votes. Auxiliary to `PvfActiveVoteMap`.
PvfActiveVoteList: Vec<ValidationCodeHash>;
/// All teyrchains. Ordered ascending by ParaId. On-demand teyrchains are not included.
Teyrchains: Vec<ParaId>,
/// The current lifecycle state of all known Para Ids.
ParaLifecycle: map ParaId => Option<ParaLifecycle>,
/// The head-data of every registered para.
Heads: map ParaId => Option<HeadData>;
/// The context (relay-chain block number) of the most recent teyrchain head.
MostRecentContext: map ParaId => BlockNumber;
/// The validation code hash of every live para.
CurrentCodeHash: map ParaId => Option<ValidationCodeHash>;
/// Actual past code hash, indicated by the para id as well as the block number at which it became outdated.
PastCodeHash: map (ParaId, BlockNumber) => Option<ValidationCodeHash>;
/// Past code of teyrchains. The teyrchains themselves may not be registered anymore,
/// but we also keep their code on-chain for the same amount of time as outdated code
/// to keep it available for secondary checkers.
PastCodeMeta: map ParaId => ParaPastCodeMeta;
/// Which paras have past code that needs pruning and the relay-chain block at which the code was replaced.
/// Note that this is the actual height of the included block, not the expected height at which the
/// code upgrade would be applied, although they may be equal.
/// This is to ensure the entire acceptance period is covered, not an offset acceptance period starting
/// from the time at which the teyrchain perceives a code upgrade as having occurred.
/// Multiple entries for a single para are permitted. Ordered ascending by block number.
PastCodePruning: Vec<(ParaId, BlockNumber)>;
/// The block number at which the planned code change is expected for a para.
/// The change will be applied after the first parablock for this ID included which executes
/// in the context of a relay chain block with a number >= `expected_at`.
FutureCodeUpgrades: map ParaId => Option<BlockNumber>;
/// Hash of the actual future code of a para.
FutureCodeHash: map ParaId => Option<ValidationCodeHash>;
/// This is used by the relay-chain to communicate to a teyrchain a go-ahead with in the upgrade procedure.
///
/// This value is absent when there are no upgrades scheduled or during the time the relay chain
/// performs the checks. It is set at the first relay-chain block when the corresponding teyrchain
/// can switch its upgrade function. As soon as the teyrchain's block is included, the value
/// gets reset to `None`.
///
/// NOTE that this field is used by teyrchains via merkle storage proofs, therefore changing
/// the format will require migration of teyrchains.
UpgradeGoAheadSignal: map hasher(twox_64_concat) ParaId => Option<UpgradeGoAhead>;
/// This is used by the relay-chain to communicate that there are restrictions for performing
/// an upgrade for this teyrchain.
///
/// This may be a because the teyrchain waits for the upgrade cooldown to expire. Another
/// potential use case is when we want to perform some maintenance (such as storage migration)
/// we could restrict upgrades to make the process simpler.
///
/// NOTE that this field is used by teyrchains via merkle storage proofs, therefore changing
/// the format will require migration of teyrchains.
UpgradeRestrictionSignal: map hasher(twox_64_concat) ParaId => Option<UpgradeRestriction>;
/// The list of teyrchains that are awaiting for their upgrade restriction to cooldown.
///
/// Ordered ascending by block number.
UpgradeCooldowns: Vec<(ParaId, BlockNumberFor<T>)>;
/// The list of upcoming code upgrades. Each item is a pair of which para performs a code
/// upgrade and at which relay-chain block it is expected at.
///
/// Ordered ascending by block number.
UpcomingUpgrades: Vec<(ParaId, BlockNumberFor<T>)>;
/// The actions to perform during the start of a specific session index.
ActionsQueue: map SessionIndex => Vec<ParaId>;
/// Upcoming paras instantiation arguments.
///
/// NOTE that after PVF pre-checking is enabled the para genesis arg will have it's code set
/// to empty. Instead, the code will be saved into the storage right away via `CodeByHash`.
UpcomingParasGenesis: map ParaId => Option<ParaGenesisArgs>;
/// The number of references on the validation code in `CodeByHash` storage.
CodeByHashRefs: map ValidationCodeHash => u32;
/// Validation code stored by its hash.
CodeByHash: map ValidationCodeHash => Option<ValidationCode>
```
## Session Change
1. Execute all queued actions for paralifecycle changes:
1. Clean up outgoing paras.
1. This means removing the entries under `Heads`, `CurrentCode`, `FutureCodeUpgrades`, `FutureCode` and
`MostRecentContext`. An according entry should be added to `PastCode`, `PastCodeMeta`, and `PastCodePruning`
using the outgoing `ParaId` and removed `CurrentCode` value. This is because any outdated validation code must
remain available on-chain for a determined amount of blocks, and validation code outdated by de-registering the
para is still subject to that invariant.
1. Apply all incoming paras by initializing the `Heads` and `CurrentCode` using the genesis parameters as well as
`MostRecentContext` to `0`.
1. Amend the `Teyrchains` list and `ParaLifecycle` to reflect changes in registered teyrchains.
1. Amend the `ParaLifecycle` set to reflect changes in registered on-demand teyrchains.
1. Upgrade all on-demand teyrchains that should become lease holding teyrchains, updating the `Teyrchains` list and
`ParaLifecycle`.
1. Downgrade all lease holding teyrchains that should become on-demand teyrchains, updating the `Teyrchains` list and
`ParaLifecycle`.
1. (Deferred) Return list of outgoing paras to the initializer for use by other modules.
1. Go over all active PVF pre-checking votes:
1. Increment `age` of the vote.
1. If `age` reached `cfg.pvf_voting_ttl`, then enact PVF rejection and remove the vote from the active list.
1. Otherwise, reinitialize the ballots. 1. Resize the `votes_accept`/`votes_reject` to have the same length as the
incoming validator set. 1. Zero all the votes.
## Initialization
1. Do pruning based on all entries in `PastCodePruning` with `BlockNumber <= now`. Update the corresponding
`PastCodeMeta` and `PastCode` accordingly.
1. Toggle the upgrade related signals
1. Collect all `(para_id, expected_at)` from `UpcomingUpgrades` where `expected_at <= now` and prune them. For each
para pruned set `UpgradeGoAheadSignal` to `GoAhead`. Reserve weight for the state modification to upgrade each para
pruned.
1. Collect all `(para_id, next_possible_upgrade_at)` from `UpgradeCooldowns` where `next_possible_upgrade_at <= now`.
For each para obtained this way reserve weight to remove its `UpgradeRestrictionSignal` on finalization.
## Routines
- `schedule_para_initialize(ParaId, ParaGenesisArgs)`: Schedule a para to be initialized at the next session. Noop if
para is already registered in the system with some `ParaLifecycle`.
- `schedule_para_cleanup(ParaId)`: Schedule a para to be cleaned up after the next full session.
- `schedule_parathread_upgrade(ParaId)`: Schedule a parathread (on-demand teyrchain) to be upgraded to a teyrchain.
- `schedule_teyrchain_downgrade(ParaId)`: Schedule a teyrchain to be downgraded from lease holding to on-demand.
- `schedule_code_upgrade(ParaId, new_code, relay_parent: BlockNumber, HostConfiguration)`: Schedule a future code
upgrade of the given teyrchain. In case the PVF pre-checking is disabled, or the new code is already present in the
storage, the upgrade will be applied after inclusion of a block of the same teyrchain executed in the context of a
relay-chain block with number >= `relay_parent + config.validation_upgrade_delay`. If the upgrade is scheduled
`UpgradeRestrictionSignal` is set and it will remain set until `relay_parent + config.validation_upgrade_cooldown`. In
case the PVF pre-checking is enabled, or the new code is not already present in the storage, then the PVF pre-checking
run will be scheduled for that validation code. If the pre-checking concludes with rejection, then the upgrade is
canceled. Otherwise, after pre-checking is concluded the upgrade will be scheduled and be enacted as described above.
- `note_new_head(ParaId, HeadData, BlockNumber)`: note that a para has progressed to a new head, where the new head was
executed in the context of a relay-chain block with given number, the latter value is inserted into the
`MostRecentContext` mapping. This will apply pending code upgrades based on the block number provided. If an upgrade
took place it will clear the `UpgradeGoAheadSignal`.
- `lifecycle(ParaId) -> Option<ParaLifecycle>`: Return the `ParaLifecycle` of a para.
- `is_teyrchain(ParaId) -> bool`: Returns true if the para ID references any live lease holding teyrchain, including
those which may be transitioning to an on-demand teyrchain in the future.
- `is_parathread(ParaId) -> bool`: Returns true if the para ID references any live parathread (on-demand teyrchain),
including those which may be transitioning to a lease holding teyrchain in the future.
- `is_valid_para(ParaId) -> bool`: Returns true if the para ID references either a live on-demand teyrchain or live
lease holding teyrchain.
- `can_upgrade_validation_code(ParaId) -> bool`: Returns true if the given para can signal code upgrade right now.
- `pvfs_require_prechecking() -> Vec<ValidationCodeHash>`: Returns the list of PVF validation code hashes that require
PVF pre-checking votes.
## Finalization
Collect all `(para_id, next_possible_upgrade_at)` from `UpgradeCooldowns` where `next_possible_upgrade_at <= now` and
prune them. For each para pruned remove its `UpgradeRestrictionSignal`.
@@ -0,0 +1,310 @@
# Scheduler Pallet
> TODO: this section is still heavily under construction. key questions about availability cores and validator
> assignment are still open and the flow of the section may be contradictory or inconsistent
The Scheduler module is responsible for two main tasks:
- Partitioning validators into groups and assigning groups to teyrchains.
- Scheduling teyrchains for each block
It aims to achieve these tasks with these goals in mind:
- It should be possible to know at least a block ahead-of-time, ideally more, which validators are going to be assigned
to which teyrchains.
- Teyrchains that have a candidate pending availability in this fork of the chain should not be assigned.
- Validator assignments should not be gameable. Malicious cartels should not be able to manipulate the scheduler to
assign themselves as desired.
- High or close to optimal throughput of teyrchains. Work among validator groups should be balanced.
## Availability Cores
The Scheduler manages resource allocation using the concept of "Availability Cores". There will be one availability core
for each lease holding teyrchain, and a fixed number of cores used for multiplexing on-demand teyrchains. Validators
will be partitioned into groups, with the same number of groups as availability cores. Validator groups will be assigned
to different availability cores over time.
An availability core can exist in either one of two states at the beginning or end of a block: free or occupied. A free
availability core can have a lease holding or on-demand teyrchain assigned to it for the potential to have a backed
candidate included. After backing, the core enters the occupied state as the backed candidate is pending availability.
There is an important distinction: a core is not considered occupied until it is in charge of a block pending
availability, although the implementation may treat scheduled cores the same as occupied ones for brevity. A core exits
the occupied state when the candidate is no longer pending availability - either on timeout or on availability. A core
starting in the occupied state can move to the free state and back to occupied all within a single block, as
availability bitfields are processed before backed candidates. At the end of the block, there is a possible timeout on
availability which can move the core back to the free state if occupied.
Cores are treated as an ordered list and are typically referred to by their index in that list.
```dot process
digraph {
label = "Availability Core State Machine\n\n\n";
labelloc = "t";
{ rank=same vg1 vg2 }
vg1 [label = "Free" shape=rectangle]
vg2 [label = "Occupied" shape=rectangle]
vg1 -> vg2 [label = "Assignment & Backing" ]
vg2 -> vg1 [label = "Availability or Timeout" ]
}
```
```dot process
digraph {
label = "Availability Core Transitions within Block\n\n\n";
labelloc = "t";
splines="line";
subgraph cluster_left {
label = "";
labelloc = "t";
fr1 [label = "Free" shape=rectangle]
fr2 [label = "Free" shape=rectangle]
occ [label = "Occupied" shape=rectangle]
fr1 -> fr2 [label = "No Backing"]
fr1 -> occ [label = "Backing"]
{ rank=same fr2 occ }
}
subgraph cluster_right {
label = "";
labelloc = "t";
occ2 [label = "Occupied" shape=rectangle]
fr3 [label = "Free" shape=rectangle]
fr4 [label = "Free" shape=rectangle]
occ3 [label = "Occupied" shape=rectangle]
occ4 [label = "Occupied" shape=rectangle]
occ2 -> fr3 [label = "Availability"]
occ2 -> occ3 [label = "No availability"]
fr3 -> fr4 [label = "No backing"]
fr3 -> occ4 [label = "Backing"]
occ3 -> occ4 [label = "(no change)"]
occ3 -> fr3 [label = "Availability Timeout"]
{ rank=same; fr3[group=g1]; occ3[group=g2] }
{ rank=same; fr4[group=g1]; occ4[group=g2] }
}
}
```
## Validator Groups
Validator group assignments do not need to change very quickly. The security benefits of fast rotation are redundant
with the challenge mechanism in the [Approval process](../protocol-approval.md). Because of this, we only divide
validators into groups at the beginning of the session and do not shuffle membership during the session. However, we do
take steps to ensure that no particular validator group has dominance over a single lease holding teyrchain or on-demand
teyrchain-multiplexer for an entire session to provide better guarantees of live-ness.
Validator groups rotate across availability cores in a round-robin fashion, with rotation occurring at fixed intervals.
The i'th group will be assigned to the `(i+k)%n`'th core at any point in time, where `k` is the number of rotations that
have occurred in the session, and `n` is the number of cores. This makes upcoming rotations within the same session
predictable.
When a rotation occurs, validator groups are still responsible for distributing availability chunks for any previous
cores that are still occupied and pending availability. In practice, rotation and availability-timeout frequencies
should be set so this will only be the core they have just been rotated from. It is possible that a validator group is
rotated onto a core which is currently occupied. In this case, the validator group will have nothing to do until the
previously-assigned group finishes their availability work and frees the core or the availability process times out.
Depending on if the core is for a lease holding teyrchain or on-demand teyrchain, a different timeout `t` from the
[`HostConfiguration`](../types/runtime.md#host-configuration) will apply. Availability timeouts should only be triggered
in the first `t-1` blocks after the beginning of a rotation.
## Claims
On-demand teyrchains operate on a system of claims. Collators purchase claims on authoring the next block of an
on-demand teyrchain, although the purchase mechanism is beyond the scope of the scheduler. The scheduler guarantees that
they'll be given at least a certain number of attempts to author a candidate that is backed. Attempts that fail during
the availability phase are not counted, since ensuring availability at that stage is the responsibility of the backing
validators, not of the collator. When a claim is accepted, it is placed into a queue of claims, and each claim is
assigned to a particular on-demand teyrchain-multiplexing core in advance. Given that the current assignments of
validator groups to cores are known, and the upcoming assignments are predictable, it is possible for on-demand
teyrchain collators to know who they should be talking to now and how they should begin establishing connections with as
a fallback.
With this information, the Node-side can be aware of which on-demand teyrchains have a good chance of being includable
within the relay-chain block and can focus any additional resources on backing candidates from those on-demand
teyrchains. Furthermore, Node-side code is aware of which validator group will be responsible for that thread. If the
necessary conditions are reached for core reassignment, those candidates can be backed within the same block as the core
being freed.
On-demand claims, when scheduled onto a free core, may not result in a block pending availability. This may be due to
collator error, networking timeout, or censorship by the validator group. In this case, the claims should be retried a
certain number of times to give the collator a fair shot.
## Storage
Utility structs:
```rust
// A claim on authoring the next block for a given parathread (on-demand teyrchain).
struct ParathreadClaim(ParaId, CollatorId);
// An entry tracking a parathread (on-demand teyrchain) claim to ensure it does not
// pass the maximum number of retries.
struct ParathreadEntry {
claim: ParathreadClaim,
retries: u32,
}
// A queued parathread (on-demand teyrchain) entry, pre-assigned to a core.
struct QueuedParathread {
claim: ParathreadEntry,
/// offset within the set of parathreads (on-demand teyrchains) ranged `0..config.parathread_cores`.
core_offset: u32,
}
struct ParathreadQueue {
queue: Vec<QueuedParathread>,
/// offset within the set of parathreads (on-demand teyrchains) ranged `0..config.parathread_cores`.
next_core_offset: u32,
}
enum CoreOccupied {
// On-demand teyrchain
Parathread(ParathreadEntry), // claim & retries
Teyrchain,
}
enum AssignmentKind {
Teyrchain,
// On-demand teyrchain
Parathread(CollatorId, u32),
}
struct CoreAssignment {
core: CoreIndex,
para_id: ParaId,
kind: AssignmentKind,
group_idx: GroupIndex,
}
// reasons a core might be freed.
enum FreedReason {
Concluded,
TimedOut,
}
```
Storage layout:
```rust
/// All the validator groups. One for each core. Indices are into the `ActiveValidators` storage.
ValidatorGroups: Vec<Vec<ValidatorIndex>>;
/// A queue of upcoming parathread (on-demand teyrchain) claims and which core they should be mapped onto.
ParathreadQueue: ParathreadQueue;
/// One entry for each availability core. Entries are `None` if the core is not currently occupied.
/// The i'th teyrchain lease belongs to the i'th core, with the remaining cores all being
/// on-demand teyrchain-multiplexers.
AvailabilityCores: Vec<Option<CoreOccupied>>;
/// An index used to ensure that only one claim on a parathread (on-demand teyrchain) exists in the queue or is
/// currently being handled by an occupied core.
ParathreadClaimIndex: Vec<ParaId>;
/// The block number where the session start occurred. Used to track how many group rotations have occurred.
SessionStartBlock: BlockNumber;
/// Currently scheduled cores - free but up to be occupied.
/// The value contained here will not be valid after the end of a block.
/// Runtime APIs should be used to determine scheduled cores
/// for the upcoming block.
Scheduled: Vec<CoreAssignment>, // sorted ascending by CoreIndex.
```
## Session Change
Session changes are the only time that configuration can change, and the [Configuration module](configuration.md)'s
session-change logic is handled before this module's. We also lean on the behavior of the [Inclusion
module](inclusion.md) which clears all its occupied cores on session change. Thus we don't have to worry about cores
being occupied across session boundaries and it is safe to re-size the `AvailabilityCores` bitfield.
Actions:
1. Set `SessionStartBlock` to current block number + 1, as session changes are applied at the end of the block.
1. Clear all `Some` members of `AvailabilityCores`. Return all parathread claims to queue with retries un-incremented.
1. Set `configuration = Configuration::configuration()` (see
[`HostConfiguration`](../types/runtime.md#host-configuration))
1. Fetch `Shared::ActiveValidators` as AV.
1. Determine the number of cores & validator groups as `n_cores`. This is the maximum of
1. `paras::Teyrchains::<T>::get().len() + configuration.parathread_cores`
1. `n_validators / max_validators_per_core` if `configuration.max_validators_per_core` is `Some` and non-zero.
1. Resize `AvailabilityCores` to have length `n_cores` with all `None` entries.
1. Compute new validator groups by shuffling using a secure randomness beacon
- Note that the total number of validators `V` in AV may not be evenly divided by `n_cores`.
- The groups are selected by partitioning AV. The first `V % N` groups will have `(V / n_cores) + 1` members, while
the remaining groups will have `(V / N)` members each.
- Instead of using the indices within AV, which point to the broader set, indices _into_ AV should be used. This
implies that groups should have simply ascending validator indices.
1. Prune the parathread (on-demand teyrchain) queue to remove all retries beyond `configuration.parathread_retries`.
- Also prune all on-demand claims corresponding to de-registered teyrchains.
- all pruned claims should have their entry removed from the parathread (on-demand teyrchain) index.
- assign all non-pruned claims to new cores if the number of on-demand teyrchain cores has changed between the
`new_config` and `old_config` of the `SessionChangeNotification`.
- Assign claims in equal balance across all cores if rebalancing, and set the `next_core` of the `ParathreadQueue`
(on-demand queue) by incrementing the relative index of the last assigned core and taking it modulo the number of
on-demand cores.
## Initialization
No initialization routine runs for this module.
## Finalization
No finalization routine runs for this module.
## Routines
- `add_parathread_claim(ParathreadClaim)`: Add a parathread (on-demand teyrchain) claim to the queue.
- Fails if any on-demand claim on the same teyrchain is currently indexed.
- Fails if the queue length is >= `config.scheduling_lookahead * config.parathread_cores`.
- The core used for the on-demand claim is the `next_core` field of the `ParathreadQueue` (on-demand queue) and adding
`paras::Teyrchains::<T>::get().len()` to it.
- `next_core` is then updated by adding 1 and taking it modulo `config.parathread_cores`.
- The claim is then added to the claim index.
- `free_cores(Vec<(CoreIndex, FreedReason)>)`: indicate previously-occupied cores which are to be considered returned
and why they are being returned.
- All freed lease holding teyrchain cores should be assigned to their respective teyrchain
- All freed on-demand teyrchain cores whose reason for freeing was `FreedReason::Concluded` should have the claim
removed from the claim index.
- All freed on-demand cores whose reason for freeing was `FreedReason::TimedOut` should have the claim added to the
parathread queue (on-demand queue) again without retries incremented
- All freed on-demand cores should take the next on-demand teyrchain entry from the queue.
- `schedule(Vec<(CoreIndex, FreedReason)>, now: BlockNumber)`: schedule new core assignments, with a parameter
indicating previously-occupied cores which are to be considered returned and why they are being returned.
- Invoke `free_cores(freed_cores)`
- The i'th validator group will be assigned to the `(i+k)%n`'th core at any point in time, where `k` is the number of
rotations that have occurred in the session, and `n` is the total number of cores. This makes upcoming rotations
within the same session predictable. Rotations are based off of `now`.
- `scheduled() -> Vec<CoreAssignment>`: Get currently scheduled core assignments.
- `occupied(Vec<CoreIndex>)`. Note that the given cores have become occupied.
- Behavior undefined if any given cores were not scheduled.
- Behavior undefined if the given cores are not sorted ascending by core index
- This clears them from `Scheduled` and marks each corresponding `core` in the `AvailabilityCores` as occupied.
- Since both the availability cores and the newly-occupied cores lists are sorted ascending, this method can be
implemented efficiently.
- `group_validators(GroupIndex) -> Option<Vec<ValidatorIndex>>`: return all validators in a given group, if the group
index is valid for this session.
- `availability_timeout_predicate() -> Option<impl Fn(CoreIndex, BlockNumber) -> bool>`: returns an optional predicate
that should be used for timing out occupied cores. if `None`, no timing-out should be done. The predicate accepts the
index of the core, and the block number since which it has been occupied. The predicate should be implemented based on
the time since the last validator group rotation, and the respective teyrchain timeouts, i.e. only within
`max(config.chain_availability_period, config.thread_availability_period)` of the last rotation would this return
`Some`.
- `group_rotation_info(now: BlockNumber) -> GroupRotationInfo`: Returns a helper for determining group rotation.
- `next_up_on_available(CoreIndex) -> Option<ScheduledCore>`: Return the next thing that will be scheduled on this core
assuming it is currently occupied and the candidate occupying it became available. Returns in `ScheduledCore` format
(todo: link to Runtime APIs page; linkcheck doesn't allow this right now). For lease holding teyrchains, this is
always the ID of the teyrchain and no specified collator. For on-demand teyrchains, this is based on the next item in
the `ParathreadQueue` (on-demand queue) assigned to that core, and is `None` if there isn't one.
- `next_up_on_time_out(CoreIndex) -> Option<ScheduledCore>`: Return the next thing that will be scheduled on this core
assuming it is currently occupied and the candidate occupying it timed out. Returns in `ScheduledCore` format (todo:
link to Runtime APIs page; linkcheck doesn't allow this right now). For teyrchains, this is always the ID of the
teyrchain and no specified collator. For on-demand teyrchains, this is based on the next item in the `ParathreadQueue`
(on-demand queue) assigned to that core, or if there isn't one, the claim that is currently occupying the core.
Otherwise `None`.
- `clear()`:
- Free all scheduled cores and return on-demand claims to queue, with retries incremented. Skip on-demand teyrchains
which no longer exist under paras.
@@ -0,0 +1,81 @@
# Session Info
For disputes and approvals, we need access to information about validator sets from prior sessions. We also often want
easy access to the same information about the current session's validator set. This module aggregates and stores this
information in a rolling window while providing easy APIs for access.
## Storage
Helper structs:
```rust
struct SessionInfo {
/// Validators in canonical ordering.
///
/// NOTE: There might be more authorities in the current session, than `validators` participating
/// in teyrchain consensus. See
/// [`max_validators`](https://github.com/paritytech/polkadot/blob/a52dca2be7840b23c19c153cf7e110b1e3e475f8/runtime/parachains/src/configuration.rs#L148).
///
/// `SessionInfo::validators` will be limited to `max_validators` when set.
validators: Vec<ValidatorId>,
/// Validators' authority discovery keys for the session in canonical ordering.
///
/// NOTE: The first `validators.len()` entries will match the corresponding validators in
/// `validators`, afterwards any remaining authorities can be found. This is any authorities not
/// participating in teyrchain consensus - see
/// [`max_validators`](https://github.com/paritytech/polkadot/blob/a52dca2be7840b23c19c153cf7e110b1e3e475f8/runtime/parachains/src/configuration.rs#L148)
#[cfg_attr(feature = "std", ignore_malloc_size_of = "outside type")]
discovery_keys: Vec<AuthorityDiscoveryId>,
/// The assignment keys for validators.
///
/// NOTE: There might be more authorities in the current session, than validators participating
/// in teyrchain consensus. See
/// [`max_validators`](https://github.com/paritytech/polkadot/blob/a52dca2be7840b23c19c153cf7e110b1e3e475f8/runtime/parachains/src/configuration.rs#L148).
///
/// Therefore:
/// ```ignore
/// assignment_keys.len() == validators.len() && validators.len() <= discovery_keys.len()
/// ```
assignment_keys: Vec<AssignmentId>,
/// Validators in shuffled ordering - these are the validator groups as produced
/// by the `Scheduler` module for the session and are typically referred to by
/// `GroupIndex`.
validator_groups: Vec<Vec<ValidatorIndex>>,
/// The number of availability cores used by the protocol during this session.
n_cores: u32,
/// The zeroth delay tranche width.
zeroth_delay_tranche_width: u32,
/// The number of samples we do of `relay_vrf_modulo`.
relay_vrf_modulo_samples: u32,
/// The number of delay tranches in total.
n_delay_tranches: u32,
/// How many slots (BABE / SASSAFRAS) must pass before an assignment is considered a
/// no-show.
no_show_slots: u32,
/// The number of validators needed to approve a block.
needed_approvals: u32,
}
```
Storage Layout:
```rust
/// The earliest session for which previous session info is stored.
EarliestStoredSession: SessionIndex,
/// Session information. Should have an entry from `EarliestStoredSession..=CurrentSessionIndex`
Sessions: map SessionIndex => Option<SessionInfo>,
```
## Session Change
1. Update `EarliestStoredSession` based on `config.dispute_period` and remove all entries from `Sessions` from the
previous value up to the new value.
1. Create a new entry in `Sessions` with information about the current session. Use `shared::ActiveValidators` to
determine the indices into the broader validator sets (validation, assignment, discovery) which are actually used for
teyrchain validation. Only these validators should appear in the `SessionInfo`.
## Routines
* `EarliestStoredSession::<T>::get() -> SessionIndex`: Yields the earliest session for which we have information stored.
* `Sessions::<T>::get(session: SessionIndex) -> Option<SessionInfo>`: Yields the session info for the given session, if
stored.
@@ -0,0 +1,89 @@
# Shared Pallet
This module is responsible for managing shared storage and configuration for other modules.
It is important that other pallets are able to use the Shared Module, so it should not have a dependency on any other
modules in the Teyrchains Runtime.
For the moment, it is used exclusively to track the current session index across the Teyrchains Runtime system, and when
it should be allowed to schedule future changes to Paras or Configurations.
## Constants
```rust
// `SESSION_DELAY` is used to delay any changes to Paras registration or configurations.
// Wait until the session index is 2 larger then the current index to apply any changes,
// which guarantees that at least one full session has passed before any changes are applied.
pub(crate) const SESSION_DELAY: SessionIndex = 2;
```
## Storage
Helper structs:
```rust
struct AllowedRelayParentsTracker<Hash, BlockNumber> {
// The past relay parents, paired with state roots, that are viable to build upon.
//
// They are in ascending chronologic order, so the newest relay parents are at
// the back of the deque.
//
// (relay_parent, state_root)
//
// NOTE: the size limit of look-back is currently defined as a constant in Runtime.
buffer: VecDeque<(Hash, Hash)>,
// The number of the most recent relay-parent, if any.
latest_number: BlockNumber,
}
```
Storage Layout:
```rust
/// The current session index within the Teyrchains Runtime system.
CurrentSessionIndex: SessionIndex;
/// All the validators actively participating in teyrchain consensus.
/// Indices are into the broader validator set.
ActiveValidatorIndices: Vec<ValidatorIndex>,
/// The teyrchain attestation keys of the validators actively participating in teyrchain consensus.
/// This should be the same length as `ActiveValidatorIndices`.
ActiveValidatorKeys: Vec<ValidatorId>
/// Relay-parents allowed to build candidates upon.
AllowedRelayParents: AllowedRelayParentsTracker<T::Hash, T::BlockNumber>,
```
## Initialization
The Shared Module currently has no initialization routines.
The Shared Module is initialized directly after the Configuration module, but before all other modules. It is important
to update the Shared Module before any other module since its state may be used within the logic of other modules, and
it is important that the state is consistent across them.
## Session Change
During a session change, the Shared Module receives and stores the current Session Index directly from the initializer
module, along with the broader validator set, and it returns the new list of validators.
The list of validators should be first shuffled according to the chain's random seed and then truncated. The indices of
these validators should be set to `ActiveValidatorIndices` and then returned back to the initializer.
`ActiveValidatorKeys` should be set accordingly.
This information is used in the:
* Configuration Module: For delaying updates to configurations until at lease one full session has passed.
* Paras Module: For delaying updates to paras until at least one full session has passed.
Allowed relay parents buffer, which is maintained by [ParaInherent](./parainherent.md) module, is cleared on every
session change.
## Finalization
The Shared Module currently has no finalization routines.
## Functions
* `scheduled_sessions() -> SessionIndex`: Return the next session index where updates to the Teyrchains Runtime system
would be safe to apply.
* `set_session_index(SessionIndex)`: For tests. Set the current session index in the Shared Module.
@@ -0,0 +1,421 @@
# Type Definitions
This section of the guide provides type definitions of various categories.
## V1 Overview
Diagrams are rendered in high resolution; open them in a separate tab to see full scale.
These data types are defined in `pezkuwi/primitives/src/v1.rs`:
```dot process
digraph {
rankdir = LR;
node [shape = plain]
CandidateDescriptor [label = <
<table>
<tr><td border="0" colspan="2" port="name">CandidateDescriptor&lt;H = Hash&gt;</td></tr>
<tr><td>para_id</td><td port="para_id">Id</td></tr>
<tr><td>relay_parent</td><td port="relay_parent">H</td></tr>
<tr><td>collator</td><td port="collator">CollatorId</td></tr>
<tr><td>persisted_validation_data_hash</td><td port="persisted_validation_data_hash">Hash</td></tr>
<tr><td>pov_hash</td><td port="pov_hash">Hash</td></tr>
<tr><td>erasure_root</td><td port="erasure_root">Hash</td></tr>
<tr><td>signature</td><td port="signature">CollatorSignature</td></tr>
</table>
>]
CandidateDescriptor:para_id -> Id:w
CandidateDescriptor:pov_hash -> PoVHash
CandidateDescriptor:collator -> CollatorId:w
CandidateDescriptor:persisted_validation_data_hash -> PersistedValidationDataHash
Id [label="pezkuwi_teyrchain_primitives::primitives::Id"]
CollatorId [label="pezkuwi_primitives::v2::CollatorId"]
PoVHash [label = "Hash", shape="doublecircle", fill="gray90"]
PoVHash -> PoV:name
CandidateReceipt [label = <
<table>
<tr><td border="0" colspan="2" port="name">CandidateReceipt&lt;H = Hash&gt;</td></tr>
<tr><td>descriptor</td><td port="descriptor">CandidateDescriptor&lt;H&gt;</td></tr>
<tr><td>commitments_hash</td><td port="commitments_hash">Hash</td></tr>
</table>
>]
CandidateReceipt:descriptor -> CandidateDescriptor:name
CandidateReceipt:commitments_hash -> CandidateCommitmentsHash
CandidateHash [label = "Hash", shape="doublecircle", fill="gray90"]
CandidateHash -> CandidateReceipt:name
CandidateCommitmentsHash [label = "Hash", shape="doublecircle", fill="gray90"]
CandidateCommitmentsHash -> CandidateCommitments:name
CommittedCandidateReceipt [label = <
<table>
<tr><td border="0" colspan="2" port="name">CommittedCandidateReceipt&lt;H = Hash&gt;</td></tr>
<tr><td>descriptor</td><td port="descriptor">CandidateDescriptor&lt;H&gt;</td></tr>
<tr><td>commitments</td><td port="commitments">CandidateCommitments</td></tr>
</table>
>]
CommittedCandidateReceipt:descriptor -> CandidateDescriptor:name
CommittedCandidateReceipt:commitments -> CandidateCommitments:name
ValidationData [label = <
<table>
<tr><td border="0" colspan="2" port="name">ValidationData&lt;N = BlockNumber&gt;</td></tr>
<tr><td>persisted</td><td port="persisted">PersistedValidationData&lt;N&gt;</td></tr>
<tr><td>transient</td><td port="transient">TransientValidationData&lt;N&gt;</td></tr>
</table>
>]
ValidationData:persisted -> PersistedValidationData:name
ValidationData:transient -> TransientValidationData:name
PersistedValidationData [label = <
<table>
<tr><td border="0" colspan="2" port="name">PersistedValidationData&lt;N = BlockNumber&gt;</td></tr>
<tr><td>parent_head</td><td port="parent_head">HeadData</td></tr>
<tr><td>block_number</td><td port="block_number">N</td></tr>
<tr><td>relay_parent_storage_root</td><td port="relay_parent_storage_root">Hash</td></tr>
<tr><td>max_pov_size</td><td port="max_pov_size">u32</td></tr>
</table>
>]
PersistedValidationData:parent_head -> HeadData:w
PersistedValidationDataHash [label = "Hash", shape="doublecircle", fill="gray90"]
PersistedValidationDataHash -> PersistedValidationData:name
TransientValidationData [label = <
<table>
<tr><td border="0" colspan="2" port="name">TransientValidationData&lt;N = BlockNumber&gt;</td></tr>
<tr><td>max_code_size</td><td port="max_code_size">u32</td></tr>
<tr><td>max_head_data_size</td><td port="max_head_data_size">u32</td></tr>
<tr><td>balance</td><td port="balance">Balance</td></tr>
<tr><td>code_upgrade_allowed</td><td port="code_upgrade_allowed">Option&lt;N&gt;</td></tr>
<tr><td>dmq_length</td><td port="dmq_length">u32</td></tr>
</table>
>]
TransientValidationData:balance -> "pezkuwi_core_primitives::v2::Balance":w
CandidateCommitments [label = <
<table>
<tr><td border="0" colspan="2" port="name">CandidateCommitments&lt;N = BlockNumber&gt;</td></tr>
<tr><td>upward_messages</td><td port="upward_messages">Vec&lt;UpwardMessage&gt;</td></tr>
<tr><td>horizontal_messages</td><td port="horizontal_messages">Vec&lt;OutboundHrmpMessage&lt;Id&gt;&gt;</td></tr>
<tr><td>new_validation_code</td><td port="new_validation_code">Option&lt;ValidationCode&gt;</td></tr>
<tr><td>head_data</td><td port="head_data">HeadData</td></tr>
<tr><td>processed_downward_messages</td><td port="processed_downward_messages">u32</td></tr>
<tr><td>hrmp_watermark</td><td port="hrmp_watermark">N</td></tr>
</table>
>]
CandidateCommitments:upward_messages -> "pezkuwi_teyrchain_primitives::primitives::UpwardMessage":w
CandidateCommitments:horizontal_messages -> "pezkuwi_core_primitives::v2::OutboundHrmpMessage":w
CandidateCommitments:head_data -> HeadData:w
CandidateCommitments:horizontal_messages -> "pezkuwi_teyrchain_primitives::primitives::Id":w
CandidateCommitments:new_validation_code -> "pezkuwi_teyrchain_primitives::primitives::ValidationCode":w
PoV [label = <
<table>
<tr><td border="0" colspan="2" port="name">PoV</td></tr>
<tr><td>block_data</td><td port="block_data">BlockData</td></tr>
</table>
>]
PoV:block_data -> "pezkuwi_teyrchain_primitives::primitives::BlockData":w
BackedCandidate [label = <
<table>
<tr><td border="0" colspan="2" port="name">BackedCandidate&lt;H = Hash&gt;</td></tr>
<tr><td>candidate</td><td port="candidate">CommittedCandidateReceipt&lt;H&gt;</td></tr>
<tr><td>validity_votes</td><td port="validity_votes">Vec&lt;ValidityAttestation&gt;</td></tr>
<tr><td>validator_indices</td><td port="validator_indices">BitVec</td></tr>
</table>
>]
BackedCandidate:candidate -> CommittedCandidateReceipt:name
BackedCandidate:validity_votes -> "pezkuwi_primitives:v0:ValidityAttestation":w
HeadData [label = "pezkuwi_teyrchain_primitives::primitives::HeadData"]
CoreIndex [label = <
<table>
<tr><td border="0" colspan="2" port="name">CoreIndex</td></tr>
<tr><td>0</td><td port="0">u32</td></tr>
</table>
>]
GroupIndex [label = <
<table>
<tr><td border="0" colspan="2" port="name">GroupIndex</td></tr>
<tr><td>0</td><td port="0">u32</td></tr>
</table>
>]
ParathreadClaim [label = <
<table>
<tr><td border="0" colspan="2" port="name">ParathreadClaim</td></tr>
<tr><td>0</td><td port="0">Id</td></tr>
<tr><td>1</td><td port="1">CollatorId</td></tr>
</table>
>]
ParathreadClaim:0 -> Id:w
ParathreadClaim:1 -> CollatorId:w
MessageQueueChainLink [label = "(prev_head, B, H(M))\nSee doc of AbridgedHrmpChannel::mqc_head"]
MQCHash [label = "Hash", shape="doublecircle", fill="gray90"]
MQCHash -> MessageQueueChainLink
ParathreadEntry [label = <
<table>
<tr><td border="0" colspan="2" port="name">ParathreadEntry</td></tr>
<tr><td>claim</td><td port="claim">ParathreadClaim</td></tr>
<tr><td>retries</td><td port="retries">u32</td></tr>
</table>
>]
ParathreadEntry:claim -> ParathreadClaim:name
CoreOccupied [label = <
<table>
<tr><td border="0" colspan="2" port="name"><i>enum</i> CoreOccupied</td></tr>
<tr><td></td><td port="parathread">Parathread(ParathreadEntry)</td></tr>
<tr><td></td><td port="teyrchain">Teyrchain</td></tr>
</table>
>]
CoreOccupied:parathread -> ParathreadEntry:name
AvailableData [label = <
<table>
<tr><td border="0" colspan="2" port="name">AvailableData</td></tr>
<tr><td>pov</td><td port="pov">Arc&lt;PoV&gt;</td></tr>
<tr><td>validation_data</td><td port="validation_data">PersistedValidationData</td></tr>
</table>
>]
AvailableData:pov -> PoV:name
AvailableData:validation_data -> PersistedValidationData:name
GroupRotationInfo [label = <
<table>
<tr><td border="0" colspan="2" port="name">GroupRotationInfo&lt;N = BlockNumber&gt;</td></tr>
<tr><td>session_start_block</td><td port="session_start_block">N</td></tr>
<tr><td>group_rotation_frequency</td><td port="group_rotation_frequency">N</td></tr>
<tr><td>now</td><td port="now">N</td></tr>
</table>
>]
OccupiedCore [label = <
<table>
<tr><td border="0" colspan="2" port="name">OccupiedCore&lt;H = Hash, N = BlockNumber&gt;</td></tr>
<tr><td>next_up_on_available</td><td port="next_up_on_available">Option&lt;ScheduledCore&gt;</td></tr>
<tr><td>occupied_since</td><td port="occupied_since">N</td></tr>
<tr><td>time_out_at</td><td port="time_out_at">N</td></tr>
<tr><td>next_up_on_time_out</td><td port="next_up_on_time_out">Option&lt;ScheduledCore&gt;</td></tr>
<tr><td>availability</td><td port="availability">BitVec</td></tr>
<tr><td>group_responsible</td><td port="group_responsible">GroupIndex</td></tr>
<tr><td>candidate_hash</td><td port="candidate_hash">CandidateHash</td></tr>
<tr><td>candidate_descriptor</td><td port="candidate_descriptor">CandidateDescriptor</td></tr>
</table>
>]
OccupiedCore:next_up_on_available -> ScheduledCore:name
OccupiedCore:next_up_on_time_out -> ScheduledCore:name
OccupiedCore:group_responsible -> GroupIndex
OccupiedCore:candidate_hash -> CandidateHash
OccupiedCore:candidate_descriptor -> CandidateDescriptor:name
ScheduledCore [label = <
<table>
<tr><td border="0" colspan="2" port="name">ScheduledCore</td></tr>
<tr><td>para_id</td><td port="para_id">Id</td></tr>
<tr><td>collator</td><td port="collator">Option&lt;CollatorId&gt;</td></tr>
</table>
>]
ScheduledCore:para_id -> Id:w
ScheduledCore:collator -> CollatorId:w
CoreState [label = <
<table>
<tr><td border="0" colspan="2" port="name"><i>enum</i> CoreState&lt;H = Hash, N = BlockNumber&gt;</td></tr>
<tr><td></td><td port="occupied">Occupied(OccupiedCore&lt;H, N&gt;)</td></tr>
<tr><td></td><td port="scheduled">Scheduled(ScheduledCore)</td></tr>
<tr><td></td><td port="free">Free</td></tr>
</table>
>]
CoreState:occupied -> OccupiedCore:name
CoreState:scheduled -> ScheduledCore:name
CandidateEvent [label = <
<table>
<tr><td border="0" colspan="2" port="name"><i>enum</i> CandidateEvent&lt;H = Hash&gt;</td></tr>
<tr><td></td><td port="CandidateBacked">CandidateBacked(CandidateReceipt&lt;H&gt;, HeadData)</td></tr>
<tr><td></td><td port="CandidateIncluded">CandidateIncluded(CandidateReceipt&lt;H&gt;, HeadData)</td></tr>
<tr><td></td><td port="CandidateTimedOut">CandidateTimedOut(CandidateReceipt&lt;H&gt;, HeadData)</td></tr>
</table>
>]
CandidateEvent:e -> CandidateReceipt:name
CandidateEvent:e -> HeadData:w
SessionInfo [label = <
<table>
<tr><td border="0" colspan="2" port="name">SessionInfo</td></tr>
<tr><td>validators</td><td port="validators">Vec&lt;ValidatorId&gt;</td></tr>
<tr><td>discovery_keys</td><td port="discovery_keys">Vec&lt;AuthorityDiscoveryId&gt;</td></tr>
<tr><td>assignment_keys</td><td port="assignment_keys">Vec&lt;AssignmentId&gt;</td></tr>
<tr><td>validator_groups</td><td port="validator_groups">Vec&lt;Vec&lt;ValidatorIndex&gt;&gt;</td></tr>
<tr><td>n_cores</td><td port="n_cores">u32</td></tr>
<tr><td>zeroth_delay_tranche_width</td><td port="zeroth_delay_tranch_width">u32</td></tr>
<tr><td>relay_vrf_modulo_samples</td><td port="relay_vrf_modulo_samples">u32</td></tr>
<tr><td>n_delay_tranches</td><td port="n_delay_tranches">u32</td></tr>
<tr><td>no_show_slots</td><td port="no_show_slots">u32</td></tr>
<tr><td>needed_approvals</td><td port="needed_approvals">u32</td></tr>
</table>
>]
SessionInfo:validators -> ValidatorId:w
SessionInfo:discovery_keys -> AuthorityDiscoveryId:w
SessionInfo:validator_groups -> ValidatorIndex:w
ValidatorId [label = "pezkuwi_primitives::v2::ValidatorId"]
AuthorityDiscoveryId [label = "sp_authority_discovery::AuthorityId"]
ValidatorIndex [label = "pezkuwi_primitives::v2::ValidatorIndex"]
AbridgedHostConfiguration [label = <
<table>
<tr><td border="0" colspan="2" port="name">AbridgedHostConfiguration</td></tr>
<tr><td>max_code_size</td><td port="max_code_size">u32</td></tr>
<tr><td>max_head_data_size</td><td port="max_head_data_size">u32</td></tr>
<tr><td>max_upward_queue_count</td><td port="max_upward_queue_count">u32</td></tr>
<tr><td>max_upward_queue_size</td><td port="max_upward_queue_size">u32</td></tr>
<tr><td>max_upward_message_size</td><td port="max_upward_message_size">u32</td></tr>
<tr><td>max_upward_messages_num_per_candidate</td><td port="max_upward_messages_num_per_candidate">u32</td></tr>
<tr><td>hrmp_max_message_num_per_candidate</td><td port="hrmp_max_message_num_per_candidate">u32</td></tr>
<tr><td>validation_upgrade_cooldown</td><td port="validation_upgrade_cooldown">BlockNumber</td></tr>
<tr><td>validation_upgrade_delay</td><td port="validation_upgrade_delay">BlockNumber</td></tr>
</table>
>]
AbridgedHrmpChannel [label = <
<table>
<tr><td border="0" colspan="2" port="name">AbridgedHrmpChannel</td></tr>
<tr><td>max_capacity</td><td port="max_capacity">u32</td></tr>
<tr><td>max_total_size</td><td port="max_total_size">u32</td></tr>
<tr><td>max_message_size</td><td port="max_message_size">u32</td></tr>
<tr><td>msg_count</td><td port="msg_count">u32</td></tr>
<tr><td>total_size</td><td port="total_size">u32</td></tr>
<tr><td>mqc_head</td><td port="mqc_head">Option&lt;Hash&gt;</td></tr>
</table>
>]
AbridgedHrmpChannel:mqc_head -> MQCHash
}
```
These data types are defined in `pezkuwi/teyrchain/src/primitives.rs`:
```dot process
digraph {
rankdir = LR;
node [shape = plain]
HeadData [label = <
<table>
<tr><td border="0" colspan="2" port="name">HeadData</td></tr>
<tr><td>0</td><td port="0">Vec&lt;u8&gt;</td></tr>
</table>
>]
ValidationCode [label = <
<table>
<tr><td border="0" colspan="2" port="name">ValidationCode</td></tr>
<tr><td>0</td><td port="0">Vec&lt;u8&gt;</td></tr>
</table>
>]
BlockData [label = <
<table>
<tr><td border="0" colspan="2" port="name">BlockData</td></tr>
<tr><td>0</td><td port="0">Vec&lt;u8&gt;</td></tr>
</table>
>]
Id [label = <
<table>
<tr><td border="0" colspan="2" port="name">Id</td></tr>
<tr><td>0</td><td port="0">u32</td></tr>
</table>
>]
Sibling [label = <
<table>
<tr><td border="0" colspan="2" port="name">Sibling</td></tr>
<tr><td>0</td><td port="0">Id</td></tr>
</table>
>]
Sibling:0 -> Id:name
HrmpChannelId [label = <
<table>
<tr><td border="0" colspan="2" port="name">HrmpChannelId</td></tr>
<tr><td>sender</td><td port="sender">Id</td></tr>
<tr><td>recipient</td><td port="recipient">Id</td></tr>
</table>
>]
HrmpChannelId:e -> Id:name
ValidationParams [label = <
<table>
<tr><td border="0" colspan="2" port="name">ValidationParams</td></tr>
<tr><td>parent_head</td><td port="parent_head">HeadData</td></tr>
<tr><td>block_data</td><td port="block_data">BlockData</td></tr>
<tr><td>relay_parent_number</td><td port="relay_parent_number">RelayChainBlockNumber</td></tr>
<tr><td>relay_parent_storage_root</td><td port="relay_parent_storage_root">Hash</td></tr>
</table>
>]
ValidationParams:parent_head -> HeadData:name
ValidationParams:block_data -> BlockData:name
ValidationParams:relay_parent_number -> RelayChainBlockNumber:w
RelayChainBlockNumber [label = "pezkuwi_core_primitives::BlockNumber"]
ValidationResult [label = <
<table>
<tr><td border="0" colspan="2" port="name">ValidationResult</td></tr>
<tr><td>head_data</td><td port="head_data">HeadData</td></tr>
<tr><td>new_validation_code</td><td port="new_validation_code">Option&lt;ValidationCode&gt;</td></tr>
<tr><td>upward_messages</td><td port="upward_messages">Vec&lt;UpwardMessage&gt;</td></tr>
<tr><td>horizontal_messages</td><td port="horizontal_messages">Vec&lt;OutboundHrmpMessage&lt;Id&gt;&gt;</td></tr>
<tr><td>processed_downward_messages</td><td port="processed_downward_messages">u32</td></tr>
<tr><td>hrmp_watermark</td><td port="hrmp_watermark">RelayChainBlockNumber</td></tr>
</table>
>]
ValidationResult:head_data -> HeadData:name
ValidationResult:new_validation_code -> ValidationCode:name
ValidationResult:upward_messages -> UpwardMessage:w
ValidationResult:horizontal_messages -> OutboundHrmpMessage:w
ValidationResult:horizontal_messages -> Id:name
ValidationResult:hrmp_watermark -> RelayChainBlockNumber:w
UpwardMessage [label = "Vec<u8>"]
OutboundHrmpMessage [label = "pezkuwi_core_primitives::OutboundHrmpMessage"]
}
```
@@ -0,0 +1,137 @@
# Approval Types
## `AssignmentId`
The public key of a keypair used by a validator for determining assignments to approve included teyrchain candidates.
## `AssignmentCert`
An `AssignmentCert`, short for Assignment Certificate, is a piece of data provided by a validator to prove that they
have been selected to perform approval checks on an included candidate.
These certificates can be checked in the context of a specific block, candidate, and validator assignment VRF key. The
block state will also provide further context about the availability core states at that block.
```rust
enum AssignmentCertKind {
RelayVRFModulo {
sample: u32,
},
RelayVRFDelay {
core_index: CoreIndex,
}
}
enum AssignmentCertKindV2 {
/// Multiple assignment stories based on the VRF that authorized the relay-chain block where the
/// candidates were included.
///
/// The context is [`v2::RELAY_VRF_MODULO_CONTEXT`]
RelayVRFModuloCompact {
/// A bitfield representing the core indices claimed by this assignment.
core_bitfield: CoreBitfield,
},
/// An assignment story based on the VRF that authorized the relay-chain block where the
/// candidate was included combined with the index of a particular core.
///
/// The context is [`v2::RELAY_VRF_DELAY_CONTEXT`]
RelayVRFDelay {
/// The core index chosen in this cert.
core_index: CoreIndex,
},
/// Deprecated assignment. Soon to be removed.
///
/// An assignment story based on the VRF that authorized the relay-chain block where the
/// candidate was included combined with a sample number.
///
/// The context used to produce bytes is [`v1::RELAY_VRF_MODULO_CONTEXT`]
RelayVRFModulo {
/// The sample number used in this cert.
sample: u32,
},
}
struct AssignmentCert {
// The criterion which is claimed to be met by this cert.
kind: AssignmentCertKind,
// The VRF showing the criterion is met.
vrf: (VRFPreOut, VRFProof),
}
```
> TODO: `RelayEquivocation` cert. Probably can only be broadcast to chains that have handled an equivocation report.
## `IndirectAssignmentCert`
An assignment cert which refers to the candidate under which the assignment is relevant by block hash.
```rust
struct IndirectAssignmentCert {
// A block hash where the candidate appears.
block_hash: Hash,
validator: ValidatorIndex,
cert: AssignmentCert,
}
```
## `ApprovalVote`
A vote of approval on a candidate.
```rust
struct ApprovalVote(Hash);
```
## `SignedApprovalVote`
An approval vote signed with a validator's key. This should be verifiable under the `ValidatorId` corresponding to the
`ValidatorIndex` of the session, which should be implicit from context.
```rust
struct SignedApprovalVote {
vote: ApprovalVote,
validator: ValidatorIndex,
signature: ValidatorSignature,
}
```
## `IndirectSignedApprovalVote`
A signed approval vote which references the candidate indirectly via the block. If there exists a look-up to the
candidate hash from the block hash and candidate index, then this can be transformed into a `SignedApprovalVote`.
Although this vote references the candidate by a specific block hash and candidate index, the signature is computed on
the actual `SignedApprovalVote` payload.
```rust
struct IndirectSignedApprovalVote {
// A block hash where the candidate appears.
block_hash: Hash,
// The index of the candidate in the list of candidates fully included as-of the block.
candidate_index: CandidateIndex,
validator: ValidatorIndex,
signature: ValidatorSignature,
}
```
## `CheckedAssignmentCert`
An assignment cert which has checked both the VRF and the validity of the implied assignment according to the selection
criteria rules of the protocol. This type should be declared in such a way as to be instantiable only when the checks
have actually been done. Fields should be accessible via getters, not direct struct access.
```rust
struct CheckedAssignmentCert {
cert: AssignmentCert,
validator: ValidatorIndex,
relay_block: Hash,
candidate_hash: Hash,
delay_tranche: DelayTranche,
}
```
## `DelayTranche`
```rust
type DelayTranche = u32;
```
@@ -0,0 +1,71 @@
# Availability
One of the key roles of validators is to ensure availability of all data necessary to validate candidates for the
duration of a challenge period. This is done via an erasure-coding of the data to keep available.
## Signed Availability Bitfield
A bitfield [signed](backing.md#signed-wrapper) by a particular validator about the availability of pending candidates.
```rust
type SignedAvailabilityBitfield = Signed<Bitvec>;
struct Bitfields(Vec<(SignedAvailabilityBitfield)>), // bitfields sorted by validator index, ascending
```
### Semantics
A `SignedAvailabilityBitfield` represents the view from a particular validator's perspective. Each bit in the bitfield
corresponds to a single [availability core](../runtime-api/availability-cores.md). A `1` bit indicates that the
validator believes the following statements to be true for a core:
- the availability core is occupied
- there exists a [`CommittedCandidateReceipt`](candidate.html#committed-candidate-receipt) corresponding to that core.
In other words, that para has a block in progress.
- the validator's [Availability Store](../node/utility/availability-store.md) contains a chunk of that parablock's PoV.
In other words, it is the transpose of [`OccupiedCore::availability`](../runtime-api/availability-cores.md).
## Proof-of-Validity
Often referred to as PoV, this is a type-safe wrapper around bytes (`Vec<u8>`) when referring to data that acts as a
stateless-client proof of validity of a candidate, when used as input to the validation function of the para.
```rust
struct PoV(Vec<u8>);
```
## Available Data
This is the data we want to keep available for each [candidate](candidate.md) included in the relay chain. This is the
PoV of the block, as well as the [`PersistedValidationData`](candidate.md#persistedvalidationdata)
```rust
struct AvailableData {
/// The Proof-of-Validation of the candidate.
pov: Arc<PoV>,
/// The persisted validation data used to check the candidate.
validation_data: PersistedValidationData,
}
```
> TODO: With XCMP, we also need to keep available the outgoing messages as a result of para-validation.
## Erasure Chunk
The [`AvailableData`](#availabledata) is split up into an erasure-coding as part of the availability process. Each
validator gets a chunk. This describes one of those chunks, along with its proof against a merkle root hash, which
should be apparent from context, and is the `erasure_root` field of a
[`CandidateDescriptor`](candidate.md#candidatedescriptor).
```rust
struct ErasureChunk {
/// The erasure-encoded chunk of data belonging to the candidate block.
chunk: Vec<u8>,
/// The index of this erasure-encoded chunk of data.
index: u32,
/// Proof for this chunk's branch in the Merkle tree.
proof: Vec<Vec<u8>>,
}
```
@@ -0,0 +1,127 @@
# Backing Types
[Candidates](candidate.md) go through many phases before being considered included in a fork of the relay chain and
eventually accepted.
These types describe the data used in the backing phase. Some are sent over the wire within subsystems, and some are
simply included in the relay-chain block.
## Validity Attestation
An attestation of validity for a candidate, used as part of a backing. Both the `Seconded` and `Valid` statements are
considered attestations of validity. This structure is only useful where the candidate referenced is apparent.
```rust
enum ValidityAttestation {
/// Implicit validity attestation by issuing.
/// This corresponds to issuance of a `Seconded` statement.
Implicit(ValidatorSignature),
/// An explicit attestation. This corresponds to issuance of a
/// `Valid` statement.
Explicit(ValidatorSignature),
}
```
## Signed Wrapper
There are a few distinct types which we desire to sign, and validate the signatures of. Instead of duplicating this
work, we extract a signed wrapper.
```rust,ignore
/// A signed type which encapsulates the common desire to sign some data and validate a signature.
///
/// Note that the internal fields are not public; they are all accessible by immutable getters.
/// This reduces the chance that they are accidentally mutated, invalidating the signature.
struct Signed<Payload, RealPayload=Payload> {
/// The payload is part of the signed data. The rest is the signing context,
/// which is known both at signing and at validation.
payload: Payload,
/// The index of the validator signing this statement.
validator_index: ValidatorIndex,
/// The signature by the validator of the signed payload.
signature: ValidatorSignature,
}
impl<Payload: EncodeAs<RealPayload>, RealPayload: Encode> Signed<Payload, RealPayload> {
fn sign(payload: Payload, context: SigningContext, index: ValidatorIndex, key: ValidatorPair) -> Signed<Payload, RealPayload> { ... }
fn validate(&self, context: SigningContext, key: ValidatorId) -> bool { ... }
}
```
Note the presence of the [`SigningContext`](../types/candidate.md#signing-context) in the signatures of the `sign` and
`validate` methods. To ensure cryptographic security, the actual signed payload is always the SCALE encoding of
`(payload.into(), signing_context)`. Including the signing context prevents replay attacks.
`EncodeAs` is a helper trait with a blanket impl which ensures that any `T` can `EncodeAs<T>`. Therefore, for the
generic case where `RealPayload = Payload`, it changes nothing. However, we `impl EncodeAs<CompactStatement> for
Statement`, which helps efficiency.
## Statement Type
The [Candidate Backing subsystem](../node/backing/candidate-backing.md) issues and signs these after candidate
validation.
```rust
/// A statement about the validity of a teyrchain candidate.
enum Statement {
/// A statement about a new candidate being seconded by a validator. This is an implicit validity vote.
///
/// The main semantic difference between `Seconded` and `Valid` comes from the fact that every validator may
/// second only 1 candidate; this places an upper bound on the total number of candidates whose validity
/// needs to be checked. A validator who seconds more than 1 teyrchain candidate per relay head is subject
/// to slashing.
Seconded(CommittedCandidateReceipt),
/// A statement about the validity of a candidate, based on candidate's hash.
Valid(Hash),
}
/// A statement about the validity of a teyrchain candidate.
///
/// This variant should only be used in the production of `SignedStatement`s. The only difference between
/// this enum and `Statement` is that the `Seconded` variant contains a `Hash` instead of a `CandidateReceipt`.
/// The rationale behind the difference is that the signature should always be on the hash instead of the
/// full data, as this lowers the requirement for checking while retaining necessary cryptographic properties
enum CompactStatement {
/// A statement about a new candidate being seconded by a validator. This is an implicit validity vote.
Seconded(Hash),
/// A statement about the validity of a candidate, based on candidate's hash.
Valid(Hash),
}
```
`CompactStatement` exists because a `CandidateReceipt` includes `HeadData`, which does not have a bounded size.
## Signed Statement Type
A statement which has been [cryptographically signed](#signed-wrapper) by a validator.
```rust
/// A signed statement, containing the committed candidate receipt in the `Seconded` variant.
pub type SignedFullStatement = Signed<Statement, CompactStatement>;
/// A signed statement, containing only the hash.
pub type SignedStatement = Signed<CompactStatement>;
```
Munging the signed `Statement` into a `CompactStatement` before signing allows the candidate receipt itself to be
omitted when checking a signature on a `Seconded` statement.
## Backed Candidate
An [`CommittedCandidateReceipt`](candidate.md#committed-candidate-receipt) along with all data necessary to prove its
backing. This is submitted to the relay-chain to process and move along the candidate to the pending-availability stage.
```rust
struct BackedCandidate {
candidate: CommittedCandidateReceipt,
validity_votes: Vec<ValidityAttestation>,
// the indices of validators who signed the candidate within the group. There is no need to include
// bit for any validators who are not in the group, so this is more compact.
// The number of bits is the number of validators in the group.
//
// the group should be apparent from context.
validator_indices: BitVec,
}
struct BackedCandidates(Vec<BackedCandidate>); // sorted by para-id.
```
@@ -0,0 +1,190 @@
# Candidate Types
Para candidates are some of the most common types, both within the runtime and on the Node-side. Candidates are the
fundamental datatype for advancing teyrchains, encapsulating the collator's signature, the context of the parablock, the
commitments to the output, and a commitment to the data which proves it valid.
In a way, this entire guide is about these candidates: how they are scheduled, constructed, backed, included, and
challenged.
This section will describe the base candidate type, its components, and variants that contain extra data.
## Para Id
A unique 32-bit identifier referring to a specific para (chain or thread). The relay-chain runtime guarantees that
`ParaId`s are unique for the duration of any session, but recycling and reuse over a longer period of time is permitted.
```rust
struct ParaId(u32);
```
## Candidate Receipt
Compact representation of the result of a validation. This is what validators
receive from collators, together with the PoV.
```rust
/// A candidate-receipt.
struct CandidateReceipt {
/// The descriptor of the candidate.
descriptor: CandidateDescriptor,
/// The hash of the encoded commitments made as a result of candidate execution.
commitments_hash: Hash,
}
```
## Committed Candidate Receipt
This is a variant of the candidate receipt which includes the commitments of the candidate receipt alongside the
descriptor. This should be favored over the [`Candidate Receipt`](#candidate-receipt) in situations where the candidate
is not going to be executed but the actual data committed to is important. This is often the case in the backing phase.
The hash of the committed candidate receipt will be the same as the corresponding [`Candidate
Receipt`](#candidate-receipt), because it is computed by first hashing the encoding of the commitments to form a plain
[`Candidate Receipt`](#candidate-receipt).
```rust
/// A candidate-receipt with commitments directly included.
struct CommittedCandidateReceipt {
/// The descriptor of the candidate.
descriptor: CandidateDescriptor,
/// The commitments of the candidate receipt.
commitments: CandidateCommitments,
}
```
## Candidate Descriptor
This struct is pure description of the candidate, in a lightweight format.
```rust
/// A unique descriptor of the candidate receipt.
struct CandidateDescriptor {
/// The ID of the para this is a candidate for.
para_id: ParaId,
/// The hash of the relay-chain block this is executed in the context of.
relay_parent: Hash,
/// The collator's sr25519 public key.
collator: CollatorId,
/// The blake2-256 hash of the persisted validation data. These are extra parameters
/// derived from relay-chain state that influence the validity of the block which
/// must also be kept available for approval checkers.
persisted_validation_data_hash: Hash,
/// The blake2-256 hash of the `pov-block`.
pov_hash: Hash,
/// The root of a block's erasure encoding Merkle tree.
erasure_root: Hash,
/// Signature on blake2-256 of components of this receipt:
/// The teyrchain index, the relay parent, the validation data hash, and the `pov_hash`.
signature: CollatorSignature,
/// Hash of the para header that is being generated by this candidate.
para_head: Hash,
/// The blake2-256 hash of the validation code bytes.
validation_code_hash: ValidationCodeHash,
}
```
## `ValidationParams`
```rust
/// Validation parameters for evaluating the teyrchain validity function.
pub struct ValidationParams {
/// Previous head-data.
pub parent_head: HeadData,
/// The collation body.
pub block_data: BlockData,
/// The current relay-chain block number.
pub relay_parent_number: RelayChainBlockNumber,
/// The relay-chain block's storage root.
pub relay_parent_storage_root: Hash,
}
```
## `PersistedValidationData`
The validation data provides information about how to create the inputs for validation of a candidate. This information
is derived from the chain state and will vary from para to para, although some of the fields may be the same for every
para.
Since this data is used to form inputs to the validation function, it needs to be persisted by the availability system
to avoid dependence on availability of the relay-chain state.
Furthermore, the validation data acts as a way to authorize the additional data the collator needs to pass to the
validation function. For example, the validation function can check whether the incoming messages (e.g. downward
messages) were actually sent by using the data provided in the validation data using so called MQC heads.
Since the commitments of the validation function are checked by the relay-chain, approval checkers can rely on the
invariant that the relay-chain only includes para-blocks for which these checks have already been done. As such, there
is no need for the validation data used to inform validators and collators about the checks the relay-chain will perform
to be persisted by the availability system.
The `PersistedValidationData` should be relatively lightweight primarily because it is constructed during inclusion for
each candidate and therefore lies on the critical path of inclusion.
```rust
struct PersistedValidationData {
/// The parent head-data.
parent_head: HeadData,
/// The relay-chain block number this is in the context of. This informs the collator.
relay_parent_number: BlockNumber,
/// The relay-chain block storage root this is in the context of.
relay_parent_storage_root: Hash,
/// The list of MQC heads for the inbound channels paired with the sender para ids. This
/// vector is sorted ascending by the para id and doesn't contain multiple entries with the same
/// sender.
///
/// The HRMP MQC heads will be used by the validation function to authorize the input messages passed
/// by the collator.
hrmp_mqc_heads: Vec<(ParaId, Hash)>,
/// The maximum legal size of a POV block, in bytes.
pub max_pov_size: u32,
}
```
## `HeadData`
Head data is a type-safe abstraction around bytes (`Vec<u8>`) for the purposes of representing heads of teyrchains.
```rust
struct HeadData(Vec<u8>);
```
## Candidate Commitments
The execution and validation of teyrchain candidates produces a number of values which either must be committed to
blocks on the relay chain or committed to the state of the relay chain.
```rust
/// Commitments made in a `CandidateReceipt`. Many of these are outputs of validation.
#[derive(PartialEq, Eq, Clone, Encode, Decode)]
#[cfg_attr(feature = "std", derive(Debug, Default))]
struct CandidateCommitments {
/// Messages directed to other paras routed via the relay chain.
horizontal_messages: Vec<OutboundHrmpMessage>,
/// Messages destined to be interpreted by the Relay chain itself.
upward_messages: Vec<UpwardMessage>,
/// New validation code.
new_validation_code: Option<ValidationCode>,
/// The head-data produced as a result of execution.
head_data: HeadData,
/// The number of messages processed from the DMQ.
processed_downward_messages: u32,
/// The mark which specifies the block number up to which all inbound HRMP messages are processed.
hrmp_watermark: BlockNumber,
}
```
## Signing Context
This struct provides context to signatures by combining with various payloads to localize the signature to a particular
session index and relay-chain hash. Having these fields included in the signature makes misbehavior attribution much
simpler.
```rust
struct SigningContext {
/// The relay-chain block hash this signature is in the context of.
parent_hash: Hash,
/// The session index this signature is in the context of.
session_index: SessionIndex,
}
```
@@ -0,0 +1,91 @@
# Disputes
## `DisputeStatementSet`
```rust
/// A set of statements about a specific candidate.
struct DisputeStatementSet {
candidate_hash: CandidateHash,
session: SessionIndex,
statements: Vec<(DisputeStatement, ValidatorIndex, ValidatorSignature)>,
}
```
## `DisputeStatement`
```rust
/// A statement about a candidate, to be used within some dispute resolution process.
///
/// Statements are either in favor of the candidate's validity or against it.
enum DisputeStatement {
/// A valid statement, of the given kind
Valid(ValidDisputeStatementKind),
/// An invalid statement, of the given kind.
Invalid(InvalidDisputeStatementKind),
}
```
## Dispute Statement Kinds
Kinds of dispute statements. Each of these can be combined with a candidate hash, session index, validator public key,
and validator signature to reproduce and check the original statement.
```rust
enum ValidDisputeStatementKind {
Explicit,
BackingSeconded(Hash),
BackingValid(Hash),
ApprovalChecking,
}
enum InvalidDisputeStatementKind {
Explicit,
}
```
## `ExplicitDisputeStatement`
```rust
struct ExplicitDisputeStatement {
valid: bool,
candidate_hash: CandidateHash,
session: SessionIndex,
}
```
## `MultiDisputeStatementSet`
Sets of statements for many (zero or more) disputes.
```rust
type MultiDisputeStatementSet = Vec<DisputeStatementSet>;
```
## `DisputeState`
```rust
struct DisputeState {
validators_for: Bitfield, // one bit per validator.
validators_against: Bitfield, // one bit per validator.
start: BlockNumber,
concluded_at: Option<BlockNumber>,
}
```
## `ScrapedOnChainVotes`
```rust
/// Type for transcending recorded on-chain
/// dispute relevant votes and conclusions to
/// the off-chain `DisputesCoordinator`.
struct ScrapedOnChainVotes {
/// The session index at which the block was included.
session: SessionIndex,
/// The backing and seconding validity attestations for all candidates, providing the full candidate receipt.
backing_validators_per_candidate: Vec<(CandidateReceipt<H>, Vec<(ValidatorIndex, ValidityAttestation)>)>
/// Set of concluded disputes that were recorded
/// on chain within the inherent.
disputes: MultiDisputeStatementSet,
}
```
@@ -0,0 +1,74 @@
# Message types
Types of messages that are passed between teyrchains and the relay chain: UMP, DMP, XCMP.
There is also HRMP (Horizontally Relay-routed Message Passing) which provides the same functionality
although with smaller scalability potential.
## Vertical Message Passing
Types required for message passing between the relay-chain and a teyrchain.
Actual contents of the messages is specified by the XCM standard.
```rust,ignore
/// A message sent from a teyrchain to the relay-chain.
type UpwardMessage = Vec<u8>;
/// A message sent from the relay-chain down to a teyrchain.
///
/// The size of the message is limited by the `config.max_downward_message_size`
/// parameter.
type DownwardMessage = Vec<u8>;
/// This struct extends `DownwardMessage` by adding the relay-chain block number when the message was
/// enqueued in the downward message queue.
struct InboundDownwardMessage {
/// The block number at which this messages was put into the downward message queue.
pub sent_at: BlockNumber,
/// The actual downward message to processes.
pub msg: DownwardMessage,
}
```
## Horizontal Message Passing
## HrmpChannelId
A type that uniquely identifies an HRMP channel. An HRMP channel is established between two paras.
In text, we use the notation `(A, B)` to specify a channel between A and B. The channels are
unidirectional, meaning that `(A, B)` and `(B, A)` refer to different channels. The convention is
that we use the first item tuple for the sender and the second for the recipient. Only one channel
is allowed between two participants in one direction, i.e. there cannot be 2 different channels
identified by `(A, B)`.
```rust,ignore
struct HrmpChannelId {
sender: ParaId,
recipient: ParaId,
}
```
## Horizontal Message
This is a message sent from a teyrchain to another teyrchain that travels through the relay chain.
This message ends up in the recipient's mailbox. A size of a horizontal message is defined by its
`data` payload.
```rust,ignore
struct OutboundHrmpMessage {
/// The para that will get this message in its downward message queue.
pub recipient: ParaId,
/// The message payload.
pub data: Vec<u8>,
}
struct InboundHrmpMessage {
/// The block number at which this message was sent.
/// Specifically, it is the block number at which the candidate that sends this message was
/// enacted.
pub sent_at: BlockNumber,
/// The message payload.
pub data: Vec<u8>,
}
```
@@ -0,0 +1,195 @@
# Network Types
These types are those that are actually sent over the network to subsystems.
## Universal Types
```rust
type RequestId = u64;
type ProtocolVersion = u32;
struct PeerId(...); // opaque, unique identifier of a peer.
struct View {
// Up to `N` (5?) chain heads.
heads: Vec<Hash>,
// The number of the finalized block.
finalized_number: BlockNumber,
}
enum ObservedRole {
Full,
Light,
}
```
## V1 Network Subsystem Message Types
### Approval Distribution V1
```rust
enum ApprovalDistributionV1Message {
/// Assignments for candidates in recent, unfinalized blocks.
///
/// The u32 is the claimed index of the candidate this assignment corresponds to. Actually checking the assignment
/// may yield a different result.
Assignments(Vec<(IndirectAssignmentCert, u32)>),
/// Approvals for candidates in some recent, unfinalized block.
Approvals(Vec<IndirectSignedApprovalVote>),
}
```
### Availability Distribution V1
```rust
enum AvailabilityDistributionV1Message {
/// An erasure chunk for a given candidate hash.
Chunk(CandidateHash, ErasureChunk),
}
```
### Availability Recovery V1
```rust
enum AvailabilityRecoveryV1Message {
/// Request a chunk for a given candidate hash and validator index.
RequestChunk(RequestId, CandidateHash, ValidatorIndex),
/// Respond with chunk for a given candidate hash and validator index.
/// The response may be `None` if the requestee does not have the chunk.
Chunk(RequestId, Option<ErasureChunk>),
/// Request the full data for a given candidate hash.
RequestFullData(RequestId, CandidateHash),
/// Respond with data for a given candidate hash and validator index.
/// The response may be `None` if the requestee does not have the data.
FullData(RequestId, Option<AvailableData>),
}
```
### Bitfield Distribution V1
```rust
enum BitfieldDistributionV1Message {
/// A signed availability bitfield for a given relay-parent hash.
Bitfield(Hash, SignedAvailabilityBitfield),
}
```
### PoV Distribution V1
```rust
enum PoVDistributionV1Message {
/// Notification that we are awaiting the given PoVs (by hash) against a
/// specific relay-parent hash.
Awaiting(Hash, Vec<Hash>),
/// Notification of an awaited PoV, in a given relay-parent context.
/// (`relay_parent`, `pov_hash`, `pov`)
SendPoV(Hash, Hash, PoV),
}
```
### Statement Distribution V1
```rust
enum StatementDistributionV1Message {
/// A signed full statement under a given relay-parent.
Statement(Hash, SignedFullStatement)
}
```
### Collator Protocol V1
```rust
enum CollatorProtocolV1Message {
/// Declare the intent to advertise collations under a collator ID and `Para`, attaching a
/// signature of the `PeerId` of the node using the given collator ID key.
Declare(CollatorId, ParaId, CollatorSignature),
/// Advertise a collation to a validator. Can only be sent once the peer has
/// declared that they are a collator with given ID.
AdvertiseCollation(Hash),
/// A collation sent to a validator was seconded.
CollationSeconded(SignedFullStatement),
}
```
## V1 Wire Protocols
### Validation V1
These are the messages for the protocol on the validation peer-set.
```rust
enum ValidationProtocolV1 {
ApprovalDistribution(ApprovalDistributionV1Message),
AvailabilityDistribution(AvailabilityDistributionV1Message),
AvailabilityRecovery(AvailabilityRecoveryV1Message),
BitfieldDistribution(BitfieldDistributionV1Message),
PoVDistribution(PoVDistributionV1Message),
StatementDistribution(StatementDistributionV1Message),
}
```
### Collation V1
These are the messages for the protocol on the collation peer-set
```rust
enum CollationProtocolV1 {
CollatorProtocol(CollatorProtocolV1Message),
}
```
## Network Bridge Event
These updates are posted from the [Network Bridge Subsystem](../node/utility/network-bridge.md) to other subsystems
based on registered listeners.
```rust
struct NewGossipTopology {
/// The session index this topology corresponds to.
session: SessionIndex,
/// The topology itself.
topology: SessionGridTopology,
/// The local validator index, if any.
local_index: Option<ValidatorIndex>,
}
struct SessionGridTopology {
/// An array mapping validator indices to their indices in the
/// shuffling itself. This has the same size as the number of validators
/// in the session.
shuffled_indices: Vec<usize>,
/// The canonical shuffling of validators for the session.
canonical_shuffling: Vec<TopologyPeerInfo>,
}
struct TopologyPeerInfo {
/// The validator's known peer IDs.
peer_ids: Vec<PeerId>,
/// The index of the validator in the discovery keys of the corresponding
/// `SessionInfo`. This can extend _beyond_ the set of active teyrchain validators.
validator_index: ValidatorIndex,
/// The authority discovery public key of the validator in the corresponding
/// `SessionInfo`.
discovery_id: AuthorityDiscoveryId,
}
enum NetworkBridgeEvent<M> {
/// A peer with given ID is now connected.
PeerConnected(PeerId, ObservedRole, ProtocolVersion, Option<HashSet<AuthorityDiscoveryId>>),
/// A peer with given ID is now disconnected.
PeerDisconnected(PeerId),
/// Our neighbors in the new gossip topology.
/// We're not necessarily connected to all of them.
///
/// This message is issued only on the validation peer set.
///
/// Note, that the distribution subsystems need to handle the last
/// view update of the newly added gossip peers manually.
NewGossipTopology(NewGossipTopology),
/// We received a message from the given peer.
PeerMessage(PeerId, M),
/// The given peer has updated its description of its view.
PeerViewChange(PeerId, View), // guaranteed to come after peer connected event.
/// We have posted the given view update to all connected peers.
OurViewChange(View),
}
```
@@ -0,0 +1,936 @@
# Overseer Protocol
This chapter contains message types sent to and from the overseer, and the underlying subsystem message types that are
transmitted using these.
## Overseer Signal
Signals from the overseer to a subsystem to request change in execution that has to be obeyed by the subsystem.
```rust
enum OverseerSignal {
/// Signal about a change in active leaves.
ActiveLeavesUpdate(ActiveLeavesUpdate),
/// Signal about a new best finalized block.
BlockFinalized(Hash),
/// Conclude all operation.
Conclude,
}
```
All subsystems have their own message types; all of them need to be able to listen for overseer signals as well. There
are currently two proposals for how to handle that with unified communication channels:
1. Retaining the `OverseerSignal` definition above, add `enum FromOrchestra<T> {Signal(OverseerSignal), Message(T)}`.
1. Add a generic variant to `OverseerSignal`: `Message(T)`.
Either way, there will be some top-level type encapsulating messages from the overseer to each subsystem.
## Active Leaves Update
Indicates a change in active leaves. Activated leaves should have jobs, whereas deactivated leaves should lead to
winding-down of work based on those leaves.
```rust
struct ActiveLeavesUpdate {
activated: [(Hash, Number)],
deactivated: [Hash],
}
```
## All Messages
A message type tying together all message types that are used across Subsystems.
```rust
enum AllMessages {
CandidateValidation(CandidateValidationMessage),
CandidateBacking(CandidateBackingMessage),
ChainApi(ChainApiMessage),
CollatorProtocol(CollatorProtocolMessage),
StatementDistribution(StatementDistributionMessage),
AvailabilityDistribution(AvailabilityDistributionMessage),
AvailabilityRecovery(AvailabilityRecoveryMessage),
BitfieldDistribution(BitfieldDistributionMessage),
BitfieldSigning(BitfieldSigningMessage),
Provisioner(ProvisionerMessage),
RuntimeApi(RuntimeApiMessage),
AvailabilityStore(AvailabilityStoreMessage),
NetworkBridge(NetworkBridgeMessage),
CollationGeneration(CollationGenerationMessage),
ApprovalVoting(ApprovalVotingMessage),
ApprovalDistribution(ApprovalDistributionMessage),
GossipSupport(GossipSupportMessage),
DisputeCoordinator(DisputeCoordinatorMessage),
ChainSelection(ChainSelectionMessage),
PvfChecker(PvfCheckerMessage),
}
```
## Approval Voting Message
Messages received by the approval voting subsystem.
```rust
enum AssignmentCheckResult {
// The vote was accepted and should be propagated onwards.
Accepted,
// The vote was valid but duplicate and should not be propagated onwards.
AcceptedDuplicate,
// The vote was valid but too far in the future to accept right now.
TooFarInFuture,
// The vote was bad and should be ignored, reporting the peer who propagated it.
Bad(AssignmentCheckError),
}
pub enum AssignmentCheckError {
UnknownBlock(Hash),
UnknownSessionIndex(SessionIndex),
InvalidCandidateIndex(CandidateIndex),
InvalidCandidate(CandidateIndex, CandidateHash),
InvalidCert(ValidatorIndex),
Internal(Hash, CandidateHash),
}
enum ApprovalCheckResult {
// The vote was accepted and should be propagated onwards.
Accepted,
// The vote was bad and should be ignored, reporting the peer who propagated it.
Bad(ApprovalCheckError),
}
pub enum ApprovalCheckError {
UnknownBlock(Hash),
UnknownSessionIndex(SessionIndex),
InvalidCandidateIndex(CandidateIndex),
InvalidValidatorIndex(ValidatorIndex),
InvalidCandidate(CandidateIndex, CandidateHash),
InvalidSignature(ValidatorIndex),
NoAssignment(ValidatorIndex),
Internal(Hash, CandidateHash),
}
enum ApprovalVotingMessage {
/// Import an assignment into the approval-voting database.
///
/// Should not be sent unless the block hash is known and the VRF assignment checks out.
ImportAssignment(CheckedIndirectAssignment, Option<oneshot::Sender<AssignmentCheckResult>>),
/// Import an approval vote into approval-voting database
///
/// Should not be sent unless the block hash within the indirect vote is known, vote is
/// correctly signed and we had a previous assignment for the candidate.
ImportApproval(CheckedIndirectSignedApprovalVote, Option<oneshot::Sender<ApprovalCheckResult>>),
/// Returns the highest possible ancestor hash of the provided block hash which is
/// acceptable to vote on finality for. Along with that, return the lists of candidate hashes
/// which appear in every block from the (non-inclusive) base number up to (inclusive) the specified
/// approved ancestor.
/// This list starts from the highest block (the approved ancestor itself) and moves backwards
/// towards the base number.
///
/// The base number is typically the number of the last finalized block, but in GRANDPA it is
/// possible for the base to be slightly higher than the last finalized block.
///
/// The `BlockNumber` provided is the number of the block's ancestor which is the
/// earliest possible vote.
///
/// It can also return the same block hash, if that is acceptable to vote upon.
/// Return `None` if the input hash is unrecognized.
ApprovedAncestor {
target_hash: Hash,
base_number: BlockNumber,
rx: ResponseChannel<Option<(Hash, BlockNumber, Vec<(Hash, Vec<CandidateHash>)>)>>
},
}
```
## Approval Distribution Message
Messages received by the approval distribution subsystem.
```rust
/// Metadata about a block which is now live in the approval protocol.
struct BlockApprovalMeta {
/// The hash of the block.
hash: Hash,
/// The number of the block.
number: BlockNumber,
/// The candidates included by the block. Note that these are not the same as the candidates that appear within the
/// block body.
parent_hash: Hash,
/// The candidates included by the block. Note that these are not the same as the candidates that appear within the
/// block body.
candidates: Vec<CandidateHash>,
/// The consensus slot of the block.
slot: Slot,
/// The session of the block.
session: SessionIndex,
}
enum ApprovalDistributionMessage {
/// Notify the `ApprovalDistribution` subsystem about new blocks and the candidates contained within
/// them.
NewBlocks(Vec<BlockApprovalMeta>),
/// Distribute an assignment cert from the local validator. The cert is assumed
/// to be valid, relevant, and for the given relay-parent and validator index.
///
/// The `u32` param is the candidate index in the fully-included list.
DistributeAssignment(IndirectAssignmentCert, u32),
/// Distribute an approval vote for the local validator. The approval vote is assumed to be
/// valid, relevant, and the corresponding approval already issued. If not, the subsystem is free to drop
/// the message.
DistributeApproval(IndirectSignedApprovalVote),
/// An update from the network bridge.
NetworkBridgeUpdate(NetworkBridgeEvent<ApprovalDistributionV1Message>),
}
```
## Availability Distribution Message
Messages received by the availability distribution subsystem.
This is a network protocol that receives messages of type
[`AvailabilityDistributionV1Message`][AvailabilityDistributionV1NetworkMessage].
```rust
enum AvailabilityDistributionMessage {
/// Incoming network request for an availability chunk.
ChunkFetchingRequest(IncomingRequest<req_res_v1::ChunkFetchingRequest>),
/// Incoming network request for a seconded PoV.
PoVFetchingRequest(IncomingRequest<req_res_v1::PoVFetchingRequest>),
/// Instruct availability distribution to fetch a remote PoV.
///
/// NOTE: The result of this fetch is not yet locally validated and could be bogus.
FetchPoV {
/// The relay parent giving the necessary context.
relay_parent: Hash,
/// Validator to fetch the PoV from.
from_validator: ValidatorIndex,
/// Candidate hash to fetch the PoV for.
candidate_hash: CandidateHash,
/// Expected hash of the PoV, a PoV not matching this hash will be rejected.
pov_hash: Hash,
/// Sender for getting back the result of this fetch.
///
/// The sender will be canceled if the fetching failed for some reason.
tx: oneshot::Sender<PoV>,
},
}
```
## Availability Recovery Message
Messages received by the availability recovery subsystem.
```rust
enum RecoveryError {
Invalid,
Unavailable,
}
enum AvailabilityRecoveryMessage {
/// Recover available data from validators on the network.
RecoverAvailableData(
CandidateReceipt,
SessionIndex,
Option<GroupIndex>, // Backing validator group to request the data directly from.
Option<CoreIndex>, /* A `CoreIndex` needs to be specified for the recovery process to
* prefer systematic chunk recovery. This is the core that the candidate
* was occupying while pending availability. */
ResponseChannel<Result<AvailableData, RecoveryError>>,
),
}
```
## Availability Store Message
Messages to and from the availability store.
```rust
pub enum AvailabilityStoreMessage {
/// Query a `AvailableData` from the AV store.
QueryAvailableData(CandidateHash, oneshot::Sender<Option<AvailableData>>),
/// Query whether a `AvailableData` exists within the AV Store.
///
/// This is useful in cases when existence
/// matters, but we don't want to necessarily pass around multiple
/// megabytes of data to get a single bit of information.
QueryDataAvailability(CandidateHash, oneshot::Sender<bool>),
/// Query an `ErasureChunk` from the AV store by the candidate hash and validator index.
QueryChunk(CandidateHash, ValidatorIndex, oneshot::Sender<Option<ErasureChunk>>),
/// Get the size of an `ErasureChunk` from the AV store by the candidate hash.
QueryChunkSize(CandidateHash, oneshot::Sender<Option<usize>>),
/// Query all chunks that we have for the given candidate hash.
QueryAllChunks(CandidateHash, oneshot::Sender<Vec<ErasureChunk>>),
/// Query whether an `ErasureChunk` exists within the AV Store.
///
/// This is useful in cases like bitfield signing, when existence
/// matters, but we don't want to necessarily pass around large
/// quantities of data to get a single bit of information.
QueryChunkAvailability(CandidateHash, ValidatorIndex, oneshot::Sender<bool>),
/// Store an `ErasureChunk` in the AV store.
///
/// Return `Ok(())` if the store operation succeeded, `Err(())` if it failed.
StoreChunk {
/// A hash of the candidate this chunk belongs to.
candidate_hash: CandidateHash,
/// The chunk itself.
chunk: ErasureChunk,
/// Sending side of the channel to send result to.
tx: oneshot::Sender<Result<(), ()>>,
},
/// Computes and checks the erasure root of `AvailableData` before storing all of its chunks in
/// the AV store.
///
/// Return `Ok(())` if the store operation succeeded, `Err(StoreAvailableData)` if it failed.
StoreAvailableData {
/// A hash of the candidate this `available_data` belongs to.
candidate_hash: CandidateHash,
/// The number of validators in the session.
n_validators: u32,
/// The `AvailableData` itself.
available_data: AvailableData,
/// Erasure root we expect to get after chunking.
expected_erasure_root: Hash,
/// Sending side of the channel to send result to.
tx: oneshot::Sender<Result<(), StoreAvailableDataError>>,
},
}
/// The error result type of a [`AvailabilityStoreMessage::StoreAvailableData`] request.
pub enum StoreAvailableDataError {
InvalidErasureRoot,
}
```
## Bitfield Distribution Message
Messages received by the bitfield distribution subsystem. This is a network protocol that receives messages of type
[`BitfieldDistributionV1Message`][BitfieldDistributionV1NetworkMessage].
```rust
enum BitfieldDistributionMessage {
/// Distribute a bitfield signed by a validator to other validators.
/// The bitfield distribution subsystem will assume this is indeed correctly signed.
DistributeBitfield(relay_parent, SignedAvailabilityBitfield),
/// Receive a network bridge update.
NetworkBridgeUpdate(NetworkBridgeEvent<BitfieldDistributionV1Message>),
}
```
## Bitfield Signing Message
Currently, the bitfield signing subsystem receives no specific messages.
```rust
/// Non-instantiable message type
enum BitfieldSigningMessage { }
```
## Candidate Backing Message
```rust
enum CandidateBackingMessage {
/// Requests a set of backable candidates attested by the subsystem.
/// The order of candidates of the same para must be preserved in the response.
/// If a backed candidate of a para cannot be retrieved, the response should not contain any
/// candidates of the same para that follow it in the input vector. In other words, assuming
/// candidates are supplied in dependency order, we must ensure that this dependency order is
/// preserved.
GetBackedCandidates(
HashMap<ParaId, Vec<(CandidateHash, Hash)>>,
oneshot::Sender<HashMap<ParaId, Vec<BackedCandidate>>>,
),
/// Note that the Candidate Backing subsystem should second the given candidate in the context of the
/// given relay-parent (ref. by hash). This candidate must be validated using the provided PoV.
/// The PoV is expected to match the `pov_hash` in the descriptor.
Second(Hash, CandidateReceipt, PoV),
/// Note a peer validator's statement about a particular candidate. Disagreements about validity must be escalated
/// to a broader check by the Disputes Subsystem, though that escalation is deferred until the approval voting
/// stage to guarantee availability. Agreements are simply tallied until a quorum is reached.
Statement(Statement),
}
```
## Chain API Message
The Chain API subsystem is responsible for providing an interface to chain data.
```rust
enum ChainApiMessage {
/// Get the block number by hash.
/// Returns `None` if a block with the given hash is not present in the db.
BlockNumber(Hash, ResponseChannel<Result<Option<BlockNumber>, Error>>),
/// Request the block header by hash.
/// Returns `None` if a block with the given hash is not present in the db.
BlockHeader(Hash, ResponseChannel<Result<Option<BlockHeader>, Error>>),
/// Get the cumulative weight of the given block, by hash.
/// If the block or weight is unknown, this returns `None`.
///
/// Weight is used for comparing blocks in a fork-choice rule.
BlockWeight(Hash, ResponseChannel<Result<Option<Weight>, Error>>),
/// Get the finalized block hash by number.
/// Returns `None` if a block with the given number is not present in the db.
/// Note: the caller must ensure the block is finalized.
FinalizedBlockHash(BlockNumber, ResponseChannel<Result<Option<Hash>, Error>>),
/// Get the last finalized block number.
/// This request always succeeds.
FinalizedBlockNumber(ResponseChannel<Result<BlockNumber, Error>>),
/// Request the `k` ancestors block hashes of a block with the given hash.
/// The response channel may return a `Vec` of size up to `k`
/// filled with ancestors hashes with the following order:
/// `parent`, `grandparent`, ...
Ancestors {
/// The hash of the block in question.
hash: Hash,
/// The number of ancestors to request.
k: usize,
/// The response channel.
response_channel: ResponseChannel<Result<Vec<Hash>, Error>>,
}
}
```
## Chain Selection Message
Messages received by the [Chain Selection subsystem](../node/utility/chain-selection.md)
```rust
enum ChainSelectionMessage {
/// Signal to the chain selection subsystem that a specific block has been approved.
Approved(Hash),
/// Request the leaves in descending order by score.
Leaves(ResponseChannel<Vec<Hash>>),
/// Request the best leaf containing the given block in its ancestry. Return `None` if
/// there is no such leaf.
BestLeafContaining(Hash, ResponseChannel<Option<Hash>>),
}
```
## Collator Protocol Message
Messages received by the [Collator Protocol subsystem](../node/collators/collator-protocol.md)
This is a network protocol that receives messages of type
[`CollatorProtocolV1Message`][CollatorProtocolV1NetworkMessage].
```rust
enum CollatorProtocolMessage {
/// Signal to the collator protocol that it should connect to validators with the expectation
/// of collating on the given para. This is only expected to be called once, early on, if at all,
/// and only by the Collation Generation subsystem. As such, it will overwrite the value of
/// the previous signal.
///
/// This should be sent before any `DistributeCollation` message.
CollateOn(ParaId),
/// Provide a collation to distribute to validators with an optional result sender.
///
/// The result sender should be informed when at least one teyrchain validator seconded the collation. It is also
/// completely okay to just drop the sender.
DistributeCollation(CandidateReceipt, PoV, Option<oneshot::Sender<CollationSecondedSignal>>),
/// Fetch a collation under the given relay-parent for the given ParaId.
FetchCollation(Hash, ParaId, ResponseChannel<(CandidateReceipt, PoV)>),
/// Note a collator as having provided a good collation.
NoteGoodCollation(CollatorId, SignedFullStatement),
/// Notify a collator that its collation was seconded.
NotifyCollationSeconded(CollatorId, Hash, SignedFullStatement),
}
```
## Collation Generation Message
Messages received by the [Collation Generation subsystem](../node/collators/collation-generation.md)
This is the core interface by which collators built on top of a Pezkuwi node submit collations to validators. As such,
these messages are not sent by any subsystem but are instead sent from outside of the overseer.
```rust
/// A function provided to the subsystem which it uses to pull new collations.
///
/// This mode of querying collations is obsoleted by `CollationGenerationMessages::SubmitCollation`
///
/// The response channel, if present, is meant to receive a `Seconded` statement as a
/// form of authentication, for collation mechanisms which rely on this for anti-spam.
type CollatorFn = Fn(Hash, PersistedValidationData) -> Future<Output = (Collation, Option<ResponseChannel<SignedStatement>>)>;
/// Configuration for the collation generator
struct CollationGenerationConfig {
/// Collator's authentication key, so it can sign things.
key: CollatorPair,
/// Collation function. See [`CollatorFn`] for more details.
collator: CollatorFn,
/// The teyrchain that this collator collates for
para_id: ParaId,
}
/// Parameters for submitting a collation
struct SubmitCollationParams {
/// The relay-parent the collation is built against.
relay_parent: Hash,
/// The collation itself (PoV and commitments)
collation: Collation,
/// The parent block's head-data.
parent_head: HeadData,
/// The hash of the validation code the collation was created against.
validation_code_hash: ValidationCodeHash,
/// A response channel for receiving a `Seconded` message about the candidate
/// once produced by a validator. This is not guaranteed to provide anything.
result_sender: Option<ResponseChannel<SignedStatement>>,
}
enum CollationGenerationMessage {
/// Initialize the collation generation subsystem
Initialize(CollationGenerationConfig),
/// Submit a collation to the subsystem. This will package it into a signed
/// [`CommittedCandidateReceipt`] and distribute along the network to validators.
///
/// If sent before `Initialize`, this will be ignored.
SubmitCollation(SubmitCollationParams),
}
```
## Dispute Coordinator Message
Messages received by the [Dispute Coordinator subsystem](../node/disputes/dispute-coordinator.md)
This subsystem coordinates participation in disputes, tracks live disputes, and observed statements of validators from
subsystems.
```rust
enum DisputeCoordinatorMessage {
/// Import a statement by a validator about a candidate.
///
/// The subsystem will silently discard ancient statements or sets of only dispute-specific statements for
/// candidates that are previously unknown to the subsystem. The former is simply because ancient
/// data is not relevant and the latter is as a DoS prevention mechanism. Both backing and approval
/// statements already undergo anti-DoS procedures in their respective subsystems, but statements
/// cast specifically for disputes are not necessarily relevant to any candidate the system is
/// already aware of and thus present a DoS vector. Our expectation is that nodes will notify each
/// other of disputes over the network by providing (at least) 2 conflicting statements, of which one is either
/// a backing or validation statement.
///
/// This does not do any checking of the message signature.
ImportStatements {
/// The hash of the candidate.
candidate_hash: CandidateHash,
/// The candidate receipt itself.
candidate_receipt: CandidateReceipt,
/// The session the candidate appears in.
session: SessionIndex,
/// Triples containing the following:
/// - A statement, either indicating validity or invalidity of the candidate.
/// - The validator index (within the session of the candidate) of the validator casting the vote.
/// - The signature of the validator casting the vote.
statements: Vec<(DisputeStatement, ValidatorIndex, ValidatorSignature)>,
/// Inform the requester once we finished importing.
///
/// This is, we either discarded the votes, just record them because we
/// casted our vote already or recovered availability for the candidate
/// successfully.
pending_confirmation: oneshot::Sender<ImportStatementsResult>
},
/// Fetch a list of all recent disputes that the co-ordinator is aware of.
/// These are disputes which have occurred any time in recent sessions, which may have already concluded.
RecentDisputes(ResponseChannel<Vec<(SessionIndex, CandidateHash)>>),
/// Fetch a list of all active disputes that the co-ordinator is aware of.
/// These disputes are either unconcluded or recently concluded.
ActiveDisputes(ResponseChannel<Vec<(SessionIndex, CandidateHash)>>),
/// Get candidate votes for a candidate.
QueryCandidateVotes(SessionIndex, CandidateHash, ResponseChannel<Option<CandidateVotes>>),
/// Sign and issue local dispute votes. A value of `true` indicates validity, and `false` invalidity.
IssueLocalStatement(SessionIndex, CandidateHash, CandidateReceipt, bool),
/// Determine the highest undisputed block within the given chain, based on where candidates
/// were included. If even the base block should not be finalized due to a dispute,
/// then `None` should be returned on the channel.
///
/// The block descriptions begin counting upwards from the block after the given `base_number`. The `base_number`
/// is typically the number of the last finalized block but may be slightly higher. This block
/// is inevitably going to be finalized so it is not accounted for by this function.
DetermineUndisputedChain {
base_number: BlockNumber,
block_descriptions: Vec<(BlockHash, SessionIndex, Vec<CandidateHash>)>,
rx: ResponseSender<Option<(BlockNumber, BlockHash)>>,
}
}
/// Result of `ImportStatements`.
pub enum ImportStatementsResult {
/// Import was invalid (candidate was not available) and the sending peer should get banned.
InvalidImport,
/// Import was valid and can be confirmed to peer.
ValidImport
}
```
## Dispute Distribution Message
Messages received by the [Dispute Distribution subsystem](../node/disputes/dispute-distribution.md). This subsystem is
responsible of distributing explicit dispute statements.
```rust
enum DisputeDistributionMessage {
/// Tell dispute distribution to distribute an explicit dispute statement to
/// validators.
SendDispute((ValidVote, InvalidVote)),
/// Ask DisputeDistribution to get votes we don't know about.
/// Fetched votes will be reported via `DisputeCoordinatorMessage::ImportStatements`
FetchMissingVotes {
candidate_hash: CandidateHash,
session: SessionIndex,
known_valid_votes: Bitfield,
known_invalid_votes: Bitfield,
/// Optional validator to query from. `ValidatorIndex` as in the above
/// referenced session.
from_validator: Option<ValidatorIndex>,
}
}
```
## Network Bridge Message
Messages received by the network bridge. This subsystem is invoked by others to manipulate access to the low-level
networking code.
```rust
/// Peer-sets handled by the network bridge.
enum PeerSet {
/// The collation peer-set is used to distribute collations from collators to validators.
Collation,
/// The validation peer-set is used to distribute information relevant to teyrchain
/// validation among validators. This may include nodes which are not validators,
/// as some protocols on this peer-set are expected to be gossip.
Validation,
}
enum NetworkBridgeMessage {
/// Report a cost or benefit of a peer. Negative values are costs, positive are benefits.
ReportPeer(PeerId, cost_benefit: i32),
/// Disconnect a peer from the given peer-set without affecting their reputation.
DisconnectPeer(PeerId, PeerSet),
/// Send a message to one or more peers on the validation peerset.
SendValidationMessage([PeerId], ValidationProtocolV1),
/// Send a message to one or more peers on the collation peerset.
SendCollationMessage([PeerId], ValidationProtocolV1),
/// Send multiple validation messages.
SendValidationMessages([([PeerId, ValidationProtocolV1])]),
/// Send multiple collation messages.
SendCollationMessages([([PeerId, ValidationProtocolV1])]),
/// Connect to peers who represent the given `validator_ids`.
///
/// Also ask the network to stay connected to these peers at least
/// until a new request is issued.
///
/// Because it overrides the previous request, it must be ensured
/// that `validator_ids` include all peers the subsystems
/// are interested in (per `PeerSet`).
///
/// A caller can learn about validator connections by listening to the
/// `PeerConnected` events from the network bridge.
ConnectToValidators {
/// Ids of the validators to connect to.
validator_ids: HashSet<AuthorityDiscoveryId>,
/// The underlying protocol to use for this request.
peer_set: PeerSet,
/// Sends back the number of `AuthorityDiscoveryId`s which
/// authority discovery has failed to resolve.
failed: oneshot::Sender<usize>,
},
/// Inform the distribution subsystems about the new
/// gossip network topology formed.
NewGossipTopology {
/// The session info this gossip topology is concerned with.
session: SessionIndex,
/// Our validator index in the session, if any.
local_index: Option<ValidatorIndex>,
/// The canonical shuffling of validators for the session.
canonical_shuffling: Vec<(AuthorityDiscoveryId, ValidatorIndex)>,
/// The reverse mapping of `canonical_shuffling`: from validator index
/// to the index in `canonical_shuffling`
shuffled_indices: Vec<usize>,
}
}
```
## Misbehavior Report
```rust
pub type Misbehavior = generic::Misbehavior<
CommittedCandidateReceipt,
CandidateHash,
ValidatorIndex,
ValidatorSignature,
>;
mod generic {
/// Misbehavior: voting more than one way on candidate validity.
///
/// Since there are three possible ways to vote, a double vote is possible in
/// three possible combinations (unordered)
pub enum ValidityDoubleVote<Candidate, Digest, Signature> {
/// Implicit vote by issuing and explicitly voting validity.
IssuedAndValidity((Candidate, Signature), (Digest, Signature)),
/// Implicit vote by issuing and explicitly voting invalidity
IssuedAndInvalidity((Candidate, Signature), (Digest, Signature)),
/// Direct votes for validity and invalidity
ValidityAndInvalidity(Candidate, Signature, Signature),
}
/// Misbehavior: multiple signatures on same statement.
pub enum DoubleSign<Candidate, Digest, Signature> {
/// On candidate.
Candidate(Candidate, Signature, Signature),
/// On validity.
Validity(Digest, Signature, Signature),
/// On invalidity.
Invalidity(Digest, Signature, Signature),
}
/// Misbehavior: submitted statement for wrong group.
pub struct UnauthorizedStatement<Candidate, Digest, AuthorityId, Signature> {
/// A signed statement which was submitted without proper authority.
pub statement: SignedStatement<Candidate, Digest, AuthorityId, Signature>,
}
pub enum Misbehavior<Candidate, Digest, AuthorityId, Signature> {
/// Voted invalid and valid on validity.
ValidityDoubleVote(ValidityDoubleVote<Candidate, Digest, Signature>),
/// Submitted a message that was unauthorized.
UnauthorizedStatement(UnauthorizedStatement<Candidate, Digest, AuthorityId, Signature>),
/// Submitted two valid signatures for the same message.
DoubleSign(DoubleSign<Candidate, Digest, Signature>),
}
}
```
## PoV Distribution Message
This is a network protocol that receives messages of type [`PoVDistributionV1Message`][PoVDistributionV1NetworkMessage].
```rust
enum PoVDistributionMessage {
/// Fetch a PoV from the network.
///
/// This `CandidateDescriptor` should correspond to a candidate seconded under the provided
/// relay-parent hash.
FetchPoV(Hash, CandidateDescriptor, ResponseChannel<PoV>),
/// Distribute a PoV for the given relay-parent and CandidateDescriptor.
/// The PoV should correctly hash to the PoV hash mentioned in the CandidateDescriptor
DistributePoV(Hash, CandidateDescriptor, PoV),
/// An update from the network bridge.
NetworkBridgeUpdate(NetworkBridgeEvent<PoVDistributionV1Message>),
}
```
## Provisioner Message
```rust
/// This data becomes intrinsics or extrinsics which should be included in a future relay chain block.
enum ProvisionableData {
/// This bitfield indicates the availability of various candidate blocks.
Bitfield(Hash, SignedAvailabilityBitfield),
/// The Candidate Backing subsystem believes that this candidate is valid, pending availability.
BackedCandidate(CandidateReceipt),
/// Misbehavior reports are self-contained proofs of validator misbehavior.
MisbehaviorReport(Hash, MisbehaviorReport),
/// Disputes trigger a broad dispute resolution process.
Dispute(Hash, Signature),
}
/// Message to the Provisioner.
///
/// In all cases, the Hash is that of the relay parent.
enum ProvisionerMessage {
/// This message allows external subsystems to request current inherent data that could be used for
/// advancing the state of teyrchain consensus in a block building upon the given hash.
///
/// If called at different points in time, this may give different results.
RequestInherentData(Hash, oneshot::Sender<ParaInherentData>),
/// This data should become part of a relay chain block
ProvisionableData(ProvisionableData),
}
```
## Runtime API Message
The Runtime API subsystem is responsible for providing an interface to the state of the chain's runtime.
This is fueled by an auxiliary type encapsulating all request types defined in the [Runtime API section](../runtime-api)
of the guide.
```rust
enum RuntimeApiRequest {
/// Get the version of the runtime API at the given parent hash, if any.
Version(ResponseChannel<u32>),
/// Get the current validator set.
Validators(ResponseChannel<Vec<ValidatorId>>),
/// Get the validator groups and rotation info.
ValidatorGroups(ResponseChannel<(Vec<Vec<ValidatorIndex>>, GroupRotationInfo)>),
/// Get information about all availability cores.
AvailabilityCores(ResponseChannel<Vec<CoreState>>),
/// with the given occupied core assumption.
PersistedValidationData(
ParaId,
OccupiedCoreAssumption,
ResponseChannel<Option<PersistedValidationData>>,
),
/// Sends back `true` if the commitments pass all acceptance criteria checks.
CheckValidationOutputs(
ParaId,
CandidateCommitments,
RuntimeApiSender<bool>,
),
/// Get the session index for children of the block. This can be used to construct a signing
/// context.
SessionIndexForChild(ResponseChannel<SessionIndex>),
/// Get the validation code for a specific para, using the given occupied core assumption.
ValidationCode(ParaId, OccupiedCoreAssumption, ResponseChannel<Option<ValidationCode>>),
/// Get validation code by its hash, either past, current or future code can be returned,
/// as long as state is still available.
ValidationCodeByHash(ValidationCodeHash, RuntimeApiSender<Option<ValidationCode>>),
/// Get a committed candidate receipt for all candidates pending availability.
CandidatePendingAvailability(ParaId, ResponseChannel<Option<CommittedCandidateReceipt>>),
/// Get all events concerning candidates in the last block.
CandidateEvents(ResponseChannel<Vec<CandidateEvent>>),
/// Get the session info for the given session, if stored.
SessionInfo(SessionIndex, ResponseChannel<Option<SessionInfo>>),
/// Get all the pending inbound messages in the downward message queue for a para.
DmqContents(ParaId, ResponseChannel<Vec<InboundDownwardMessage<BlockNumber>>>),
/// Get the contents of all channels addressed to the given recipient. Channels that have no
/// messages in them are also included.
InboundHrmpChannelsContents(ParaId, ResponseChannel<BTreeMap<ParaId, Vec<InboundHrmpMessage<BlockNumber>>>>),
/// Get information about the BABE epoch this block was produced in.
BabeEpoch(ResponseChannel<BabeEpoch>),
}
enum RuntimeApiMessage {
/// Make a request of the runtime API against the post-state of the given relay-parent.
Request(Hash, RuntimeApiRequest),
/// Get the version of the runtime API at the given parent hash, if any.
Version(Hash, ResponseChannel<Option<u32>>)
}
```
## Statement Distribution Message
The Statement Distribution subsystem distributes signed statements and candidates from validators to other validators.
It does this by distributing full statements, which embed the candidate receipt, as opposed to compact statements which
don't. It receives updates from the network bridge and signed statements to share with other validators.
This is a network protocol that receives messages of type
[`StatementDistributionV1Message`][StatementDistributionV1NetworkMessage].
```rust
enum StatementDistributionMessage {
/// An update from the network bridge.
NetworkBridgeUpdate(NetworkBridgeEvent<StatementDistributionV1Message>),
/// We have validated a candidate and want to share our judgment with our peers.
/// The hash is the relay parent.
///
/// The statement distribution subsystem assumes that the statement should be correctly
/// signed.
Share(Hash, SignedFullStatementWithPVD),
}
```
## Validation Request Type
Various modules request that the [Candidate Validation subsystem](../node/utility/candidate-validation.md) validate a
block with this message. It returns [`ValidationOutputs`](candidate.md#validationoutputs) for successful validation.
```rust
/// The outcome of the candidate-validation's PVF pre-check request.
pub enum PreCheckOutcome {
/// The PVF has been compiled successfully within the given constraints.
Valid,
/// The PVF could not be compiled. This variant is used when the candidate-validation subsystem
/// can be sure that the PVF is invalid. To give a couple of examples: a PVF that cannot be
/// decompressed or that does not represent a structurally valid WebAssembly file.
Invalid,
/// This variant is used when the PVF cannot be compiled but for other reasons that are not
/// included into [`PreCheckOutcome::Invalid`]. This variant can indicate that the PVF in
/// question is invalid, however it is not necessary that PVF that received this judgement
/// is invalid.
///
/// For example, if during compilation the preparation worker was killed we cannot be sure why
/// it happened: because the PVF was malicious made the worker to use too much memory or its
/// because the host machine is under severe memory pressure and it decided to kill the worker.
Failed,
}
/// Result of the validation of the candidate.
enum ValidationResult {
/// Candidate is valid, and here are the outputs and the validation data used to form inputs.
/// In practice, this should be a shared type so that validation caching can be done.
Valid(CandidateCommitments, PersistedValidationData),
/// Candidate is invalid.
Invalid,
}
const BACKING_EXECUTION_TIMEOUT: Duration = 2 seconds;
const APPROVAL_EXECUTION_TIMEOUT: Duration = 6 seconds;
/// Messages received by the Validation subsystem.
///
/// ## Validation Requests
///
/// Validation requests made to the subsystem should return an error only on internal error.
/// Otherwise, they should return either `Ok(ValidationResult::Valid(_))`
/// or `Ok(ValidationResult::Invalid)`.
#[derive(Debug)]
pub enum CandidateValidationMessage {
/// Validate a candidate with provided, exhaustive parameters for validation.
///
/// Explicitly provide the `PersistedValidationData` and `ValidationCode` so this can do full
/// validation without needing to access the state of the relay-chain.
///
/// This request doesn't involve acceptance criteria checking, therefore only useful for the
/// cases where the validity of the candidate is established. This is the case for the typical
/// use-case: approval checkers would use this request relying on the full prior checks
/// performed by the relay-chain.
ValidateFromExhaustive(
PersistedValidationData,
ValidationCode,
CandidateDescriptor,
Arc<PoV>,
Duration, // Execution timeout.
oneshot::Sender<Result<ValidationResult, ValidationFailed>>,
),
/// Try to compile the given validation code and send back
/// the outcome.
///
/// The validation code is specified by the hash and will be queried from the runtime API at the
/// given relay-parent.
PreCheck(
// Relay-parent
Hash,
ValidationCodeHash,
oneshot::Sender<PreCheckOutcome>,
),
}
```
## PVF Pre-checker Message
Currently, the PVF pre-checker subsystem receives no specific messages.
```rust
/// Non-instantiable message type
pub enum PvfCheckerMessage { }
```
[NBE]: ../network.md#network-bridge-event
[AvailabilityDistributionV1NetworkMessage]: network.md#availability-distribution-v1
[BitfieldDistributionV1NetworkMessage]: network.md#bitfield-distribution-v1
[PoVDistributionV1NetworkMessage]: network.md#pov-distribution-v1
[StatementDistributionV1NetworkMessage]: network.md#statement-distribution-v1
[CollatorProtocolV1NetworkMessage]: network.md#collator-protocol-v1
@@ -0,0 +1,24 @@
# PVF Pre-checking types
## `PvfCheckStatement`
> ⚠️ This type was added in v2.
One of the main units of information on which PVF pre-checking voting is build is the `PvfCheckStatement`.
This is a statement by the validator who ran the pre-checking process for a PVF. A PVF is identified by the `ValidationCodeHash`.
The statement is valid only during a single session, specified in the `session_index`.
```rust
struct PvfCheckStatement {
/// `true` if the subject passed pre-checking and `false` otherwise.
pub accept: bool,
/// The validation code hash that was checked.
pub subject: ValidationCodeHash,
/// The index of a session during which this statement is considered valid.
pub session_index: SessionIndex,
/// The index of the validator from which this statement originates.
pub validator_index: ValidatorIndex,
}
```
@@ -0,0 +1,43 @@
# Runtime
Types used within the runtime exclusively and pervasively.
## Host Configuration
The internal-to-runtime configuration of the teyrchain host is kept in `struct HostConfiguration`. This is expected to
be altered only by governance procedures or via migrations from the Pezkuwi-SDK codebase. The latest definition of
`HostConfiguration` can be found in the project repo
[here](https://github.com/paritytech/polkadot-sdk/blob/master/polkadot/runtime/parachains/src/configuration.rs). Each
parameter has got a doc comment so for any details please refer to the code.
Some related parameters in `HostConfiguration` are grouped together so that they can be managed easily. These are:
* `async_backing_params` in `struct AsyncBackingParams`
* `executor_params` in `struct ExecutorParams`
* `approval_voting_params` in `struct ApprovalVotingParams`
* `scheduler_params` in `struct SchedulerParams`
Check the definitions of these structs for further details.
### Configuration migrations
Modifying `HostConfiguration` requires a storage migration. These migrations are located in the
[`migrations`](https://github.com/paritytech/polkadot-sdk/blob/master/polkadot/runtime/parachains/src/configuration.rs)
subfolder of Pezkuwi-SDK repo.
## ParaInherentData
Inherent data passed to a runtime entry-point for the advancement of teyrchain consensus.
This contains 4 pieces of data:
1. [`Bitfields`](availability.md#signed-availability-bitfield)
2. [`BackedCandidates`](backing.md#backed-candidate)
3. [`MultiDisputeStatementSet`](disputes.md#multidisputestatementset)
4. `Header`
```rust
struct ParaInherentData {
bitfields: Bitfields,
backed_candidates: BackedCandidates,
dispute_statements: MultiDisputeStatementSet,
parent_header: Header
}
```
@@ -0,0 +1,50 @@
# Whence Teyrchains
Teyrchains are the solution to a problem. As with any solution, it cannot be understood without first understanding the
problem. So let's start by going over the issues faced by blockchain technology that led to us beginning to explore the
design space for something like teyrchains.
## Issue 1: Scalability
It became clear a few years ago that the transaction throughput of simple Proof-of-Work (PoW) blockchains such as
Bitcoin, Ethereum, and myriad others was simply too low.
> TODO: what if there were more blockchains, etc.
Proof-of-Stake (PoS) systems can accomplish higher throughput than PoW blockchains. PoS systems are secured by bonded
capital as opposed to spent effort - liquidity opportunity cost vs. burning electricity. The way they work is by
selecting a set of validators with known economic identity who lock up tokens in exchange for earning the right to
"validate" or participate in the consensus process. If they are found to carry out that process wrongly, they will be
slashed, meaning some or all of the locked tokens will be burned. This provides a strong disincentive in the direction
of misbehavior.
Since the consensus protocol doesn't revolve around wasting effort, block times and agreement can occur much faster.
Solutions to PoW challenges don't have to be found before a block can be authored, so the overhead of authoring a block
is reduced to only the costs of creating and distributing the block.
However, consensus on a PoS chain requires full agreement of 2/3+ of the validator set for everything that occurs at
Layer 1: all logic which is carried out as part of the blockchain's state machine. This means that everybody still needs
to check everything. Furthermore, validators may have different views of the system based on the information that they
receive over an asynchronous network, making agreement on the latest state more difficult.
Teyrchains are an example of a **sharded** protocol. Sharding is a concept borrowed from traditional database
architecture. Rather than requiring every participant to check every transaction, we require each participant to check
some subset of transactions, with enough redundancy baked in that byzantine (arbitrarily malicious) participants can't
sneak in invalid transactions - at least not without being detected and getting slashed, with those transactions
reverted.
Sharding and Proof-of-Stake in coordination with each other allow a teyrchain host to provide full security on many
teyrchains, even without all participants checking all state transitions.
> TODO: note about network effects & bridging
## Issue 2: Flexibility / Specialization
"dumb" VMs don't give you the flexibility. Any engineer knows that being able to specialize on a problem gives them and
their users more _leverage_.
> TODO: expand on leverage
Having recognized these issues, we set out to find a solution to these problems, which could allow developers to create
and deploy purpose-built blockchains unified under a common source of security, with the capability of message-passing
between them; a _heterogeneous sharding solution_, which we have come to know as **Teyrchains**.