pezkuwi-subxt/polkadot/roadmap/implementors-guide/guide.md at bd2304ec98979003b6e8c41765db78bb8c59167f

mirror of https://github.com/pezkuwichain/pezkuwi-subxt.git synced 2026-05-30 04:41:03 +00:00

Files

T

Robert Habermeier bd2304ec98 New parachain runtime skeleton (#1158 )

* file structure and initializer skeleton

* ensure session changes happen before initialization

* add a couple tests for initializer flow

* integrate with session handling

* configuration update logic

* configuration methods

* move test mock to its own module

* integrate configuration into initializer

* add note about initialization order

* integrate configuration module into mock

* add some tests for config module

* paras module storage

* implement paras session change operation

* amend past code pruning to fully cover acceptance period

* update guide again

* do pruning of historical validation code

* add weight to initialization

* integrate into mock & leave notes for next session

* clean up un-ended sentence

* alter test to account for double index in past code meta

* port over code-at logic test

* clarify checking for conflicting code upgrades

* add genesis for paras, include in mock, ensure incoming paras are processed

* note on return value of `validation_code_at`

* implement paras routines from implementor's guide

* bring over some existing tests and begin porting

* port over code upgrade tests

* test parachain registration

* test code_at with intermediate block

* fix warnings

* clean up docs and extract to separate struct

* adjust implementor's guide to include replacementtimes

* kill stray println

* rename expected_at to applied_after

* rewrite ParaPastCodeMeta to avoid reversal

* clarify and test interface of validation_code_at

* make FutureCode optional

* rename do_old_code_pruning

* add comment on Option<()> to answer FAQ

* address some more grumbles

2020-06-02 12:34:07 -04:00

86 KiB

Raw Blame History

The Polkadot Parachain Host Implementers' Guide

Ramble / Preamble

This document aims to describe the purpose, functionality, and implementation of a host for Polkadot's parachains. It is not for the implementor of a specific parachain but rather for the implementor of the Parachain Host, which provides security and advancement for constituent parachains. In practice, this is for the implementors of Polkadot.

There are a number of other documents describing the research in more detail. All referenced documents will be linked here and should be read alongside this document for the best understanding of the full picture. However, this is the only document which aims to describe key aspects of Polkadot's particular instantiation of much of that research down to low-level technical details and software architecture.

Origins
Parachains: Basic Functionality
Architecture
- Node-side
- Runtime
Architecture: Runtime
Architecture: Node-side
Data Structures and Types
Glossary / Jargon

Origins

Parachains are the solution to a problem. As with any solution, it cannot be understood without first understanding the problem. So let's start by going over the issues faced by blockchain technology that led to us beginning to explore the design space for something like parachains.

Issue 1: Scalability

It became clear a few years ago that the transaction throughput of simple Proof-of-Work (PoW) blockchains such as Bitcoin, Ethereum, and myriad others was simply too low. [TODO: PoS, sharding, what if there were more blockchains, etc. etc.]

Proof-of-Stake (PoS) systems can accomplish higher throughput than PoW blockchains. PoS systems are secured by bonded capital as opposed to spent effort - liquidity opportunity cost vs. burning electricity. The way they work is by selecting a set of validators with known economic identity who lock up tokens in exchange for earning the right to "validate" or participate in the consensus process. If they are found to carry out that process wrongly, they will be slashed, meaning some or all of the locked tokens will be burned. This provides a strong disincentive in the direction of misbehavior.

Since the consensus protocol doesn't revolve around wasting effort, block times and agreement can occur much faster. Solutions to PoW challenges don't have to be found before a block can be authored, so the overhead of authoring a block is reduced to only the costs of creating and distributing the block.

However, consensus on a PoS chain requires full agreement of 2/3+ of the validator set for everything that occurs at Layer 1: all logic which is carried out as part of the blockchain's state machine. This means that everybody still needs to check everything. Furthermore, validators may have different views of the system based on the information that they receive over an asynchronous network, making agreement on the latest state more difficult.

Parachains are an example of a sharded protocol. Sharding is a concept borrowed from traditional database architecture. Rather than requiring every participant to check every transaction, we require each participant to check some subset of transactions, with enough redundancy baked in that byzantine (arbitrarily malicious) participants can't sneak in invalid transactions - at least not without being detected and getting slashed, with those transactions reverted.

Sharding and Proof-of-Stake in coordination with each other allow a parachain host to provide full security on many parachains, even without all participants checking all state transitions.

[TODO: note about network effects & bridging]

Issue 2: Flexibility / Specialization

"dumb" VMs don't give you the flexibility. Any engineer knows that being able to specialize on a problem gives them and their users more leverage. [TODO]

Having recognized these issues, we set out to find a solution to these problems, which could allow developers to create and deploy purpose-built blockchains unified under a common source of security, with the capability of message-passing between them; a heterogeneous sharding solution, which we have come to know as Parachains.

Parachains: Basic Functionality

This section aims to describe, at a high level, the architecture, actors, and Subsystems involved in the implementation of parachains. It also illuminates certain subtleties and challenges faced in the design and implementation of those Subsystems. Our goal is to carry a parachain block from authoring to secure inclusion, and define a process which can be carried out repeatedly and in parallel for many different parachains to extend them over time. Understanding of the high-level approach taken here is important to provide context for the proposed architecture further on.

The Parachain Host is a blockchain, known as the relay-chain, and the actors which provide security and inputs to the blockchain.

First, it's important to go over the main actors we have involved in the parachain host.

Validators. These nodes are responsible for validating proposed parachain blocks. They do so by checking a Proof-of-Validity (PoV) of the block and ensuring that the PoV remains available. They put financial capital down as "skin in the game" which can be slashed (destroyed) if they are proven to have misvalidated.
Collators. These nodes are responsible for creating the Proofs-of-Validity that validators know how to check. Creating a PoV typically requires familiarity with the transaction format and block authoring rules of the parachain, as well as having access to the full state of the parachain.
Fishermen. These are user-operated, permissionless nodes whose goal is to catch misbehaving validators in exchange for a bounty. Collators and validators can behave as Fishermen too. Fishermen aren't necessary for security, and aren't covered in-depth by this document.

This alludes to a simple pipeline where collators send validators parachain blocks and their requisite PoV to check. Then, validators validate the block using the PoV, signing statements which describe either the positive or negative outcome, and with enough positive statements, the block can be noted on the relay-chain. Negative statements are not a veto but will lead to a dispute, with those on the wrong side being slashed. If another validator later detects that a validator or group of validators incorrectly signed a statement claiming a block was valid, then those validators will be slashed, with the checker receiving a bounty.

However, there is a problem with this formulation. In order for another validator to check the previous group of validators' work after the fact, the PoV must remain available so the other validator can fetch it in order to check the work. The PoVs are expected to be too large to include in the blockchain directly, so we require an alternate data availability scheme which requires validators to prove that the inputs to their work will remain available, and so their work can be checked. Empirical tests tell us that many PoVs may be between 1 and 10MB during periods of heavy load.

Here is a description of the Inclusion Pipeline: the path a parachain block (or parablock, for short) takes from creation to inclusion:

Validators are selected and assigned to parachains by the Validator Assignment routine.
A collator produces the parachain block, which is known as a parachain candidate or candidate, along with a PoV for the candidate.
The collator forwards the candidate and PoV to validators assigned to the same parachain via the Collation Distribution Subsystem.
The validators assigned to a parachain at a given point in time participate in the Candidate Backing Subsystem to validate candidates that were put forward for validation. Candidates which gather enough signed validity statements from validators are considered "backable". Their backing is the set of signed validity statements.
A relay-chain block author, selected by BABE, can note up to one (1) backable candidate for each parachain to include in the relay-chain block alongside its backing. A backable candidate once included in the relay-chain is considered backed in that fork of the relay-chain.
Once backed in the relay-chain, the parachain candidate is considered to be "pending availability". It is not considered to be included as part of the parachain until it is proven available.
In the following relay-chain blocks, validators will participate in the Availability Distribution Subsystem to ensure availability of the candidate. Information regarding the availability of the candidate will be noted in the subsequent relay-chain blocks.
Once the relay-chain state machine has enough information to consider the candidate's PoV as being available, the candidate is considered to be part of the parachain and is graduated to being a full parachain block, or parablock for short.

Note that the candidate can fail to be included in any of the following ways:

The collator is not able to propagate the candidate to any validators assigned to the parachain.
The candidate is not backed by validators participating in the Candidate Backing Subsystem.
The candidate is not selected by a relay-chain block author to be included in the relay chain
The candidate's PoV is not considered as available within a timeout and is discarded from the relay chain.

This process can be divided further down. Steps 2 & 3 relate to the work of the collator in collating and distributing the candidate to validators via the Collation Distribution Subsystem. Steps 3 & 4 relate to the work of the validators in the Candidate Backing Subsystem and the block author (itself a validator) to include the block into the relay chain. Steps 6, 7, and 8 correspond to the logic of the relay-chain state-machine (otherwise known as the Runtime) used to fully incorporate the block into the chain. Step 7 requires further work on the validators' parts to participate in the Availability Distribution Subsystem and include that information into the relay chain for step 8 to be fully realized.

This brings us to the second part of the process. Once a parablock is considered available and part of the parachain, it is still "pending approval". At this stage in the pipeline, the parablock has been backed by a majority of validators in the group assigned to that parachain, and its data has been guaranteed available by the set of validators as a whole. Once it's considered available, the host will even begin to accept children of that block. At this point, we can consider the parablock as having been tentatively included in the parachain, although more confirmations are desired. However, the validators in the parachain-group (known as the "Parachain Validators" for that parachain) are sampled from a validator set which contains some proportion of byzantine, or arbitrarily malicious members. This implies that the Parachain Validators for some parachain may be majority-dishonest, which means that secondary checks must be done on the block before it can be considered approved. This is necessary only because the Parachain Validators for a given parachain are sampled from an overall validator set which is assumed to be up to <1/3 dishonest - meaning that there is a chance to randomly sample Parachain Validators for a parachain that are majority or fully dishonest and can back a candidate wrongly. The Approval Process allows us to detect such misbehavior after-the-fact without allocating more Parachain Validators and reducing the throughput of the system. A parablock's failure to pass the approval process will invalidate the block as well as all of its descendents. However, only the validators who backed the block in question will be slashed, not the validators who backed the descendents.

The Approval Process looks like this:

Parablocks that have been included by the Inclusion Pipeline are pending approval for a time-window known as the secondary checking window.
During the secondary-checking window, validators randomly self-select to perform secondary checks on the parablock.
These validators, known in this context as secondary checkers, acquire the parablock and its PoV, and re-run the validation function.
The secondary checkers submit the result of their checks to the relay chain. Contradictory results lead to escalation, where even more secondary checkers are selected and the secondary-checking window is extended.
At the end of the Approval Process, the parablock is either Approved or it is rejected. More on the rejection process later.

These two pipelines sum up the sequence of events necessary to extend and acquire full security on a Parablock. Note that the Inclusion Pipeline must conclude for a specific parachain before a new block can be accepted on that parachain. After inclusion, the Approval Process kicks off, and can be running for many parachain blocks at once.

Reiterating the lifecycle of a candidate:

Candidate: put forward by a collator to a validator.
Seconded: put forward by a validator to other validators
Backable: validity attested to by a majority of assigned validators
Backed: Backable & noted in a fork of the relay-chain.
Pending availability: Backed but not yet considered available.
Included: Backed and considered available.
Accepted: Backed, available, and undisputed

[TODO Diagram: Inclusion Pipeline & Approval Subsystems interaction]

It is also important to take note of the fact that the relay-chain is extended by BABE, which is a forkful algorithm. That means that different block authors can be chosen at the same time, and may not be building on the same block parent. Furthermore, the set of validators is not fixed, nor is the set of parachains. And even with the same set of validators and parachains, the validators' assignments to parachains is flexible. This means that the architecture proposed in the next chapters must deal with the variability and multiplicity of the network state.


   ....... Validator Group 1 ..........
   .                                  .
   .         (Validator 4)            .
   .  (Validator 1) (Validator 2)     .
   .         (Validator 5)            .
   .                                  .
   ..........Building on C  ...........        ........ Validator Group 2 ...........
            +----------------------+           .                                    .
            |    Relay Block C     |           .           (Validator 7)            .
            +----------------------+           .    ( Validator 3) (Validator 6)    .
                            \                  .                                    .
                             \                 ......... Building on B  .............
                              \
                      +----------------------+
                      |  Relay Block B       |
                      +----------------------+
	                             |
                      +----------------------+
                      |  Relay Block A       |
                      +----------------------+

In this example, group 1 has received block C while the others have not due to network asynchrony. Now, a validator from group 2 may be able to build another block on top of B, called C'. Assume that afterwards, some validators become aware of both C and C', while others remain only aware of one.

   ....... Validator Group 1 ..........      ........ Validator Group 2 ...........
   .                                  .      .                                    .
   .  (Validator 4) (Validator 1)     .      .    (Validator 7) (Validator 6)     .
   .                                  .      .                                    .
   .......... Building on C  ..........      ......... Building on C' .............


   ....... Validator Group 3 ..........
   .                                  .
   .   (Validator 2) (Validator 3)    .
   .        (Validator 5)             .
   .                                  .
   ....... Building on C and C' .......

            +----------------------+         +----------------------+
            |    Relay Block C     |         |    Relay Block C'    |
            +----------------------+         +----------------------+
                            \                 /
                             \               /
                              \             /
                      +----------------------+
                      |  Relay Block B       |
                      +----------------------+
	                             |
                      +----------------------+
                      |  Relay Block A       |
                      +----------------------+

Those validators that are aware of many competing heads must be aware of the work happening on each one. They may contribute to some or a full extent on both. It is possible that due to network asynchrony two forks may grow in parallel for some time, although in the absence of an adversarial network this is unlikely in the case where there are validators who are aware of both chain heads.

Architecture

Our Parachain Host includes a blockchain known as the relay-chain. A blockchain is a Directed Acyclic Graph (DAG) of state transitions, where every block can be considered to be the head of a linked-list (known as a "chain" or "fork") with a cumulative state which is determined by applying the state transition of each block in turn. All paths through the DAG terminate at the Genesis Block. In fact, the blockchain is a tree, since each block can have only one parent.

          +----------------+     +----------------+
          |    Block 4     |     | Block 5        |
          +----------------+     +----------------+
                        \           /
                         V         V
                      +---------------+
                      |    Block 3    |
                      +---------------+
                              |
                              V
                     +----------------+     +----------------+
                     |    Block 1     |     |   Block 2      |
                     +----------------+     +----------------+
                                  \            /
                                   V          V
                                +----------------+
                                |    Genesis     |
                                +----------------+

A blockchain network is comprised of nodes. These nodes each have a view of many different forks of a blockchain and must decide which forks to follow and what actions to take based on the forks of the chain that they are aware of.

So in specifying an architecture to carry out the functionality of a Parachain Host, we have to answer two categories of questions:

What is the state-transition function of the blockchain? What is necessary for a transition to be considered valid, and what information is carried within the implicit state of a block?
Being aware of various forks of the blockchain as well as global private state such as a view of the current time, what behaviors should a node undertake? What information should a node extract from the state of which forks, and how should that information be used?

The first category of questions will be addressed by the Runtime, which defines the state-transition logic of the chain. Runtime logic only has to focus on the perspective of one chain, as each state has only a single parent state.

The second category of questions addressed by Node-side behavior. Node-side behavior defines all activities that a node undertakes, given its view of the blockchain/block-DAG. Node-side behavior can take into account all or many of the forks of the blockchain, and only conditionally undertake certain activities based on which forks it is aware of, as well as the state of the head of those forks.


                     __________________________________
                    /                                  \
                    |            Runtime               |
                    |                                  |
                    \_________(Runtime API )___________/
                                |       ^
                                V       |
               +----------------------------------------------+
               |                                              |
               |                   Node                       |
               |                                              |
               |                                              |
               +----------------------------------------------+
                                   +  +
                                   |  |
               --------------------+  +------------------------
                                 Transport
               ------------------------------------------------

It is also helpful to divide Node-side behavior into two further categories: Networking and Core. Networking behaviors relate to how information is distributed between nodes. Core behaviors relate to internal work that a specific node does. These two categories of behavior often interact, but can be heavily abstracted from each other. Core behaviors care that information is distributed and received, but not the internal details of how distribution and receipt function. Networking behaviors act on requests for distribution or fetching of information, but are not concerned with how the information is used afterwards. This allows us to create clean boundaries between Core and Networking activities, improving the modularity of the code.

          ___________________                    ____________________
         /       Core        \                  /     Networking     \
         |                   |  Send "Hello"    |                    |
         |                   |-  to "foo"   --->|                    |
         |                   |                  |                    |
         |                   |                  |                    |
         |                   |                  |                    |
         |                   |    Got "World"   |                    |
         |                   |<--  from "bar" --|                    |
         |                   |                  |                    |
         \___________________/                  \____________________/
                                                   ______| |______
                                                   ___Transport___

Node-side behavior is split up into various subsystems. Subsystems are long-lived workers that perform a particular category of work. Subsystems can communicate with each other, and do so via an Overseer that prevents race conditions.

Runtime logic is divided up into Modules and APIs. Modules encapsulate particular behavior of the system. Modules consist of storage, routines, and entry-points. Routines are invoked by entry points, by other modules, upon block initialization or closing. Routines can read and alter the storage of the module. Entry-points are the means by which new information is introduced to a module and can limit the origins (user, root, parachain) that they accept being called by. Each block in the blockchain contains a set of Extrinsics. Each extrinsic targets a a specific entry point to trigger and which data should be passed to it. Runtime APIs provide a means for Node-side behavior to extract meaningful information from the state of a single fork.

These two aspects of the implementation are heavily dependent on each other. The Runtime depends on Node-side behavior to author blocks, and to include Extrinsics which trigger the correct entry points. The Node-side behavior relies on Runtime APIs to extract information necessary to determine which actions to take.

Architecture: Runtime

Broad Strokes

It's clear that we want to separate different aspects of the runtime logic into different modules. Modules define their own storage, routines, and entry-points. They also define initialization and finalization logic.

Due to the (lack of) guarantees provided by a particular blockchain-runtime framework, there is no defined or dependable order in which modules' initialization or finalization logic will run. Supporting this blockchain-runtime framework is important enough to include that same uncertainty in our model of runtime modules in this guide. Furthermore, initialization logic of modules can trigger the entry-points or routines of other modules. This is one architectural pressure against dividing the runtime logic into multiple modules. However, in this case the benefits of splitting things up outweigh the costs, provided that we take certain precautions against initialization and entry-point races.

We also expect, although it's beyond the scope of this guide, that these runtime modules will exist alongside various other modules. This has two facets to consider. First, even if the modules that we describe here don't invoke each others' entry points or routines during initialization, we still have to protect against those other modules doing that. Second, some of those modules are expected to provide governance capabilities for the chain. Configuration exposed by parachain-host modules is mostly for the benefit of these governance modules, to allow the operators or community of the chain to tweak parameters.

The runtime's primary roles to manage scheduling and updating of parachains and parathreads, as well as handling misbehavior reports and slashing. This guide doesn't focus on how parachains or parathreads are registered, only that they are. Also, this runtime description assumes that validator sets are selected somehow, but doesn't assume any other details than a periodic session change event. Session changes give information about the incoming validator set and the validator set of the following session.

The runtime also serves another role, which is to make data available to the Node-side logic via Runtime APIs. These Runtime APIs should be sufficient for the Node-side code to author blocks correctly.

There is some functionality of the relay chain relating to parachains that we also consider beyond the scope of this document. In particular, all modules related to how parachains are registered aren't part of this guide, although we do provide routines that should be called by the registration process.

We will split the logic of the runtime up into these modules:

Initializer: manage initialization order of the other modules.
Configuration: manage configuration and configuration updates in a non-racy manner.
Paras: manage chain-head and validation code for parachains and parathreads.
Scheduler: manages parachain and parathread scheduling as well as validator assignments.
Inclusion: handles the inclusion and availability of scheduled parachains and parathreads.
Validity: handles secondary checks and dispute resolution for included, available parablocks.

The Initializer module is special - it's responsible for handling the initialization logic of the other modules to ensure that the correct initialization order and related invariants are maintained. The other modules won't specify a on-initialize logic, but will instead expose a special semi-private routine that the initialization module will call. The other modules are relatively straightforward and perform the roles described above.

The Parachain Host operates under a changing set of validators. Time is split up into periodic sessions, where each session brings a potentially new set of validators. Sessions are buffered by one, meaning that the validators of the upcoming session are fixed and always known. Parachain Host runtime modules need to react to changes in the validator set, as it will affect the runtime logic for processing candidate backing, availability bitfields, and misbehavior reports. The Parachain Host modules can't determine ahead-of-time exactly when session change notifications are going to happen within the block (note: this depends on module initialization order again - better to put session before parachains modules). Ideally, session changes are always handled before initialization. It is clearly a problem if we compute validator assignments to parachains during initialization and then the set of validators changes. In the best case, we can recognize that re-initialization needs to be done. In the worst case, bugs would occur.

There are 3 main ways that we can handle this issue:

Establish an invariant that session change notifications always happen after initialization. This means that when we receive a session change notification before initialization, we call the initialization routines before handling the session change.
Require that session change notifications always occur before initialization. Brick the chain if session change notifications ever happen after initialization.
Handle both the before and after cases.

Although option 3 is the most comprehensive, it runs counter to our goal of simplicity. Option 1 means requiring the runtime to do redundant work at all sessions and will also mean, like option 3, that designing things in such a way that initialization can be rolled back and reapplied under the new environment. That leaves option 2, although it is a "nuclear" option in a way and requires us to constrain the parachain host to only run in full runtimes with a certain order of operations.

So the other role of the initializer module is to forward session change notifications to modules in the initialization order, throwing an unrecoverable error if the notification is received after initialization. Session change is the point at which the configuration module updates the configuration. Most of the other modules will handle changes in the configuration during their session change operation, so the initializer should provide both the old and new configuration to all the other modules alongside the session change notification. This means that a session change notification should consist of the following data:

struct SessionChangeNotification {
	// The new validators in the session.
	validators: Vec<ValidatorId>,
	// The validators for the next session.
	queued: Vec<ValidatorId>,
	// The configuration before handling the session change.
	prev_config: HostConfiguration,
	// The configuration after handling the session change.
	new_config: HostConfiguration,
	// A secure randomn seed for the session, gathered from BABE.
	random_seed: [u8; 32],
}

[REVIEW: other options? arguments in favor of going for options 1 or 3 instead of 2. we could do a "soft" version of 2 where we note that the chain is potentially broken due to bad initialization order]

[TODO Diagram: order of runtime operations (initialization, session change)]

The Initializer Module

Description

This module is responsible for initializing the other modules in a deterministic order. It also has one other purpose as described above: accepting and forwarding session change notifications.

Storage

HasInitialized: bool

Initialization

The other modules are initialized in this order:

Configuration
Paras
Scheduler
Inclusion
Validity.

The configuration module is first, since all other modules need to operate under the same configuration as each other. It would lead to inconsistency if, for example, the scheduler ran first and then the configuration was updated before the Inclusion module.

Set HasInitialized to true.

Session Change

If HasInitialized is true, throw an unrecoverable error (panic). Otherwise, forward the session change notification to other modules in initialization order.

Finalization

Finalization order is less important in this case than initialization order, so we finalize the modules in the reverse order from initialization.

Set HasInitialized to false.

The Configuration Module

Description

This module is responsible for managing all configuration of the parachain host in-flight. It provides a central point for configuration updates to prevent races between configuration changes and parachain-processing logic. Configuration can only change during the session change routine, and as this module handles the session change notification first it provides an invariant that the configuration does not change throughout the entire session. Both the scheduler and inclusion modules rely on this invariant to ensure proper behavior of the scheduler.

The configuration that we will be tracking is the HostConfiguration struct.

Storage

The configuration module is responsible for two main pieces of storage.

/// The current configuration to be used.
Configuration: HostConfiguration;
/// A pending configuration to be applied on session change.
PendingConfiguration: Option<HostConfiguration>;

Session change

The session change routine for the Configuration module is simple. If the PendingConfiguration is Some, take its value and set Configuration to be equal to it. Reset PendingConfiguration to None.

Routines

/// Get the host configuration.
pub fn configuration() -> HostConfiguration {
  Configuration::get()
}

/// Updating the pending configuration to be applied later.
fn update_configuration(f: impl FnOnce(&mut HostConfiguration)) {
  PendingConfiguration::mutate(|pending| {
    let mut x = pending.unwrap_or_else(Self::configuration);
    f(&mut x);
    *pending = Some(x);
  })
}

Entry-points

The Configuration module exposes an entry point for each configuration member. These entry-points accept calls only from governance origins. These entry-points will use the update_configuration routine to update the specific configuration field.

The Paras Module

Description

The Paras module is responsible for storing information on parachains and parathreads. Registered parachains and parathreads cannot change except at session boundaries. This is primarily to ensure that the number of bits required for the availability bitfields does not change except at session boundaries.

It's also responsible for managing parachain validation code upgrades as well as maintaining availability of old parachain code and its pruning.

Storage

Utility structs:

// the two key times necessary to track for every code replacement.
pub struct ReplacementTimes {
	/// The relay-chain block number that the code upgrade was expected to be activated.
	/// This is when the code change occurs from the para's perspective - after the
	/// first parablock included with a relay-parent with number >= this value.
	expected_at: BlockNumber,
	/// The relay-chain block number at which the parablock activating the code upgrade was
	/// actually included. This means considered included and available, so this is the time at which
	/// that parablock enters the acceptance period in this fork of the relay-chain.
	activated_at: BlockNumber,
}

/// Metadata used to track previous parachain validation code that we keep in
/// the state.
pub struct ParaPastCodeMeta {
	// Block numbers where the code was expected to be replaced and where the code
	// was actually replaced, respectively. The first is used to do accurate lookups
	// of historic code in historic contexts, whereas the second is used to do
	// pruning on an accurate timeframe. These can be used as indices
	// into the `PastCode` map along with the `ParaId` to fetch the code itself.
	upgrade_times: Vec<ReplacementTimes>,
	// This tracks the highest pruned code-replacement, if any.
	last_pruned: Option<BlockNumber>,
}

enum UseCodeAt {
	// Use the current code.
	Current,
	// Use the code that was replaced at the given block number.
	ReplacedAt(BlockNumber),
}

struct ParaGenesisArgs {
  /// The initial head-data to use.
  genesis_head: HeadData,
  /// The validation code to start with.
  validation_code: ValidationCode,
  /// True if parachain, false if parathread.
  parachain: bool,
}

Storage layout:

/// All parachains. Ordered ascending by ParaId. Parathreads are not included.
Parachains: Vec<ParaId>,
/// The head-data of every registered para.
Heads: map ParaId => Option<HeadData>;
/// The validation code of every live para.
ValidationCode: map ParaId => Option<ValidationCode>;
/// Actual past code, indicated by the para id as well as the block number at which it became outdated.
PastCode: map (ParaId, BlockNumber) => Option<ValidationCode>;
/// Past code of parachains. The parachains themselves may not be registered anymore,
/// but we also keep their code on-chain for the same amount of time as outdated code
/// to keep it available for secondary checkers.
PastCodeMeta: map ParaId => ParaPastCodeMeta;
/// Which paras have past code that needs pruning and the relay-chain block at which the code was replaced.
/// Note that this is the actual height of the included block, not the expected height at which the
/// code upgrade would be applied, although they may be equal.
/// This is to ensure the entire acceptance period is covered, not an offset acceptance period starting
/// from the time at which the parachain perceives a code upgrade as having occurred.
/// Multiple entries for a single para are permitted. Ordered ascending by block number.
PastCodePruning: Vec<(ParaId, BlockNumber)>;
/// The block number at which the planned code change is expected for a para.
/// The change will be applied after the first parablock for this ID included which executes
/// in the context of a relay chain block with a number >= `expected_at`.
FutureCodeUpgrades: map ParaId => Option<BlockNumber>;
/// The actual future code of a para.
FutureCode: map ParaId => Option<ValidationCode>;

/// Upcoming paras (chains and threads). These are only updated on session change. Corresponds to an
/// entry in the upcoming-genesis map.
UpcomingParas: Vec<ParaId>;
/// Upcoming paras instantiation arguments.
UpcomingParasGenesis: map ParaId => Option<ParaGenesisArgs>;
/// Paras that are to be cleaned up at the end of the session.
OutgoingParas: Vec<ParaId>;

Session Change

Clean up outgoing paras. This means removing the entries under Heads, ValidationCode, FutureCodeUpgrades, and FutureCode. An according entry should be added to PastCode, PastCodeMeta, and PastCodePruning using the outgoing ParaId and removed ValidationCode value. This is because any outdated validation code must remain available on-chain for a determined amount of blocks, and validation code outdated by de-registering the para is still subject to that invariant.
Apply all incoming paras by initializing the Heads and ValidationCode using the genesis parameters.
Amend the Parachains list to reflect changes in registered parachains.

Initialization

Do pruning based on all entries in PastCodePruning with BlockNumber <= now. Update the corresponding PastCodeMeta and PastCode accordingly.

Routines

schedule_para_initialize(ParaId, ParaGenesisArgs): schedule a para to be initialized at the next session.
schedule_para_cleanup(ParaId): schedule a para to be cleaned up at the next session.
schedule_code_upgrade(ParaId, ValidationCode, expected_at: BlockNumber): Schedule a future code upgrade of the given parachain, to be applied after inclusion of a block of the same parachain executed in the context of a relay-chain block with number >= expected_at.
note_new_head(ParaId, HeadData, BlockNumber): note that a para has progressed to a new head, where the new head was executed in the context of a relay-chain block with given number. This will apply pending code upgrades based on the block number provided.
validation_code_at(ParaId, at: BlockNumber, assume_intermediate: Option<BlockNumber>): Fetches the validation code to be used when validating a block in the context of the given relay-chain height. A second block number parameter may be used to tell the lookup to proceed as if an intermediate parablock has been included at the given relay-chain height. This may return past, current, or (with certain choices of assume_intermediate) future code. assume_intermediate, if provided, must be before at. If the validation code has been pruned, this will return None.

Finalization

No finalization routine runs for this module.

The Scheduler Module

Description

[TODO: this section is still heavily under construction. key questions about availability cores and validator assignment are still open and the flow of the the section may be contradictory or inconsistent]

The Scheduler module is responsible for two main tasks:

Partitioning validators into groups and assigning groups to parachains and parathreads.
Scheduling parachains and parathreads

It aims to achieve these tasks with these goals in mind:

It should be possible to know at least a block ahead-of-time, ideally more, which validators are going to be assigned to which parachains.
Parachains that have a candidate pending availability in this fork of the chain should not be assigned.
Validator assignments should not be gameable. Malicious cartels should not be able to manipulate the scheduler to assign themselves as desired.
High or close to optimal throughput of parachains and parathreads. Work among validator groups should be balanced.

The Scheduler manages resource allocation using the concept of "Availability Cores". There will be one availability core for each parachain, and a fixed number of cores used for multiplexing parathreads. Validators will be partitioned into groups, with the same number of groups as availability cores. Validator groups will be assigned to different availability cores over time.

An availability core can exist in either one of two states at the beginning or end of a block: free or occupied. A free availability core can have a parachain or parathread assigned to it for the potential to have a backed candidate included. After inclusion, the core enters the occupied state as the backed candidate is pending availability. There is an important distinction: a core is not considered occupied until it is in charge of a block pending availability, although the implementation may treat scheduled cores the same as occupied ones for brevity. A core exits the occupied state when the candidate is no longer pending availability - either on timeout or on availability. A core starting in the occupied state can move to the free state and back to occupied all within a single block, as availability bitfields are processed before backed candidates. At the end of the block, there is a possible timeout on availability which can move the core back to the free state if occupied.

Availability Core State Machine

              Assignment &
              Backing
+-----------+              +-----------+
|           +-------------->           |
|  Free     |              | Occupied  |
|           <--------------+           |
+-----------+ Availability +-----------+
              or Timeout

Availability Core Transitions within Block

              +-----------+                |                    +-----------+
              |           |                |                    |           |
              | Free      |                |                    | Occupied  |
              |           |                |                    |           |
              +--/-----\--+                |                    +--/-----\--+
               /-       -\                 |                     /-       -\
 No Backing  /-           \ Backing        |      Availability /-           \ No availability
           /-              \               |                  /              \
         /-                 -\             |                /-                -\
  +-----v-----+         +----v------+      |         +-----v-----+        +-----v-----+
  |           |         |           |      |         |           |        |           |
  | Free      |         | Occupied  |      |         | Free      |        | Occupied  |
  |           |         |           |      |         |           |        |           |
  +-----------+         +-----------+      |         +-----|---\-+        +-----|-----+
                                           |               |    \               |
                                           |    No backing |     \ Backing      | (no change)
                                           |               |      -\            |
                                           |         +-----v-----+  \     +-----v-----+
                                           |         |           |   \    |           |
                                           |         | Free      -----+---> Occupied  |
                                           |         |           |        |           |
                                           |         +-----------+        +-----------+
                                           |                 Availability Timeout

Validator group assignments do not need to change very quickly. The security benefits of fast rotation is redundant with the challenge mechanism in the Validity module. Because of this, we only divide validators into groups at the beginning of the session and do not shuffle membership during the session. However, we do take steps to ensure that no particular validator group has dominance over a single parachain or parathread-multiplexer for an entire session to provide better guarantees of liveness.

Validator groups rotate across availability cores in a round-robin fashion, with rotation occurring at fixed intervals. The i'th group will be assigned to the (i+k)%n'th core at any point in time, where k is the number of rotations that have occurred in the session, and n is the number of cores. This makes upcoming rotations within the same session predictable.

When a rotation occurs, validator groups are still responsible for distributing availability pieces for any previous cores that are still occupied and pending availability. In practice, rotation and availability-timeout frequencies should be set so this will only be the core they have just been rotated from. It is possible that a validator group is rotated onto a core which is currently occupied. In this case, the validator group will have nothing to do until the previously-assigned group finishes their availability work and frees the core or the availability process times out. Depending on if the core is for a parachain or parathread, a different timeout t from the HostConfiguration will apply. Availability timeouts should only be triggered in the first t-1 blocks after the beginning of a rotation.

Parathreads operate on a system of claims. Collators participate in auctions to stake a claim on authoring the next block of a parathread, although the auction mechanism is beyond the scope of the scheduler. The scheduler guarantees that they'll be given at least a certain number of attempts to author a candidate that is backed. Attempts that fail during the availability phase are not counted, since ensuring availability at that stage is the responsibility of the backing validators, not of the collator. When a claim is accepted, it is placed into a queue of claims, and each claim is assigned to a particular parathread-multiplexing core in advance. Given that the current assignments of validator groups to cores are known, and the upcoming assignments are predictable, it is possible for parathread collators to know who they should be talking to now and how they should begin establishing connections with as a fallback.

With this information, the Node-side can be aware of which parathreads have a good chance of being includable within the relay-chain block and can focus any additional resources on backing candidates from those parathreads. Furthermore, Node-side code is aware of which validator group will be responsible for that thread. If the necessary conditions are reached for core reassignment, those candidates can be backed within the same block as the core being freed.

Parathread claims, when scheduled onto a free core, may not result in a block pending availability. This may be due to collator error, networking timeout, or censorship by the validator group. In this case, the claims should be retried a certain number of times to give the collator a fair shot.

Cores are treated as an ordered list of cores and are typically referred to by their index in that list.

Storage

Utility structs:

// A claim on authoring the next block for a given parathread.
struct ParathreadClaim(ParaId, CollatorId);

// An entry tracking a claim to ensure it does not pass the maximum number of retries.
struct ParathreadEntry {
  claim: ParathreadClaim,
  retries: u32,
}

// A queued parathread entry, pre-assigned to a core.
struct QueuedParathread {
	claim: ParathreadEntry,
	core: CoreIndex,
}

struct ParathreadQueue {
	queue: Vec<QueuedParathread>,
	// this value is between 0 and config.parathread_cores
	next_core: CoreIndex,
}

enum CoreOccupied {
  Parathread(ParathreadEntry), // claim & retries
  Parachain,
}

struct CoreAssignment {
  core: CoreIndex,
  para_id: ParaId,
  collator: Option<CollatorId>,
  group_idx: GroupIndex,
}

Storage layout:

/// All the validator groups. One for each core.
ValidatorGroups: Vec<Vec<ValidatorIndex>>;
/// A queue of upcoming claims and which core they should be mapped onto.
ParathreadQueue: ParathreadQueue;
/// One entry for each availability core. Entries are `None` if the core is not currently occupied. Can be
/// temporarily `Some` if scheduled but not occupied.
/// The i'th parachain belongs to the i'th core, with the remaining cores all being
/// parathread-multiplexers.
AvailabilityCores: Vec<Option<CoreOccupied>>;
/// An index used to ensure that only one claim on a parathread exists in the queue or is
/// currently being handled by an occupied core.
ParathreadClaimIndex: Vec<ParaId>;
/// The block number where the session start occurred. Used to track how many group rotations have occurred.
SessionStartBlock: BlockNumber;
/// Currently scheduled cores - free but up to be occupied. Ephemeral storage item that's wiped on finalization.
Scheduled: Vec<CoreAssignment>, // sorted ascending by CoreIndex.

Session Change

Session changes are the only time that configuration can change, and the configuration module's session-change logic is handled before this module's. We also lean on the behavior of the inclusion module which clears all its occupied cores on session change. Thus we don't have to worry about cores being occupied across session boundaries and it is safe to re-size the AvailabilityCores bitfield.

Actions:

Set SessionStartBlock to current block number.
Clear all Some members of AvailabilityCores. Return all parathread claims to queue with retries un-incremented.
Set configuration = Configuration::configuration() (see HostConfiguration)
Resize AvailabilityCores to have length Paras::parachains().len() + configuration.parathread_cores with all None` entries.
Compute new validator groups by shuffling using a secure randomness beacon
- We need a total of N = Paras::parathreads().len() + configuration.parathread_cores validator groups.
- The total number of validators V in the SessionChangeNotification's validators may not be evenly divided by V.
- First, we obtain "shuffled validators" SV by shuffling the validators using the SessionChangeNotification's random seed.
- The groups are selected by partitioning SV. The first V % N groups will have (V / N) + 1 members, while the remaining groups will have (V / N) members each.
Prune the parathread queue to remove all retries beyond configuration.parathread_retries.
- all pruned claims should have their entry removed from the parathread index.
- assign all non-pruned claims to new cores if the number of parathread cores has changed between the new_config and old_config of the SessionChangeNotification.
- Assign claims in equal balance across all cores if rebalancing, and set the next_core of the ParathreadQueue by incrementing the relative index of the last assigned core and taking it modulo the number of parathread cores.

Initialization

Schedule free cores using the schedule(Vec::new()).

Finalization

Actions:

Free all scheduled cores and return parathread claims to queue, with retries incremented.

Routines

add_parathread_claim(ParathreadClaim): Add a parathread claim to the queue.
- Fails if any parathread claim on the same parathread is currently indexed.
- Fails if the queue length is >= config.scheduling_lookahead * config.parathread_cores.
- The core used for the parathread claim is the next_core field of the ParathreadQueue and adding Paras::parachains().len() to it.
- next_core is then updated by adding 1 and taking it modulo config.parathread_cores.
- The claim is then added to the claim index.
schedule(Vec<CoreIndex>): schedule new core assignments, with a parameter indicating previously-occupied cores which are to be considered returned.
- All freed parachain cores should be assigned to their respective parachain
- All freed parathread cores should have the claim removed from the claim index.
- All freed parathread cores should take the next parathread entry from the queue.
- The i'th validator group will be assigned to the (i+k)%n'th core at any point in time, where k is the number of rotations that have occurred in the session, and n is the total number of cores. This makes upcoming rotations within the same session predictable.
scheduled() -> Vec<CoreAssignment>: Get currently scheduled core assignments.
`occupied(Vec). Note that the given cores have become occupied.
- Fails if any given cores were not scheduled.
- Fails if the given cores are not sorted ascending by core index
- This clears them from Scheduled and marks each corresponding core in the AvailabilityCores as occupied.
- Since both the availability cores and the newly-occupied cores lists are sorted ascending, this method can be implemented efficiently.
core_para(CoreIndex) -> ParaId: return the currently-scheduled or occupied ParaId for the given core.
group_validators(GroupIndex) -> Option<Vec<ValidatorIndex>>: return all validators in a given group, if the group index is valid for this session.
availability_timeout_predicate() -> Option<impl Fn(CoreIndex, BlockNumber) -> bool>: returns an optional predicate that should be used for timing out occupied cores. if None, no timing-out should be done. The predicate accepts the index of the core, and the block number since which it has been occupied. The predicate should be implemented based on the time since the last validator group rotation, and the respective parachain and parathread timeouts, i.e. only within max(config.chain_availability_period, config.thread_availability_period) of the last rotation would this return Some.

The Inclusion Module

Description

The inclusion module is responsible for inclusion and availability of scheduled parachains and parathreads.

Storage

Helper structs:

struct AvailabilityBitfield {
  bitfield: BitVec, // one bit per core.
  submitted_at: BlockNumber, // for accounting, as meaning of bits may change over time.
}

struct CandidatePendingAvailability {
  core: CoreIndex, // availability core
  receipt: AbridgedCandidateReceipt,
  availability_votes: Bitfield, // one bit per validator.
  relay_parent_number: BlockNumber, // number of the relay-parent.
  backed_in_number: BlockNumber,
}

Storage Layout:

/// The latest bitfield for each validator, referred to by index.
bitfields: map ValidatorIndex => AvailabilityBitfield;
/// Candidates pending availability.
PendingAvailability: map ParaId => CandidatePendingAvailability;

[TODO: CandidateReceipt and AbridgedCandidateReceipt can contain code upgrades which make them very large. the code entries should be split into a different storage map with infrequent access patterns]

Session Change

Clear out all candidates pending availability.
Clear out all validator bitfields.

Routines

All failed checks should lead to an unrecoverable error making the block invalid.

process_bitfields(Bitfields, core_lookup: Fn(CoreIndex) -> Option<ParaId>):
1. check that the number of bitfields and bits in each bitfield is correct.
2. check that there are no duplicates
3. check all validator signatures.
4. apply each bit of bitfield to the corresponding pending candidate. looking up parathread cores using the core_lookup. Disregard bitfields that have a 1 bit for any free cores.
5. For each applied bit of each availability-bitfield, set the bit for the validator in the CandidatePendingAvailability's availability_votes bitfield. Track all candidates that now have >2/3 of bits set in their availability_votes. These candidates are now available and can be enacted.
6. For all now-available candidates, invoke the enact_candidate routine with the candidate and relay-parent number.
7. [TODO] pass it onwards to Validity module.
8. Return a list of freed cores consisting of the cores where candidates have become available.
process_candidates(BackedCandidates, scheduled: Vec<CoreAssignment>):
1. check that each candidate corresponds to a scheduled core and that they are ordered in ascending order by ParaId.
2. Ensure that any code upgrade scheduled by the candidate does not happen within config.validation_upgrade_frequency of the currently scheduled upgrade, if any, comparing against the value of Paras::FutureCodeUpgrades for the given para ID.
3. check the backing of the candidate using the signatures and the bitfields.
4. create an entry in the PendingAvailability map for each backed candidate with a blank availability_votes bitfield.
5. Return a Vec<CoreIndex> of all scheduled cores of the list of passed assignments that a candidate was successfully backed for, sorted ascending by CoreIndex.
enact_candidate(relay_parent_number: BlockNumber, AbridgedCandidateReceipt):
1. If the receipt contains a code upgrade, Call Paras::schedule_code_upgrade(para_id, code, relay_parent_number + config.validationl_upgrade_delay). [TODO] Note that this is safe as long as we never enact candidates where the relay parent is across a session boundary. In that case, which we should be careful to avoid with contextual execution, the configuration might have changed and the para may de-sync from the host's understanding of it.
2. Call Paras::note_new_head using the HeadData from the receipt and relay_parent_number.

collect_pending:

  fn collect_pending(f: impl Fn(CoreIndex, BlockNumber) -> bool) -> Vec<u32> {
    // sweep through all paras pending availability. if the predicate returns true, when given the core index and
    // the block number the candidate has been pending availability since, then clean up the corresponding storage for that candidate.
    // return a vector of cleaned-up core IDs.
  }

The InclusionInherent Module

Description

This module is responsible for all the logic carried by the Inclusion entry-point. This entry-point is mandatory, in that it must be invoked exactly once within every block, and it is also "inherent", in that it is provided with no origin by the block author. The data within it carries its own authentication. If any of the steps within fails, the entry-point is considered as having failed and the block will be invalid.

This module does not have the same initialization/finalization concerns as the others, as it only requires that entry points be triggered after all modules have initialized and that finalization happens after entry points are triggered. Both of these are assumptions we have already made about the runtime's order of operations, so this module doesn't need to be initialized or finalized by the Initializer.

Storage

Included: Option<()>,

Finalization

Take (get and clear) the value of Included. If it is not Some, throw an unrecoverable error.

Entry Points

inclusion: This entry-point accepts two parameters: Bitfields and BackedCandidates.
1. The Bitfields are first forwarded to the process_bitfields routine, returning a set of freed cores. Provide a Scheduler::core_para as a core-lookup to the process_bitfields routine.
2. If Scheduler::availability_timeout_predicate is Some, invoke Inclusion::collect_pending using it, and add timed-out cores to the free cores.
3. Invoke Scheduler::schedule(freed)
4. Pass the BackedCandidates along with the output of Scheduler::scheduled to the Inclusion::process_candidates routine, getting a list of all newly-occupied cores.
5. Call Scheduler::occupied for all scheduled cores where a backed candidate was submitted.
6. If all of the above succeeds, set Included to Some(()).

The Validity Module

[TODO: store all included candidate and attestations on them here. accept additional backing after the fact. accept reports based on VRF. candidate included in session S should only be reported on by validator keys from session S. trigger slashing. probably only slash for session S even if the report was submitted in session S+k because it is hard to unify identity]

Architecture: Node-side

Design Goals

Modularity: Components of the system should be as self-contained as possible. Communication boundaries between components should be well-defined and mockable. This is key to creating testable, easily reviewable code.
Minimizing side effects: Components of the system should aim to minimize side effects and to communicate with other components via message-passing.
Operational Safety: The software will be managing signing keys where conflicting messages can lead to large amounts of value to be slashed. Care should be taken to ensure that no messages are signed incorrectly or in conflict with each other.

The architecture of the node-side behavior aims to embody the Rust principles of ownership and message-passing to create clean, isolatable code. Each resource should have a single owner, with minimal sharing where unavoidable.

Many operations that need to be carried out involve the network, which is asynchronous. This asynchrony affects all core subsystems that rely on the network as well. The approach of hierarchical state machines is well-suited to this kind of environment.

We introduce a hierarchy of state machines consisting of an overseer supervising subsystems, where Subsystems can contain their own internal hierarchy of jobs. This is elaborated on in the next section on Subsystems.

Subsystems and Jobs

In this section we define the notions of Subsystems and Jobs. These are guidelines for how we will employ an architecture of hierarchical state machines. We'll have a top-level state machine which oversees the next level of state machines which oversee another layer of state machines and so on. The next sections will lay out these guidelines for what we've called subsystems and jobs, since this model applies to many of the tasks that the Node-side behavior needs to encompass, but these are only guidelines and some Subsystems may have deeper hierarchies internally.

Subsystems are long-lived worker tasks that are in charge of performing some particular kind of work. All subsystems can communicate with each other via a well-defined protocol. Subsystems can't communicate directly, but must communicate through an Overseer, which is responsible for relaying messages, handling subsystem failures, and dispatching work signals.

Most work that happens on the Node-side is related to building on top of a specific relay-chain block, which is contextually known as the "relay parent". We call it the relay parent to explicitly denote that it is a block in the relay chain and not on a parachain. We refer to the parent because when we are in the process of building a new block, we don't know what that new block is going to be. The parent block is our only stable point of reference, even though it is usually only useful when it is not yet a parent but in fact a leaf of the block-DAG expected to soon become a parent (because validators are authoring on top of it). Furthermore, we are assuming a forkful blockchain-extension protocol, which means that there may be multiple possible children of the relay-parent. Even if the relay parent has multiple children blocks, the parent of those children is the same, and the context in which those children is authored should be the same. The parent block is the best and most stable reference to use for defining the scope of work items and messages, and is typically referred to by its cryptographic hash.

Since this goal of determining when to start and conclude work relative to a specific relay-parent is common to most, if not all subsystems, it is logically the job of the Overseer to distribute those signals as opposed to each subsystem duplicating that effort, potentially being out of synchronization with each other. Subsystem A should be able to expect that subsystem B is working on the same relay-parents as it is. One of the Overseer's tasks is to provide this heartbeat, or synchronized rhythm, to the system.

The work that subsystems spawn to be done on a specific relay-parent is known as a job. Subsystems should set up and tear down jobs according to the signals received from the overseer. Subsystems may share or cache state between jobs.

Overseer

The overseer is responsible for these tasks:

Setting up, monitoring, and handing failure for overseen subsystems.
Providing a "heartbeat" of which relay-parents subsystems should be working on.
Acting as a message bus between subsystems.

The hierarchy of subsystems:

+--------------+      +------------------+    +--------------------+
|              |      |                  |---->   Subsystem A      |
| Block Import |      |                  |    +--------------------+
|    Events    |------>                  |    +--------------------+
+--------------+      |                  |---->   Subsystem B      |
                      |   Overseer       |    +--------------------+
+--------------+      |                  |    +--------------------+
|              |      |                  |---->   Subsystem C      |
| Finalization |------>                  |    +--------------------+
|    Events    |      |                  |    +--------------------+
|              |      |                  |---->   Subsystem D      |
+--------------+      +------------------+    +--------------------+

The overseer determines work to do based on block import events and block finalization events (TODO: are finalization events needed?). It does this by keeping track of the set of relay-parents for which work is currently being done. This is known as the "active leaves" set. It determines an initial set of active leaves on startup based on the data on-disk, and uses events about blockchain import to update the active leaves. Updates lead to OverseerSignal::StartWork and OverseerSignal::StopWork being sent according to new relay-parents, as well as relay-parents to stop considering.

The overseer's logic can be described with these functions:

On Startup

Start all subsystems
Determine all blocks of the blockchain that should be built on. This should typically be the head of the best fork of the chain we are aware of. Sometimes add recent forks as well.
For each of these blocks, send an OverseerSignal::StartWork to all subsystems.
Begin listening for block import events.

On Block Import Event

Apply the block import event to the active leaves. A new block should lead to its addition to the active leaves set and its parent being deactivated.
For any deactivated leaves send an OverseerSignal::StopWork message to all subsystems.
For any activated leaves send an OverseerSignal::StartWork message to all subsystems.

(TODO: in the future, we may want to avoid building on too many sibling blocks at once. the notion of a "preferred head" among many competing sibling blocks would imply changes in our "active set" update rules here)

On Subsystem Failure

Subsystems are essential tasks meant to run as long as the node does. Subsystems can spawn ephemeral work in the form of jobs, but the subsystems themselves should not go down. If a subsystem goes down, it will be because of a critical error that should take the entire node down as well.

Communication Between Subsystems

When a subsystem wants to communicate with another subsystem, or, more typically, a job within a subsystem wants to communicate with its counterpart under another subsystem, that communication must happen via the overseer. Consider this example where a job on subsystem A wants to send a message to its counterpart under subsystem B. This is a realistic scenario, where you can imagine that both jobs correspond to work under the same relay-parent.

     +--------+                                                           +--------+
     |        |                                                           |        |
     |Job A-1 | (sends message)                       (receives message)  |Job B-1 |
     |        |                                                           |        |
     +----|---+                                                           +----^---+
          |                  +------------------------------+                  ^
          v                  |                              |                  |
+---------v---------+        |                              |        +---------|---------+
|                   |        |                              |        |                   |
| Subsystem A       |        |       Overseer / Message     |        | Subsystem B       |
|                   -------->>                  Bus         -------->>                   |
|                   |        |                              |        |                   |
+-------------------+        |                              |        +-------------------+
                             |                              |
                             +------------------------------+

First, the subsystem that spawned a job is responsible for handling the first step of the communication. The overseer is not aware of the hierarchy of tasks within any given subsystem and is only responsible for subsystem-to-subsystem communication. So the sending subsystem must pass on the message via the overseer to the receiving subsystem, in such a way that the receiving subsystem can further address the communication to one of its internal tasks, if necessary.

This communication prevents a certain class of race conditions. When the Overseer determines that it is time for subsystems to begin working on top of a particular relay-parent, it will dispatch a StartWork message to all subsystems to do so, and those messages will be handled asynchronously by those subsystems. Some subsystems will receive those messsages before others, and it is important that a message sent by subsystem A after receiving StartWork message will arrive at subsystem B after its StartWork message. If subsystem A maintaned an independent channel with subsystem B to communicate, it would be possible for subsystem B to handle the side message before the StartWork message, but it wouldn't have any logical course of action to take with the side message - leading to it being discarded or improperly handled. Well-architectured state machines should have a single source of inputs, so that is what we do here.

It's important to note that the overseer is not aware of the internals of subsystems, and this extends to the jobs that they spawn. The overseer isn't aware of the existence or definition of those jobs, and is only aware of the outer subsystems with which it interacts. This gives subsystem implementations leeway to define internal jobs as they see fit, and to wrap a more complex hierarchy of state machines than having a single layer of jobs for relay-parent-based work. Likewise, subsystems aren't required to spawn jobs. Certain types of subsystems, such as those for shared storage or networking resources, won't perform block-based work but would still benefit from being on the Overseer's message bus. These subsystems can just ignore the overseer's signals for block-based work.

Furthermore, the protocols by which subsystems communicate with each other should be well-defined irrespective of the implementation of the subsystem. In other words, their interface should be distinct from their implementation. This will prevent subsystems from accessing aspects of each other that are beyond the scope of the communication boundary.

Candidate Backing Subsystem

Description

The Candidate Backing subsystem is engaged in by validators in to contribute to the backing of parachain candidates submitted by other validators.

Its role is to produce backable candidates for inclusion in new relay-chain blocks. It does so by issuing signed Statements and tracking received statements signed by other validators. Once enough statements are received, they can be combined into backing for specific candidates.

It also detects double-vote misbehavior by validators as it imports votes, passing on the misbehavior to the correct reporter and handler.

When run as a validator, this is the subsystem which actually validates incoming candidates.

Protocol

This subsystem receives messages of the type CandidateBackingSubsystemMessage.

Functionality

The subsystem should maintain a set of handles to Candidate Backing Jobs that are currently live, as well as the relay-parent to which they correspond.

On Overseer Signal

If the signal is an OverseerSignal::StartWork(relay_parent), spawn a Candidate Backing Job with the given relay parent, storing a bidirectional channel with the Candidate Backing Job in the set of handles.
If the signal is an OverseerSignal::StopWork(relay_parent), cease the Candidate Backing Job under that relay parent, if any.

On CandidateBackingSubsystemMessage

If the message corresponds to a particular relay-parent, forward the message to the Candidate Backing Job for that relay-parent, if any is live.

(big TODO: "contextual execution"

At the moment we only allow inclusion of new parachain candidates validated by current validators.
Allow inclusion of old parachain candidates validated by current validators.
Allow inclusion of old parachain candidates validated by old validators.

This will probably blur the lines between jobs, will probably require inter-job communication and a short-term memory of recently backable, but not backed candidates. )

Candidate Backing Job

The Candidate Backing Job represents the work a node does for backing candidates with respect to a particular relay-parent.

The goal of a Candidate Backing Job is to produce as many backable candidates as possible. This is done via signed Statements by validators. If a candidate receives a majority of supporting Statements from the Parachain Validators currently assigned, then that candidate is considered backable.

on startup

Fetch current validator set, validator -> parachain assignments from runtime API.
Determine if the node controls a key in the current validator set. Call this the local key if so.
If the local key exists, extract the parachain head and validation function for the parachain the local key is assigned to.

on receiving new signed Statement

if let Statement::Seconded(candidate) = signed.statement {
  if candidate is unknown and in local assignment {
    spawn_validation_work(candidate, parachain head, validation function)
  }
}

spawning validation work

fn spawn_validation_work(candidate, parachain head, validation function) {
  asynchronously {
    let pov = (fetch pov block).await

    // dispatched to sub-process (OS process) pool.
    let valid = validate_candidate(candidate, validation function, parachain head, pov).await;
    if valid {
      // make PoV available for later distribution.
      // sign and dispatch `valid` statement to network if we have not seconded the given candidate.
    } else {
      // sign and dispatch `invalid` statement to network.
    }
  }
}

fetch pov block

Create a (sender, receiver) pair. Dispatch a PovFetchSubsystemMessage(relay_parent, candidate_hash, sender) and listen on the receiver for a response.

on receiving CandidateBackingSubsystemMessage

If the message is a CandidateBackingSubsystemMessage::RegisterBackingWatcher, register the watcher and trigger it each time a new candidate is backable. Also trigger it once initially if there are any backable candidates at the time of receipt.
If the message is a CandidateBackingSubsystemMessage::Second, sign and dispatch a Seconded statement only if we have not seconded any other candidate and have not signed a Valid statement for the requested candidate. Signing both a Seconded and Valid message is a double-voting misbehavior with a heavy penalty, and this could occur if another validator has seconded the same candidate and we've received their message before the internal seconding request.

(TODO: send statements to Statement Distribution subsystem, handle shutdown signal from candidate backing subsystem)

[TODO: subsystems for gathering data necessary for block authorship, for networking, for misbehavior reporting, etc.]

Data Structures and Types

[TODO]

CandidateReceipt
CandidateCommitments
AbridgedCandidateReceipt
GlobalValidationSchedule
LocalValidationData (should commit to code hash too?)

Block Import Event

/// Indicates that a new block has been added to the blockchain.
struct BlockImportEvent {
  /// The block header-hash.
  hash: Hash,
  /// The header itself.
  header: Header,
  /// Whether this block is considered the head of the best chain according to the
  /// event emitter's fork-choice rule.
  new_best: bool,
}

Block Finalization Event

/// Indicates that a new block has been finalized.
struct BlockFinalizationEvent {
  /// The block header-hash.
  hash: Hash,
  /// The header of the finalized block.
  header: Header,
}

Statement Type

/// A statement about the validity of a parachain candidate.
enum Statement {
  /// A statement about a new candidate being seconded by a validator. This is an implicit validity vote.
  Seconded(CandidateReceipt),
  /// A statement about the validity of a candidate, based on candidate's hash.
  Valid(Hash),
  /// A statement about the invalidity of a candidate.
  Invalid(Hash),
}

Signed Statement Type

The actual signed payload should reference only the hash of the CandidateReceipt, even in the Seconded case and should include a relay parent which provides context to the signature. This prevents against replay attacks and allows the candidate receipt itself to be omitted when checking a signature on a Seconded statement.

/// A signed statement.
struct SignedStatement {
  statement: Statement,
  signed: ValidatorId,
  signature: Signature
}

Overseer Signal

Signals from the overseer to a subsystem to request change in execution that has to be obeyed by the subsystem.

enum OverseerSignal {
  /// Signal to start work localized to the relay-parent hash.
  StartWork(Hash),
  /// Signal to stop (or phase down) work localized to the relay-parent hash.
  StopWork(Hash),
}

Candidate Backing subsystem Message

enum CandidateBackingSubsystemMessage {
  /// Registers a stream listener for updates to the set of backable candidates that could be backed
  /// in a child of the given relay-parent, referenced by its hash.
  RegisterBackingWatcher(Hash, TODO),
  /// Note that the Candidate Backing subsystem should second the given candidate in the context of the
  /// given relay-parent (ref. by hash). This candidate must be validated.
  Second(Hash, CandidateReceipt)
}

Host Configuration

The internal-to-runtime configuration of the parachain host. This is expected to be altered only by governance procedures.

struct HostConfiguration {
  /// The minimum frequency at which parachains can update their validation code.
  pub validation_upgrade_frequency: BlockNumber,
  /// The delay, in blocks, before a validation upgrade is applied.
  pub validation_upgrade_delay: BlockNumber,
  /// The acceptance period, in blocks. This is the amount of blocks after availability that validators
  /// and fishermen have to perform secondary checks or issue reports.
  pub acceptance_period: BlockNumber,
  /// The maximum validation code size, in bytes.
  pub max_code_size: u32,
  /// The maximum head-data size, in bytes.
  pub max_head_data_size: u32,
  /// The amount of availability cores to dedicate to parathreads.
  pub parathread_cores: u32,
  /// The number of retries that a parathread author has to submit their block.
  pub parathread_retries: u32,
  /// How often parachain groups should be rotated across parachains.
  pub parachain_rotation_frequency: BlockNumber,
  /// The availability period, in blocks, for parachains. This is the amount of blocks
  /// after inclusion that validators have to make the block available and signal its availability to
  /// the chain. Must be at least 1.
  pub chain_availability_period: BlockNumber,
  /// The availability period, in blocks, for parathreads. Same as the `chain_availability_period`,
  /// but a differing timeout due to differing requirements. Must be at least 1.
  pub thread_availability_period: BlockNumber,
  /// The amount of blocks ahead to schedule parathreads.
  pub scheduling_lookahead: u32,
}

Signed Availability Bitfield

A bitfield signed by a particular validator about the availability of pending candidates.

struct SignedAvailabilityBitfield {
  validator_index: ValidatorIndex,
  bitfield: Bitvec,
  signature: ValidatorSignature, // signature is on payload: bitfield ++ relay_parent ++ validator index
}

struct Bitfields(Vec<(SignedAvailabilityBitfield)>), // bitfields sorted by validator index, ascending

Validity Attestation

An attestation of validity for a candidate, used as part of a backing. Both the Seconded and Valid statements are considered attestations of validity. This structure is only useful where the candidate referenced is apparent.

enum ValidityAttestation {
  /// Implicit validity attestation by issuing.
  /// This corresponds to issuance of a `Seconded` statement.
  Implicit(ValidatorSignature),
  /// An explicit attestation. This corresponds to issuance of a
  /// `Valid` statement.
  Explicit(ValidatorSignature),
}

Backed Candidate

A CandidateReceipt along with all data necessary to prove its backing. This is submitted to the relay-chain to process and move along the candidate to the pending-availability stage.

struct BackedCandidate {
  candidate: AbridgedCandidateReceipt,
  validity_votes: Vec<ValidityAttestation>,
  // the indices of validators who signed the candidate within the group. There is no need to include
  // bit for any validators who are not in the group, so this is more compact.
  validator_indices: BitVec,
}

struct BackedCandidates(Vec<BackedCandidate>); // sorted by para-id.

Glossary

Here you can find definitions of a bunch of jargon, usually specific to the Polkadot project.

BABE: (Blind Assignment for Blockchain Extension). The algorithm validators use to safely extend the Relay Chain. See the Polkadot wiki for more information.
Backable Candidate: A Parachain Candidate which is backed by a majority of validators assigned to a given parachain.
Backed Candidate: A Backable Candidate noted in a relay-chain block
Backing: A set of statements proving that a Parachain Candidate is backable.
Collator: A node who generates Proofs-of-Validity (PoV) for blocks of a specific parachain.
Extrinsic: An element of a relay-chain block which triggers a specific entry-point of a runtime module with given arguments.
GRANDPA: (Ghost-based Recursive ANcestor Deriving Prefix Agreement). The algorithm validators use to guarantee finality of the Relay Chain.
Inclusion Pipeline: The set of steps taken to carry a Parachain Candidate from authoring, to backing, to availability and full inclusion in an active fork of its parachain.
Module: A component of the Runtime logic, encapsulating storage, routines, and entry-points.
Module Entry Point: A recipient of new information presented to the Runtime. This may trigger routines.
Module Routine: A piece of code executed within a module by block initialization, closing, or upon an entry point being triggered. This may execute computation, and read or write storage.
Node: A participant in the Polkadot network, who follows the protocols of communication and connection to other nodes. Nodes form a peer-to-peer network topology without a central authority.
Parachain Candidate, or Candidate: A proposed block for inclusion into a parachain.
Parablock: A block in a parachain.
Parachain: A constituent chain secured by the Relay Chain's validators.
Parachain Validators: A subset of validators assigned during a period of time to back candidates for a specific parachain
Parathread: A parachain which is scheduled on a pay-as-you-go basis.
Proof-of-Validity (PoV): A stateless-client proof that a parachain candidate is valid, with respect to some validation function.
Relay Parent: A block in the relay chain, referred to in a context where work is being done in the context of the state at this block.
Runtime: The relay-chain state machine.
Runtime Module: See Module.
Runtime API: A means for the node-side behavior to access structured information based on the state of a fork of the blockchain.
Secondary Checker: A validator who has been randomly selected to perform secondary checks on a parablock which is pending approval.
Subsystem: A long-running task which is responsible for carrying out a particular category of work.
Validator: Specially-selected node in the network who is responsible for validating parachain blocks and issuing attestations about their validity.
Validation Function: A piece of Wasm code that describes the state-transition function of a parachain.

Also of use is the Substrate Glossary.

Index

Polkadot Wiki on Consensus: https://wiki.polkadot.network/docs/en/learn-consensus
Polkadot Runtime Spec: https://github.com/w3f/polkadot-spec/tree/spec-rt-anv-vrf-gen-and-announcement/runtime-spec

86 KiB Raw Blame History

The Polkadot Parachain Host Implementers' Guide

Ramble / Preamble

Table of Contents

Origins

Issue 1: Scalability

Issue 2: Flexibility / Specialization

Parachains: Basic Functionality

Architecture

Architecture: Runtime

Broad Strokes

The Initializer Module

Description

Storage

Initialization

Session Change

Finalization

The Configuration Module

Description

Storage

Session change

Routines

Entry-points

The Paras Module

Description

Storage

Session Change

Initialization

Routines

Finalization

The Scheduler Module

Description

Storage

Session Change

Initialization

Finalization

Routines

The Inclusion Module

Description

Storage

Session Change

Routines

The InclusionInherent Module

Description

Storage

Finalization

Entry Points

The Validity Module

Architecture: Node-side

Subsystems and Jobs

Overseer

Candidate Backing Subsystem

Description

Protocol

Functionality

Candidate Backing Job

Data Structures and Types

Block Import Event

Block Finalization Event

Statement Type

Signed Statement Type

Overseer Signal

Candidate Backing subsystem Message

Host Configuration

Signed Availability Bitfield

Validity Attestation

Backed Candidate

Glossary

Index

86 KiB

Raw Blame History