Markdown linter (#1309)

* Add markdown linting

- add linter default rules
- adapt rules to current code
- fix the code for linting to pass
- add CI check

fix #1243

* Fix markdown for Substrate
* Fix tooling install
* Fix workflow
* Add documentation
* Remove trailing spaces
* Update .github/.markdownlint.yaml

Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
* Fix mangled markdown/lists
* Fix captalization issues on known words
This commit is contained in:
Chevdor
2023-09-04 11:02:32 +02:00
committed by GitHub
parent 830fde2a60
commit a30092ab42
271 changed files with 6289 additions and 4450 deletions
+210
View File
@@ -0,0 +1,210 @@
# Default state for all rules
default: true
# Path to configuration file to extend
extends: null
# MD001/heading-increment/header-increment - Heading levels should only increment by one level at a time
MD001: true
# MD002/first-heading-h1/first-header-h1 - First heading should be a top-level heading
MD002:
# Heading level
level: 1
# MD003/heading-style/header-style - Heading style
MD003:
# Heading style
style: "consistent"
# MD004/ul-style - Unordered list style
MD004:
# List style
style: "consistent"
# MD005/list-indent - Inconsistent indentation for list items at the same level
MD005: false
# MD006/ul-start-left - Consider starting bulleted lists at the beginning of the line
MD006: false
# MD007/ul-indent - Unordered list indentation
MD007: false
# MD009/no-trailing-spaces - Trailing spaces
MD009:
# Spaces for line break
br_spaces: 2
# Allow spaces for empty lines in list items
list_item_empty_lines: false
# Include unnecessary breaks
strict: false
# MD010/no-hard-tabs - Hard tabs
MD010: false
# MD011/no-reversed-links - Reversed link syntax
MD011: true
# MD012/no-multiple-blanks - Multiple consecutive blank lines
MD012:
# Consecutive blank lines
maximum: 2
# MD013/line-length - Line length
MD013:
# Number of characters
line_length: 120
# Number of characters for headings
heading_line_length: 120
# Number of characters for code blocks
code_block_line_length: 150
# Include code blocks
code_blocks: true
# Include tables
tables: true
# Include headings
headings: true
# Include headings
headers: true
# Strict length checking
strict: false
# Stern length checking
stern: false
# MD014/commands-show-output - Dollar signs used before commands without showing output
MD014: true
# MD018/no-missing-space-atx - No space after hash on atx style heading
MD018: true
# MD019/no-multiple-space-atx - Multiple spaces after hash on atx style heading
MD019: true
# MD020/no-missing-space-closed-atx - No space inside hashes on closed atx style heading
MD020: true
# MD021/no-multiple-space-closed-atx - Multiple spaces inside hashes on closed atx style heading
MD021: true
# MD022/blanks-around-headings/blanks-around-headers - Headings should be surrounded by blank lines
MD022: false
# MD023/heading-start-left/header-start-left - Headings must start at the beginning of the line
MD023: true
# MD024/no-duplicate-heading/no-duplicate-header - Multiple headings with the same content
MD024: false
# MD025/single-title/single-h1 - Multiple top-level headings in the same document
MD025: false
# MD026/no-trailing-punctuation - Trailing punctuation in heading
MD026:
# Punctuation characters
punctuation: ".,;:!。,;:!"
# MD027/no-multiple-space-blockquote - Multiple spaces after blockquote symbol
MD027: true
# MD028/no-blanks-blockquote - Blank line inside blockquote
MD028: true
# MD029/ol-prefix - Ordered list item prefix
MD029:
# List style
style: "one_or_ordered"
# MD030/list-marker-space - Spaces after list markers
MD030:
# Spaces for single-line unordered list items
ul_single: 1
# Spaces for single-line ordered list items
ol_single: 1
# Spaces for multi-line unordered list items
ul_multi: 1
# Spaces for multi-line ordered list items
ol_multi: 1
# MD031/blanks-around-fences - Fenced code blocks should be surrounded by blank lines
MD031: false
# MD032/blanks-around-lists - Lists should be surrounded by blank lines
MD032: false
# MD033/no-inline-html - Inline HTML
MD033: false
# MD034/no-bare-urls - Bare URL used
MD034: false
# MD035/hr-style - Horizontal rule style
MD035:
# Horizontal rule style
style: "consistent"
# MD036/no-emphasis-as-heading/no-emphasis-as-header - Emphasis used instead of a heading
MD036: false
# MD037/no-space-in-emphasis - Spaces inside emphasis markers
MD037: true
# MD038/no-space-in-code - Spaces inside code span elements
MD038: true
# MD039/no-space-in-links - Spaces inside link text
MD039: true
# MD040/fenced-code-language - Fenced code blocks should have a language specified
MD040: false
# MD041/first-line-heading/first-line-h1 - First line in a file should be a top-level heading
MD041: false
# MD042/no-empty-links - No empty links
MD042: true
# MD043/required-headings/required-headers - Required heading structure
MD043: false
# MD044/proper-names - Proper names should have the correct capitalization
MD044:
# List of proper names
names: ["Polkadot", "Substrate", "Cumulus", "Parity"]
# Include code blocks
code_blocks: false
# Include HTML elements
html_elements: false
# MD045/no-alt-text - Images should have alternate text (alt text)
MD045: false
# MD046/code-block-style - Code block style
MD046:
# Block style
style: "consistent"
# MD047/single-trailing-newline - Files should end with a single newline character
MD047: true
# MD048/code-fence-style - Code fence style
MD048:
# Code fence style
style: "consistent"
# MD049/emphasis-style - Emphasis style should be consistent
MD049: false
# MD050/strong-style - Strong style should be consistent
MD050:
# Strong style
style: "consistent"
# MD051/link-fragments - Link fragments should be valid
MD051: false
# MD052/reference-links-images - Reference links and images should use a label that is defined
MD052: false
# MD053/link-image-reference-definitions - Link and image reference definitions should be needed
MD053: false
+33
View File
@@ -0,0 +1,33 @@
name: Check Markdown
on:
pull_request:
types: [opened, synchronize, reopened, ready_for_review]
permissions:
packages: read
jobs:
lint-markdown:
runs-on: ubuntu-latest
steps:
- name: Checkout sources
uses: actions/checkout@v3
- uses: actions/setup-node@v3.8.1
with:
node-version: "18.x"
registry-url: "https://npm.pkg.github.com"
scope: "@paritytech"
- name: Install tooling
run: |
npm install -g markdownlint-cli
markdownlint --version
- name: Check Markdown
env:
CONFIG: .github/.markdownlint.yaml
run: |
markdownlint --config "$CONFIG" --ignore target .
+27 -23
View File
@@ -1,6 +1,6 @@
> NOTE: We have recently made significant changes to our repository structure. In order to
streamline our development process and foster better contributions, we have merged three separate
repositories Cumulus, Substrate and Polkadot into this repository. Read more about the changes [
> NOTE: We have recently made significant changes to our repository structure. In order to streamline our development
process and foster better contributions, we have merged three separate repositories Cumulus, Substrate and Polkadot into
this repository. Read more about the changes [
here](https://polkadot-public.notion.site/Polkadot-SDK-FAQ-fbc4cecc2c46443fb37b9eeec2f0d85f).
# Polkadot SDK
@@ -9,27 +9,29 @@ here](https://polkadot-public.notion.site/Polkadot-SDK-FAQ-fbc4cecc2c46443fb37b9
[![StackExchange](https://img.shields.io/badge/StackExchange-Community%20&%20Support-222222?logo=stackexchange)](https://substrate.stackexchange.com/)
The Polkadot SDK repository provides all the resources needed to start building on the Polkadot
network, a multi-chain blockchain platform that enables different blockchains to interoperate and
share information in a secure and scalable way. The Polkadot SDK comprises three main pieces of
software:
The Polkadot SDK repository provides all the resources needed to start building on the Polkadot network, a multi-chain
blockchain platform that enables different blockchains to interoperate and share information in a secure and scalable
way. The Polkadot SDK comprises three main pieces of software:
## [Polkadot](./polkadot/)
[![PolkadotForum](https://img.shields.io/badge/Polkadot_Forum-e6007a?logo=polkadot)](https://forum.polkadot.network/) [![Polkadot-license](https://img.shields.io/badge/License-GPL3-blue)](./polkadot/LICENSE)
[![PolkadotForum](https://img.shields.io/badge/Polkadot_Forum-e6007a?logo=polkadot)](https://forum.polkadot.network/)
[![Polkadot-license](https://img.shields.io/badge/License-GPL3-blue)](./polkadot/LICENSE)
Implementation of a node for the https://polkadot.network in Rust, using the Substrate framework.
This directory currently contains runtimes for the Polkadot, Kusama, Westend, and Rococo networks.
In the future, these will be relocated to the [`runtimes`](https://github.com/polkadot-fellows/runtimes/) repository.
Implementation of a node for the https://polkadot.network in Rust, using the Substrate framework. This directory
currently contains runtimes for the Polkadot, Kusama, Westend, and Rococo networks. In the future, these will be
relocated to the [`runtimes`](https://github.com/polkadot-fellows/runtimes/) repository.
## [Substrate](./substrate/)
[![SubstrateRustDocs](https://img.shields.io/badge/Rust_Docs-Substrate-24CC85?logo=rust)](https://paritytech.github.io/substrate/master/substrate/index.html) [![Substrate-license](https://img.shields.io/badge/License-GPL3%2FApache2.0-blue)](./substrate/README.md#LICENSE)
[![SubstrateRustDocs](https://img.shields.io/badge/Rust_Docs-Substrate-24CC85?logo=rust)](https://paritytech.github.io/substrate/master/substrate/index.html)
[![Substrate-license](https://img.shields.io/badge/License-GPL3%2FApache2.0-blue)](./substrate/README.md#LICENSE)
Substrate is the primary blockchain SDK used by developers to create the parachains that make up
the Polkadot network. Additionally, it allows for the development of self-sovereign blockchains
that operate completely independently of Polkadot.
Substrate is the primary blockchain SDK used by developers to create the parachains that make up the Polkadot network.
Additionally, it allows for the development of self-sovereign blockchains that operate completely independently of
Polkadot.
## [Cumulus](./cumulus/)
[![CumulusRustDocs](https://img.shields.io/badge/Rust_Docs-Cumulus-222222?logo=rust)](https://paritytech.github.io/cumulus/cumulus_client_collator/index.html) [![Cumulus-license](https://img.shields.io/badge/License-GPL3-blue)](./cumulus/LICENSE)
[![CumulusRustDocs](https://img.shields.io/badge/Rust_Docs-Cumulus-222222?logo=rust)](https://paritytech.github.io/cumulus/cumulus_client_collator/index.html)
[![Cumulus-license](https://img.shields.io/badge/License-GPL3-blue)](./cumulus/LICENSE)
Cumulus is a set of tools for writing Substrate-based Polkadot parachains.
@@ -37,10 +39,10 @@ Cumulus is a set of tools for writing Substrate-based Polkadot parachains.
Below are the primary upstream dependencies utilized in this project:
- [parity-scale-codec](https://crates.io/crates/parity-scale-codec)
- [parity-db](https://crates.io/crates/parity-db)
- [parity-common](https://github.com/paritytech/parity-common)
- [trie](https://github.com/paritytech/trie)
- [`parity-scale-codec`](https://crates.io/crates/parity-scale-codec)
- [`parity-db`](https://crates.io/crates/parity-db)
- [`parity-common`](https://github.com/paritytech/parity-common)
- [`trie`](https://github.com/paritytech/trie)
## Security
@@ -48,9 +50,11 @@ The security policy and procedures can be found in [docs/SECURITY.md](./docs/SEC
## Contributing & Code of Conduct
Ensure you follow our [contribution guidelines](./docs/CONTRIBUTING.md). In every interaction and contribution, this project adheres to the [Contributor Covenant Code of Conduct](./docs/CODE_OF_CONDUCT.md).
Ensure you follow our [contribution guidelines](./docs/CONTRIBUTING.md). In every interaction and contribution, this
project adheres to the [Contributor Covenant Code of Conduct](./docs/CODE_OF_CONDUCT.md).
## Additional Resources
- For monitoring upcoming changes and current proposals related to the technical implementation of the Polkadot network, visit the [`Requests for Comment (RFC)`](https://github.com/polkadot-fellows/RFCs) repository. While it's maintained by the Polkadot Fellowship, the RFC process welcomes contributions from everyone.
- For monitoring upcoming changes and current proposals related to the technical implementation of the Polkadot network,
visit the [`Requests for Comment (RFC)`](https://github.com/polkadot-fellows/RFCs) repository. While it's maintained
by the Polkadot Fellowship, the RFC process welcomes contributions from everyone.
+17 -17
View File
@@ -1,13 +1,13 @@
# Using Parity Bridges Common dependency (`git subtree`).
# Using Parity Bridges Common dependency (`git subtree`)
In `./bridges` sub-directory you can find a `git subtree` imported version of:
[parity-bridges-common](https://github.com/paritytech/parity-bridges-common/) repository.
[`parity-bridges-common`](https://github.com/paritytech/parity-bridges-common/) repository.
(For regular Cumulus contributor 1. is relevant) \
(For Cumulus maintainer 1. and 2. are relevant) \
(For Bridges team 1. and 2. and 3. are relevant)
# 1. How to fix broken Bridges code?
## How to fix broken Bridges code?
To fix Bridges code simply create a commit in current (`Cumulus`) repo. Best if
the commit is isolated to changes in `./bridges` sub-directory, because it makes
@@ -16,7 +16,7 @@ it easier to import that change back to upstream repo.
(Any changes to `bridges` subtree require Bridges team approve and they should manage backport to Bridges repo)
# 2. How to pull latest Bridges code to the `bridges` subtree
## How to pull latest Bridges code to the `bridges` subtree
(in practice)
The `bridges` repo has a stabilized branch `polkadot-staging` dedicated for releasing.
@@ -25,7 +25,7 @@ The `bridges` repo has a stabilized branch `polkadot-staging` dedicated for rele
cd <cumulus-git-repo-dir>
# this will update new git branches from bridges repo
# there could be unresolved conflicts, but dont worry,
# there could be unresolved conflicts, but don't worry,
# lots of them are caused because of removed unneeded files with patch step,
BRANCH=polkadot-staging ./scripts/bridges_update_subtree.sh fetch
@@ -45,9 +45,9 @@ BRANCH=polkadot-staging ./scripts/bridges_update_subtree.sh fetch
# so after all conflicts are solved and patch passes and compiles,
# then we need to finish merge with:
git merge --continue
````
```
# 3. How to pull latest Bridges code or contribute back?
## How to pull latest Bridges code or contribute back?
(in theory)
Note that it's totally fine to ping the **Bridges Team** to do that for you. The point
@@ -58,34 +58,34 @@ If you still would like to either update the code to match latest code from the
or create an upstream PR read below. The following commands should be run in the
current (`polkadot`) repo.
1. Add Bridges repo as a local remote:
### Add Bridges repo as a local remote
```
$ git remote add -f bridges git@github.com:paritytech/parity-bridges-common.git
git remote add -f bridges git@github.com:paritytech/parity-bridges-common.git
```
If you plan to contribute back, consider forking the repository on Github and adding
your personal fork as a remote as well.
```
$ git remote add -f my-bridges git@github.com:tomusdrw/parity-bridges-common.git
git remote add -f my-bridges git@github.com:tomusdrw/parity-bridges-common.git
```
2. To update Bridges:
### To update Bridges
```
git fetch bridges polkadot-staging
git subtree pull --prefix=bridges bridges polkadot-staging --squash
```
$ git fetch bridges polkadot-staging
$ git subtree pull --prefix=bridges bridges polkadot-staging --squash
````
We use `--squash` to avoid adding individual commits and rather squashing them
all into one.
3. Clean unneeded files here:
### Clean unneeded files here
```
./bridges/scripts/verify-pallets-build.sh --ignore-git-state --no-revert
```
4. Contributing back to Bridges (creating upstream PR)
### Contributing back to Bridges (creating upstream PR)
```
$ git subtree push --prefix=bridges my-bridges polkadot-staging
git subtree push --prefix=bridges my-bridges polkadot-staging
```
This command will push changes to your personal fork of Bridges repo, from where
you can simply create a PR to the main repo.
+75 -87
View File
@@ -2,59 +2,53 @@
[![Doc](https://img.shields.io/badge/cumulus%20docs-master-brightgreen)](https://paritytech.github.io/cumulus/)
This repository contains both the Cumulus SDK and also specific chains implemented on top of this
SDK.
This repository contains both the Cumulus SDK and also specific chains implemented on top of this SDK.
If you only want to run a **Polkadot Parachain Node**, check out our [container
section](./docs/container.md).
If you only want to run a **Polkadot Parachain Node**, check out our [container section](./docs/container.md).
## Cumulus SDK
A set of tools for writing [Substrate](https://substrate.io/)-based
[Polkadot](https://wiki.polkadot.network/en/)
[parachains](https://wiki.polkadot.network/docs/en/learn-parachains). Refer to the included
[overview](docs/overview.md) for architectural details, and the [Connect to a relay chain how-to
guide](https://docs.substrate.io/reference/how-to-guides/parachains/connect-to-a-relay-chain/) for a
guided walk-through of using these tools.
A set of tools for writing [Substrate](https://substrate.io/)-based [Polkadot](https://wiki.polkadot.network/en/)
[parachains](https://wiki.polkadot.network/docs/en/learn-parachains). Refer to the included [overview](docs/overview.md)
for architectural details, and the [Connect to a relay chain how-to
guide](https://docs.substrate.io/reference/how-to-guides/parachains/connect-to-a-relay-chain/) for a guided walk-through
of using these tools.
It's easy to write blockchains using Substrate, and the overhead of writing parachains'
distribution, p2p, database, and synchronization layers should be just as low. This project aims to
make it easy to write parachains for Polkadot by leveraging the power of Substrate.
It's easy to write blockchains using Substrate, and the overhead of writing parachains' distribution, p2p, database, and
synchronization layers should be just as low. This project aims to make it easy to write parachains for Polkadot by
leveraging the power of Substrate.
Cumulus clouds are shaped sort of like dots; together they form a system that is intricate,
beautiful and functional.
Cumulus clouds are shaped sort of like dots; together they form a system that is intricate, beautiful and functional.
### Consensus
[`parachain-consensus`](https://github.com/paritytech/polkadot-sdk/blob/master/cumulus/client/consensus/common/src/parachain_consensus.rs)
is a [consensus engine](https://docs.substrate.io/v3/advanced/consensus) for Substrate that follows
a Polkadot [relay chain](https://wiki.polkadot.network/docs/en/learn-architecture#relay-chain). This
will run a Polkadot node internally, and dictate to the client and synchronization algorithms which
chain to follow,
[finalize](https://wiki.polkadot.network/docs/en/learn-consensus#probabilistic-vs-provable-finality),
and treat as best.
is a [consensus engine](https://docs.substrate.io/v3/advanced/consensus) for Substrate that follows a Polkadot [relay
chain](https://wiki.polkadot.network/docs/en/learn-architecture#relay-chain). This will run a Polkadot node internally,
and dictate to the client and synchronization algorithms which chain to follow,
[finalize](https://wiki.polkadot.network/docs/en/learn-consensus#probabilistic-vs-provable-finality), and treat as best.
### Collator
A Polkadot [collator](https://wiki.polkadot.network/docs/en/learn-collator) for the parachain is
implemented by the `polkadot-parachain` binary (previously called `polkadot-collator`).
A Polkadot [collator](https://wiki.polkadot.network/docs/en/learn-collator) for the parachain is implemented by the
`polkadot-parachain` binary (previously called `polkadot-collator`).
You may run `polkadot-parachain` locally after building it or using one of the container option
described [here](./docs/container.md).
You may run `polkadot-parachain` locally after building it or using one of the container option described
[here](./docs/container.md).
### Relay Chain Interaction
To operate a parachain node, a connection to the corresponding relay
chain is necessary. This can be achieved in one of three ways:
1. Run a full relay chain node within the parachain node (default)
2. Connect to an external relay chain node via WebSocket RPC
### Relay Chain Interaction
To operate a parachain node, a connection to the corresponding relay chain is necessary. This can be achieved in one of
three ways:
1. Run a full relay chain node within the parachain node (default)
2. Connect to an external relay chain node via WebSocket RPC
3. Run a light client for the relay chain
#### In-process Relay Chain Node
If an external relay chain node is not specified (default behavior), then a full relay chain node is
spawned within the same process.
If an external relay chain node is not specified (default behavior), then a full relay chain node is spawned within the
same process.
This node has all of the typical components of a regular Polkadot node and will have to fully sync
with the relay chain to work.
This node has all of the typical components of a regular Polkadot node and will have to fully sync with the relay chain
to work.
##### Example command
```bash
@@ -66,19 +60,16 @@ polkadot-parachain \
```
#### External Relay Chain Node
An external relay chain node is connected via WebsSocket RPC by using the
`--relay-chain-rpc-urls` command line argument. This option accepts one or more
space-separated WebSocket URLs to a full relay chain node. By default, only the
first URL will be used, with the rest as a backup in case the connection to the
first node is lost.
An external relay chain node is connected via WebsSocket RPC by using the `--relay-chain-rpc-urls` command line
argument. This option accepts one or more space-separated WebSocket URLs to a full relay chain node. By default, only
the first URL will be used, with the rest as a backup in case the connection to the first node is lost.
Parachain nodes using this feature won't have to fully sync with the relay chain
to work, so in general they will use fewer system resources.
Parachain nodes using this feature won't have to fully sync with the relay chain to work, so in general they will use
fewer system resources.
**Note:** At this time, any parachain nodes using this feature will still spawn a
significantly cut-down relay chain node in-process. Even though they lack the
majority of normal Polkadot subsystems, they will still need to connect directly
to the relay chain network.
**Note:** At this time, any parachain nodes using this feature will still spawn a significantly cut-down relay chain
node in-process. Even though they lack the majority of normal Polkadot subsystems, they will still need to connect
directly to the relay chain network.
##### Example command
@@ -94,17 +85,15 @@ polkadot-parachain \
```
#### Relay Chain Light Client
An internal relay chain light client provides a fast and lightweight approach
for connecting to the relay chain network. It provides relay chain notifications
and facilitates runtime calls.
An internal relay chain light client provides a fast and lightweight approach for connecting to the relay chain network.
It provides relay chain notifications and facilitates runtime calls.
To specify which chain the light client should connect to, users need to supply
a relay chain chain-spec as part of the relay chain arguments.
To specify which chain the light client should connect to, users need to supply a relay chain chain-spec as part of the
relay chain arguments.
**Note:** At this time, any parachain nodes using this feature will still spawn
a significantly cut-down relay chain node in-process. Even though they lack the
majority of normal Polkadot subsystems, they will still need to connect directly
to the relay chain network.
**Note:** At this time, any parachain nodes using this feature will still spawn a significantly cut-down relay chain
node in-process. Even though they lack the majority of normal Polkadot subsystems, they will still need to connect
directly to the relay chain network.
##### Example command
@@ -118,23 +107,22 @@ polkadot-parachain \
```
## Installation and Setup
Before building Cumulus SDK based nodes / runtimes prepare your environment by
following Substrate [installation instructions](https://docs.substrate.io/main-docs/install/).
Before building Cumulus SDK based nodes / runtimes prepare your environment by following Substrate [installation
instructions](https://docs.substrate.io/main-docs/install/).
To launch a local network, you can use [zombienet](https://github.com/paritytech/zombienet)
for quick setup and experimentation or follow the [manual setup](#manual-setup).
To launch a local network, you can use [zombienet](https://github.com/paritytech/zombienet) for quick setup and
experimentation or follow the [manual setup](#manual-setup).
### Zombienet
We use Zombienet to spin up networks for integration tests and local networks.
Follow [these installation steps](https://github.com/paritytech/zombienet#requirements-by-provider)
to set it up on your machine. A simple network specification with two relay chain
nodes and one collator is located at [zombienet/examples/small_network.toml](zombienet/examples/small_network.toml).
We use Zombienet to spin up networks for integration tests and local networks. Follow [these installation
steps](https://github.com/paritytech/zombienet#requirements-by-provider) to set it up on your machine. A simple network
specification with two relay chain nodes and one collator is located at
[zombienet/examples/small_network.toml](zombienet/examples/small_network.toml).
#### Which provider should I use?
Zombienet offers multiple providers to run networks. Choose the one that best fits your needs:
- **Podman:** Choose this if you want to spin up a network quick and easy.
- **Native:** Choose this if you want to develop and deploy your changes. Requires compilation
of the binaries.
- **Native:** Choose this if you want to develop and deploy your changes. Requires compilation of the binaries.
- **Kubernetes:** Choose this for advanced use-cases or running on cloud-infrastructure.
#### How to run
@@ -183,13 +171,16 @@ cargo build --release --bin polkadot-parachain
./target/release/polkadot-parachain export-genesis-wasm > genesis-wasm
# Collator1
./target/release/polkadot-parachain --collator --alice --force-authoring --tmp --port 40335 --rpc-port 9946 -- --chain ../polkadot/rococo-local-cfde.json --port 30335
./target/release/polkadot-parachain --collator --alice --force-authoring \
--tmp --port 40335 --rpc-port 9946 -- --chain ../polkadot/rococo-local-cfde.json --port 30335
# Collator2
./target/release/polkadot-parachain --collator --bob --force-authoring --tmp --port 40336 --rpc-port 9947 -- --chain ../polkadot/rococo-local-cfde.json --port 30336
./target/release/polkadot-parachain --collator --bob --force-authoring \
--tmp --port 40336 --rpc-port 9947 -- --chain ../polkadot/rococo-local-cfde.json --port 30336
# Parachain Full Node 1
./target/release/polkadot-parachain --tmp --port 40337 --rpc-port 9948 -- --chain ../polkadot/rococo-local-cfde.json --port 30337
./target/release/polkadot-parachain --tmp --port 40337 --rpc-port 9948 -- \
--chain ../polkadot/rococo-local-cfde.json --port 30337
```
#### Register the parachain
@@ -199,8 +190,8 @@ cargo build --release --bin polkadot-parachain
## Asset Hub 🪙
This repository also contains the Asset Hub runtimes. Asset Hub is a system parachain
providing an asset store for the Polkadot ecosystem.
This repository also contains the Asset Hub runtimes. Asset Hub is a system parachain providing an asset store for the
Polkadot ecosystem.
### Build & Launch a Node
@@ -228,20 +219,18 @@ See [the `contracts-rococo` readme](parachains/runtimes/contracts/contracts-roco
See [the `bridge-hubs` readme](parachains/runtimes/bridge-hubs/README.md) for details.
## Rococo 👑
[Rococo](https://polkadot.js.org/apps/?rpc=wss://rococo-rpc.polkadot.io) is becoming a
[Community Parachain Testbed](https://polkadot.network/blog/rococo-revamp-becoming-a-community-parachain-testbed/)
for parachain teams in the Polkadot ecosystem. It supports multiple parachains with the
differentiation of long-term connections and recurring short-term connections, to see
which parachains are currently connected and how long they will be connected for
[see here](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Frococo-rpc.polkadot.io#/parachains).
[Rococo](https://polkadot.js.org/apps/?rpc=wss://rococo-rpc.polkadot.io) is becoming a [Community Parachain
Testbed](https://polkadot.network/blog/rococo-revamp-becoming-a-community-parachain-testbed/) for parachain teams in the
Polkadot ecosystem. It supports multiple parachains with the differentiation of long-term connections and recurring
short-term connections, to see which parachains are currently connected and how long they will be connected for [see
here](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Frococo-rpc.polkadot.io#/parachains).
Rococo is an elaborate style of design and the name describes the painstaking effort that
has gone into this project.
Rococo is an elaborate style of design and the name describes the painstaking effort that has gone into this project.
### Build & Launch Rococo Collators
Collators are similar to validators in the relay chain. These nodes build the blocks that
will eventually be included by the relay chain for a parachain.
Collators are similar to validators in the relay chain. These nodes build the blocks that will eventually be included by
the relay chain for a parachain.
To run a Rococo collator you will need to compile the following binary:
@@ -250,8 +239,7 @@ To run a Rococo collator you will need to compile the following binary:
cargo build --release --locked --bin polkadot-parachain
```
Once the executable is built, launch collators for each parachain (repeat once each for chain
`tick`, `trick`, `track`):
Once the executable is built, launch collators for each parachain (repeat once each for chain `tick`, `trick`, `track`):
```bash
./target/release/polkadot-parachain --chain $CHAIN --validator
@@ -261,10 +249,10 @@ You can also build [using a container](./docs/container.md).
### Parachains
* [Asset Hub](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Frococo-statemint-rpc.polkadot.io#/explorer)
* [Contracts on Rococo](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Frococo-contracts-rpc.polkadot.io#/explorer)
* [RILT](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Frococo.kilt.io#/explorer)
- [Asset Hub](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Frococo-statemint-rpc.polkadot.io#/explorer)
- [Contracts on Rococo](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Frococo-contracts-rpc.polkadot.io#/explorer)
- [RILT](https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Frococo.kilt.io#/explorer)
The network uses horizontal message passing (HRMP) to enable communication between
parachains and the relay chain and, in turn, between parachains. This means that every
message is sent to the relay chain, and from the relay chain to its destination parachain.
The network uses horizontal message passing (HRMP) to enable communication between parachains and the relay chain and,
in turn, between parachains. This means that every message is sent to the relay chain, and from the relay chain to its
destination parachain.
+2 -2
View File
@@ -34,9 +34,9 @@ of preference. We see that blockchains are naturally community platforms with u
ultimate decision makers. We assert that good software will maximise user agency by facilitate
user-expression on the network. As such:
- This project will strive to give users as much choice as is both reasonable and possible over what
* This project will strive to give users as much choice as is both reasonable and possible over what
protocol they adhere to; but
- use of the project's technical forums, commenting systems, pull requests and issue trackers as a
* use of the project's technical forums, commenting systems, pull requests and issue trackers as a
means to express individual protocol preferences is forbidden.
## Our Responsibilities
+58 -60
View File
@@ -2,11 +2,10 @@
This is a collection of components for building bridges.
These components include Substrate pallets for syncing headers, passing arbitrary messages, as well
as libraries for building relayers to provide cross-chain communication capabilities.
These components include Substrate pallets for syncing headers, passing arbitrary messages, as well as libraries for
building relayers to provide cross-chain communication capabilities.
Three bridge nodes are also available. The nodes can be used to run test networks which bridge other
Substrate chains.
Three bridge nodes are also available. The nodes can be used to run test networks which bridge other Substrate chains.
🚧 The bridges are currently under construction - a hardhat is recommended beyond this point 🚧
@@ -21,8 +20,8 @@ Substrate chains.
## Installation
To get up and running you need both stable and nightly Rust. Rust nightly is used to build the Web
Assembly (WASM) runtime for the node. You can configure the WASM support as so:
To get up and running you need both stable and nightly Rust. Rust nightly is used to build the Web Assembly (WASM)
runtime for the node. You can configure the WASM support as so:
```bash
rustup install nightly
@@ -38,8 +37,8 @@ cargo build --all
cargo test --all
```
Also you can build the repo with
[Parity CI Docker image](https://github.com/paritytech/scripts/tree/master/dockerfiles/bridges-ci):
Also you can build the repo with [Parity CI Docker
image](https://github.com/paritytech/scripts/tree/master/dockerfiles/bridges-ci):
```bash
docker pull paritytech/bridges-ci:production
@@ -57,16 +56,14 @@ docker run --rm -it -w /shellhere/parity-bridges-common \
If you want to reproduce other steps of CI process you can use the following
[guide](https://github.com/paritytech/scripts#reproduce-ci-locally).
If you need more information about setting up your development environment [Substrate's
Installation page](https://docs.substrate.io/main-docs/install/) is a good
resource.
If you need more information about setting up your development environment [Substrate's Installation
page](https://docs.substrate.io/main-docs/install/) is a good resource.
## High-Level Architecture
This repo has support for bridging foreign chains together using a combination of Substrate pallets
and external processes called relayers. A bridge chain is one that is able to follow the consensus
of a foreign chain independently. For example, consider the case below where we want to bridge two
Substrate based chains.
This repo has support for bridging foreign chains together using a combination of Substrate pallets and external
processes called relayers. A bridge chain is one that is able to follow the consensus of a foreign chain independently.
For example, consider the case below where we want to bridge two Substrate based chains.
```
+---------------+ +---------------+
@@ -82,19 +79,19 @@ Substrate based chains.
+---------------+
```
The Millau chain must be able to accept Rialto headers and verify their integrity. It does this by
using a runtime module designed to track GRANDPA finality. Since two blockchains can't interact
directly they need an external service, called a relayer, to communicate. The relayer will subscribe
to new Rialto headers via RPC and submit them to the Millau chain for verification.
The Millau chain must be able to accept Rialto headers and verify their integrity. It does this by using a runtime
module designed to track GRANDPA finality. Since two blockchains can't interact directly they need an external service,
called a relayer, to communicate. The relayer will subscribe to new Rialto headers via RPC and submit them to the Millau
chain for verification.
Take a look at [Bridge High Level Documentation](./docs/high-level-overview.md) for more in-depth
description of the bridge interaction.
Take a look at [Bridge High Level Documentation](./docs/high-level-overview.md) for more in-depth description of the
bridge interaction.
## Project Layout
Here's an overview of how the project is laid out. The main bits are the `bin`, which is the actual
"blockchain", the `modules` which are used to build the blockchain's logic (a.k.a the runtime) and
the `relays` which are used to pass messages between chains.
Here's an overview of how the project is laid out. The main bits are the `bin`, which is the actual "blockchain", the
`modules` which are used to build the blockchain's logic (a.k.a the runtime) and the `relays` which are used to pass
messages between chains.
```
├── bin // Node and Runtime for the various Substrate chains
@@ -117,16 +114,16 @@ the `relays` which are used to pass messages between chains.
## Running the Bridge
To run the Bridge you need to be able to connect the bridge relay node to the RPC interface of nodes
on each side of the bridge (source and target chain).
To run the Bridge you need to be able to connect the bridge relay node to the RPC interface of nodes on each side of the
bridge (source and target chain).
There are 2 ways to run the bridge, described below:
- building & running from source: with this option, you'll be able to run the bridge between two standalone
chains that are running GRANDPA finality gadget to achieve finality;
- building & running from source: with this option, you'll be able to run the bridge between two standalone chains that
are running GRANDPA finality gadget to achieve finality;
- running a Docker Compose setup: this is a recommended option, where you'll see bridges with parachains,
complex relays and more.
- running a Docker Compose setup: this is a recommended option, where you'll see bridges with parachains, complex relays
and more.
### Using the Source
@@ -141,16 +138,15 @@ cargo build -p substrate-relay
### Running a Dev network
We will launch a dev network to demonstrate how to relay a message between two Substrate based
chains (named Rialto and Millau).
We will launch a dev network to demonstrate how to relay a message between two Substrate based chains (named Rialto and
Millau).
To do this we will need two nodes, two relayers which will relay headers, and two relayers which
will relay messages.
To do this we will need two nodes, two relayers which will relay headers, and two relayers which will relay messages.
#### Running from local scripts
To run a simple dev network you can use the scripts located in the
[`deployments/local-scripts` folder](./deployments/local-scripts).
To run a simple dev network you can use the scripts located in the [`deployments/local-scripts`
folder](./deployments/local-scripts).
First, we must run the two Substrate nodes.
@@ -167,8 +163,8 @@ After the nodes are up we can run the header relayers.
./deployments/local-scripts/relay-rialto-to-millau.sh
```
At this point you should see the relayer submitting headers from the Millau Substrate chain to the
Rialto Substrate chain.
At this point you should see the relayer submitting headers from the Millau Substrate chain to the Rialto Substrate
chain.
```
# Header Relayer Logs
@@ -192,20 +188,23 @@ You will also see the message lane relayers listening for new messages.
[Millau_to_Rialto_MessageLane_00000000] [date] DEBUG bridge Asking Millau::ReceivingConfirmationsDelivery about best message nonces
[...] [date] INFO bridge Synced Some(2) of Some(3) nonces in Millau::MessagesDelivery -> Rialto::MessagesDelivery race
[...] [date] DEBUG bridge Asking Millau::MessagesDelivery about message nonces
[...] [date] DEBUG bridge Received best nonces from Millau::ReceivingConfirmationsDelivery: TargetClientNonces { latest_nonce: 0, nonces_data: () }
[...] [date] DEBUG bridge Received best nonces from Millau::ReceivingConfirmationsDelivery: TargetClientNonces {
latest_nonce: 0, nonces_data: () }
[...] [date] DEBUG bridge Asking Millau::ReceivingConfirmationsDelivery about finalized message nonces
[...] [date] DEBUG bridge Received finalized nonces from Millau::ReceivingConfirmationsDelivery: TargetClientNonces { latest_nonce: 0, nonces_data: () }
[...] [date] DEBUG bridge Received finalized nonces from Millau::ReceivingConfirmationsDelivery: TargetClientNonces {
latest_nonce: 0, nonces_data: () }
[...] [date] DEBUG bridge Received nonces from Millau::MessagesDelivery: SourceClientNonces { new_nonces: {}, confirmed_nonce: Some(0) }
[...] [date] DEBUG bridge Asking Millau node about its state
[...] [date] DEBUG bridge Received state from Millau node: ClientState { best_self: HeaderId(1593, 0xacac***), best_finalized_self: HeaderId(1590, 0x0be81d...), best_finalized_peer_at_best_self: HeaderId(0, 0xdcdd89...) }
[...] [date] DEBUG bridge Received state from Millau node: ClientState { best_self: HeaderId(1593, 0xacac***), best_finalized_self:
HeaderId(1590, 0x0be81d...), best_finalized_peer_at_best_self: HeaderId(0, 0xdcdd89...) }
```
To send a message see the ["How to send a message" section](#how-to-send-a-message).
### How to send a message
In this section we'll show you how to quickly send a bridge message. The message is just an encoded XCM
`Trap(43)` message.
In this section we'll show you how to quickly send a bridge message. The message is just an encoded XCM `Trap(43)`
message.
```bash
# In `parity-bridges-common` folder
@@ -222,20 +221,20 @@ TRACE bridge Sent transaction to Millau node: 0x5e68...
And at the Rialto node logs you'll something like this:
```
... runtime::bridge-messages: Received messages: total=1, valid=1. Weight used: Weight(ref_time: 1215065371, proof_size: 48559)/Weight(ref_time: 1215065371, proof_size: 54703).
```
... runtime::bridge-messages: Received messages: total=1, valid=1. Weight used: Weight(ref_time: 1215065371, proof_size:
48559)/Weight(ref_time: 1215065371, proof_size: 54703).
```
It means that the message has been delivered and dispatched. Message may be dispatched with an
error, though - the goal of our test bridge is to ensure that messages are successfully delivered
and all involved components are working.
It means that the message has been delivered and dispatched. Message may be dispatched with an error, though - the goal
of our test bridge is to ensure that messages are successfully delivered and all involved components are working.
## Full Network Docker Compose Setup
For a more sophisticated deployment which includes bidirectional header sync, message passing,
monitoring dashboards, etc. see the [Deployments README](./deployments/README.md).
For a more sophisticated deployment which includes bidirectional header sync, message passing, monitoring dashboards,
etc. see the [Deployments README](./deployments/README.md).
You should note that you can find images for all the bridge components published on
[Docker Hub](https://hub.docker.com/u/paritytech).
You should note that you can find images for all the bridge components published on [Docker
Hub](https://hub.docker.com/u/paritytech).
To run a Rialto node for example, you can use the following command:
@@ -247,13 +246,12 @@ docker run -p 30333:30333 -p 9933:9933 -p 9944:9944 \
## Community
Main hangout for the community is [Element](https://element.io/) (formerly Riot). Element is a chat
server like, for example, Discord. Most discussions around Polkadot and Substrate happen
in various Element "rooms" (channels). So, joining Element might be a good idea, anyway.
Main hangout for the community is [Element](https://element.io/) (formerly Riot). Element is a chat server like, for
example, Discord. Most discussions around Polkadot and Substrate happen in various Element "rooms" (channels). So,
joining Element might be a good idea, anyway.
If you are interested in information exchange and development of Polkadot related bridges please
feel free to join the [Polkadot Bridges](https://app.element.io/#/room/#bridges:web3.foundation)
Element channel.
If you are interested in information exchange and development of Polkadot related bridges please feel free to join the
[Polkadot Bridges](https://app.element.io/#/room/#bridges:web3.foundation) Element channel.
The [Substrate Technical](https://app.element.io/#/room/#substrate-technical:matrix.org) Element
channel is most suited for discussions regarding Substrate itself.
The [Substrate Technical](https://app.element.io/#/room/#substrate-technical:matrix.org) Element channel is most suited
for discussions regarding Substrate itself.
+8 -4
View File
@@ -4,11 +4,15 @@ Thanks for helping make the Parity ecosystem more secure. Security is one of our
## Reporting a vulnerability
If you find something that can be treated as a security vulnerability, please do not use the issue tracker or discuss it in the public forum as it can cause more damage, rather than giving real help to the ecosystem.
If you find something that can be treated as a security vulnerability, please do not use the issue tracker or discuss it
in the public forum as it can cause more damage, rather than giving real help to the ecosystem.
Security vulnerabilities should be reported by the [contact form](https://security-submission.parity.io/).
If you think that your report might be eligible for the Bug Bounty Program, please mark this during the submission. Please check up-to-date [Parity Bug Bounty Program rules](https://www.parity.io/bug-bounty) to find out the information about our Bug Bounty Program.
**Warning**: This is an unified SECURITY.md file for Paritytech GitHub Organization. The presence of this file does not mean that this repository is covered by the Bug Bounty program. Please always check the Bug Bounty Program scope for information.
If you think that your report might be eligible for the Bug Bounty Program, please mark this during the submission.
Please check up-to-date [Parity Bug Bounty Program rules](https://www.parity.io/bug-bounty) to find out the information
about our Bug Bounty Program.
**Warning**: This is an unified SECURITY.md file for Paritytech GitHub Organization. The presence of this file does not
mean that this repository is covered by the Bug Bounty program. Please always check the Bug Bounty Program scope for
information.
+106 -103
View File
@@ -1,83 +1,85 @@
# High-Level Bridge Documentation
This document gives a brief, abstract description of main components that may be found in this repository.
If you want to see how we're using them to build Rococo <> Wococo (Kusama <> Polkadot) bridge, please
refer to the [Polkadot <> Kusama Bridge](./polkadot-kusama-bridge-overview.md).
This document gives a brief, abstract description of main components that may be found in this repository. If you want
to see how we're using them to build Rococo <> Wococo (Kusama <> Polkadot) bridge, please refer to the [Polkadot <>
Kusama Bridge](./polkadot-kusama-bridge-overview.md).
## Purpose
This repo contains all components required to build a trustless connection between standalone Substrate chains,
that are using GRANDPA finality, their parachains or any combination of those. On top of this connection, we
offer a messaging pallet that provides means to organize messages exchange.
This repo contains all components required to build a trustless connection between standalone Substrate chains, that are
using GRANDPA finality, their parachains or any combination of those. On top of this connection, we offer a messaging
pallet that provides means to organize messages exchange.
On top of that layered infrastructure, anyone may build their own bridge applications - e.g. [XCM messaging](./polkadot-kusama-bridge-overview.md),
[encoded calls messaging](https://github.com/paritytech/parity-bridges-common/releases/tag/encoded-calls-messaging) and so on.
On top of that layered infrastructure, anyone may build their own bridge applications - e.g. [XCM
messaging](./polkadot-kusama-bridge-overview.md), [encoded calls
messaging](https://github.com/paritytech/parity-bridges-common/releases/tag/encoded-calls-messaging) and so on.
## Terminology
Even though we support (and require) two-way bridging, the documentation will generally talk about
a one-sided interaction. That's to say, we will only talk about syncing finality proofs and messages
from a _source_ chain to a _target_ chain. This is because the two-sided interaction is really just the
one-sided interaction with the source and target chains switched.
Even though we support (and require) two-way bridging, the documentation will generally talk about a one-sided
interaction. That's to say, we will only talk about syncing finality proofs and messages from a _source_ chain to a
_target_ chain. This is because the two-sided interaction is really just the one-sided interaction with the source and
target chains switched.
The bridge has both on-chain (pallets) and offchain (relayers) components.
## On-chain components
On-chain bridge components are pallets that are deployed at the chain runtime. Finality pallets require
deployment at the target chain, while messages pallet needs to be deployed at both, source
and target chains.
On-chain bridge components are pallets that are deployed at the chain runtime. Finality pallets require deployment at
the target chain, while messages pallet needs to be deployed at both, source and target chains.
### Bridge GRANDPA Finality Pallet
A GRANDPA light client of the source chain built into the target chain's runtime. It provides a "source of truth"
about the source chain headers which have been finalized. This is useful for higher level applications.
A GRANDPA light client of the source chain built into the target chain's runtime. It provides a "source of truth" about
the source chain headers which have been finalized. This is useful for higher level applications.
The pallet tracks current GRANDPA authorities set and only accepts finality proofs (GRANDPA justifications),
generated by the current authorities set. The GRANDPA protocol itself requires current authorities set to
generate explicit justification for the header that enacts next authorities set. Such headers and their finality
proofs are called mandatory in the pallet and relayer pays no fee for such headers submission.
The pallet tracks current GRANDPA authorities set and only accepts finality proofs (GRANDPA justifications), generated
by the current authorities set. The GRANDPA protocol itself requires current authorities set to generate explicit
justification for the header that enacts next authorities set. Such headers and their finality proofs are called
mandatory in the pallet and relayer pays no fee for such headers submission.
The pallet does not require all headers to be imported or provided. The relayer itself chooses which headers
he wants to submit (with the exception of mandatory headers).
The pallet does not require all headers to be imported or provided. The relayer itself chooses which headers he wants to
submit (with the exception of mandatory headers).
More: [pallet level documentation and code](../modules/grandpa/).
### Bridge Parachains Finality Pallet
Parachains are not supposed to have their own finality, so we can't use bridge GRANDPA pallet to verify their
finality proofs. Instead, they rely on their relay chain finality. The parachain header is considered final,
when it is accepted by the [`paras` pallet](https://github.com/paritytech/polkadot/tree/1a034bd6de0e76721d19aed02a538bcef0787260/runtime/parachains/src/paras)
at its relay chain. Obviously, the relay chain block, where it is accepted, must also be finalized by the relay
chain GRANDPA gadget.
Parachains are not supposed to have their own finality, so we can't use bridge GRANDPA pallet to verify their finality
proofs. Instead, they rely on their relay chain finality. The parachain header is considered final, when it is accepted
by the [`paras`
pallet](https://github.com/paritytech/polkadot/tree/1a034bd6de0e76721d19aed02a538bcef0787260/runtime/parachains/src/paras)
at its relay chain. Obviously, the relay chain block, where it is accepted, must also be finalized by the relay chain
GRANDPA gadget.
That said, the bridge parachains pallet accepts storage proof of one or several parachain heads, inserted to the
[`Heads`](https://github.com/paritytech/polkadot/blob/1a034bd6de0e76721d19aed02a538bcef0787260/runtime/parachains/src/paras/mod.rs#L642)
map of the [`paras` pallet](https://github.com/paritytech/polkadot/tree/1a034bd6de0e76721d19aed02a538bcef0787260/runtime/parachains/src/paras).
map of the [`paras`
pallet](https://github.com/paritytech/polkadot/tree/1a034bd6de0e76721d19aed02a538bcef0787260/runtime/parachains/src/paras).
To verify this storage proof, the pallet uses relay chain header, imported earlier by the bridge GRANDPA pallet.
The pallet may track multiple parachains at once and those parachains may use different primitives. So the
parachain header decoding never happens at the pallet level. For maintaining the headers order, the pallet
uses relay chain header number.
The pallet may track multiple parachains at once and those parachains may use different primitives. So the parachain
header decoding never happens at the pallet level. For maintaining the headers order, the pallet uses relay chain header
number.
More: [pallet level documentation and code](../modules/parachains/).
### Bridge Messages Pallet
The pallet is responsible for queuing messages at the source chain and receiving the messages proofs at the
target chain. The messages are sent to the particular _lane_, where they are guaranteed to be received in the
same order they are sent. The pallet supports many lanes.
The pallet is responsible for queuing messages at the source chain and receiving the messages proofs at the target
chain. The messages are sent to the particular _lane_, where they are guaranteed to be received in the same order they
are sent. The pallet supports many lanes.
The lane has two ends. Outbound lane end is storing number of messages that have been sent and the number of
messages that have been received. Inbound lane end stores the number of messages that have been received and
also a map that maps messages to relayers that have delivered those messages to the target chain.
The lane has two ends. Outbound lane end is storing number of messages that have been sent and the number of messages
that have been received. Inbound lane end stores the number of messages that have been received and also a map that maps
messages to relayers that have delivered those messages to the target chain.
The pallet has three main entrypoints:
- the `send_message` may be used by the other runtime pallets to send the messages;
- the `receive_messages_proof` is responsible for parsing the messages proof and handing messages over to the
dispatch code;
- the `receive_messages_delivery_proof` is responsible for parsing the messages delivery proof and rewarding
relayers that have delivered the message.
- the `receive_messages_proof` is responsible for parsing the messages proof and handing messages over to the dispatch
code;
- the `receive_messages_delivery_proof` is responsible for parsing the messages delivery proof and rewarding relayers
that have delivered the message.
Many things are abstracted by the pallet:
- the message itself may mean anything, the pallet doesn't care about its content;
@@ -85,97 +87,98 @@ Many things are abstracted by the pallet:
- the messages proof and messages delivery proof are verified outside of the pallet;
- the relayers incentivization scheme is defined outside of the pallet.
Outside of the messaging pallet, we have a set of adapters, where messages and delivery proofs are regular
storage proofs. The proofs are generated at the bridged chain and require bridged chain finality. So messages
pallet, in this case, depends on one of the finality pallets. The messages are XCM messages and we are using
XCM executor to dispatch them on receival. You may find more info in [Polkadot <> Kusama Bridge](./polkadot-kusama-bridge-overview.md)
document.
Outside of the messaging pallet, we have a set of adapters, where messages and delivery proofs are regular storage
proofs. The proofs are generated at the bridged chain and require bridged chain finality. So messages pallet, in this
case, depends on one of the finality pallets. The messages are XCM messages and we are using XCM executor to dispatch
them on receival. You may find more info in [Polkadot <> Kusama Bridge](./polkadot-kusama-bridge-overview.md) document.
More: [pallet level documentation and code](../modules/messages/).
### Bridge Relayers Pallet
The pallet is quite simple. It just registers relayer rewards and has an entrypoint to collect them. When
the rewards are registered and the reward amount is configured outside of the pallet.
The pallet is quite simple. It just registers relayer rewards and has an entrypoint to collect them. When the rewards
are registered and the reward amount is configured outside of the pallet.
More: [pallet level documentation and code](../modules/relayers/).
## Offchain Components
Offchain bridge components are separate processes, called relayers. Relayers are connected both to the
source chain and target chain nodes. Relayers are reading state of the source chain, compare it to the
state of the target chain and, if state at target chain needs to be updated, submits target chain
transaction.
Offchain bridge components are separate processes, called relayers. Relayers are connected both to the source chain and
target chain nodes. Relayers are reading state of the source chain, compare it to the state of the target chain and, if
state at target chain needs to be updated, submits target chain transaction.
### GRANDPA Finality Relay
The task of relay is to submit source chain GRANDPA justifications and their corresponding headers to
the Bridge GRANDPA Finality Pallet, deployed at the target chain. For that, the relay subscribes to
the source chain GRANDPA justifications stream and submits every new justification it sees to the
target chain GRANDPA light client. In addition, relay is searching for mandatory headers and
submits their justifications - without that the pallet will be unable to move forward.
The task of relay is to submit source chain GRANDPA justifications and their corresponding headers to the Bridge GRANDPA
Finality Pallet, deployed at the target chain. For that, the relay subscribes to the source chain GRANDPA justifications
stream and submits every new justification it sees to the target chain GRANDPA light client. In addition, relay is
searching for mandatory headers and submits their justifications - without that the pallet will be unable to move
forward.
More: [GRANDPA Finality Relay Sequence Diagram](./grandpa-finality-relay.html), [pallet level documentation and code](../relays/finality/).
More: [GRANDPA Finality Relay Sequence Diagram](./grandpa-finality-relay.html), [pallet level documentation and
code](../relays/finality/).
### Parachains Finality Relay
The relay connects to the source _relay_ chain and the target chain nodes. It doesn't need to connect to the
tracked parachain nodes. The relay looks at the [`Heads`](https://github.com/paritytech/polkadot/blob/1a034bd6de0e76721d19aed02a538bcef0787260/runtime/parachains/src/paras/mod.rs#L642)
map of the [`paras` pallet](https://github.com/paritytech/polkadot/tree/1a034bd6de0e76721d19aed02a538bcef0787260/runtime/parachains/src/paras)
in source chain, and compares the value with the best parachain head, stored in the bridge parachains pallet at
the target chain. If new parachain head appears at the relay chain block `B`, the relay process **waits**
until header `B` or one of its ancestors appears at the target chain. Once it is available, the storage
proof of the map entry is generated and is submitted to the target chain.
The relay connects to the source _relay_ chain and the target chain nodes. It doesn't need to connect to the tracked
parachain nodes. The relay looks at the
[`Heads`](https://github.com/paritytech/polkadot/blob/1a034bd6de0e76721d19aed02a538bcef0787260/runtime/parachains/src/paras/mod.rs#L642)
map of the [`paras`
pallet](https://github.com/paritytech/polkadot/tree/1a034bd6de0e76721d19aed02a538bcef0787260/runtime/parachains/src/paras)
in source chain, and compares the value with the best parachain head, stored in the bridge parachains pallet at the
target chain. If new parachain head appears at the relay chain block `B`, the relay process **waits** until header `B`
or one of its ancestors appears at the target chain. Once it is available, the storage proof of the map entry is
generated and is submitted to the target chain.
As its on-chain component (which requires bridge GRANDPA pallet to be deployed nearby), the parachains
finality relay requires GRANDPA finality relay to be running in parallel. Without it, the header `B` or
any of its children's finality at source won't be relayed at target, and target chain
won't be able to verify generated storage proof.
As its on-chain component (which requires bridge GRANDPA pallet to be deployed nearby), the parachains finality relay
requires GRANDPA finality relay to be running in parallel. Without it, the header `B` or any of its children's finality
at source won't be relayed at target, and target chain won't be able to verify generated storage proof.
More: [Parachains Finality Relay Sequence Diagram](./parachains-finality-relay.html), [code](../relays/parachains/).
### Messages Relay
Messages relay is actually two relays that are running in a single process: messages delivery relay and
delivery confirmation relay. Even though they are more complex and have many caveats, the overall algorithm
is the same as in other relays.
Messages relay is actually two relays that are running in a single process: messages delivery relay and delivery
confirmation relay. Even though they are more complex and have many caveats, the overall algorithm is the same as in
other relays.
Message delivery relay connects to the source chain and looks at the outbound lane end, waiting until new
messages are queued there. Once they appear at the source block `B`, the relay start waiting for the block
`B` or its descendant appear at the target chain. Then the messages storage proof is generated and submitted
to the bridge messages pallet at the target chain. In addition, the transaction may include the storage proof
of the outbound lane state - that proves that relayer rewards have been paid and this data (map of relay
accounts to the delivered messages) may be pruned from the inbound lane state at the target chain.
Message delivery relay connects to the source chain and looks at the outbound lane end, waiting until new messages are
queued there. Once they appear at the source block `B`, the relay start waiting for the block `B` or its descendant
appear at the target chain. Then the messages storage proof is generated and submitted to the bridge messages pallet at
the target chain. In addition, the transaction may include the storage proof of the outbound lane state - that proves
that relayer rewards have been paid and this data (map of relay accounts to the delivered messages) may be pruned from
the inbound lane state at the target chain.
Delivery confirmation relay connects to the target chain and starts watching the inbound lane end. When new
messages are delivered to the target chain, the corresponding _source chain account_ is inserted to the
map in the inbound lane data. Relay detects that, say, at the target chain block `B` and waits until that
block or its descendant appears at the source chain. Once that happens, the relay crafts a storage proof of
that data and sends it to the messages pallet, deployed at the source chain.
Delivery confirmation relay connects to the target chain and starts watching the inbound lane end. When new messages are
delivered to the target chain, the corresponding _source chain account_ is inserted to the map in the inbound lane data.
Relay detects that, say, at the target chain block `B` and waits until that block or its descendant appears at the
source chain. Once that happens, the relay crafts a storage proof of that data and sends it to the messages pallet,
deployed at the source chain.
As you can see, the messages relay also requires finality relay to be operating in parallel. Since messages
relay submits transactions to both source and target chains, it requires both _source-to-target_ and
_target-to-source_ finality relays. They can be GRANDPA finality relays or GRANDPA+parachains finality relays,
depending on the type of connected chain.
As you can see, the messages relay also requires finality relay to be operating in parallel. Since messages relay
submits transactions to both source and target chains, it requires both _source-to-target_ and _target-to-source_
finality relays. They can be GRANDPA finality relays or GRANDPA+parachains finality relays, depending on the type of
connected chain.
More: [Messages Relay Sequence Diagram](./messages-relay.html), [pallet level documentation and code](../relays/messages/).
More: [Messages Relay Sequence Diagram](./messages-relay.html), [pallet level documentation and
code](../relays/messages/).
### Complex Relay
Every relay transaction has its cost. The only transaction, that is "free" to relayer is when the mandatory
GRANDPA header is submitted. The relay that feeds the bridge with every relay chain and/or parachain head it
sees, will have to pay a (quite large) cost. And if no messages are sent through the bridge, that is just
waste of money.
Every relay transaction has its cost. The only transaction, that is "free" to relayer is when the mandatory GRANDPA
header is submitted. The relay that feeds the bridge with every relay chain and/or parachain head it sees, will have to
pay a (quite large) cost. And if no messages are sent through the bridge, that is just waste of money.
We have a special relay mode, called _complex relay_, where relay mostly sleeps and only submits transactions
that are required for the messages/confirmations delivery. This mode starts two message relays (in both
directions). All required finality relays are also started in a special _on-demand_ mode. In this mode they
do not submit any headers without special request. As always, the only exception is when GRANDPA finality
relay sees the mandatory header - it is submitted without such request.
We have a special relay mode, called _complex relay_, where relay mostly sleeps and only submits transactions that are
required for the messages/confirmations delivery. This mode starts two message relays (in both directions). All required
finality relays are also started in a special _on-demand_ mode. In this mode they do not submit any headers without
special request. As always, the only exception is when GRANDPA finality relay sees the mandatory header - it is
submitted without such request.
The message relays are watching their lanes and when, at some block `B`, they see new messages/confirmations
to be delivered, they are asking on-demand relays to relay this block `B`. On-demand relays does that and
then message relay may perform its job. If on-demand relay is a parachain finality relay, it also runs its
own on-demand GRANDPA relay, which is used to relay required relay chain headers.
The message relays are watching their lanes and when, at some block `B`, they see new messages/confirmations to be
delivered, they are asking on-demand relays to relay this block `B`. On-demand relays does that and then message relay
may perform its job. If on-demand relay is a parachain finality relay, it also runs its own on-demand GRANDPA relay,
which is used to relay required relay chain headers.
More: [Complex Relay Sequence Diagram](./complex-relay.html), [code](../relays/bin-substrate/src/cli/relay_headers_and_messages/).
More: [Complex Relay Sequence Diagram](./complex-relay.html),
[code](../relays/bin-substrate/src/cli/relay_headers_and_messages/).
@@ -1,35 +1,35 @@
# Polkadot <> Kusama Bridge Overview
This document describes how we use all components, described in the [High-Level Bridge Documentation](./high-level-overview.md),
to build the XCM bridge between Kusama and Polkadot. In this case, our components merely work as a XCM transport
(like XCMP/UMP/HRMP), between chains that are not a part of the same consensus system.
This document describes how we use all components, described in the [High-Level Bridge
Documentation](./high-level-overview.md), to build the XCM bridge between Kusama and Polkadot. In this case, our
components merely work as a XCM transport (like XCMP/UMP/HRMP), between chains that are not a part of the same consensus
system.
The overall architecture may be seen in [this diagram](./polkadot-kusama-bridge.html).
## Bridge Hubs
All operations at relay chain are expensive. Ideally all non-mandatory transactions must happen on parachains.
That's why we are planning to have two parachains - Polkadot Bridge Hub under Polkadot consensus and Kusama
Bridge Hub under Kusama consensus.
All operations at relay chain are expensive. Ideally all non-mandatory transactions must happen on parachains. That's
why we are planning to have two parachains - Polkadot Bridge Hub under Polkadot consensus and Kusama Bridge Hub under
Kusama consensus.
The Bridge Hub will have all required bridge pallets in its runtime. We hope that later, other teams will be able to
use our bridge hubs too and have their pallets there.
The Bridge Hub will have all required bridge pallets in its runtime. We hope that later, other teams will be able to use
our bridge hubs too and have their pallets there.
The Bridge Hub will use the base token of the ecosystem - KSM at Kusama Bridge Hub and DOT at Polkadot Bridge Hub.
The runtime will have minimal set of non-bridge pallets, so there's not much you can do directly on bridge hubs.
The Bridge Hub will use the base token of the ecosystem - KSM at Kusama Bridge Hub and DOT at Polkadot Bridge Hub. The
runtime will have minimal set of non-bridge pallets, so there's not much you can do directly on bridge hubs.
## Connecting Parachains
You won't be able to directly use bridge hub transactions to send XCM messages over the bridge. Instead, you'll need
to use other parachains transactions, which will use HRMP to deliver messages to the Bridge Hub. The Bridge Hub will
just queue these messages in its outbound lane, which is dedicated to deliver messages between two parachains.
You won't be able to directly use bridge hub transactions to send XCM messages over the bridge. Instead, you'll need to
use other parachains transactions, which will use HRMP to deliver messages to the Bridge Hub. The Bridge Hub will just
queue these messages in its outbound lane, which is dedicated to deliver messages between two parachains.
Our first planned bridge will connect the Polkadot and Kusama Asset Hubs. A bridge between those two
parachains would allow Asset Hub Polkadot accounts to hold wrapped KSM tokens and Asset Hub Kusama
accounts to hold wrapped DOT tokens.
Our first planned bridge will connect the Polkadot and Kusama Asset Hubs. A bridge between those two parachains would
allow Asset Hub Polkadot accounts to hold wrapped KSM tokens and Asset Hub Kusama accounts to hold wrapped DOT tokens.
For that bridge (pair of parachains under different consensus systems) we'll be using the lane 00000000. Later,
when other parachains will join the bridge, they will be using other lanes for their messages.
For that bridge (pair of parachains under different consensus systems) we'll be using the lane 00000000. Later, when
other parachains will join the bridge, they will be using other lanes for their messages.
## Running Relayers
@@ -38,9 +38,9 @@ justifications to the bridge hubs at the other side. It'll also relay finalized
Hub heads. This will only happen when messages will be queued at hubs. So most of time relayer will be idle.
There's no any active relayer sets, or something like that. Anyone may start its own relayer and relay queued messages.
We are not against that and, as always, appreciate any community efforts. Of course, running relayer has the cost.
Apart from paying for the CPU and network, the relayer pays for transactions at both sides of the bridge. We have
a mechanism for rewarding relayers.
We are not against that and, as always, appreciate any community efforts. Of course, running relayer has the cost. Apart
from paying for the CPU and network, the relayer pays for transactions at both sides of the bridge. We have a mechanism
for rewarding relayers.
### Compensating the Cost of Message Delivery Transactions
@@ -56,51 +56,49 @@ is the relayer, which is following our rules:
- we compensate the cost of message delivery transactions that have actually delivered the messages. So if your
transaction has claimed to deliver messages `[42, 43, 44]`, but, because of some reasons, has actually delivered
messages `[42, 43]`, the transaction will be free for relayer. If it has not delivered any messages, then
the relayer pays the full cost of the transaction;
messages `[42, 43]`, the transaction will be free for relayer. If it has not delivered any messages, then the relayer
pays the full cost of the transaction;
- we compensate the cost of message delivery and all required finality calls, if they are part of the same
[`frame_utility::batch_all`](https://github.com/paritytech/substrate/blob/891d6a5c870ab88521183facafc811a203bb6541/frame/utility/src/lib.rs#L326)
transaction. Of course, the calls inside the batch must be linked - e.g. the submitted parachain head must be used
to prove messages. Relay header must be used to prove parachain head finality. If one of calls fails, or if they
are not linked together, the relayer pays the full transaction cost.
transaction. Of course, the calls inside the batch must be linked - e.g. the submitted parachain head must be used to
prove messages. Relay header must be used to prove parachain head finality. If one of calls fails, or if they are not
linked together, the relayer pays the full transaction cost.
Please keep in mind that the fee of "zero-cost" transactions is still withdrawn from the relayer account. But the
compensation is registered in the `pallet_bridge_relayers::RelayerRewards` map at the target bridge hub. The relayer
may later claim all its rewards later, using the `pallet_bridge_relayers::claim_rewards` call.
compensation is registered in the `pallet_bridge_relayers::RelayerRewards` map at the target bridge hub. The relayer may
later claim all its rewards later, using the `pallet_bridge_relayers::claim_rewards` call.
*A side note*: why we don't simply set the cost of useful transactions to zero? That's because the bridge has its cost.
If we won't take any fees, it would mean that the sender is not obliged to pay for its messages. And Bridge Hub
collators (and, maybe, "treasury") are not receiving any payment for including transactions. More about this later,
in the [Who is Rewarding Relayers](#who-is-rewarding-relayers) section.
collators (and, maybe, "treasury") are not receiving any payment for including transactions. More about this later, in
the [Who is Rewarding Relayers](#who-is-rewarding-relayers) section.
### Message Delivery Confirmation Rewards
In addition to the "zero-cost" message delivery transactions, the relayer is also rewarded for:
- delivering every message. The reward is registered during delivery confirmation transaction at the Source Bridge
Hub.;
- delivering every message. The reward is registered during delivery confirmation transaction at the Source Bridge Hub.;
- submitting delivery confirmation transaction. The relayer may submit delivery confirmation that e.g. confirms
delivery of four messages, of which the only one (or zero) messages is actually delivered by this relayer. It
receives some fee for confirming messages, delivered by other relayers.
- submitting delivery confirmation transaction. The relayer may submit delivery confirmation that e.g. confirms delivery
of four messages, of which the only one (or zero) messages is actually delivered by this relayer. It receives some fee
for confirming messages, delivered by other relayers.
Both rewards may be claimed using the `pallet_bridge_relayers::claim_rewards` call at the Source Bridge Hub.
### Who is Rewarding Relayers
Obviously, there should be someone who is paying relayer rewards. We want bridge transactions to have a cost, so we
can't use fees for rewards. Instead, the parachains using the bridge, use sovereign accounts on both sides
of the bridge to cover relayer rewards.
can't use fees for rewards. Instead, the parachains using the bridge, use sovereign accounts on both sides of the bridge
to cover relayer rewards.
Bridged Parachains will have sovereign accounts at bridge hubs. For example, the Kusama Asset Hub will
have an account at the Polkadot Bridge Hub. The Polkadot Asset Hub will have an account at the Kusama
Bridge Hub. The sovereign accounts are used as a source of funds when the relayer is calling the
`pallet_bridge_relayers::claim_rewards`.
Bridged Parachains will have sovereign accounts at bridge hubs. For example, the Kusama Asset Hub will have an account
at the Polkadot Bridge Hub. The Polkadot Asset Hub will have an account at the Kusama Bridge Hub. The sovereign accounts
are used as a source of funds when the relayer is calling the `pallet_bridge_relayers::claim_rewards`.
Since messages lane is only used by the pair of parachains, there's no collision between different bridges. E.g.
Kusama Asset Hub will only reward relayers that are delivering messages from Kusama Asset Hub. The Kusama Asset Hub sovereign account
is not used to cover rewards of bridging with some other Polkadot Parachain.
Since messages lane is only used by the pair of parachains, there's no collision between different bridges. E.g. Kusama
Asset Hub will only reward relayers that are delivering messages from Kusama Asset Hub. The Kusama Asset Hub sovereign
account is not used to cover rewards of bridging with some other Polkadot Parachain.
### Multiple Relayers and Rewards
@@ -108,25 +106,24 @@ Our goal is to incentivize running honest relayers. But we have no relayers sets
message delivery transaction, hoping that the cost of this transaction will be compensated. So what if some message is
currently queued and two relayers are submitting two identical message delivery transactions at once? Without any
special means, the cost of first included transaction will be compensated and the cost of the other one won't. A honest,
but unlucky relayer will lose some money. In addition, we'll waste some portion of block size and weight, which
may be used by other useful transactions.
but unlucky relayer will lose some money. In addition, we'll waste some portion of block size and weight, which may be
used by other useful transactions.
To solve the problem, we have two signed extensions ([generate_bridge_reject_obsolete_headers_and_messages! {}](../bin/runtime-common/src/lib.rs)
and [RefundRelayerForMessagesFromParachain](../bin/runtime-common/src/refund_relayer_extension.rs)), that are
preventing bridge transactions with obsolete data from including into the block. We are rejecting following
transactions:
To solve the problem, we have two signed extensions ([generate_bridge_reject_obsolete_headers_and_messages!
{}](../bin/runtime-common/src/lib.rs) and
[RefundRelayerForMessagesFromParachain](../bin/runtime-common/src/refund_relayer_extension.rs)), that are preventing
bridge transactions with obsolete data from including into the block. We are rejecting following transactions:
- transactions, that are submitting the GRANDPA justification for the best finalized header, or one of its ancestors;
- transactions, that are submitting the proof of the current best parachain head, or one of its ancestors;
- transactions, that are delivering already delivered messages. If at least one of messages is not yet delivered,
the transaction is not rejected;
- transactions, that are delivering already delivered messages. If at least one of messages is not yet delivered, the
transaction is not rejected;
- transactions, that are confirming delivery of already confirmed messages. If at least one of confirmations is new,
the transaction is not rejected;
- transactions, that are confirming delivery of already confirmed messages. If at least one of confirmations is new, the
transaction is not rejected;
- [`frame_utility::batch_all`](https://github.com/paritytech/substrate/blob/891d6a5c870ab88521183facafc811a203bb6541/frame/utility/src/lib.rs#L326)
transactions, that have both finality and message delivery calls. All restrictions from the
[Compensating the Cost of Message Delivery Transactions](#compensating-the-cost-of-message-delivery-transactions)
are applied.
transactions, that have both finality and message delivery calls. All restrictions from the [Compensating the Cost of
Message Delivery Transactions](#compensating-the-cost-of-message-delivery-transactions) are applied.
+145 -172
View File
@@ -1,8 +1,7 @@
# Bridge Messages Pallet
The messages pallet is used to deliver messages from source chain to target chain. Message is
(almost) opaque to the module and the final goal is to hand message to the message dispatch
mechanism.
The messages pallet is used to deliver messages from source chain to target chain. Message is (almost) opaque to the
module and the final goal is to hand message to the message dispatch mechanism.
## Contents
@@ -14,229 +13,203 @@ mechanism.
## Overview
Message lane is an unidirectional channel, where messages are sent from source chain to the target
chain. At the same time, a single instance of messages module supports both outbound lanes and
inbound lanes. So the chain where the module is deployed (this chain), may act as a source chain for
outbound messages (heading to a bridged chain) and as a target chain for inbound messages (coming
from a bridged chain).
Message lane is an unidirectional channel, where messages are sent from source chain to the target chain. At the same
time, a single instance of messages module supports both outbound lanes and inbound lanes. So the chain where the module
is deployed (this chain), may act as a source chain for outbound messages (heading to a bridged chain) and as a target
chain for inbound messages (coming from a bridged chain).
Messages module supports multiple message lanes. Every message lane is identified with a 4-byte
identifier. Messages sent through the lane are assigned unique (for this lane) increasing integer
value that is known as nonce ("number that can only be used once"). Messages that are sent over the
same lane are guaranteed to be delivered to the target chain in the same order they're sent from
the source chain. In other words, message with nonce `N` will be delivered right before delivering a
message with nonce `N+1`.
Messages module supports multiple message lanes. Every message lane is identified with a 4-byte identifier. Messages
sent through the lane are assigned unique (for this lane) increasing integer value that is known as nonce ("number that
can only be used once"). Messages that are sent over the same lane are guaranteed to be delivered to the target chain in
the same order they're sent from the source chain. In other words, message with nonce `N` will be delivered right before
delivering a message with nonce `N+1`.
Single message lane may be seen as a transport channel for single application (onchain, offchain or
mixed). At the same time the module itself never dictates any lane or message rules. In the end, it
is the runtime developer who defines what message lane and message mean for this runtime.
Single message lane may be seen as a transport channel for single application (onchain, offchain or mixed). At the same
time the module itself never dictates any lane or message rules. In the end, it is the runtime developer who defines
what message lane and message mean for this runtime.
In our [Kusama<>Polkadot bridge](../../docs/polkadot-kusama-bridge-overview.md) we are using lane
as a channel of communication between two parachains of different relay chains. For example, lane
`[0, 0, 0, 0]` is used for Polkadot <> Kusama Asset Hub communications. Other lanes may be used to
bridge other parachains.
In our [Kusama<>Polkadot bridge](../../docs/polkadot-kusama-bridge-overview.md) we are using lane as a channel of
communication between two parachains of different relay chains. For example, lane `[0, 0, 0, 0]` is used for Polkadot <>
Kusama Asset Hub communications. Other lanes may be used to bridge other parachains.
## Message Workflow
The pallet is not intended to be used by end users and provides no public calls to send the message.
Instead, it provides runtime-internal method that allows other pallets (or other runtime code) to queue
outbound messages.
The pallet is not intended to be used by end users and provides no public calls to send the message. Instead, it
provides runtime-internal method that allows other pallets (or other runtime code) to queue outbound messages.
The message "appears" when some runtime code calls the `send_message()` method of the pallet.
The submitter specifies the lane that they're willing to use and the message itself. If some fee must
be paid for sending the message, it must be paid outside of the pallet. If a message passes all checks
(that include, for example, message size check, disabled lane check, ...), the nonce is assigned and
the message is stored in the module storage. The message is in an "undelivered" state now.
The message "appears" when some runtime code calls the `send_message()` method of the pallet. The submitter specifies
the lane that they're willing to use and the message itself. If some fee must be paid for sending the message, it must
be paid outside of the pallet. If a message passes all checks (that include, for example, message size check, disabled
lane check, ...), the nonce is assigned and the message is stored in the module storage. The message is in an
"undelivered" state now.
We assume that there are external, offchain actors, called relayers, that are submitting module
related transactions to both target and source chains. The pallet itself has no assumptions about
relayers incentivization scheme, but it has some callbacks for paying rewards. See
[Integrating Messages Module into runtime](#Integrating-Messages-Module-into-runtime)
for details.
We assume that there are external, offchain actors, called relayers, that are submitting module related transactions to
both target and source chains. The pallet itself has no assumptions about relayers incentivization scheme, but it has
some callbacks for paying rewards. See [Integrating Messages Module into
runtime](#Integrating-Messages-Module-into-runtime) for details.
Eventually, some relayer would notice this message in the "undelivered" state and it would decide to
deliver this message. Relayer then crafts `receive_messages_proof()` transaction (aka delivery
transaction) for the messages module instance, deployed at the target chain. Relayer provides
its account id at the source chain, the proof of message (or several messages), the number of
messages in the transaction and their cumulative dispatch weight. Once a transaction is mined, the
message is considered "delivered".
Eventually, some relayer would notice this message in the "undelivered" state and it would decide to deliver this
message. Relayer then crafts `receive_messages_proof()` transaction (aka delivery transaction) for the messages module
instance, deployed at the target chain. Relayer provides its account id at the source chain, the proof of message (or
several messages), the number of messages in the transaction and their cumulative dispatch weight. Once a transaction is
mined, the message is considered "delivered".
Once a message is delivered, the relayer may want to confirm delivery back to the source chain.
There are two reasons why it would want to do that. The first is that we intentionally limit number
of "delivered", but not yet "confirmed" messages at inbound lanes
(see [What about other Constants in the Messages Module Configuration Trait](#What-about-other-Constants-in-the-Messages-Module-Configuration-Trait) for explanation).
So at some point, the target chain may stop accepting new messages until relayers confirm some of
these. The second is that if the relayer wants to be rewarded for delivery, it must prove the fact
that it has actually delivered the message. And this proof may only be generated after the delivery
transaction is mined. So relayer crafts the `receive_messages_delivery_proof()` transaction (aka
confirmation transaction) for the messages module instance, deployed at the source chain. Once
this transaction is mined, the message is considered "confirmed".
Once a message is delivered, the relayer may want to confirm delivery back to the source chain. There are two reasons
why it would want to do that. The first is that we intentionally limit number of "delivered", but not yet "confirmed"
messages at inbound lanes (see [What about other Constants in the Messages Module Configuration
Trait](#What-about-other-Constants-in-the-Messages-Module-Configuration-Trait) for explanation). So at some point, the
target chain may stop accepting new messages until relayers confirm some of these. The second is that if the relayer
wants to be rewarded for delivery, it must prove the fact that it has actually delivered the message. And this proof may
only be generated after the delivery transaction is mined. So relayer crafts the `receive_messages_delivery_proof()`
transaction (aka confirmation transaction) for the messages module instance, deployed at the source chain. Once this
transaction is mined, the message is considered "confirmed".
The "confirmed" state is the final state of the message. But there's one last thing related to the
message - the fact that it is now "confirmed" and reward has been paid to the relayer (or at least
callback for this has been called), must be confirmed to the target chain. Otherwise, we may reach
the limit of "unconfirmed" messages at the target chain and it will stop accepting new messages. So
relayer sometimes includes a nonce of the latest "confirmed" message in the next
The "confirmed" state is the final state of the message. But there's one last thing related to the message - the fact
that it is now "confirmed" and reward has been paid to the relayer (or at least callback for this has been called), must
be confirmed to the target chain. Otherwise, we may reach the limit of "unconfirmed" messages at the target chain and it
will stop accepting new messages. So relayer sometimes includes a nonce of the latest "confirmed" message in the next
`receive_messages_proof()` transaction, proving that some messages have been confirmed.
## Integrating Messages Module into Runtime
As it has been said above, the messages module supports both outbound and inbound message lanes.
So if we will integrate a module in some runtime, it may act as the source chain runtime for
outbound messages and as the target chain runtime for inbound messages. In this section, we'll
sometimes refer to the chain we're currently integrating with, as "this chain" and the other
chain as "bridged chain".
As it has been said above, the messages module supports both outbound and inbound message lanes. So if we will integrate
a module in some runtime, it may act as the source chain runtime for outbound messages and as the target chain runtime
for inbound messages. In this section, we'll sometimes refer to the chain we're currently integrating with, as "this
chain" and the other chain as "bridged chain".
Messages module doesn't simply accept transactions that are claiming that the bridged chain has
some updated data for us. Instead of this, the module assumes that the bridged chain is able to
prove that updated data in some way. The proof is abstracted from the module and may be of any kind.
In our Substrate-to-Substrate bridge we're using runtime storage proofs. Other bridges may use
transaction proofs, Substrate header digests or anything else that may be proved.
Messages module doesn't simply accept transactions that are claiming that the bridged chain has some updated data for
us. Instead of this, the module assumes that the bridged chain is able to prove that updated data in some way. The proof
is abstracted from the module and may be of any kind. In our Substrate-to-Substrate bridge we're using runtime storage
proofs. Other bridges may use transaction proofs, Substrate header digests or anything else that may be proved.
**IMPORTANT NOTE**: everything below in this chapter describes details of the messages module
configuration. But if you're interested in well-probed and relatively easy integration of two
Substrate-based chains, you may want to look at the
[bridge-runtime-common](../../bin/runtime-common/) crate. This crate is providing a lot of
helpers for integration, which may be directly used from within your runtime. Then if you'll decide
to change something in this scheme, get back here for detailed information.
**IMPORTANT NOTE**: everything below in this chapter describes details of the messages module configuration. But if
you're interested in well-probed and relatively easy integration of two Substrate-based chains, you may want to look at
the [bridge-runtime-common](../../bin/runtime-common/) crate. This crate is providing a lot of helpers for integration,
which may be directly used from within your runtime. Then if you'll decide to change something in this scheme, get back
here for detailed information.
### General Information
The messages module supports instances. Every module instance is supposed to bridge this chain
and some bridged chain. To bridge with another chain, using another instance is suggested (this
isn't forced anywhere in the code, though). Keep in mind, that the pallet may be used to build
virtual channels between multiple chains, as we do in our [Polkadot <> Kusama bridge](../../docs/polkadot-kusama-bridge-overview.md).
There, the pallet actually bridges only two parachains - Kusama Bridge Hub and Polkadot
Bridge Hub. However, other Kusama and Polkadot parachains are able to send (XCM) messages to their
Bridge Hubs. The messages will be delivered to the other side of the bridge and routed to the proper
The messages module supports instances. Every module instance is supposed to bridge this chain and some bridged chain.
To bridge with another chain, using another instance is suggested (this isn't forced anywhere in the code, though). Keep
in mind, that the pallet may be used to build virtual channels between multiple chains, as we do in our [Polkadot <>
Kusama bridge](../../docs/polkadot-kusama-bridge-overview.md). There, the pallet actually bridges only two parachains -
Kusama Bridge Hub and Polkadot Bridge Hub. However, other Kusama and Polkadot parachains are able to send (XCM) messages
to their Bridge Hubs. The messages will be delivered to the other side of the bridge and routed to the proper
destination parachain within the bridged chain consensus.
Message submitters may track message progress by inspecting module events. When Message is accepted,
the `MessageAccepted` event is emitted. The event contains both message lane identifier and nonce that
has been assigned to the message. When a message is delivered to the target chain, the `MessagesDelivered`
event is emitted from the `receive_messages_delivery_proof()` transaction. The `MessagesDelivered` contains
the message lane identifier and inclusive range of delivered message nonces.
Message submitters may track message progress by inspecting module events. When Message is accepted, the
`MessageAccepted` event is emitted. The event contains both message lane identifier and nonce that has been assigned to
the message. When a message is delivered to the target chain, the `MessagesDelivered` event is emitted from the
`receive_messages_delivery_proof()` transaction. The `MessagesDelivered` contains the message lane identifier and
inclusive range of delivered message nonces.
The pallet provides no means to get the result of message dispatch at the target chain. If that is
required, it must be done outside of the pallet. For example, XCM messages, when dispatched, have
special instructions to send some data back to the sender. Other dispatchers may use similar
mechanism for that.
The pallet provides no means to get the result of message dispatch at the target chain. If that is required, it must be
done outside of the pallet. For example, XCM messages, when dispatched, have special instructions to send some data back
to the sender. Other dispatchers may use similar mechanism for that.
### How to plug-in Messages Module to Send Messages to the Bridged Chain?
The `pallet_bridge_messages::Config` trait has 3 main associated types that are used to work with
outbound messages. The `pallet_bridge_messages::Config::TargetHeaderChain` defines how we see the
bridged chain as the target for our outbound messages. It must be able to check that the bridged
chain may accept our message - like that the message has size below maximal possible transaction
size of the chain and so on. And when the relayer sends us a confirmation transaction, this
implementation must be able to parse and verify the proof of messages delivery. Normally, you would
reuse the same (configurable) type on all chains that are sending messages to the same bridged
chain.
The `pallet_bridge_messages::Config` trait has 3 main associated types that are used to work with outbound messages. The
`pallet_bridge_messages::Config::TargetHeaderChain` defines how we see the bridged chain as the target for our outbound
messages. It must be able to check that the bridged chain may accept our message - like that the message has size below
maximal possible transaction size of the chain and so on. And when the relayer sends us a confirmation transaction, this
implementation must be able to parse and verify the proof of messages delivery. Normally, you would reuse the same
(configurable) type on all chains that are sending messages to the same bridged chain.
The `pallet_bridge_messages::Config::LaneMessageVerifier` defines a single callback to verify outbound
messages. The simplest callback may just accept all messages. But in this case you'll need to answer
many questions first. Who will pay for the delivery and confirmation transaction? Are we sure that
someone will ever deliver this message to the bridged chain? Are we sure that we don't bloat our
runtime storage by accepting this message? What if the message is improperly encoded or has some
fields set to invalid values? Answering all those (and similar) questions would lead to correct
implementation.
The `pallet_bridge_messages::Config::LaneMessageVerifier` defines a single callback to verify outbound messages. The
simplest callback may just accept all messages. But in this case you'll need to answer many questions first. Who will
pay for the delivery and confirmation transaction? Are we sure that someone will ever deliver this message to the
bridged chain? Are we sure that we don't bloat our runtime storage by accepting this message? What if the message is
improperly encoded or has some fields set to invalid values? Answering all those (and similar) questions would lead to
correct implementation.
There's another thing to consider when implementing type for use in
`pallet_bridge_messages::Config::LaneMessageVerifier`. It is whether we treat all message lanes
identically, or they'll have different sets of verification rules? For example, you may reserve
lane#1 for messages coming from some 'wrapped-token' pallet - then you may verify in your
implementation that the origin is associated with this pallet. Lane#2 may be reserved for 'system'
messages and you may charge zero fee for such messages. You may have some rate limiting for messages
sent over the lane#3. Or you may just verify the same rules set for all outbound messages - it is
`pallet_bridge_messages::Config::LaneMessageVerifier`. It is whether we treat all message lanes identically, or they'll
have different sets of verification rules? For example, you may reserve lane#1 for messages coming from some
'wrapped-token' pallet - then you may verify in your implementation that the origin is associated with this pallet.
Lane#2 may be reserved for 'system' messages and you may charge zero fee for such messages. You may have some rate
limiting for messages sent over the lane#3. Or you may just verify the same rules set for all outbound messages - it is
all up to the `pallet_bridge_messages::Config::LaneMessageVerifier` implementation.
The last type is the `pallet_bridge_messages::Config::DeliveryConfirmationPayments`. When confirmation
transaction is received, we call the `pay_reward()` method, passing the range of delivered messages.
You may use the [`pallet-bridge-relayers`](../relayers/) pallet and its
[`DeliveryConfirmationPaymentsAdapter`](../relayers/src/payment_adapter.rs) adapter as a possible
implementation. It allows you to pay fixed reward for relaying the message and some of its portion
for confirming delivery.
The last type is the `pallet_bridge_messages::Config::DeliveryConfirmationPayments`. When confirmation transaction is
received, we call the `pay_reward()` method, passing the range of delivered messages. You may use the
[`pallet-bridge-relayers`](../relayers/) pallet and its
[`DeliveryConfirmationPaymentsAdapter`](../relayers/src/payment_adapter.rs) adapter as a possible implementation. It
allows you to pay fixed reward for relaying the message and some of its portion for confirming delivery.
### I have a Messages Module in my Runtime, but I Want to Reject all Outbound Messages. What shall I do?
You should be looking at the `bp_messages::source_chain::ForbidOutboundMessages` structure
[`bp_messages::source_chain`](../../primitives/messages/src/source_chain.rs). It implements
all required traits and will simply reject all transactions, related to outbound messages.
[`bp_messages::source_chain`](../../primitives/messages/src/source_chain.rs). It implements all required traits and will
simply reject all transactions, related to outbound messages.
### How to plug-in Messages Module to Receive Messages from the Bridged Chain?
The `pallet_bridge_messages::Config` trait has 2 main associated types that are used to work with
inbound messages. The `pallet_bridge_messages::Config::SourceHeaderChain` defines how we see the
bridged chain as the source of our inbound messages. When relayer sends us a delivery transaction,
this implementation must be able to parse and verify the proof of messages wrapped in this
transaction. Normally, you would reuse the same (configurable) type on all chains that are sending
messages to the same bridged chain.
The `pallet_bridge_messages::Config` trait has 2 main associated types that are used to work with inbound messages. The
`pallet_bridge_messages::Config::SourceHeaderChain` defines how we see the bridged chain as the source of our inbound
messages. When relayer sends us a delivery transaction, this implementation must be able to parse and verify the proof
of messages wrapped in this transaction. Normally, you would reuse the same (configurable) type on all chains that are
sending messages to the same bridged chain.
The `pallet_bridge_messages::Config::MessageDispatch` defines a way on how to dispatch delivered
messages. Apart from actually dispatching the message, the implementation must return the correct
dispatch weight of the message before dispatch is called.
The `pallet_bridge_messages::Config::MessageDispatch` defines a way on how to dispatch delivered messages. Apart from
actually dispatching the message, the implementation must return the correct dispatch weight of the message before
dispatch is called.
### I have a Messages Module in my Runtime, but I Want to Reject all Inbound Messages. What shall I do?
You should be looking at the `bp_messages::target_chain::ForbidInboundMessages` structure from
the [`bp_messages::target_chain`](../../primitives/messages/src/target_chain.rs) module. It
implements all required traits and will simply reject all transactions, related to inbound messages.
You should be looking at the `bp_messages::target_chain::ForbidInboundMessages` structure from the
[`bp_messages::target_chain`](../../primitives/messages/src/target_chain.rs) module. It implements all required traits
and will simply reject all transactions, related to inbound messages.
### What about other Constants in the Messages Module Configuration Trait?
Two settings that are used to check messages in the `send_message()` function. The
`pallet_bridge_messages::Config::ActiveOutboundLanes` is an array of all message lanes, that
may be used to send messages. All messages sent using other lanes are rejected. All messages that have
size above `pallet_bridge_messages::Config::MaximalOutboundPayloadSize` will also be rejected.
`pallet_bridge_messages::Config::ActiveOutboundLanes` is an array of all message lanes, that may be used to send
messages. All messages sent using other lanes are rejected. All messages that have size above
`pallet_bridge_messages::Config::MaximalOutboundPayloadSize` will also be rejected.
To be able to reward the relayer for delivering messages, we store a map of message nonces range =>
identifier of the relayer that has delivered this range at the target chain runtime storage. If a
relayer delivers multiple consequent ranges, they're merged into single entry. So there may be more
than one entry for the same relayer. Eventually, this whole map must be delivered back to the source
chain to confirm delivery and pay rewards. So to make sure we are able to craft this confirmation
transaction, we need to: (1) keep the size of this map below a certain limit and (2) make sure that
the weight of processing this map is below a certain limit. Both size and processing weight mostly
depend on the number of entries. The number of entries is limited with the
`pallet_bridge_messages::ConfigMaxUnrewardedRelayerEntriesAtInboundLane` parameter. Processing weight
also depends on the total number of messages that are being confirmed, because every confirmed
message needs to be read. So there's another
`pallet_bridge_messages::Config::MaxUnconfirmedMessagesAtInboundLane` parameter for that.
To be able to reward the relayer for delivering messages, we store a map of message nonces range => identifier of the
relayer that has delivered this range at the target chain runtime storage. If a relayer delivers multiple consequent
ranges, they're merged into single entry. So there may be more than one entry for the same relayer. Eventually, this
whole map must be delivered back to the source chain to confirm delivery and pay rewards. So to make sure we are able to
craft this confirmation transaction, we need to: (1) keep the size of this map below a certain limit and (2) make sure
that the weight of processing this map is below a certain limit. Both size and processing weight mostly depend on the
number of entries. The number of entries is limited with the
`pallet_bridge_messages::ConfigMaxUnrewardedRelayerEntriesAtInboundLane` parameter. Processing weight also depends on
the total number of messages that are being confirmed, because every confirmed message needs to be read. So there's
another `pallet_bridge_messages::Config::MaxUnconfirmedMessagesAtInboundLane` parameter for that.
When choosing values for these parameters, you must also keep in mind that if proof in your scheme
is based on finality of headers (and it is the most obvious option for Substrate-based chains with
finality notion), then choosing too small values for these parameters may cause significant delays
in message delivery. That's because there are too many actors involved in this scheme: 1) authorities
that are finalizing headers of the target chain need to finalize header with non-empty map; 2) the
headers relayer then needs to submit this header and its finality proof to the source chain; 3) the
messages relayer must then send confirmation transaction (storage proof of this map) to the source
chain; 4) when the confirmation transaction will be mined at some header, source chain authorities
must finalize this header; 5) the headers relay then needs to submit this header and its finality
proof to the target chain; 6) only now the messages relayer may submit new messages from the source
to target chain and prune the entry from the map.
When choosing values for these parameters, you must also keep in mind that if proof in your scheme is based on finality
of headers (and it is the most obvious option for Substrate-based chains with finality notion), then choosing too small
values for these parameters may cause significant delays in message delivery. That's because there are too many actors
involved in this scheme: 1) authorities that are finalizing headers of the target chain need to finalize header with
non-empty map; 2) the headers relayer then needs to submit this header and its finality proof to the source chain; 3)
the messages relayer must then send confirmation transaction (storage proof of this map) to the source chain; 4) when
the confirmation transaction will be mined at some header, source chain authorities must finalize this header; 5) the
headers relay then needs to submit this header and its finality proof to the target chain; 6) only now the messages
relayer may submit new messages from the source to target chain and prune the entry from the map.
Delivery transaction requires the relayer to provide both number of entries and total number of
messages in the map. This means that the module never charges an extra cost for delivering a map -
the relayer would need to pay exactly for the number of entries+messages it has delivered. So the
best guess for values of these parameters would be the pair that would occupy `N` percent of the
maximal transaction size and weight of the source chain. The `N` should be large enough to process
large maps, at the same time keeping reserve for future source chain upgrades.
Delivery transaction requires the relayer to provide both number of entries and total number of messages in the map.
This means that the module never charges an extra cost for delivering a map - the relayer would need to pay exactly for
the number of entries+messages it has delivered. So the best guess for values of these parameters would be the pair that
would occupy `N` percent of the maximal transaction size and weight of the source chain. The `N` should be large enough
to process large maps, at the same time keeping reserve for future source chain upgrades.
## Non-Essential Functionality
There may be a special account in every runtime where the messages module is deployed. This
account, named 'module owner', is like a module-level sudo account - he's able to halt and
resume all module operations without requiring runtime upgrade. Calls that are related to this
account are:
There may be a special account in every runtime where the messages module is deployed. This account, named 'module
owner', is like a module-level sudo account - he's able to halt and resume all module operations without requiring
runtime upgrade. Calls that are related to this account are:
- `fn set_owner()`: current module owner may call it to transfer "ownership" to another account;
- `fn halt_operations()`: the module owner (or sudo account) may call this function to stop all
module operations. After this call, all message-related transactions will be rejected until
further `resume_operations` call'. This call may be used when something extraordinary happens with
the bridge;
- `fn resume_operations()`: module owner may call this function to resume bridge operations. The
module will resume its regular operations after this call.
- `fn halt_operations()`: the module owner (or sudo account) may call this function to stop all module operations. After
this call, all message-related transactions will be rejected until further `resume_operations` call'. This call may be
used when something extraordinary happens with the bridge;
- `fn resume_operations()`: module owner may call this function to resume bridge operations. The module will resume its
regular operations after this call.
If pallet owner is not defined, the governance may be used to make those calls.
## Messages Relay
We have an offchain actor, who is watching for new messages and submits them to the bridged chain.
It is the messages relay - you may look at the [crate level documentation and the code](../../relays/messages/).
We have an offchain actor, who is watching for new messages and submits them to the bridged chain. It is the messages
relay - you may look at the [crate level documentation and the code](../../relays/messages/).
+1 -1
View File
@@ -19,7 +19,7 @@ validators. Validators validate the block and register the new parachain head in
[`Heads` map](https://github.com/paritytech/polkadot/blob/88013730166ba90745ae7c9eb3e0c1be1513c7cc/runtime/parachains/src/paras/mod.rs#L645)
of the [`paras`](https://github.com/paritytech/polkadot/tree/master/runtime/parachains/src/paras) pallet,
deployed at the relay chain. Keep in mind that this pallet, deployed at a relay chain, is **NOT** a bridge pallet,
even though the names are similar.
even though the names are similar.
And what the bridge parachains pallet does, is simply verifying storage proofs of parachain heads within that
`Heads` map. It does that using relay chain header, that has been previously imported by the
+18 -12
View File
@@ -1,19 +1,22 @@
## Using Containers
# Using Containers
Using containers via **Podman** or **Docker** brings benefit, whether it is to build a container image or
run a node while keeping a minimum footprint on your local system.
Using containers via **Podman** or **Docker** brings benefit, whether it is to build a container image or run a node
while keeping a minimum footprint on your local system.
This document mentions using `podman` or `docker`. Those are usually interchangeable and it is encouraged using preferably **Podman**. If you have podman installed and want to use all the commands mentioned below, you can simply create an alias with `alias docker=podman`.
This document mentions using `podman` or `docker`. Those are usually interchangeable and it is encouraged using
preferably **Podman**. If you have podman installed and want to use all the commands mentioned below, you can simply
create an alias with `alias docker=podman`.
There are a few options to build a node within a container and inject a binary inside an image.
### Parity built container image
## Parity built container image
Parity builds and publishes a container image that can be found as `docker.io/parity/polkadot-parachain`.
### Parity CI image
## Parity CI image
Parity maintains and uses internally a generic "CI" image that can be used as a base to build binaries: [Parity CI container image](https://github.com/paritytech/scripts/tree/master/dockerfiles/ci-linux):
Parity maintains and uses internally a generic "CI" image that can be used as a base to build binaries: [Parity CI
container image](https://github.com/paritytech/scripts/tree/master/dockerfiles/ci-linux):
The command below allows building a Linux binary without having to even install Rust or any dependency locally:
@@ -29,19 +32,22 @@ sudo chown -R $(id -u):$(id -g) target/
If you want to reproduce other steps of CI process you can use the following
[guide](https://github.com/paritytech/scripts#gitlab-ci-for-building-docker-images).
### Injected image
## Injected image
Injecting a binary inside a base image is the quickest option to get a working container image. This only works if you were able to build a Linux binary, either locally, or using a container as described above.
Injecting a binary inside a base image is the quickest option to get a working container image. This only works if you
were able to build a Linux binary, either locally, or using a container as described above.
After building a Linux binary ()`polkadot-parachain`) with cargo or with Parity CI image as documented above, the following command allows producing a new container image where the compiled binary is injected:
After building a Linux binary ()`polkadot-parachain`) with cargo or with Parity CI image as documented above, the
following command allows producing a new container image where the compiled binary is injected:
```bash
./docker/scripts/build-injected-image.sh
```
### Container build
## Container build
Alternatively, you can build an image with a builder pattern. This options takes a while but offers a simple method for anyone to get a working container image without requiring any of the Rust toolchain installed locally.
Alternatively, you can build an image with a builder pattern. This options takes a while but offers a simple method for
anyone to get a working container image without requiring any of the Rust toolchain installed locally.
```bash
docker build \
+30 -25
View File
@@ -37,8 +37,8 @@ performed during the release process.
### <a name="burnin"></a>Burn In
Ensure that Parity DevOps has run the new release on Westend and Kusama Asset Hub collators for 12h
prior to publishing the release.
Ensure that Parity DevOps has run the new release on Westend and Kusama Asset Hub collators for 12h prior to publishing
the release.
### Build Artifacts
@@ -75,56 +75,61 @@ function of the appropriate pallets.
### Extrinsic Ordering & Storage
Offline signing libraries depend on a consistent ordering of call indices and
functions. Compare the metadata of the current and new runtimes and ensure that
the `module index, call index` tuples map to the same set of functions. It also checks if there have been any changes in `storage`. In case of a breaking change, increase `transaction_version`.
Offline signing libraries depend on a consistent ordering of call indices and functions. Compare the metadata of the
current and new runtimes and ensure that the `module index, call index` tuples map to the same set of functions. It also
checks if there have been any changes in `storage`. In case of a breaking change, increase `transaction_version`.
To verify the order has not changed, manually start the following [Github Action](https://github.com/paritytech/cumulus/actions/workflows/extrinsic-ordering-check-from-bin.yml). It takes around a minute to run and will produce the report as artifact you need to manually check.
To verify the order has not changed, manually start the following [Github
Action](https://github.com/paritytech/cumulus/actions/workflows/extrinsic-ordering-check-from-bin.yml). It takes around
a minute to run and will produce the report as artifact you need to manually check.
To run it, in the _Run Workflow_ dropdown:
1. **Use workflow from**: to ignore, leave `master` as default
2. **The WebSocket url of the reference node**:
- Asset Hub Polkadot: `wss://statemint-rpc.polkadot.io`
2. **The WebSocket url of the reference node**: - Asset Hub Polkadot: `wss://statemint-rpc.polkadot.io`
- Asset Hub Kusama: `wss://statemine-rpc.polkadot.io`
- Asset Hub Westend: `wss://westmint-rpc.polkadot.io`
3. **A url to a Linux binary for the node containing the runtime to test**: Paste the URL of the latest release-candidate binary from the draft-release on Github. The binary has to previously be uploaded to S3 (Github url link to the binary is constantly changing)
3. **A url to a Linux binary for the node containing the runtime to test**: Paste the URL of the latest
release-candidate binary from the draft-release on Github. The binary has to previously be uploaded to S3 (Github url
link to the binary is constantly changing)
- E.g: https://releases.parity.io/cumulus/v0.9.270-rc3/polkadot-parachain
4. **The name of the chain under test. Usually, you would pass a local chain**:
- Asset Hub Polkadot: `asset-hub-polkadot-local`
4. **The name of the chain under test. Usually, you would pass a local chain**: - Asset Hub Polkadot:
`asset-hub-polkadot-local`
- Asset Hub Kusama: `asset-hub-kusama-local`
- Asset Hub Westend: `asset-hub-westend-local`
5. Click **Run workflow**
When the workflow is done, click on it and download the zip artifact, inside you'll find an `output.txt` file. The things to look for in the output are lines like:
When the workflow is done, click on it and download the zip artifact, inside you'll find an `output.txt` file. The
things to look for in the output are lines like:
- `[Identity] idx 28 -> 25 (calls 15)` - indicates the index for Identity has changed
- `[+] Society, Recovery` - indicates the new version includes 2 additional modules/pallets.
- If no indices have changed, every modules line should look something like `[Identity] idx 25 (calls 15)`
**Note**: Adding new functions to the runtime does not constitute a breaking change
as long as the indexes did not change.
**Note**: Adding new functions to the runtime does not constitute a breaking change as long as the indexes did not
change.
**Note**: Extrinsic function signatures changes (adding/removing & ordering arguments) are not caught by the job, so those changes should be reviewed "manually"
**Note**: Extrinsic function signatures changes (adding/removing & ordering arguments) are not caught by the job, so
those changes should be reviewed "manually"
### Benchmarks
The Benchmarks can now be started from the CI. First find the CI pipeline from [here](https://gitlab.parity.io/parity/mirrors/cumulus/-/pipelines?page=1&scope=all&ref=release-parachains-v9220) and pick the latest.
[Guide](https://github.com/paritytech/ci_cd/wiki/Benchmarks:-cumulus)
The Benchmarks can now be started from the CI. First find the CI pipeline from
[here](https://gitlab.parity.io/parity/mirrors/cumulus/-/pipelines?page=1&scope=all&ref=release-parachains-v9220) and
pick the latest. [Guide](https://github.com/paritytech/ci_cd/wiki/Benchmarks:-cumulus)
### Integration Tests
Until https://github.com/paritytech/ci_cd/issues/499 is done, tests will have to be run manually.
1. Go to https://github.com/paritytech/parachains-integration-tests and check out the release branch.
E.g. https://github.com/paritytech/parachains-integration-tests/tree/release-v9270-v0.9.27
for `release-parachains-v0.9.270`
1. Go to https://github.com/paritytech/parachains-integration-tests and check out the release branch. E.g.
https://github.com/paritytech/parachains-integration-tests/tree/release-v9270-v0.9.27 for `release-parachains-v0.9.270`
2. Clone `release-parachains-<version>` branch from Cumulus
3. `cargo build --release`
4. Copy `./target/polkadot-parachain` to `./bin`
5. Clone `it/release-<version>-fast-sudo` from Polkadot
In case the branch does not exists (it is a manual process): cherry pick paritytech/polkadot@791c8b8 and run
`find . -type f -name "*.toml" -print0 | xargs -0 sed -i '' -e 's/polkadot-vX.X.X/polkadot-v<version>/g'`
5. Clone `it/release-<version>-fast-sudo` from Polkadot In case the branch does not exists (it is a manual process):
cherry pick `paritytech/polkadot@791c8b8` and run:
`find . -type f -name "*.toml" -print0 | xargs -0 sed -i '' -e 's/polkadot-vX.X.X/polkadot-v<version>/g'`
6. `cargo build --release --features fast-runtime`
7. Copy `./target/polkadot` into `./bin` (in Cumulus)
8. Run the tests:
- Asset Hub Polkadot: `yarn zombienet-test -c ./examples/statemint/config.toml -t ./examples/statemint`
- Asset Hub Kusama: `yarn zombienet-test -c ./examples/statemine/config.toml -t ./examples/statemine`
- Asset Hub Polkadot: `yarn zombienet-test -c ./examples/statemint/config.toml -t ./examples/statemint`
- Asset Hub Kusama: `yarn zombienet-test -c ./examples/statemine/config.toml -t ./examples/statemine`
+1 -1
View File
@@ -1 +1 @@
License: Apache-2.0
License: Apache-2.0
+1 -1
View File
@@ -19,4 +19,4 @@ parathreads [here](https://wiki.polkadot.network/docs/learn-parathreads).
🧙 Learn about how to use this template and run your own parachain testnet for it in the
[Devhub Cumulus Tutorial](https://docs.substrate.io/tutorials/v3/cumulus/start-relay/).
[Devhub Cumulus Tutorial](https://docs.substrate.io/tutorials/v3/cumulus/start-relay/).
@@ -1,19 +1,23 @@
E2E tests concerning Polkadot Governance and the Collectives Parachain. The tests run by the Parachain Integration Tests [tool](https://github.com/paritytech/parachains-integration-tests/).
E2E tests concerning Polkadot Governance and the Collectives Parachain. The tests run by the Parachain Integration Tests
[tool](https://github.com/paritytech/parachains-integration-tests/).
## Requirements
# Requirements
The tests require some changes to the regular production runtime builds:
RelayChain runtime:
## RelayChain runtime
1. Alice has SUDO
2. Public Referenda `StakingAdmin`, `FellowshipAdmin` tracks settings (see the corresponding keys of the `TRACKS_DATA` constant in the `governance::tracks` module of the Relay Chain runtime crate):
2. Public Referenda `StakingAdmin`, `FellowshipAdmin` tracks settings (see the corresponding keys of the `TRACKS_DATA`
constant in the `governance::tracks` module of the Relay Chain runtime crate):
``` yaml
prepare_period: 5 Block,
decision_period: 1 Block,
confirm_period: 1 Block,
min_enactment_period: 1 Block,
```
Collectives runtime:
1. Fellowship Referenda `Fellows` track settings (see the corresponding key of the `TRACKS_DATA` constant in the `fellowship::tracks` module of the Collectives runtime crate):
## Collectives runtime
1. Fellowship Referenda `Fellows` track settings (see the corresponding key of the `TRACKS_DATA` constant in the
`fellowship::tracks` module of the Collectives runtime crate):
``` yaml
prepare_period: 5 Block,
decision_period: 1 Block,
@@ -1,26 +1,26 @@
- [Bridge-hub Parachains](#bridge-hub-parachains)
* [Requirements for local run/testing](#requirements-for-local-runtesting)
* [How to test local Rococo <-> Wococo bridge](#how-to-test-local-rococo---wococo-bridge)
+ [Run chains (Rococo + BridgeHub, Wococo + BridgeHub) with zombienet](#run-chains-rococo--bridgehub-wococo--bridgehub-with-zombienet)
+ [Run relayer (BridgeHubRococo, BridgeHubWococo)](#run-relayer-bridgehubrococo-bridgehubwococo)
- [Requirements for local run/testing](#requirements-for-local-runtesting)
- [How to test local Rococo <-> Wococo bridge](#how-to-test-local-rococo---wococo-bridge)
- [Run chains (Rococo + BridgeHub, Wococo + BridgeHub) with
zombienet](#run-chains-rococo--bridgehub-wococo--bridgehub-with-zombienet)
- [Run relayer (BridgeHubRococo, BridgeHubWococo)](#run-relayer-bridgehubrococo-bridgehubwococo)
- [Run with script (alternative 1)](#run-with-script-alternative-1)
- [Run with binary (alternative 2)](#run-with-binary-alternative-2)
+ [Send messages - transfer asset over bridge](#send-messages---transfer-asset-over-bridge)
* [How to test live BridgeHubRococo/BridgeHubWococo](#how-to-test-live-bridgehubrococobridgehubwococo)
* [How to test local BridgeHubKusama/BridgeHubPolkadot](#how-to-test-local-bridgehubkusamabridgehubpolkadot)
- [Send messages - transfer asset over bridge](#send-messages---transfer-asset-over-bridge)
- [How to test live BridgeHubRococo/BridgeHubWococo](#how-to-test-live-bridgehubrococobridgehubwococo)
- [How to test local BridgeHubKusama/BridgeHubPolkadot](#how-to-test-local-bridgehubkusamabridgehubpolkadot)
# Bridge-hub Parachains
_BridgeHub(s)_ are **_system parachains_** that will house trustless bridges from the local
ecosystem to others.
The current trustless bridges planned for the BridgeHub(s) are:
_BridgeHub(s)_ are **_system parachains_** that will house trustless bridges from the local ecosystem to others. The
current trustless bridges planned for the BridgeHub(s) are:
- `BridgeHubPolkadot` system parachain:
1. Polkadot <-> Kusama bridge
2. Polkadot <-> Ethereum bridge (Snowbridge)
- `BridgeHubKusama` system parachain:
1. Kusama <-> Polkadot bridge
2. Kusama <-> Ethereum bridge
The high-level responsibilities of each bridge living on BridgeHub:
2. Kusama <-> Ethereum bridge The high-level
responsibilities of each bridge living on BridgeHub:
- sync finality proofs between relay chains (or equivalent)
- sync finality proofs between BridgeHub parachains
- pass (XCM) messages between different BridgeHub parachains
@@ -192,43 +192,40 @@ RUST_LOG=runtime=trace,rpc=trace,bridge=trace \
```
**Check relay-chain headers relaying:**
- Rococo parachain:
- https://polkadot.js.org/apps/?rpc=ws%3A%2F%2F127.0.0.1%3A8943#/chainstate
- Pallet: **bridgeWococoGrandpa**
- Keys: **bestFinalized()**
- Wococo parachain:
- https://polkadot.js.org/apps/?rpc=ws%3A%2F%2F127.0.0.1%3A8945#/chainstate
- Pallet: **bridgeRococoGrandpa**
- Keys: **bestFinalized()**
- Rococo parachain: - https://polkadot.js.org/apps/?rpc=ws%3A%2F%2F127.0.0.1%3A8943#/chainstate - Pallet:
**bridgeWococoGrandpa** - Keys: **bestFinalized()**
- Wococo parachain: - https://polkadot.js.org/apps/?rpc=ws%3A%2F%2F127.0.0.1%3A8945#/chainstate - Pallet:
**bridgeRococoGrandpa** - Keys: **bestFinalized()**
**Check parachain headers relaying:**
- Rococo parachain:
- https://polkadot.js.org/apps/?rpc=ws%3A%2F%2F127.0.0.1%3A8943#/chainstate
- Pallet: **bridgeWococoParachain**
- Keys: **bestParaHeads()**
- Wococo parachain:
- https://polkadot.js.org/apps/?rpc=ws%3A%2F%2F127.0.0.1%3A8945#/chainstate
- Pallet: **bridgeRococoParachain**
- Keys: **bestParaHeads()**
- Rococo parachain: - https://polkadot.js.org/apps/?rpc=ws%3A%2F%2F127.0.0.1%3A8943#/chainstate - Pallet:
**bridgeWococoParachain** - Keys: **bestParaHeads()**
- Wococo parachain: - https://polkadot.js.org/apps/?rpc=ws%3A%2F%2F127.0.0.1%3A8945#/chainstate - Pallet:
**bridgeRococoParachain** - Keys: **bestParaHeads()**
### Send messages - transfer asset over bridge
TODO: see `# !!! READ HERE` above
## How to test live BridgeHubRococo/BridgeHubWococo
(here is still deployed older PoC from branch `origin/bko-transfer-asset-via-bridge`, which uses custom extrinsic, which is going to be replaced by `pallet_xcm` usage)
(here is still deployed older PoC from branch `origin/bko-transfer-asset-via-bridge`, which uses custom extrinsic, which
is going to be replaced by `pallet_xcm` usage)
- uses account seed on Live Rococo:Rockmine2
```
cd <cumulus-git-repo-dir>
./scripts/bridges_rococo_wococo.sh transfer-asset-from-asset-hub-rococo
```
- open explorers:
- Rockmine2 (see events `xcmpQueue.XcmpMessageSent`, `bridgeTransfer.ReserveAssetsDeposited`, `bridgeTransfer.TransferInitiated`) https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Fws-rococo-rockmine2-collator-node-0.parity-testnet.parity.io#/explorer
- BridgeHubRococo (see `bridgeWococoMessages.MessageAccepted`) https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Frococo-bridge-hub-rpc.polkadot.io#/explorer
- BridgeHubWococo (see `bridgeRococoMessages.MessagesReceived`) https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Fwococo-bridge-hub-rpc.polkadot.io#/explorer
- Wockmint (see `xcmpQueue.Success` for `transfer-asset` and `xcmpQueue.Fail` for `ping-via-bridge`) https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Fwococo-wockmint-rpc.polkadot.io#/explorer
- BridgeHubRococo (see `bridgeWococoMessages.MessagesDelivered`)
- open explorers: - Rockmine2 (see events `xcmpQueue.XcmpMessageSent`, `bridgeTransfer.ReserveAssetsDeposited`,
`bridgeTransfer.TransferInitiated`)
https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Fws-rococo-rockmine2-collator-node-0.parity-testnet.parity.io#/explorer
- BridgeHubRococo (see `bridgeWococoMessages.MessageAccepted`)
https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Frococo-bridge-hub-rpc.polkadot.io#/explorer - BridgeHubWococo (see
`bridgeRococoMessages.MessagesReceived`)
https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Fwococo-bridge-hub-rpc.polkadot.io#/explorer - Wockmint (see
`xcmpQueue.Success` for `transfer-asset` and `xcmpQueue.Fail` for `ping-via-bridge`)
https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Fwococo-wockmint-rpc.polkadot.io#/explorer - BridgeHubRococo (see
`bridgeWococoMessages.MessagesDelivered`)
## How to test local BridgeHubKusama/BridgeHubPolkadot
@@ -33,8 +33,8 @@ There are also different user interfaces and command-line tools you can use to d
or interact with contracts:
* [Contracts UI](https://paritytech.github.io/contracts-ui/) a beginner-friendly UI for smart contract developers.
* [polkadot-js](https://polkadot.js.org/apps/) the go-to expert UI for smart contract developers.
* [cargo-contract](https://github.com/paritytech/cargo-contract) a CLI tool, ideal for scripting or your terminal workflow.
* [`polkadot-js`](https://polkadot.js.org/apps/) the go-to expert UI for smart contract developers.
* [`cargo-contract`](https://github.com/paritytech/cargo-contract) a CLI tool, ideal for scripting or your terminal workflow.
If you are looking for a quickstart, we can recommend
[ink!'s Guided Tutorial for Beginners](https://docs.substrate.io/tutorials/v3/ink-workshop/pt1/).
+77
View File
@@ -0,0 +1,77 @@
# Changelog
Currently, the changelog is built locally. It will be moved to CI once labels stabilize.
For now, a bit of preparation is required before you can run the script:
- fetch the srtool digests
- store them under the `digests` folder as `<chain>-srtool-digest.json`
- ensure the `.env` file is up to date with correct information
The content of the release notes is generated from the template files under the `scripts/ci/changelog/templates` folder.
For readability and maintenance, the template is split into several small snippets.
Run:
```
./bin/changelog <ref_since> [<ref_until>=HEAD]
```
For instance:
```
./bin/changelog parachains-v7.0.0-rc8
```
A file called `release-notes.md` will be generated and can be used for the release.
## ENV
You may use the following ENV for testing:
```
RUSTC_STABLE="rustc 1.56.1 (59eed8a2a 2021-11-01)"
RUSTC_NIGHTLY="rustc 1.57.0-nightly (51e514c0f 2021-09-12)"
PRE_RELEASE=true
HIDE_SRTOOL_ROCOCO=true
HIDE_SRTOOL_SHELL=true
REF1=statemine-v5.0.0
REF2=HEAD
DEBUG=1
NO_CACHE=1
```
By default, the template will include all the information, including the runtime data. For clients releases, we don't
need those and they can be skipped by setting the following env:
```
RELEASE_TYPE=client
```
## Considered labels
The following list will likely evolve over time and it will be hard to keep it in sync. In any case, if you want to find
all the labels that are used, search for `meta` in the templates. Currently, the considered labels are:
- Priority: C<N> labels
- Audit: D<N> labels
- E4 => new host function
- B0 => silent, not showing up
- B1-releasenotes (misc unless other labels)
- B5-client (client changes)
- B7-runtimenoteworthy (runtime changes)
- T6-XCM
Note that labels with the same letter are mutually exclusive. A PR should not have both `B0` and `B5`, or both `C1` and
`C9`. In case of conflicts, the template will decide which label will be considered.
## Dev and debuggin
### Hot Reload
The following command allows **Hot Reload**:
```
fswatch templates -e ".*\.md$" | xargs -n1 -I{} ./bin/changelog statemine-v5.0.0
```
### Caching
By default, if the changelog data from Github is already present, the calls to the Github API will be skipped and the
local version of the data will be used. This is much faster. If you know that some labels have changed in Github, you
probably want to refresh the data. You can then either delete manually the `cumulus.json` file or `export NO_CACHE=1` to
force refreshing the data.
+2 -2
View File
@@ -13,11 +13,11 @@ As the messages do not physically go through the same messaging infrastructure
there is some code that is not being tested compared to using slower E2E tests.
In future it may be possible to run these XCM emulated tests as E2E tests (without changes).
As well as the XCM message transport being mocked out, so too are areas around consensus,
As well as the XCM message transport being mocked out, so too are areas around consensus,
in particular things like disputes, staking and iamonline events can't be tested.
## Alternatives
If you just wish to test execution of various XCM instructions
against the XCM VM then the `xcm-simulator` (in the polkadot
against the XCM VM then the `xcm-simulator` (in the Polkadot
repo) is the perfect tool for this.
@@ -1,22 +1,24 @@
# Database snapshot guide
For this guide we will be taking a snapshot of a parachain and relay chain. Please note we are using a local chain here `rococo_local_testnet` and `local_testnet`. Live chains will have different values
For this guide we will be taking a snapshot of a parachain and relay chain. Please note we are using a local chain here
`rococo_local_testnet` and `local_testnet`. Live chains will have different values
*Please ensure that the database is not in current use, i.e no nodes are writing to it*
# How to prepare database for a relaychain
To prepare snapshot for a relay chain we need to copy the database.
To prepare snapshot for a relay chain we need to copy the database.
```
mkdir -p relaychain-snapshot/alice/data/chains/rococo_local_testnet/db/
cp -r chain-data/alice/data/chains/rococo_local_testnet/db/. relaychain-snapshot/alice/data/chains/rococo_local_testnet/db/
cp -r chain-data/alice/data/chains/rococo_local_testnet/db/. relaychain-snapshot/alice/data/chains/rococo_local_testnet/db/
tar -C relaychain-snapshot/alice/ -czf relaychain.tgz data
```
# How to prepare database for a parachain
To prepare snapshot for a parachain we need to copy the database for both the collator node (parachain data) and validator (relay data)
To prepare snapshot for a parachain we need to copy the database for both the collator node (parachain data) and
validator (relay data)
```
#Parachain data
@@ -33,5 +35,6 @@ tar -C parachain-snapshot/charlie/ -czf parachain.tgz data relay-data
```
# Restoring a snapshot
Zombienet will automatically download the `*.tgz` file to the respective folder for a run. However you can also download it manually, just ensure you extract the tar file in the correct directory, i.e. the root directory
`chain-data/charlie/`
Zombienet will automatically download the `*.tgz` file to the respective folder for a run. However you can also download
it manually, just ensure you extract the tar file in the correct directory, i.e. the root directory
`chain-data/charlie/`
+9 -9
View File
@@ -32,11 +32,11 @@ If it is an urgent fix with no large change to logic, then it may be merged afte
contributor has reviewed it well and approved the review once CI is complete.
No PR should be merged until all reviews' comments are addressed.
### Labels:
### Labels
The set of labels and their description can be found [here](https://paritytech.github.io/labels/doc_polkadot-sdk.html).
### Process:
### Process
1. Please use our [Pull Request Template](./PULL_REQUEST_TEMPLATE.md) and make sure all relevant
information is reflected in your PR.
@@ -50,12 +50,12 @@ The set of labels and their description can be found [here](https://paritytech.g
`T13-documentation`. The docs team will get in touch.
5. If your PR changes files in these paths:
`polkadot` : '^runtime/polkadot'
`polkadot` : '^runtime/kusama'
`polkadot` : '^primitives/src/'
`polkadot` : '^runtime/common'
`substrate` : '^frame/'
`substrate` : '^primitives/'
`polkadot` : `^runtime/polkadot`
`polkadot` : `^runtime/kusama`
`polkadot` : `^primitives/src/`
`polkadot` : `^runtime/common`
`substrate` : `^frame/`
`substrate` : `^primitives/`
It should be added to the [security audit board](https://github.com/orgs/paritytech/projects/103)
and will need to undergo an audit before merge.
@@ -67,7 +67,7 @@ to change the code to make it work/compile.
It should also mention potential storage migrations and if they require some special setup aside adding
it to the list of migrations in the runtime.
## Reviewing pull requests:
## Reviewing pull requests
When reviewing a pull request, the end-goal is to suggest useful changes to the author.
Reviews should finish with approval unless there are issues that would result in:
+109 -124
View File
@@ -1,11 +1,10 @@
# Substrate Documentation Guidelines
This document is focused on documenting parts of substrate that relate to its
external API. The list of such crates can be found in [CODEOWNERS](./CODEOWNERS).
Search for the crates auto-assigned to the `docs-audit` team.
This document is focused on documenting parts of Substrate that relate to its external API. The list of such crates can
be found in [CODEOWNERS](./CODEOWNERS). Search for the crates auto-assigned to the `docs-audit` team.
These crates are used by external developers and need thorough documentation.
They are the most concerned with FRAME development.
These crates are used by external developers and need thorough documentation. They are the most concerned with FRAME
development.
- [Substrate Documentation Guidelines](#substrate-documentation-guidelines)
- [General/Non-Pallet Crates](#generalnon-pallet-crates)
@@ -35,22 +34,19 @@ First, consider the case for all such crates, except for those that are pallets.
The first question is, what should you document? Use this filter:
1. In the crates assigned to `docs-audit` in [CODEOWNERS](./CODEOWNERS),
2. All `pub` items need to be documented. If not `pub`, it doesn't appear in the
rust-docs, and is not public facing.
2. All `pub` items need to be documented. If not `pub`, it doesn't appear in the rust-docs, and is not public facing.
* Within `pub` items, sometimes they are only `pub` to be used by another
internal crate, and you can foresee that this won't be used by anyone else.
These need **not** be documented thoroughly.
- Within `pub` items, sometimes they are only `pub` to be used by another internal crate, and you can foresee that
this won't be used by anyone else. These need **not** be documented thoroughly.
* Reminder: `trait` items are public by definition if the trait is public.
- Reminder: `trait` items are public by definition if the trait is public.
3. All public modules (`mod`) should have reasonable module-level documentation (`//!`).
#### Rust Docs vs. Code Comments
Note that anything starting with `///` is an external rust-doc, and everything
starting with `//` does not appear in the rust-docs.
It's important to not confuse the two in your documentation.
Note that anything starting with `///` is an external rust-doc, and everything starting with `//` does not appear in the
rust-docs. It's important to not confuse the two in your documentation.
```rust
/// Computes the square root of the input, returning `Ok(_)` if successful.
@@ -73,100 +69,88 @@ pub fn sqrt(x: u32) -> Result<u32, ()> {
There are good sources to look into:
- [Rust Documentation Guide](https://doc.rust-lang.org/rustdoc/how-to-write-documentation.html)
- [Documentation in Rust Book](https://doc.rust-lang.org/book/ch14-02-publishing-to-crates-io.html#making-useful-documentation-comments)
- [Guide on Writing Documentation for a Rust Crate](https://blog.guillaume-gomez.fr/articles/2020-03-12+Guide+on+how+to+write+documentation+for+a+Rust+crate)
- [Documentation in Rust
Book](https://doc.rust-lang.org/book/ch14-02-publishing-to-crates-io.html#making-useful-documentation-comments)
- [Guide on Writing Documentation for a Rust
Crate](https://blog.guillaume-gomez.fr/articles/2020-03-12+Guide+on+how+to+write+documentation+for+a+Rust+crate)
As mentioned [here](https://web.mit.edu/rust-lang_v1.25/arch/amd64_ubuntu1404/share/doc/rust/html/book/first-edition/documentation.html#writing-documentation-comments) and [here](https://blog.guillaume-gomez.fr/articles/2020-03-12+Guide+on+how+to+write+documentation+for+a+Rust+crate),
always start with a **single sentence** demonstrating what is documented. All additional
documentation should be added *after a newline*. Strive to make the first sentence succinct
and short.The reason for this is the first paragraph of docs about an item (everything
before the first newline) is used as the excerpt that rust doc displays about
this item when it appears in tables, such as the table listing all functions in
a module. If this excerpt is too long, the module docs will be very difficult
to read.
As mentioned
[here](https://web.mit.edu/rust-lang_v1.25/arch/amd64_ubuntu1404/share/doc/rust/html/book/first-edition/documentation.html#writing-documentation-comments)
and [here](https://blog.guillaume-gomez.fr/articles/2020-03-12+Guide+on+how+to+write+documentation+for+a+Rust+crate),
always start with a **single sentence** demonstrating what is documented. All additional documentation should be added
*after a newline*. Strive to make the first sentence succinct and short.The reason for this is the first paragraph of
docs about an item (everything before the first newline) is used as the excerpt that rust doc displays about this item
when it appears in tables, such as the table listing all functions in a module. If this excerpt is too long, the module
docs will be very difficult to read.
About [special sections](https://web.mit.edu/rust-lang_v1.25/arch/amd64_ubuntu1404/share/doc/rust/html/book/first-edition/documentation.html#special-sections), we will most likely not need to think about panic and safety in any runtime related code. Our code is never `unsafe`, and will (almost) never panic.
About [special
sections](https://web.mit.edu/rust-lang_v1.25/arch/amd64_ubuntu1404/share/doc/rust/html/book/first-edition/documentation.html#special-sections),
we will most likely not need to think about panic and safety in any runtime related code. Our code is never `unsafe`,
and will (almost) never panic.
Use `# Examples as much as possible. These are great ways to further
demonstrate what your APIs are doing, and add free test coverage. As an
additional benefit, any code in rust-docs is treated as an "integration tests",
not unit tests, which tests your crate in a different way than unit tests. So,
it is both a win for "more documentation" and a win for "more test coverage".
Use `# Examples as much as possible. These are great ways to further demonstrate what your APIs are doing, and add free
test coverage. As an additional benefit, any code in rust-docs is treated as an "integration tests", not unit tests,
which tests your crate in a different way than unit tests. So, it is both a win for "more documentation" and a win for
"more test coverage".
You can also consider having an `# Error` section optionally. Of course, this
only applies if there is a `Result` being returned, and if the `Error` variants
are overly complicated.
You can also consider having an `# Error` section optionally. Of course, this only applies if there is a `Result` being
returned, and if the `Error` variants are overly complicated.
Strive to include correct links to other items in your written docs as much as
possible. In other words, avoid \`some_func\` and instead use \[\`some_fund\`\].
Read more about how to correctly use links in your rust-docs
[here](https://doc.rust-lang.org/rustdoc/write-documentation/linking-to-items-by-name.html#valid-links)
and [here](https://rust-lang.github.io/rfcs/1946-intra-rustdoc-links.html#additions-to-the-documentation-syntax).
Strive to include correct links to other items in your written docs as much as
possible. In other words, avoid `` `some_func` `` and instead use
``[`some_func`]``.
Strive to include correct links to other items in your written docs as much as possible. In other words, avoid
\`some_func\` and instead use \[\`some_fund\`\]. Read more about how to correctly use links in your rust-docs
[here](https://doc.rust-lang.org/rustdoc/write-documentation/linking-to-items-by-name.html#valid-links) and
[here](https://rust-lang.github.io/rfcs/1946-intra-rustdoc-links.html#additions-to-the-documentation-syntax). Strive to
include correct links to other items in your written docs as much as possible. In other words, avoid `` `some_func` ``
and instead use ``[`some_func`]``.
> While you are linking, you might become conscious of the fact that you are
in need of linking to (too many) foreign items in order to explain your API.
This is leaning more towards API-Design rather than documentation, but it is a
warning that the subject API might be slightly wrong. For example, most "glue"
traits[^1] in `frame/support` should be designed and documented without making
hard assumptions about particular pallets that implement them.
> While you are linking, you might become conscious of the fact that you are in need of linking to (too many) foreign
items in order to explain your API. This is leaning more towards API-Design rather than documentation, but it is a
warning that the subject API might be slightly wrong. For example, most "glue" traits[^1] in `frame/support` should be
designed and documented without making hard assumptions about particular pallets that implement them.
---
#### TLDR
0. Have the goal of enforcing `#![deny(missing_docs)]` mentally, even if it is
not enforced by the compiler 🙈.
1. Start with a single, clear and concise sentence. Follow up with more context,
after a newline, if needed.
0. Have the goal of enforcing `#![deny(missing_docs)]` mentally, even if it is not enforced by the compiler 🙈.
1. Start with a single, clear and concise sentence. Follow up with more context, after a newline, if needed.
2. Use examples as much as reasonably possible.
3. Use links as much as possible.
4. Think about context. If you are explaining a lot of foreign topics while
documenting a trait that should not explicitly depend on them, you have likely
not designed it properly.
4. Think about context. If you are explaining a lot of foreign topics while documenting a trait that should not
explicitly depend on them, you have likely not designed it properly.
---
#### Proc-Macros
Note that there are special considerations when documenting proc macros. Doc
links will appear to function _within_ your proc macro crate, but often will no
longer function when these proc macros are re-exported elsewhere in your
project. The exception is doc links to _other proc macros_ which will function
just fine if they are also being re-exported. It is also often necessary to
disambiguate between a proc macro and a function of the same name, which can be
done using the `macro@my_macro_name` syntax in your link. Read more about how to
correctly use links in your rust-docs [here](https://doc.rust-lang.org/rustdoc/write-documentation/linking-to-items-by-name.html#valid-links)
and [here](https://rust-lang.github.io/rfcs/1946-intra-rustdoc-links.html#additions-to-the-documentation-syntax).
Note that there are special considerations when documenting proc macros. Doc links will appear to function _within_ your
proc macro crate, but often will no longer function when these proc macros are re-exported elsewhere in your project.
The exception is doc links to _other proc macros_ which will function just fine if they are also being re-exported. It
is also often necessary to disambiguate between a proc macro and a function of the same name, which can be done using
the `macro@my_macro_name` syntax in your link. Read more about how to correctly use links in your rust-docs
[here](https://doc.rust-lang.org/rustdoc/write-documentation/linking-to-items-by-name.html#valid-links) and
[here](https://rust-lang.github.io/rfcs/1946-intra-rustdoc-links.html#additions-to-the-documentation-syntax).
---
### Other Guidelines
The above five guidelines must always be reasonably respected in the
documentation.
The above five guidelines must always be reasonably respected in the documentation.
The following are a set of notes that may not necessarily hold in all
circumstances:
The following are a set of notes that may not necessarily hold in all circumstances:
---
#### Document Through Code
You should make sure that your code is properly-named and well-organized so that
your code functions as a form of documentation. However, within the complexity
of our projects in Polkadot/Substrate that is not enough. Particularly, things
like examples, errors and panics cannot be documented only through properly-
named and well-organized code.
You should make sure that your code is properly-named and well-organized so that your code functions as a form of
documentation. However, within the complexity of our projects in Polkadot/Substrate that is not enough. Particularly,
things like examples, errors and panics cannot be documented only through properly- named and well-organized code.
> Our north star is self-documenting code that also happens to be well-documented
and littered with examples.
> Our north star is self-documenting code that also happens to be well-documented and littered with examples.
* Your written documents should *complement* the code, not *repeat* it. As an
example, a documentation on top of a code example should never look like the
following:
- Your written documents should *complement* the code, not *repeat* it. As an example, a documentation on top of a code
example should never look like the following:
```rust
/// Sends request and handles the response.
@@ -175,15 +159,14 @@ following:
}
```
In the above example, the documentation has added no useful information not
already contained within the properly-named trait and is redundant.
In the above example, the documentation has added no useful information not already contained within the properly-named
trait and is redundant.
---
#### Formatting Matters
The way you format your documents (newlines, heading and so on) makes a
difference. Consider the below examples:
The way you format your documents (newlines, heading and so on) makes a difference. Consider the below examples:
```rust
/// This function works with input u32 x and multiplies it by two. If
@@ -206,16 +189,15 @@ fn multiply_by_2(x: u32) -> u32 { .. }
// More efficiency can be achieved if we improve this via such and such.
fn multiply_by_2(x: u32) -> u32 { .. }
```
They are both roughly conveying the same set of facts, but one is easier to
follow because it was formatted cleanly. Especially for traits and types that
you can foresee will be seen and used a lot, try and write a well formatted
They are both roughly conveying the same set of facts, but one is easier to follow because it was formatted cleanly.
Especially for traits and types that you can foresee will be seen and used a lot, try and write a well formatted
version.
Similarly, make sure your comments are wrapped at 100 characters line-width (as
defined by our [`rustfmt.toml`](../rustfmt.toml)), no **more and no less**! The
more is fixed by `rustfmt` and our CI, but if you (for some unknown reason)
wrap your lines at 59 characters, it will pass the CI, and it will not look good
🫣. Consider using a plugin like [rewrap](https://marketplace.visualstudio.com/items?itemName=stkb.rewrap) (for Visual Studio Code) to properly do this.
Similarly, make sure your comments are wrapped at 100 characters line-width (as defined by our
[`rustfmt.toml`](../rustfmt.toml)), no **more and no less**! The more is fixed by `rustfmt` and our CI, but if you (for
some unknown reason) wrap your lines at 59 characters, it will pass the CI, and it will not look good 🫣. Consider using
a plugin like [rewrap](https://marketplace.visualstudio.com/items?itemName=stkb.rewrap) (for Visual Studio Code) to
properly do this.
[^1]: Those that help two pallets talk to each other.
@@ -224,12 +206,11 @@ wrap your lines at 59 characters, it will pass the CI, and it will not look good
## Pallet Crates
The guidelines so far have been general in nature, and are applicable to crates
that are pallets and crates that're not pallets.
The guidelines so far have been general in nature, and are applicable to crates that are pallets and crates that're not
pallets.
The following is relevant to how to document parts of a crate that is a pallet.
See [`pallet-fast-unstake`](../frame/fast-unstake/src/lib.rs) as one example of
adhering these guidelines.
The following is relevant to how to document parts of a crate that is a pallet. See
[`pallet-fast-unstake`](../frame/fast-unstake/src/lib.rs) as one example of adhering these guidelines.
---
@@ -252,15 +233,20 @@ For the top-level pallet docs, consider the following template:
//!
//! ### Example
//!
//! <Your pallet must have a few tests that cover important user journeys. Use https://crates.io/crates/docify to reuse these as examples>.
//! <Your pallet must have a few tests that cover important user journeys. Use https://crates.io/crates/docify to reuse
//! these as examples>.
//!
//! ## Pallet API
//!
//! <Reminder: inside the [`pallet`] module, a template that leads the reader to the relevant items is auto-generated. There is no need to repeat things like "See Config trait for ...", which are generated inside [`pallet`] here anyways. You can use the below line as-is:>
//! <Reminder: inside the [`pallet`] module, a template that leads the reader to the relevant items is auto-generated.
//! There is no need to repeat things like "See Config trait for ...", which are generated inside [`pallet`] here anyways.
//! You can use the below line as-is:>
//!
//! See the [`pallet`] module for more information about the interfaces this pallet exposes, including its configuration trait, dispatchables, storage items, events and errors.
//! See the [`pallet`] module for more information about the interfaces this pallet exposes, including its configuration
//! trait, dispatchables, storage items, events and errors.
//!
//! <The audience of this is those who want to know how this pallet works, to the extent of being able to build something on top of it, like a DApp or another pallet>
//! <The audience of this is those who want to know how this pallet works, to the extent of being able to build something
//! on top of it, like a DApp or another pallet>
//!
//! This section can most often be left as-is.
//!
@@ -268,7 +254,8 @@ For the top-level pallet docs, consider the following template:
//!
//! <The format of this section is up to you, but we suggest the Design-oriented approach that follows>
//!
//! <The audience of this would be your future self, or anyone who wants to gain a deep understanding of how the pallet works so that they can eventually propose optimizations to it>
//! <The audience of this would be your future self, or anyone who wants to gain a deep understanding of how the pallet
//! works so that they can eventually propose optimizations to it>
//!
//! ### Design Goals (optional)
//!
@@ -276,31 +263,31 @@ For the top-level pallet docs, consider the following template:
//!
//! ### Design (optional)
//!
//! <Describe how you've reached those goals. This should describe the storage layout of your pallet and what was your approach in designing it that way.>
//! <Describe how you've reached those goals. This should describe the storage layout of your pallet and what was your
//! approach in designing it that way.>
//!
//! ### Terminology (optional)
//!
//! <Optionally, explain any non-obvious terminology here. You can link to it if you want to use the terminology further up>
//! <Optionally, explain any non-obvious terminology here. You can link to it if you want to use the terminology further
//! up>
```
This template's details (heading 3s and beyond) are left flexible, and at the
discretion of the developer to make the best final choice about. For example,
you might want to include `### Terminology` or not. Moreover, you might find it
This template's details (heading 3s and beyond) are left flexible, and at the discretion of the developer to make the
best final choice about. For example, you might want to include `### Terminology` or not. Moreover, you might find it
more useful to include it in `## Overview`.
Nonetheless, the high level flow of going from the most high level explanation
to the most low level explanation is important to follow.
Nonetheless, the high level flow of going from the most high level explanation to the most low level explanation is
important to follow.
As a rule of thumb, the Heading 2s (`##`) in this template can be considered a
strict rule, while the Heading 3s (`###`) and beyond are flexible.
As a rule of thumb, the Heading 2s (`##`) in this template can be considered a strict rule, while the Heading 3s (`###`)
and beyond are flexible.
---
#### Polkadot and Substrate
Optionally, in order to demonstrate the relation between the two, you can start
the pallet documentation with:
Optionally, in order to demonstrate the relation between the two, you can start the pallet documentation with:
```
//! > Made with *Substrate*, for *Polkadot*.
@@ -331,7 +318,8 @@ For each dispatchable (`fn` item inside `#[pallet::call]`), consider the followi
///
/// ## Errors (optional)
///
/// <If an extensive list of errors can be returned, list them individually instead of mentioning them in the section above>
/// <If an extensive list of errors can be returned, list them individually instead of mentioning them in the section
/// above>
///
/// ## Events (optional)
///
@@ -339,29 +327,26 @@ For each dispatchable (`fn` item inside `#[pallet::call]`), consider the followi
pub fn name_of_dispatchable(origin: OriginFor<T>, ...) -> DispatchResult {}
```
Consider the fact that these docs will be part of the metadata of the associated dispatchable, and might be used by wallets and explorers.
Consider the fact that these docs will be part of the metadata of the associated dispatchable, and might be used by
wallets and explorers.
---
### Storage Items
1. If a map-like type is being used, always note the choice of your hashers as
private code docs (`// Hasher X chosen because ...`). Recall that this is not
relevant information to external people, so it must be documented as `//`.
1. If a map-like type is being used, always note the choice of your hashers as private code docs (`// Hasher X chosen
because ...`). Recall that this is not relevant information to external people, so it must be documented as `//`.
2. Consider explaining the crypto-economics of how a deposit is being taken in
return of the storage being used.
2. Consider explaining the crypto-economics of how a deposit is being taken in return of the storage being used.
3. Consider explaining why it is safe for the storage item to be unbounded, if
`#[pallet::unbounded]` or `#[pallet::without_storage_info]` is being used.
3. Consider explaining why it is safe for the storage item to be unbounded, if `#[pallet::unbounded]` or
`#[pallet::without_storage_info]` is being used.
---
### Errors and Events
Consider the fact that, similar to dispatchables, these docs will be part of
the metadata of the associated event/error, and might be used by wallets and
explorers.
Consider the fact that, similar to dispatchables, these docs will be part of the metadata of the associated event/error,
and might be used by wallets and explorers.
Specifically for `error`, explain why the error has happened, and what can be
done in order to avoid it.
Specifically for `error`, explain why the error has happened, and what can be done in order to avoid it.
+15 -11
View File
@@ -2,25 +2,27 @@
✄ -----------------------------------------------------------------------------
Thank you for your Pull Request! 🙏 Please make sure it follows the contribution
guidelines outlined in [this document](https://github.com/paritytech/polkadot-sdk/blob/master/docs/CONTRIBUTING.md) and fill out the
sections below. Once you're ready to submit your PR for review, please delete
this section and leave only the text under the "Description" heading.
Thank you for your Pull Request! 🙏 Please make sure it follows the contribution guidelines outlined in
[this document](https://github.com/paritytech/polkadot-sdk/blob/master/docs/CONTRIBUTING.md) and fill
out the sections below. Once you're ready to submit your PR for review, please
delete this section and leave only the text under the "Description" heading.
# Description
*Please include a summary of the changes and the related issue. Please also include relevant motivation and context, including:*
*Please include a summary of the changes and the related issue. Please also include relevant motivation and context,
including:*
- What does this PR do?
- Why are these changes needed?
- How were these changes implemented and what do they affect?
*Use [Github semantic linking](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword) to address any open issues this PR relates to or closes.*
*Use [Github semantic
linking](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)
to address any open issues this PR relates to or closes.*
Fixes # (issue number, *if applicable*)
Fixes # (issue number, *if applicable*)
Closes # (issue number, *if applicable*)
Closes # (issue number, *if applicable*)
Polkadot companion: (*if applicable*)
@@ -29,10 +31,12 @@ Cumulus companion: (*if applicable*)
# Checklist
- [ ] My PR includes a detailed description as outlined in the "Description" section above
- [ ] My PR follows the [labeling requirements](https://github.com/paritytech/polkadot-sdk/blob/master/docs/CONTRIBUTING.md#process) of this project (at minimum one label for `T` required)
- [ ] My PR follows the [labeling requirements](CONTRIBUTING.md#Process) of this project (at minimum one label for `T`
required)
- [ ] I have made corresponding changes to the documentation (if applicable)
- [ ] I have added tests that prove my fix is effective or that my feature works (if applicable)
- [ ] If this PR alters any external APIs or interfaces used by Polkadot, the corresponding Polkadot PR is ready as well as the corresponding Cumulus PR (optional)
- [ ] If this PR alters any external APIs or interfaces used by Polkadot, the corresponding Polkadot PR is ready as well
as the corresponding Cumulus PR (optional)
You can remove the "Checklist" section once all have been checked. Thank you for your contribution!
+20 -20
View File
@@ -2,19 +2,19 @@
title: Style Guide for Rust in the Polkadot-SDK
---
Where possible these styles are enforced by settings in `rustfmt.toml` so if you run `cargo fmt`
Where possible these styles are enforced by settings in `rustfmt.toml` so if you run `cargo fmt`
then you will adhere to most of these style guidelines automatically.
# Formatting
- Indent using tabs.
- Lines should be longer than 100 characters long only in exceptional circumstances and certainly
- Indent using tabs.
- Lines should be longer than 100 characters long only in exceptional circumstances and certainly
no longer than 120. For this purpose, tabs are considered 4 characters wide.
- Indent levels should be greater than 5 only in exceptional circumstances and certainly no
- Indent levels should be greater than 5 only in exceptional circumstances and certainly no
greater than 8. If they are greater than 5, then consider using `let` or auxiliary functions in
order to strip out complex inline expressions.
- Never have spaces on a line prior to a non-whitespace character
- Follow-on lines are only ever a single indent from the original line.
- Never have spaces on a line prior to a non-whitespace character
- Follow-on lines are only ever a single indent from the original line.
```rust
fn calculation(some_long_variable_a: i8, some_long_variable_b: i8) -> bool {
@@ -25,7 +25,7 @@ fn calculation(some_long_variable_a: i8, some_long_variable_b: i8) -> bool {
}
```
- Indent level should follow open parens/brackets, but should be collapsed to the smallest number
- Indent level should follow open parens/brackets, but should be collapsed to the smallest number
of levels actually used:
```rust
@@ -45,8 +45,8 @@ fn calculate(
}
```
- `where` is indented, and its items are indented one further.
- Argument lists or function invocations that are too long to fit on one line are indented
- `where` is indented, and its items are indented one further.
- Argument lists or function invocations that are too long to fit on one line are indented
similarly to code blocks, and once one param is indented in such a way, all others should be,
too. Run-on parameter lists are also acceptable for single-line run-ons of basic function calls.
@@ -92,7 +92,7 @@ fn foo(really_long_parameter_name_1: SomeLongTypeName, really_long_parameter_nam
}
```
- Always end last item of a multi-line comma-delimited set with `,` when legal:
- Always end last item of a multi-line comma-delimited set with `,` when legal:
```rust
struct Point<T> {
@@ -104,7 +104,7 @@ struct Point<T> {
enum Meal { Breakfast, Lunch, Dinner };
```
- Avoid trailing `;`s where unneeded.
- Avoid trailing `;`s where unneeded.
```rust
if condition {
@@ -112,8 +112,8 @@ if condition {
}
```
- `match` arms may be either blocks or have a trailing `,` but not both.
- Blocks should not be used unnecessarily.
- `match` arms may be either blocks or have a trailing `,` but not both.
- Blocks should not be used unnecessarily.
```rust
match meal {
@@ -126,7 +126,7 @@ match meal {
# Style
- Panickers require explicit proofs they don't trigger. Calling `unwrap` is discouraged. The
- Panickers require explicit proofs they don't trigger. Calling `unwrap` is discouraged. The
exception to this rule is test code. Avoiding panickers by restructuring code is preferred if
feasible.
@@ -139,14 +139,14 @@ let mut target_path =
);
```
- Unsafe code requires explicit proofs just as panickers do. When introducing unsafe code,
- Unsafe code requires explicit proofs just as panickers do. When introducing unsafe code,
consider trade-offs between efficiency on one hand and reliability, maintenance costs, and
security on the other. Here is a list of questions that may help evaluating the trade-off while
preparing or reviewing a PR:
- how much more performant or compact the resulting code will be using unsafe code,
- how likely is it that invariants could be violated,
- are issues stemming from the use of unsafe code caught by existing tests/tooling,
- what are the consequences if the problems slip into production.
- how much more performant or compact the resulting code will be using unsafe code,
- how likely is it that invariants could be violated,
- are issues stemming from the use of unsafe code caught by existing tests/tooling,
- what are the consequences if the problems slip into production.
# Manifest Formatting
@@ -177,4 +177,4 @@ default = [
# Comments go here as well ;)
"std",
]
```
```
+20
View File
@@ -0,0 +1,20 @@
# Markdown linting
Since the introduction of [PR #1309](https://github.com/paritytech/polkadot-sdk/pull/1309), the markdown
files in this repository are checked by a linter for formatting and consistency.
The linter used is [`markdownlint`](https://github.com/DavidAnson/markdownlint) and can be installed locally on your
machine. It can also be setup as [pre-commit hook](https://github.com/igorshubovych/markdownlint-cli#use-with-pre-commit)
to ensure that your markdown is passing all the tests.
The rules in place are defined
[here](https://github.com/paritytech/polkadot-sdk/blob/master/.github/.markdownlint.yaml).
You may run `markdownlint` locally using:
```
markdownlint --config .github/.markdownlint.yaml --ignore target .
```
There are also plugins for your favorite editor, that can ensure that most
of the rules will pass and fix typical issues (such as trailing spaces,
missing eof new line, long lines, etc...)
+4 -4
View File
@@ -22,7 +22,7 @@ Installation from the Debian repository will create a `systemd` service that can
Polkadot node. This is disabled by default, and can be started by running `systemctl start polkadot`
on demand (use `systemctl enable polkadot` to make it auto-start after reboot). By default, it will
run as the `polkadot` user. Command-line flags passed to the binary can be customized by editing
`/etc/default/polkadot`. This file will not be overwritten on updating polkadot. You may also just
`/etc/default/polkadot`. This file will not be overwritten on updating Polkadot. You may also just
run the node directly from the command-line.
### Debian-based (Debian, Ubuntu)
@@ -128,7 +128,7 @@ Connect to the global Polkadot Mainnet network by running:
You can see your node on [telemetry] (set a custom name with `--name "my custom name"`).
[telemetry]: https://telemetry.polkadot.io/#list/Polkadot
[telemetry](https://telemetry.polkadot.io/#list/Polkadot): https://telemetry.polkadot.io/#list/Polkadot
### Connect to the "Kusama" Canary Network
@@ -140,7 +140,7 @@ Connect to the global Kusama canary network by running:
You can see your node on [telemetry] (set a custom name with `--name "my custom name"`).
[telemetry]: https://telemetry.polkadot.io/#list/Kusama
[telemetry](https://telemetry.polkadot.io/#list/Kusama): https://telemetry.polkadot.io/#list/Kusama
### Connect to the Westend Testnet
@@ -152,7 +152,7 @@ Connect to the global Westend testnet by running:
You can see your node on [telemetry] (set a custom name with `--name "my custom name"`).
[telemetry]: https://telemetry.polkadot.io/#list/Westend
[telemetry](https://telemetry.polkadot.io/#list/Westend): https://telemetry.polkadot.io/#list/Westend
### Obtaining DOTs
+44 -49
View File
@@ -1,57 +1,52 @@
Polkadot Release Process
------------------------
# Polkadot Release Process
### Branches
* release-candidate branch: The branch used for staging of the next release.
Named like `release-v0.8.26`
### Notes
* The release-candidate branch *must* be made in the paritytech/polkadot repo in
order for release automation to work correctly
* Any new pushes/merges to the release-candidate branch (for example,
refs/heads/release-v0.8.26) will result in the rc index being bumped (e.g., v0.8.26-rc1
to v0.8.26-rc2) and new wasms built.
## Branches
* release-candidate branch: The branch used for staging of the next release. Named like `release-v0.8.26`
### Release workflow
## Notes
* The release-candidate branch *must* be made in the `paritytech/polkadot` repo in order for release automation to work
correctly
* Any new pushes/merges to the release-candidate branch (for example, refs/heads/release-v0.8.26) will result in the rc
index being bumped (e.g., v0.8.26-rc1 to v0.8.26-rc2) and new wasm built.
Below are the steps of the release workflow. Steps prefixed with NOACTION are
automated and require no human action.
## Release workflow
Below are the steps of the release workflow. Steps prefixed with NOACTION are automated and require no human action.
1. To initiate the release process:
1. branch master off to a release candidate branch:
- `git checkout master; git pull; git checkout -b release-v0.8.26`
2. In the [substrate](https://github.com/paritytech/substrate) repo, check out the commit used by polkadot (this can be found using the following command in the *polkadot* repo: `grep 'paritytech/substrate' Cargo.lock | grep -E '[0-9a-f]{40}' | sort | uniq `
3. Branch off this **substrate** commit into its own branch: `git branch -b polkadot-v0.8.26; git push origin refs/heads/polkadot-v0.8.26`
4. In the **polkadot** repository, use [diener](https://github.com/bkchr/diener/) to switch to this branch: `diener update --branch "polkadot-v0.8.26" --substrate`. Update Cargo.lock (to do this, you can run `cargo build` and then ctrl+c once it finishes fetching and begins compiling)
5. Push the **polkadot** `release-v0.8.26` branch to Github: `git push origin refs/heads/release-v0.8.26`
2. NOACTION: The current HEAD of the release-candidate branch is tagged `v0.8.26-rc1`
3. NOACTION: A draft release and runtime WASMs are created for this
release-candidate automatically. A link to the draft release will be linked in
the internal polkadot matrix channel.
4. NOACTION: A new Github issue is created containing a checklist of manual
steps to be completed before we are confident with the release. This will be
linked in Matrix.
5. Complete the steps in the issue created in step 4, signing them off as
completed
6. (optional) If a fix is required to the release-candidate:
1. branch master off to a release candidate branch: - `git checkout master; git pull; git checkout -b release-v0.8.26`
1. In the [Substrate](https://github.com/paritytech/substrate) repo, check out the commit used by Polkadot (this can
be found using the following command in the *Polkadot* repo: `grep 'paritytech/substrate' Cargo.lock | grep -E
'[0-9a-f]{40}' | sort | uniq`
1. Branch off this **Substrate** commit into its own branch: `git branch -b polkadot-v0.8.26; git push origin
refs/heads/polkadot-v0.8.26`
1. In the **Polkadot** repository, use [diener](https://github.com/bkchr/diener/) to switch to this branch: `diener
update --branch "polkadot-v0.8.26" --substrate`. Update Cargo.lock (to do this, you can run `cargo build` and then
ctrl+c once it finishes fetching and begins compiling)
1. Push the **Polkadot** `release-v0.8.26` branch to Github: `git push origin refs/heads/release-v0.8.26`
1. NOACTION: The current HEAD of the release-candidate branch is tagged `v0.8.26-rc1`
1. NOACTION: A draft release and runtime WASMs are created for this release-candidate automatically. A link to the draft
release will be linked in the internal Polkadot matrix channel.
1. NOACTION: A new Github issue is created containing a checklist of manual steps to be completed before we are
confident with the release. This will be linked in Matrix.
1. Complete the steps in the issue created in step 4, signing them off as completed
1. (optional) If a fix is required to the release-candidate:
1. Merge the fix with `master` first
2. Cherry-pick the commit from `master` to `release-v0.8.26`, fixing any
merge conflicts. Try to avoid unnecessarily bumping crates.
3. Push the release-candidate branch to Github - this is now the new release-
candidate
4. Depending on the cherry-picked changes, it may be necessary to perform some
or all of the manual tests again.
5. If there are **substrate** changes required, these should be cherry-picked to the substrate `polkadot-v0.8.26` branch and pushed, and the version of substrate used in **polkadot** updated using `cargo update -p sp-io`
7. Once happy with the release-candidate, tag the current top commit in the release candidate branch and push to Github: `git tag -s -m 'v0.8.26' v0.8.26; git push --tags`
9. NOACTION: The HEAD of the `release` branch will be tagged with `v0.8.26`,
and a final draft release will be created on Github.
1. Cherry-pick the commit from `master` to `release-v0.8.26`, fixing any merge conflicts. Try to avoid unnecessarily
bumping crates.
1. Push the release-candidate branch to Github - this is now the new release- candidate
1. Depending on the cherry-picked changes, it may be necessary to perform some or all of the manual tests again.
1. If there are **Substrate** changes required, these should be cherry-picked to the Substrate `polkadot-v0.8.26`
branch and pushed, and the version of Substrate used in **Polkadot** updated using `cargo update -p sp-io`
1. Once happy with the release-candidate, tag the current top commit in the release candidate branch and push to Github:
`git tag -s -m 'v0.8.26' v0.8.26; git push --tags`
1. NOACTION: The HEAD of the `release` branch will be tagged with `v0.8.26`, and a final draft release will be created
on Github.
### Security releases
## Security releases
Occasionally there may be changes that need to be made to the most recently
released version of Polkadot, without taking *every* change to `master` since
the last release. For example, in the event of a security vulnerability being
found, where releasing a fixed version is a matter of some expediency. In cases
like this, the fix should first be merged with master, cherry-picked to a branch
forked from `release`, tested, and then finally merged with `release`. A
sensible versioning scheme for changes like this is `vX.Y.Z-1`.
Occasionally there may be changes that need to be made to the most recently released version of Polkadot, without taking
*every* change to `master` since the last release. For example, in the event of a security vulnerability being found,
where releasing a fixed version is a matter of some expediency. In cases like this, the fix should first be merged with
master, cherry-picked to a branch forked from `release`, tested, and then finally merged with `release`. A sensible
versioning scheme for changes like this is `vX.Y.Z-1`.
+33 -17
View File
@@ -1,6 +1,8 @@
# Using Containers
The following commands should work no matter if you use Docker or Podman. In general, Podman is recommended. All commands are "engine neutral" so you can use the container engine of your choice while still being able to copy/paste the commands below.
The following commands should work no matter if you use Docker or Podman. In general, Podman is recommended. All
commands are "engine neutral" so you can use the container engine of your choice while still being able to copy/paste
the commands below.
Let's start defining Podman as our engine:
```
@@ -14,11 +16,15 @@ ENGINE=docker
## The easiest way
The easiest/faster option to run Polkadot in Docker is to use the latest release images. These are small images that use the latest official release of the Polkadot binary, pulled from our Debian package.
The easiest/faster option to run Polkadot in Docker is to use the latest release images. These are small images that use
the latest official release of the Polkadot binary, pulled from our Debian package.
**_The following examples are running on westend chain and without SSL. They can be used to quick start and learn how Polkadot needs to be configured. Please find out how to secure your node, if you want to operate it on the internet. Do not expose RPC and WS ports, if they are not correctly configured._**
**_The following examples are running on westend chain and without SSL. They can be used to quick start and learn how
Polkadot needs to be configured. Please find out how to secure your node, if you want to operate it on the internet. Do
not expose RPC and WS ports, if they are not correctly configured._**
Let's first check the version we have. The first time you run this command, the Polkadot docker image will be downloaded. This takes a bit of time and bandwidth, be patient:
Let's first check the version we have. The first time you run this command, the Polkadot docker image will be
downloaded. This takes a bit of time and bandwidth, be patient:
```bash
$ENGINE run --rm -it parity/polkadot:latest --version
@@ -32,11 +38,14 @@ $ENGINE run --rm -it parity/polkadot:latest --chain westend --name "PolkaDocker"
## Examples
Once you are done experimenting and picking the best node name :) you can start Polkadot as daemon, exposes the Polkadot ports and mount a volume that will keep your blockchain data locally. Make sure that you set the ownership of your local directory to the Polkadot user that is used by the container.
Once you are done experimenting and picking the best node name :) you can start Polkadot as daemon, exposes the Polkadot
ports and mount a volume that will keep your blockchain data locally. Make sure that you set the ownership of your local
directory to the Polkadot user that is used by the container.
Set user id 1000 and group id 1000, by running `chown 1000.1000 /my/local/folder -R` if you use a bind mount.
To start a Polkadot node on default rpc port 9933 and default p2p port 30333 use the following command. If you want to connect to rpc port 9933, then must add Polkadot startup parameter: `--rpc-external`.
To start a Polkadot node on default rpc port 9933 and default p2p port 30333 use the following command. If you want to
connect to rpc port 9933, then must add Polkadot startup parameter: `--rpc-external`.
```bash
$ENGINE run -d -p 30333:30333 -p 9933:9933 \
@@ -82,7 +91,8 @@ services:
]
```
With following `docker-compose.yml` you can set up a node and use polkadot-js-apps as the front end on port 80. After starting the node use a browser and enter your Docker host IP in the URL field: _<http://[YOUR_DOCKER_HOST_IP]>_
With following `docker-compose.yml` you can set up a node and use `polkadot-js-apps` as the front end on port 80. After
starting the node use a browser and enter your Docker host IP in the URL field: _<http://[YOUR_DOCKER_HOST_IP]>_
```bash
version: '2'
@@ -117,12 +127,14 @@ services:
Chain syncing will utilize all available memory and CPU power your server has to offer, which can lead to crashing.
If running on a low resource VPS, use `--memory` and `--cpus` to limit the resources used. E.g. To allow a maximum of 512MB memory and 50% of 1 CPU, use `--cpus=".5" --memory="512m"`. Read more about limiting a container's resources [here](https://docs.docker.com/config/containers/resource_constraints).
If running on a low resource VPS, use `--memory` and `--cpus` to limit the resources used. E.g. To allow a maximum of
512MB memory and 50% of 1 CPU, use `--cpus=".5" --memory="512m"`. Read more about limiting a container's resources
[here](https://docs.docker.com/config/containers/resource_constraints).
## Build your own image
There are 3 options to build a polkadot container image:
There are 3 options to build a Polkadot container image:
- using the builder image
- using the injected "Debian" image
- using the generic injected image
@@ -131,27 +143,31 @@ There are 3 options to build a polkadot container image:
To get up and running with the smallest footprint on your system, you may use an existing Polkadot Container image.
You may also build a polkadot container image yourself (it takes a while...) using the container specs `scripts/ci/dockerfiles/polkadot/polkadot_builder.Dockerfile`.
You may also build a Polkadot container image yourself (it takes a while...) using the container specs
`scripts/ci/dockerfiles/polkadot/polkadot_builder.Dockerfile`.
### Debian injected
The Debian injected image is how the official polkadot container image is produced. It relies on the Debian package that is published upon each release. The Debian injected image is usually available a few minutes after a new release is published.
It has the benefit of relying on the GPG signatures embedded in the Debian package.
The Debian injected image is how the official Polkadot container image is produced. It relies on the Debian package that
is published upon each release. The Debian injected image is usually available a few minutes after a new release is
published. It has the benefit of relying on the GPG signatures embedded in the Debian package.
### Generic injected
For simple testing purposes, the easiest option for polkadot and also random binaries, is to use the `binary_injected.Dockerfile` container spec. This option is less secure since the injected binary is not checked at all but it has the benefit to be simple. This option requires to already have a valid `polkadot` binary, compiled for Linux.
For simple testing purposes, the easiest option for Polkadot and also random binaries, is to use the
`binary_injected.Dockerfile` container spec. This option is less secure since the injected binary is not checked at all
but it has the benefit to be simple. This option requires to already have a valid `polkadot` binary, compiled for Linux.
This binary is then simply copied inside the `parity/base-bin` image.
## Reporting issues
If you run into issues with Polkadot when using docker, please run the following command
(replace the tag with the appropriate one if you do not use latest):
If you run into issues with Polkadot when using docker, please run the following command (replace the tag with the
appropriate one if you do not use latest):
```bash
$ENGINE run --rm -it parity/polkadot:latest --version
```
This will show you the Polkadot version as well as the git commit ref that was used to build your container.
You can now paste the version information in a [new issue](https://github.com/paritytech/polkadot/issues/new/choose).
This will show you the Polkadot version as well as the git commit ref that was used to build your container. You can now
paste the version information in a [new issue](https://github.com/paritytech/polkadot/issues/new/choose).
+52 -59
View File
@@ -1,98 +1,91 @@
## Notes
# Notes
### Burn In
## Burn In
Ensure that Parity DevOps has run the new release on Westend, Kusama, and
Polkadot validators for at least 12 hours prior to publishing the release.
Ensure that Parity DevOps has run the new release on Westend, Kusama, and Polkadot validators for at least 12 hours
prior to publishing the release.
### Build Artifacts
## Build Artifacts
Add any necessary assets to the release. They should include:
- Linux binary
- GPG signature of the Linux binary
- SHA256 of binary
- Source code
- Wasm binaries of any runtimes
* Linux binary
* GPG signature of the Linux binary
* SHA256 of binary
* Source code
* Wasm binaries of any runtimes
### Release notes
## Release notes
The release notes should list:
- The priority of the release (i.e., how quickly users should upgrade) - this is
based on the max priority of any *client* changes.
- Which native runtimes and their versions are included
- The proposal hashes of the runtimes as built with
[srtool](https://gitlab.com/chevdor/srtool)
- Any changes in this release that are still awaiting audit
* The priority of the release (i.e., how quickly users should upgrade) - this is based on the max priority of any
*client* changes.
* Which native runtimes and their versions are included
* The proposal hashes of the runtimes as built with [srtool](https://gitlab.com/chevdor/srtool)
* Any changes in this release that are still awaiting audit
The release notes may also list:
- Free text at the beginning of the notes mentioning anything important
regarding this release
- Notable changes (those labelled with B[1-9]-* labels) separated into sections
* Free text at the beginning of the notes mentioning anything important regarding this release
* Notable changes (those labelled with B[1-9]-* labels) separated into sections
### Spec Version
## Spec Version
A runtime upgrade must bump the spec number. This may follow a pattern with the
client release (e.g. runtime v12 corresponds to v0.8.12, even if the current
runtime is not v11).
A runtime upgrade must bump the spec number. This may follow a pattern with the client release (e.g. runtime v12
corresponds to v0.8.12, even if the current runtime is not v11).
### Old Migrations Removed
## Old Migrations Removed
Any previous `on_runtime_upgrade` functions from old upgrades must be removed
to prevent them from executing a second time. The `on_runtime_upgrade` function
can be found in `runtime/<runtime>/src/lib.rs`.
Any previous `on_runtime_upgrade` functions from old upgrades must be removed to prevent them from executing a second
time. The `on_runtime_upgrade` function can be found in `runtime/<runtime>/src/lib.rs`.
### New Migrations
## New Migrations
Ensure that any migrations that are required due to storage or logic changes
are included in the `on_runtime_upgrade` function of the appropriate pallets.
Ensure that any migrations that are required due to storage or logic changes are included in the `on_runtime_upgrade`
function of the appropriate pallets.
### Extrinsic Ordering
## Extrinsic Ordering
Offline signing libraries depend on a consistent ordering of call indices and
functions. Compare the metadata of the current and new runtimes and ensure that
the `module index, call index` tuples map to the same set of functions. In case
Offline signing libraries depend on a consistent ordering of call indices and functions. Compare the metadata of the
current and new runtimes and ensure that the `module index, call index` tuples map to the same set of functions. In case
of a breaking change, increase `transaction_version`.
To verify the order has not changed, you may manually start the following [Github Action](https://github.com/paritytech/polkadot/actions/workflows/extrinsic-ordering-check-from-bin.yml). It takes around a minute to run and will produce the report as artifact you need to manually check.
To verify the order has not changed, you may manually start the following [Github
Action](https://github.com/paritytech/polkadot/actions/workflows/extrinsic-ordering-check-from-bin.yml). It takes around
a minute to run and will produce the report as artifact you need to manually check.
The things to look for in the output are lines like:
- `[Identity] idx 28 -> 25 (calls 15)` - indicates the index for `Identity` has changed
- `[+] Society, Recovery` - indicates the new version includes 2 additional modules/pallets.
- If no indices have changed, every modules line should look something like `[Identity] idx 25 (calls 15)`
* `[Identity] idx 28 -> 25 (calls 15)` - indicates the index for `Identity` has changed
* `[+] Society, Recovery` - indicates the new version includes 2 additional modules/pallets.
* If no indices have changed, every modules line should look something like `[Identity] idx 25 (calls 15)`
Note: Adding new functions to the runtime does not constitute a breaking change
as long as the indexes did not change.
Note: Adding new functions to the runtime does not constitute a breaking change as long as the indexes did not change.
### Proxy Filtering
## Proxy Filtering
The runtime contains proxy filters that map proxy types to allowable calls. If
the new runtime contains any new calls, verify that the proxy filters are up to
date to include them.
The runtime contains proxy filters that map proxy types to allowable calls. If the new runtime contains any new calls,
verify that the proxy filters are up to date to include them.
### Benchmarks
## Benchmarks
There are three benchmarking machines reserved for updating the weights at
release-time. To initialise a benchmark run for each production runtime
(westend, kusama, polkadot):
There are three benchmarking machines reserved for updating the weights at release-time. To initialise a benchmark run
for each production runtime (`westend`, `kusama`, `polkadot`):
* Go to https://gitlab.parity.io/parity/polkadot/-/pipelines?page=1&scope=branches&ref=master
* Click the link to the last pipeline run for master
* Start each of the manual jobs:
* 'update_westend_weights'
* 'update_polkadot_weights'
* 'update_kusama_weights'
* When these jobs have completed (it takes a few hours), a git PATCH file will
be available to download as an artifact.
* `update_westend_weights`
* `update_polkadot_weights`
* `update_kusama_weights`
* When these jobs have completed (it takes a few hours), a git PATCH file will be available to download as an artifact.
* On your local machine, branch off master
* Download the patch file and apply it to your branch with `git patch patchfile.patch`
* Commit the changes to your branch and submit a PR against master
* The weights should be (Currently manually) checked to make sure there are no
big outliers (i.e., twice or half the weight).
* The weights should be (Currently manually) checked to make sure there are no big outliers (i.e., twice or half the
weight).
### Polkadot JS
## Polkadot JS
Ensure that a release of [Polkadot JS API]() contains any new types or
interfaces necessary to interact with the new runtime.
Ensure that a release of [Polkadot JS API](https://github.com/polkadot-js/api) contains any new types or interfaces
necessary to interact with the new runtime.
+4 -2
View File
@@ -1,6 +1,7 @@
# Shell completion
The Polkadot CLI command supports shell auto-completion. For this to work, you will need to run the completion script matching you build and system.
The Polkadot CLI command supports shell auto-completion. For this to work, you will need to run the completion script
matching you build and system.
Assuming you built a release version using `cargo build --release` and use `bash` run the following:
@@ -30,7 +31,8 @@ source $HOME/.bash_profile
## Update
When you build a new version of Polkadot, the following will ensure you auto-completion script matches the current binary:
When you build a new version of Polkadot, the following will ensure you auto-completion script matches the current
binary:
```bash
COMPL_DIR=$HOME/.completion
+50 -58
View File
@@ -4,7 +4,7 @@ Automated testing is an essential tool to assure correctness.
## Scopes
The testing strategy for polkadot is 4-fold:
The testing strategy for Polkadot is 4-fold:
### Unit testing (1)
@@ -16,18 +16,15 @@ There are two variants of integration tests:
#### Subsystem tests (2)
One particular subsystem (subsystem under test) interacts with a
mocked overseer that is made to assert incoming and outgoing messages
of the subsystem under test.
This is largely present today, but has some fragmentation in the evolved
integration test implementation. A `proc-macro`/`macro_rules` would allow
for more consistent implementation and structure.
One particular subsystem (subsystem under test) interacts with a mocked overseer that is made to assert incoming and
outgoing messages of the subsystem under test. This is largely present today, but has some fragmentation in the evolved
integration test implementation. A `proc-macro`/`macro_rules` would allow for more consistent implementation and
structure.
#### Behavior tests (3)
Launching small scale networks, with multiple adversarial nodes without any further tooling required.
This should include tests around the thresholds in order to evaluate the error handling once certain
assumed invariants fail.
Launching small scale networks, with multiple adversarial nodes without any further tooling required. This should
include tests around the thresholds in order to evaluate the error handling once certain assumed invariants fail.
For this purpose based on `AllSubsystems` and `proc-macro` `AllSubsystemsGen`.
@@ -35,18 +32,14 @@ This assumes a simplistic test runtime.
#### Testing at scale (4)
Launching many nodes with configurable network speed and node features in a cluster of nodes.
At this scale the [Simnet][simnet] comes into play which launches a full cluster of nodes.
The scale is handled by spawning a kubernetes cluster and the meta description
is covered by [Gurke][Gurke].
Asserts are made using Grafana rules, based on the existing prometheus metrics. This can
be extended by adding an additional service translating `jaeger` spans into addition
prometheus avoiding additional polkadot source changes.
Launching many nodes with configurable network speed and node features in a cluster of nodes. At this scale the
[Simnet][simnet] comes into play which launches a full cluster of nodes. The scale is handled by spawning a kubernetes
cluster and the meta description is covered by [Gurke][Gurke]. Asserts are made using Grafana rules, based on the
existing prometheus metrics. This can be extended by adding an additional service translating `jaeger` spans into
addition prometheus avoiding additional Polkadot source changes.
_Behavior tests_ and _testing at scale_ have naturally soft boundary.
The most significant difference is the presence of a real network and
the number of nodes, since a single host often not capable to run
multiple nodes at once.
_Behavior tests_ and _testing at scale_ have naturally soft boundary. The most significant difference is the presence of
a real network and the number of nodes, since a single host often not capable to run multiple nodes at once.
---
@@ -54,8 +47,8 @@ multiple nodes at once.
Coverage gives a _hint_ of the actually covered source lines by tests and test applications.
The state of the art is currently [tarpaulin][tarpaulin] which unfortunately yields a
lot of false negatives. Lines that are in fact covered, marked as uncovered due to a mere linebreak in a statement can cause these artifacts. This leads to
The state of the art is currently [tarpaulin][tarpaulin] which unfortunately yields a lot of false negatives. Lines that
are in fact covered, marked as uncovered due to a mere linebreak in a statement can cause these artifacts. This leads to
lower coverage percentages than there actually is.
Since late 2020 rust has gained [MIR based coverage tooling](
@@ -97,9 +90,11 @@ The test coverage in `lcov` can the be published to <https://codecov.io>.
bash <(curl -s https://codecov.io/bash) -f lcov.info
```
or just printed as part of the PR using a github action i.e. [`jest-lcov-reporter`](https://github.com/marketplace/actions/jest-lcov-reporter).
or just printed as part of the PR using a github action i.e.
[`jest-lcov-reporter`](https://github.com/marketplace/actions/jest-lcov-reporter).
For full examples on how to use [`grcov` /w polkadot specifics see the github repo](https://github.com/mozilla/grcov#coverallscodecov-output).
For full examples on how to use [`grcov` /w Polkadot specifics see the github
repo](https://github.com/mozilla/grcov#coverallscodecov-output).
## Fuzzing
@@ -109,11 +104,12 @@ Currently implemented fuzzing targets:
* `erasure-coding`
The tooling of choice here is `honggfuzz-rs` as it allows _fastest_ coverage according to "some paper" which is a positive feature when run as part of PRs.
The tooling of choice here is `honggfuzz-rs` as it allows _fastest_ coverage according to "some paper" which is a
positive feature when run as part of PRs.
Fuzzing is generally not applicable for data secured by cryptographic hashes or signatures. Either the input has to be specifically crafted, such that the discarded input
percentage stays in an acceptable range.
System level fuzzing is hence simply not feasible due to the amount of state that is required.
Fuzzing is generally not applicable for data secured by cryptographic hashes or signatures. Either the input has to be
specifically crafted, such that the discarded input percentage stays in an acceptable range. System level fuzzing is
hence simply not feasible due to the amount of state that is required.
Other candidates to implement fuzzing are:
@@ -128,14 +124,17 @@ There are various ways of performance metrics.
* cache hits/misses w/ `iai` harness or `criterion-perf`
* `coz` a performance based compiler
Most of them are standard tools to aid in the creation of statistical tests regarding change in time of certain unit tests.
Most of them are standard tools to aid in the creation of statistical tests regarding change in time of certain unit
tests.
`coz` is meant for runtime. In our case, the system is far too large to yield a sufficient number of measurements in finite time.
An alternative approach could be to record incoming package streams per subsystem and store dumps of them, which in return could be replayed repeatedly at an
accelerated speed, with which enough metrics could be obtained to yield
information on which areas would improve the metrics.
This unfortunately will not yield much information, since most if not all of the subsystem code is linear based on the input to generate one or multiple output messages, it is unlikely to get any useful metrics without mocking a sufficiently large part of the other subsystem which overlaps with [#Integration tests] which is unfortunately not repeatable as of now.
As such the effort gain seems low and this is not pursued at the current time.
`coz` is meant for runtime. In our case, the system is far too large to yield a sufficient number of measurements in
finite time. An alternative approach could be to record incoming package streams per subsystem and store dumps of them,
which in return could be replayed repeatedly at an accelerated speed, with which enough metrics could be obtained to
yield information on which areas would improve the metrics. This unfortunately will not yield much information, since
most if not all of the subsystem code is linear based on the input to generate one or multiple output messages, it is
unlikely to get any useful metrics without mocking a sufficiently large part of the other subsystem which overlaps with
[#Integration tests] which is unfortunately not repeatable as of now. As such the effort gain seems low and this is not
pursued at the current time.
## Writing small scope integration tests with preconfigured workers
@@ -152,29 +151,25 @@ Requirements:
### Goals
The main goals are is to allow creating a test node which
exhibits a certain behavior by utilizing a subset of _wrapped_ or _replaced_ subsystems easily.
The runtime must not matter at all for these tests and should be simplistic.
The execution must be fast, this mostly means to assure a close to zero network latency as
well as shorting the block time and epoch times down to a few `100ms` and a few dozend blocks per epoch.
The main goals are is to allow creating a test node which exhibits a certain behavior by utilizing a subset of _wrapped_
or _replaced_ subsystems easily. The runtime must not matter at all for these tests and should be simplistic. The
execution must be fast, this mostly means to assure a close to zero network latency as well as shorting the block time
and epoch times down to a few `100ms` and a few dozend blocks per epoch.
### Approach
#### MVP
A simple small scale builder pattern would suffice for stage one implementation of allowing to
replace individual subsystems.
An alternative would be to harness the existing `AllSubsystems` type
and replace the subsystems as needed.
A simple small scale builder pattern would suffice for stage one implementation of allowing to replace individual
subsystems. An alternative would be to harness the existing `AllSubsystems` type and replace the subsystems as needed.
#### Full `proc-macro` implementation
`Overseer` is a common pattern.
It could be extracted as `proc` macro and generative `proc-macro`.
This would replace the `AllSubsystems` type as well as implicitly create
the `AllMessages` enum as `AllSubsystemsGen` does today.
`Overseer` is a common pattern. It could be extracted as `proc` macro and generative `proc-macro`. This would replace
the `AllSubsystems` type as well as implicitly create the `AllMessages` enum as `AllSubsystemsGen` does today.
The implementation is yet to be completed, see the [implementation PR](https://github.com/paritytech/polkadot/pull/2962) for details.
The implementation is yet to be completed, see the [implementation PR](https://github.com/paritytech/polkadot/pull/2962)
for details.
##### Declare an overseer implementation
@@ -233,19 +228,16 @@ fn main() -> eyre::Result<()> {
#### Simnet
Spawn a kubernetes cluster based on a meta description using [Gurke] with the
[Simnet] scripts.
Spawn a kubernetes cluster based on a meta description using [Gurke] with the [Simnet] scripts.
Coordinated attacks of multiple nodes or subsystems must be made possible via
a side-channel, that is out of scope for this document.
Coordinated attacks of multiple nodes or subsystems must be made possible via a side-channel, that is out of scope for
this document.
The individual node configurations are done as targets with a particular
builder configuration.
The individual node configurations are done as targets with a particular builder configuration.
#### Behavior tests w/o Simnet
Commonly this will require multiple nodes, and most machines are limited to
running two or three nodes concurrently.
Commonly this will require multiple nodes, and most machines are limited to running two or three nodes concurrently.
Hence, this is not the common case and is just an implementation _idea_.
```rust
+4 -4
View File
@@ -1,10 +1,10 @@
### Run benches
# Run benches
```
$ cd erasure-coding # ensure you are in the right directory
$ cargo bench
cd erasure-coding # ensure you are in the right directory
cargo bench
```
### `scaling_with_validators`
## `scaling_with_validators`
This benchmark evaluates the performance of constructing the chunks and the erasure root from PoV and
reconstructing the PoV from chunks. You can see the results of running this bench on 5950x below.
+26 -26
View File
@@ -1,35 +1,35 @@
# Do I need this ?
Polkadot nodes collect and produce Prometheus metrics and logs. These include health, performance and debug
information such as last finalized block, height of the chain, and many other deeper implementation details
of the Polkadot/Substrate node subsystems. These are crucial pieces of information that one needs to successfully
Polkadot nodes collect and produce Prometheus metrics and logs. These include health, performance and debug
information such as last finalized block, height of the chain, and many other deeper implementation details
of the Polkadot/Substrate node subsystems. These are crucial pieces of information that one needs to successfully
monitor the liveliness and performance of a network and its validators.
# How does it work ?
Just import the dashboard JSON files from this folder in your Grafana installation. All dashboards are grouped in
Just import the dashboard JSON files from this folder in your Grafana installation. All dashboards are grouped in
folder percategory (like for example `parachains`). The files have been created by Grafana export functionality and
follow the data model specified [here](https://grafana.com/docs/grafana/latest/dashboards/json-model/).
We aim to keep the dashboards here in sync with the implementation, except dashboards for development and
We aim to keep the dashboards here in sync with the implementation, except dashboards for development and
testing.
# Contributing
**Your contributions are most welcome!**
**Your contributions are most welcome!**
Please make sure to follow the following design guidelines:
- Add a new entry in this file and describe the usecase and key metrics
- Ensure proper names and descriptions for dashboard panels and add relevant documentation when needed.
This is very important as not all users have similar depth of understanding of the implementation
- Ensure proper names and descriptions for dashboard panels and add relevant documentation when needed.
This is very important as not all users have similar depth of understanding of the implementation
- Have labels for axis
- All values have proper units of measurement
- A crisp and clear color scheme is used
# Prerequisites
Before you continue make sure you have Grafana set up, or otherwise follow this
[guide](https://wiki.polkadot.network/docs/maintain-guides-how-to-monitor-your-node).
Before you continue make sure you have Grafana set up, or otherwise follow this
[guide](https://wiki.polkadot.network/docs/maintain-guides-how-to-monitor-your-node).
You might also need to [setup Loki](https://grafana.com/go/webinar/loki-getting-started/).
@@ -44,7 +44,7 @@ This section is a list of dashboards, their use case as well as the key metrics
## Node Versions
Useful for monitoring versions and logs of validator nodes. Includes time series panels that
Useful for monitoring versions and logs of validator nodes. Includes time series panels that
track node warning and error log rates. These can be further investigated in Grafana Loki.
Requires Loki for log aggregation and querying.
@@ -64,30 +64,30 @@ It includes panels covering key subsystems of the parachain node side implementa
- Disputes coordinator
- Chain selection
It is important to note that this dashboard applies only for validator nodes. The prometheus
queries assume the `instance` label value contains the string `validator` only for validator nodes.
It is important to note that this dashboard applies only for validator nodes. The prometheus
queries assume the `instance` label value contains the string `validator` only for validator nodes.
[Dashboard JSON](parachains/status.json)
### Key liveliness indicators
- **Relay chain finality lag**. How far behind finality is compared to the current best block. By design,
GRANDPA never finalizes past last 2 blocks, so this value is always >=2 blocks.
- **Approval checking finality lag**. The distance (in blocks) between the chain head and the last block
on which Approval voting is happening. The block is generally the highest approved ancestor of the head
- **Approval checking finality lag**. The distance (in blocks) between the chain head and the last block
on which Approval voting is happening. The block is generally the highest approved ancestor of the head
block and the metric is computed during relay chain selection.
- **Disputes finality lag**. How far behind the chain head is the last approved and non disputed block.
This value is always higher than approval checking lag as it further restricts finality to only undisputed
- **Disputes finality lag**. How far behind the chain head is the last approved and non disputed block.
This value is always higher than approval checking lag as it further restricts finality to only undisputed
chains.
- **PVF preparation and execution time**. Each parachain has it's own PVF (parachain validation function):
a wasm blob that is executed by validators during backing, approval checking and disputing. The PVF
preparation time refers to the time it takes for the PVF wasm to be compiled. This step is done once and
then result cached. PVF execution will use the resulting artifact to execute the PVF for a given candidate.
PVFs are expected to have a limited execution time to ensure there is enough time left for the parachain
- **PVF preparation and execution time**. Each parachain has it's own PVF (parachain validation function):
a wasm blob that is executed by validators during backing, approval checking and disputing. The PVF
preparation time refers to the time it takes for the PVF wasm to be compiled. This step is done once and
then result cached. PVF execution will use the resulting artifact to execute the PVF for a given candidate.
PVFs are expected to have a limited execution time to ensure there is enough time left for the parachain
block to be included in the relay block.
- **Time to recover and check candidate**. This is part of approval voting and covers the time it takes
- **Time to recover and check candidate**. This is part of approval voting and covers the time it takes
to recover the candidate block available data from other validators, check it (includes PVF execution time)
and issue statement or initiate dispute.
- **Assignment delay tranches**. Approval voting is designed such that validators assigned to check a specific
candidate are split up into equal delay tranches (0.5 seconds each). All validators checks are ordered by the delay
tranche index. Early tranches of validators have the opportunity to check the candidate first before later tranches
- **Assignment delay tranches**. Approval voting is designed such that validators assigned to check a specific
candidate are split up into equal delay tranches (0.5 seconds each). All validators checks are ordered by the delay
tranche index. Early tranches of validators have the opportunity to check the candidate first before later tranches
that act as as backups in case of no shows.
+1 -1
View File
@@ -52,6 +52,6 @@ when providing to any of the log macros (`warn!`, `info!`, etc.).
The crate has to be used throughout the entire codebase to work consistently, to
disambiguate, the prefix `gum::` is used.
Feature parity with `tracing::{warn!,..}` is not desired. We want consistency
Feature Parity with `tracing::{warn!,..}` is not desired. We want consistency
more than anything. All currently used features _are_ supported with _gum_ as
well.
+2 -2
View File
@@ -18,7 +18,7 @@ defined in the [(DSL[(**D**omain **S**pecific **L**anguage)]) doc](https://parit
## Usage
> Assumes you already gained permissiones, ping in element @javier:matrix.parity.io to get access.
> Assumes you already gained permissiones, ping in element `@javier:matrix.parity.io` to get access.
> and you have cloned the [zombienet][zombienet] repo.
To launch a test case in the development cluster use (e.g. for the ./node/malus/integrationtests/0001-dispute-valid-block.toml):
@@ -48,7 +48,7 @@ This will also teardown the namespace after completion.
## Container Image Building Note
In order to build the container image you need to have the latest changes from
polkadot and substrate master branches.
Polkadot and Substrate master branches.
```sh
pwd # run this from the current dir
+1 -1
View File
@@ -1,4 +1,4 @@
# polkadot-node-metrics
# `polkadot-node-metrics`
## Testing
+1 -1
View File
@@ -1,4 +1,4 @@
# polkadot-test-service
# `polkadot-test-service`
## Testing
+2 -1
View File
@@ -1,3 +1,4 @@
# Test Parachains
Each parachain consists of three parts: a `#![no_std]` library with the main execution logic, a WASM crate which wraps this logic, and a collator node.
Each parachain consists of three parts: a `#![no_std]` library with the main execution logic, a WASM crate which wraps
this logic, and a collator node.
@@ -19,7 +19,7 @@ Next start the collator that will collate for the adder parachain:
cargo run --release -p test-parachain-adder-collator -- --tmp --chain rococo-local --port 50553
```
The last step is to register the parachain using polkadot-js. The parachain id is
The last step is to register the parachain using `polkadot-js`. The parachain id is
100. The genesis state and the validation code are printed at startup by the collator.
To do this automatically, run `scripts/adder-collator.sh`.
@@ -1,5 +1,11 @@
# Preamble
This document aims to describe the purpose, functionality, and implementation of the host for Polkadot's _parachains_ functionality - that is, the software which provides security and advancement for constituent parachains. It is not for the implementer of a specific parachain but rather for the implementer of the Parachain Host. In practice, this is for the implementers of Polkadot in general.
This document aims to describe the purpose, functionality, and implementation of the host for Polkadot's _parachains_
functionality - that is, the software which provides security and advancement for constituent parachains. It is not for
the implementer of a specific parachain but rather for the implementer of the Parachain Host. In practice, this is for
the implementers of Polkadot in general.
There are a number of other documents describing the research in more detail. All referenced documents will be linked here and should be read alongside this document for the best understanding of the full picture. However, this is the only document which aims to describe key aspects of Polkadot's particular instantiation of much of that research down to low-level technical details and software architecture.
There are a number of other documents describing the research in more detail. All referenced documents will be linked
here and should be read alongside this document for the best understanding of the full picture. However, this is the
only document which aims to describe key aspects of Polkadot's particular instantiation of much of that research down to
low-level technical details and software architecture.
@@ -71,16 +71,16 @@
- [Chain Selection Request](node/utility/chain-selection.md)
- [PVF Pre-Checking](node/utility/pvf-prechecker.md)
- [Data Structures and Types](types/README.md)
- [Candidate](types/candidate.md)
- [Backing](types/backing.md)
- [Availability](types/availability.md)
- [Overseer and Subsystem Protocol](types/overseer-protocol.md)
- [Runtime](types/runtime.md)
- [Messages](types/messages.md)
- [Network](types/network.md)
- [Approvals](types/approval.md)
- [Disputes](types/disputes.md)
- [PVF Pre-checking](types/pvf-prechecking.md)
- [Candidate](types/candidate.md)
- [Backing](types/backing.md)
- [Availability](types/availability.md)
- [Overseer and Subsystem Protocol](types/overseer-protocol.md)
- [Runtime](types/runtime.md)
- [Messages](types/messages.md)
- [Network](types/network.md)
- [Approvals](types/approval.md)
- [Disputes](types/disputes.md)
- [PVF Pre-checking](types/pvf-prechecking.md)
[Glossary](glossary.md)
[Further Reading](further-reading.md)
@@ -1,8 +1,13 @@
# Architecture Overview
This section aims to describe, at a high level, the code architecture and subsystems involved in the implementation of an individual Parachain Host. It also illuminates certain subtleties and challenges faced in the design and implementation of those subsystems.
This section aims to describe, at a high level, the code architecture and subsystems involved in the implementation of
an individual Parachain Host. It also illuminates certain subtleties and challenges faced in the design and
implementation of those subsystems.
To recap, Polkadot includes a blockchain known as the relay-chain. A blockchain is a Directed Acyclic Graph (DAG) of state transitions, where every block can be considered to be the head of a linked-list (known as a "chain" or "fork") with a cumulative state which is determined by applying the state transition of each block in turn. All paths through the DAG terminate at the Genesis Block. In fact, the blockchain is a tree, since each block can have only one parent.
To recap, Polkadot includes a blockchain known as the relay-chain. A blockchain is a Directed Acyclic Graph (DAG) of
state transitions, where every block can be considered to be the head of a linked-list (known as a "chain" or "fork")
with a cumulative state which is determined by applying the state transition of each block in turn. All paths through
the DAG terminate at the Genesis Block. In fact, the blockchain is a tree, since each block can have only one parent.
```dot process
digraph {
@@ -22,16 +27,25 @@ digraph {
}
```
A blockchain network is comprised of nodes. These nodes each have a view of many different forks of a blockchain and must decide which forks to follow and what actions to take based on the forks of the chain that they are aware of.
A blockchain network is comprised of nodes. These nodes each have a view of many different forks of a blockchain and
must decide which forks to follow and what actions to take based on the forks of the chain that they are aware of.
So in specifying an architecture to carry out the functionality of a Parachain Host, we have to answer two categories of questions:
So in specifying an architecture to carry out the functionality of a Parachain Host, we have to answer two categories of
questions:
1. What is the state-transition function of the blockchain? What is necessary for a transition to be considered valid, and what information is carried within the implicit state of a block?
1. Being aware of various forks of the blockchain as well as global private state such as a view of the current time, what behaviors should a node undertake? What information should a node extract from the state of which forks, and how should that information be used?
1. What is the state-transition function of the blockchain? What is necessary for a transition to be considered valid,
and what information is carried within the implicit state of a block?
1. Being aware of various forks of the blockchain as well as global private state such as a view of the current time,
what behaviors should a node undertake? What information should a node extract from the state of which forks, and how
should that information be used?
The first category of questions will be addressed by the Runtime, which defines the state-transition logic of the chain. Runtime logic only has to focus on the perspective of one chain, as each state has only a single parent state.
The first category of questions will be addressed by the Runtime, which defines the state-transition logic of the chain.
Runtime logic only has to focus on the perspective of one chain, as each state has only a single parent state.
The second category of questions addressed by Node-side behavior. Node-side behavior defines all activities that a node undertakes, given its view of the blockchain/block-DAG. Node-side behavior can take into account all or many of the forks of the blockchain, and only conditionally undertake certain activities based on which forks it is aware of, as well as the state of the head of those forks.
The second category of questions addressed by Node-side behavior. Node-side behavior defines all activities that a node
undertakes, given its view of the blockchain/block-DAG. Node-side behavior can take into account all or many of the
forks of the blockchain, and only conditionally undertake certain activities based on which forks it is aware of, as
well as the state of the head of those forks.
```dot process
digraph G {
@@ -46,7 +60,13 @@ digraph G {
```
It is also helpful to divide Node-side behavior into two further categories: Networking and Core. Networking behaviors relate to how information is distributed between nodes. Core behaviors relate to internal work that a specific node does. These two categories of behavior often interact, but can be heavily abstracted from each other. Core behaviors care that information is distributed and received, but not the internal details of how distribution and receipt function. Networking behaviors act on requests for distribution or fetching of information, but are not concerned with how the information is used afterwards. This allows us to create clean boundaries between Core and Networking activities, improving the modularity of the code.
It is also helpful to divide Node-side behavior into two further categories: Networking and Core. Networking behaviors
relate to how information is distributed between nodes. Core behaviors relate to internal work that a specific node
does. These two categories of behavior often interact, but can be heavily abstracted from each other. Core behaviors
care that information is distributed and received, but not the internal details of how distribution and receipt
function. Networking behaviors act on requests for distribution or fetching of information, but are not concerned with
how the information is used afterwards. This allows us to create clean boundaries between Core and Networking
activities, improving the modularity of the code.
```text
___________________ ____________________
@@ -65,8 +85,18 @@ It is also helpful to divide Node-side behavior into two further categories: Net
```
Node-side behavior is split up into various subsystems. Subsystems are long-lived workers that perform a particular category of work. Subsystems can communicate with each other, and do so via an [Overseer](node/overseer.md) that prevents race conditions.
Node-side behavior is split up into various subsystems. Subsystems are long-lived workers that perform a particular
category of work. Subsystems can communicate with each other, and do so via an [Overseer](node/overseer.md) that
prevents race conditions.
Runtime logic is divided up into Modules and APIs. Modules encapsulate particular behavior of the system. Modules consist of storage, routines, and entry-points. Routines are invoked by entry points, by other modules, upon block initialization or closing. Routines can read and alter the storage of the module. Entry-points are the means by which new information is introduced to a module and can limit the origins (user, root, parachain) that they accept being called by. Each block in the blockchain contains a set of Extrinsics. Each extrinsic targets a a specific entry point to trigger and which data should be passed to it. Runtime APIs provide a means for Node-side behavior to extract meaningful information from the state of a single fork.
Runtime logic is divided up into Modules and APIs. Modules encapsulate particular behavior of the system. Modules
consist of storage, routines, and entry-points. Routines are invoked by entry points, by other modules, upon block
initialization or closing. Routines can read and alter the storage of the module. Entry-points are the means by which
new information is introduced to a module and can limit the origins (user, root, parachain) that they accept being
called by. Each block in the blockchain contains a set of Extrinsics. Each extrinsic targets a a specific entry point to
trigger and which data should be passed to it. Runtime APIs provide a means for Node-side behavior to extract meaningful
information from the state of a single fork.
These two aspects of the implementation are heavily dependent on each other. The Runtime depends on Node-side behavior to author blocks, and to include Extrinsics which trigger the correct entry points. The Node-side behavior relies on Runtime APIs to extract information necessary to determine which actions to take.
These two aspects of the implementation are heavily dependent on each other. The Runtime depends on Node-side behavior
to author blocks, and to include Extrinsics which trigger the correct entry points. The Node-side behavior relies on
Runtime APIs to extract information necessary to determine which actions to take.
@@ -70,10 +70,12 @@ stateDiagram-v2
## Conditional formulation
The set of validators eligible to vote consists of
the validators that had duty at the time of backing, plus backing votes by the backing validators.
The set of validators eligible to vote consists of the validators that had duty at the time of backing, plus backing
votes by the backing validators.
If a validator receives an initial dispute message (a set of votes where there are at least two opposing votes contained), and the PoV or Code are hence not reconstructable from local storage, that validator must request the required data from its peers.
If a validator receives an initial dispute message (a set of votes where there are at least two opposing votes
contained), and the PoV or Code are hence not reconstructable from local storage, that validator must request the
required data from its peers.
The dispute availability message must contain code, persisted validation data, and the proof of validity.
@@ -81,9 +83,11 @@ Only peers that already voted shall be queried for the dispute availability data
The peer to be queried for disputes data, must be picked at random.
A validator must retain code, persisted validation data and PoV until a block, that contains the dispute resolution, is finalized - plus an additional 24 hours.
A validator must retain code, persisted validation data and PoV until a block, that contains the dispute resolution, is
finalized - plus an additional 24 hours.
Dispute availability gossip must continue beyond the dispute resolution, until the post resolution timeout expired (equiv to the timeout until which additional late votes are accepted).
Dispute availability gossip must continue beyond the dispute resolution, until the post resolution timeout expired
(equiv to the timeout until which additional late votes are accepted).
Remote disputes are disputes that are in relation to a chain that is not part of the local validators active heads.
@@ -93,32 +97,42 @@ Persisted votes stay persisted for `N` sessions, and are cleaned up on a per ses
Votes must be queryable by a particular validator, identified by its signing key.
Votes must be queryable by a particular validator, identified by a session index and the validator index valid in that session.
Votes must be queryable by a particular validator, identified by a session index and the validator index valid in that
session.
If there exists a negative and a positive vote for a particular block, a dispute is detected.
If a dispute is detected, all currently available votes for that block must be gossiped.
If an incoming dispute vote is detected, a validator must cast their own vote. The vote is determined by validating the PoV with the Code at the time of backing the block in question.
If an incoming dispute vote is detected, a validator must cast their own vote. The vote is determined by validating the
PoV with the Code at the time of backing the block in question.
If the validator was also a backer of the block, validation and casting an additional vote should be skipped.
If the count of votes pro or cons regarding the disputed block, reaches the required ⅔ supermajority (including the backing votes), the conclusion must be recorded on chain and the voters on the loosing and no-shows being slashed appropriately.
If the count of votes pro or cons regarding the disputed block, reaches the required ⅔ supermajority (including the
backing votes), the conclusion must be recorded on chain and the voters on the loosing and no-shows being slashed
appropriately.
If a block is found invalid by a dispute resolution, it must be blacklisted to avoid resync or further build on that chain if other chains are available (to be detailed in the grandpa fork choice rule).
If a block is found invalid by a dispute resolution, it must be blacklisted to avoid resync or further build on that
chain if other chains are available (to be detailed in the grandpa fork choice rule).
A dispute accepts Votes after the dispute is resolved, for 1 day.
If a vote is received, after the dispute is resolved, the vote shall still be recorded in the state root, albeit yielding less reward.
If a vote is received, after the dispute is resolved, the vote shall still be recorded in the state root, albeit
yielding less reward.
Recording in the state root might happen batched, at timeout expiry.
If a new active head/chain appears, and the dispute resolution was not recorded on that chain yet, the dispute resolution or open dispute must be recorded / transplanted to that chain as well, since the disputes must be present on all chains to make sure the offender is punished.
If a new active head/chain appears, and the dispute resolution was not recorded on that chain yet, the dispute
resolution or open dispute must be recorded / transplanted to that chain as well, since the disputes must be present on
all chains to make sure the offender is punished.
If a validator votes in two opposing ways, this composes of a double vote like in other cases (backing, approval voting).
If a validator votes in two opposing ways, this composes of a double vote like in other cases (backing, approval
voting).
If a dispute is not resolved within due time, all validators are to be slashed for a small amount.
If a dispute is not resolved within due time, governance mode shall be entered for manual resolution.
If a validator unexpectedly restarts, the dispute shall be continued with the state based on votes being cast and being present in persistent storage.
If a validator unexpectedly restarts, the dispute shall be continued with the state based on votes being cast and being
present in persistent storage.
@@ -2,46 +2,72 @@
Here you can find definitions of a bunch of jargon, usually specific to the Polkadot project.
- **Approval Checker:** A validator who randomly self-selects so to perform validity checks on a parablock which is pending approval.
- **BABE:** (Blind Assignment for Blockchain Extension). The algorithm validators use to safely extend the Relay Chain. See [the Polkadot wiki][0] for more information.
- **Backable Candidate:** A Parachain Candidate which is backed by a majority of validators assigned to a given parachain.
- **Approval Checker:** A validator who randomly self-selects so to perform validity checks on a parablock which is
pending approval.
- **BABE:** (Blind Assignment for Blockchain Extension). The algorithm validators use to safely extend the Relay Chain.
See [the Polkadot wiki][0] for more information.
- **Backable Candidate:** A Parachain Candidate which is backed by a majority of validators assigned to a given
parachain.
- **Backed Candidate:** A Backable Candidate noted in a relay-chain block
- **Backing:** A set of statements proving that a Parachain Candidate is backable.
- **Collator:** A node who generates Proofs-of-Validity (PoV) for blocks of a specific parachain.
- **DMP:** (Downward Message Passing). Message passing from the relay-chain to a parachain. Also there is a runtime parachains module with the same name.
- **DMQ:** (Downward Message Queue). A message queue for messages from the relay-chain down to a parachain. A parachain has
exactly one downward message queue.
- **Extrinsic:** An element of a relay-chain block which triggers a specific entry-point of a runtime module with given arguments.
- **GRANDPA:** (Ghost-based Recursive ANcestor Deriving Prefix Agreement). The algorithm validators use to guarantee finality of the Relay Chain.
- **HRMP:** (Horizontally Relay-routed Message Passing). A mechanism for message passing between parachains (hence horizontal) that leverages the relay-chain storage. Predates XCMP. Also there is a runtime parachains module with the same name.
- **Inclusion Pipeline:** The set of steps taken to carry a Parachain Candidate from authoring, to backing, to availability and full inclusion in an active fork of its parachain.
- **DMP:** (Downward Message Passing). Message passing from the relay-chain to a parachain. Also there is a runtime
parachains module with the same name.
- **DMQ:** (Downward Message Queue). A message queue for messages from the relay-chain down to a parachain. A parachain
has exactly one downward message queue.
- **Extrinsic:** An element of a relay-chain block which triggers a specific entry-point of a runtime module with given
arguments.
- **GRANDPA:** (Ghost-based Recursive ANcestor Deriving Prefix Agreement). The algorithm validators use to guarantee
finality of the Relay Chain.
- **HRMP:** (Horizontally Relay-routed Message Passing). A mechanism for message passing between parachains (hence
horizontal) that leverages the relay-chain storage. Predates XCMP. Also there is a runtime parachains module with the
same name.
- **Inclusion Pipeline:** The set of steps taken to carry a Parachain Candidate from authoring, to backing, to
availability and full inclusion in an active fork of its parachain.
- **Module:** A component of the Runtime logic, encapsulating storage, routines, and entry-points.
- **Module Entry Point:** A recipient of new information presented to the Runtime. This may trigger routines.
- **Module Routine:** A piece of code executed within a module by block initialization, closing, or upon an entry point being triggered. This may execute computation, and read or write storage.
- **MQC:** (Message Queue Chain). A cryptographic data structure that resembles an append-only linked list which doesn't store original values but only their hashes. The whole structure is described by a single hash, referred as a "head". When a value is appended, it's contents hashed with the previous head creating a hash that becomes a new head.
- **Node:** A participant in the Polkadot network, who follows the protocols of communication and connection to other nodes. Nodes form a peer-to-peer network topology without a central authority.
- **Module Routine:** A piece of code executed within a module by block initialization, closing, or upon an entry point
being triggered. This may execute computation, and read or write storage.
- **MQC:** (Message Queue Chain). A cryptographic data structure that resembles an append-only linked list which doesn't
store original values but only their hashes. The whole structure is described by a single hash, referred as a "head".
When a value is appended, it's contents hashed with the previous head creating a hash that becomes a new head.
- **Node:** A participant in the Polkadot network, who follows the protocols of communication and connection to other
nodes. Nodes form a peer-to-peer network topology without a central authority.
- **Parachain Candidate, or Candidate:** A proposed block for inclusion into a parachain.
- **Parablock:** A block in a parachain.
- **Parachain:** A constituent chain secured by the Relay Chain's validators.
- **Parachain Validators:** A subset of validators assigned during a period of time to back candidates for a specific parachain
- **Parachain Validators:** A subset of validators assigned during a period of time to back candidates for a specific
parachain
- **On-demand parachain:** A parachain which is scheduled on a pay-as-you-go basis.
- **Lease holding parachain:** A parachain possessing an active slot lease. The lease holder is assigned a single availability core for the duration of the lease, granting consistent blockspace scheduling at the rate 1 parablock per relay block.
- **Lease holding parachain:** A parachain possessing an active slot lease. The lease holder is assigned a single
availability core for the duration of the lease, granting consistent blockspace scheduling at the rate 1 parablock per
relay block.
- **PDK (Parachain Development Kit):** A toolset that allows one to develop a parachain. Cumulus is a PDK.
- **Preimage:** In our context, if `H(X) = Y` where `H` is a hash function and `Y` is the hash, then `X` is the hash preimage.
- **Proof-of-Validity (PoV):** A stateless-client proof that a parachain candidate is valid, with respect to some validation function.
- **Preimage:** In our context, if `H(X) = Y` where `H` is a hash function and `Y` is the hash, then `X` is the hash
preimage.
- **Proof-of-Validity (PoV):** A stateless-client proof that a parachain candidate is valid, with respect to some
validation function.
- **PVF:** Parachain Validation Function. The validation code that is run by validators on parachains.
- **PVF Prechecking:** This is the process of initially checking the PVF when it is first added. We attempt preparation of the PVF and make sure it succeeds within a given timeout, plus some additional checks.
- **PVF Preparation:** This is the process of preparing the WASM blob and includes both prevalidation and compilation. As there is no prevalidation right now, preparation just consists of compilation.
- **Relay Parent:** A block in the relay chain, referred to in a context where work is being done in the context of the state at this block.
- **PVF Prechecking:** This is the process of initially checking the PVF when it is first added. We attempt preparation
of the PVF and make sure it succeeds within a given timeout, plus some additional checks.
- **PVF Preparation:** This is the process of preparing the WASM blob and includes both prevalidation and compilation.
As there is no prevalidation right now, preparation just consists of compilation.
- **Relay Parent:** A block in the relay chain, referred to in a context where work is being done in the context of the
state at this block.
- **Runtime:** The relay-chain state machine.
- **Runtime Module:** See Module.
- **Runtime API:** A means for the node-side behavior to access structured information based on the state of a fork of the blockchain.
- **Runtime API:** A means for the node-side behavior to access structured information based on the state of a fork of
the blockchain.
- **Subsystem:** A long-running task which is responsible for carrying out a particular category of work.
- **UMP:** (Upward Message Passing) A vertical message passing mechanism from a parachain to the relay chain.
- **Validator:** Specially-selected node in the network who is responsible for validating parachain blocks and issuing attestations about their validity.
- **Validator:** Specially-selected node in the network who is responsible for validating parachain blocks and issuing
attestations about their validity.
- **Validation Function:** A piece of Wasm code that describes the state-transition function of a parachain.
- **VMP:** (Vertical Message Passing) A family of mechanisms that are responsible for message exchange between the relay chain and parachains.
- **XCMP:** (Cross-Chain Message Passing) A type of horizontal message passing (i.e. between parachains) that allows secure message passing directly between parachains and has minimal resource requirements from the relay chain, thus highly scalable.
- **VMP:** (Vertical Message Passing) A family of mechanisms that are responsible for message exchange between the relay
chain and parachains.
- **XCMP:** (Cross-Chain Message Passing) A type of horizontal message passing (i.e. between parachains) that allows
secure message passing directly between parachains and has minimal resource requirements from the relay chain, thus
highly scalable.
## See Also
@@ -1,9 +1,9 @@
# Messaging Overview
The Polkadot Host has a few mechanisms that are responsible for message passing. They can be generally divided
on two categories: Horizontal and Vertical. Horizontal Message Passing (HMP) refers to mechanisms
that are responsible for exchanging messages between parachains. Vertical Message Passing (VMP) is
used for communication between the relay chain and parachains.
The Polkadot Host has a few mechanisms that are responsible for message passing. They can be generally divided on two
categories: Horizontal and Vertical. Horizontal Message Passing (HMP) refers to mechanisms that are responsible for
exchanging messages between parachains. Vertical Message Passing (VMP) is used for communication between the relay chain
and parachains.
## Vertical Message Passing
@@ -19,35 +19,34 @@ digraph {
Downward Message Passing (DMP) is a mechanism for delivering messages to parachains from the relay chain.
Each parachain has its own queue that stores all pending inbound downward messages. A parachain
doesn't have to process all messages at once, however, there are rules as to how the downward message queue
should be processed. Currently, at least one message must be consumed per candidate if the queue is not empty.
The downward message queue doesn't have a cap on its size and it is up to the relay-chain to put mechanisms
that prevent spamming in place.
Each parachain has its own queue that stores all pending inbound downward messages. A parachain doesn't have to process
all messages at once, however, there are rules as to how the downward message queue should be processed. Currently, at
least one message must be consumed per candidate if the queue is not empty. The downward message queue doesn't have a
cap on its size and it is up to the relay-chain to put mechanisms that prevent spamming in place.
Upward Message Passing (UMP) is a mechanism responsible for delivering messages in the opposite direction:
from a parachain up to the relay chain. Upward messages are essentially byte blobs. However, they are interpreted
by the relay-chain according to the XCM standard.
Upward Message Passing (UMP) is a mechanism responsible for delivering messages in the opposite direction: from a
parachain up to the relay chain. Upward messages are essentially byte blobs. However, they are interpreted by the
relay-chain according to the XCM standard.
The XCM standard is a common vocabulary of messages. The XCM standard doesn't require a particular interpretation of
a message. However, the parachains host (e.g. Polkadot) guarantees certain semantics for those.
The XCM standard is a common vocabulary of messages. The XCM standard doesn't require a particular interpretation of a
message. However, the parachains host (e.g. Polkadot) guarantees certain semantics for those.
Moreover, while most XCM messages are handled by the on-chain XCM interpreter, some of the messages are special
cased. Specifically, those messages can be checked during the acceptance criteria and thus invalid
messages would lead to rejecting the candidate itself.
Moreover, while most XCM messages are handled by the on-chain XCM interpreter, some of the messages are special cased.
Specifically, those messages can be checked during the acceptance criteria and thus invalid messages would lead to
rejecting the candidate itself.
One kind of such a message is `Xcm::Transact`. This upward message can be seen as a way for a parachain
to execute arbitrary entrypoints on the relay-chain. `Xcm::Transact` messages resemble regular extrinsics with the exception that they
originate from a parachain.
One kind of such a message is `Xcm::Transact`. This upward message can be seen as a way for a parachain to execute
arbitrary entrypoints on the relay-chain. `Xcm::Transact` messages resemble regular extrinsics with the exception that
they originate from a parachain.
The payload of `Xcm::Transact` messages is referred as to `Dispatchable`. When a candidate with such a message is enacted
the dispatchables are put into a queue corresponding to the parachain. There can be only so many dispatchables in that queue at once.
The weight that processing of the dispatchables can consume is limited by a preconfigured value. Therefore, it is possible
that some dispatchables will be left for later blocks. To make the dispatching more fair, the queues are processed turn-by-turn
in a round robin fashion.
The payload of `Xcm::Transact` messages is referred as to `Dispatchable`. When a candidate with such a message is
enacted the dispatchables are put into a queue corresponding to the parachain. There can be only so many dispatchables
in that queue at once. The weight that processing of the dispatchables can consume is limited by a preconfigured value.
Therefore, it is possible that some dispatchables will be left for later blocks. To make the dispatching more fair, the
queues are processed turn-by-turn in a round robin fashion.
The second category of special cased XCM messages are for horizontal messaging channel management,
namely messages meant to request opening and closing HRMP channels (HRMP will be described below).
The second category of special cased XCM messages are for horizontal messaging channel management, namely messages meant
to request opening and closing HRMP channels (HRMP will be described below).
## Horizontal Message Passing
@@ -77,29 +76,28 @@ The most important member of this family is XCMP.
> ️ XCMP is currently under construction and details are subject for change.
XCMP is a message passing mechanism between parachains that require minimal involvement of the relay chain.
The relay chain provides means for sending parachains to authenticate messages sent to recipient parachains.
XCMP is a message passing mechanism between parachains that require minimal involvement of the relay chain. The relay
chain provides means for sending parachains to authenticate messages sent to recipient parachains.
Semantically communication occurs through so called channels. A channel is unidirectional and it has
two endpoints, for sender and for recipient. A channel can be opened only if the both parties agree
and closed unilaterally.
Semantically communication occurs through so called channels. A channel is unidirectional and it has two endpoints, for
sender and for recipient. A channel can be opened only if the both parties agree and closed unilaterally.
Only the channel metadata is stored on the relay-chain in a very compact form: all messages and their
contents sent by the sender parachain are encoded using only one root hash. This root is referred as
MQC head.
Only the channel metadata is stored on the relay-chain in a very compact form: all messages and their contents sent by
the sender parachain are encoded using only one root hash. This root is referred as MQC head.
The authenticity of the messages must be proven using that root hash to the receiving party at the
candidate authoring time. The proof stems from the relay parent storage that contains the root hash of the channel.
Since not all messages are required to be processed by the receiver's candidate, only the processed
messages are supplied (i.e. preimages), rest are provided as hashes.
The authenticity of the messages must be proven using that root hash to the receiving party at the candidate authoring
time. The proof stems from the relay parent storage that contains the root hash of the channel. Since not all messages
are required to be processed by the receiver's candidate, only the processed messages are supplied (i.e. preimages),
rest are provided as hashes.
Further details can be found at the official repository for the
[Cross-Consensus Message Format (XCM)](https://github.com/paritytech/xcm-format/blob/master/README.md), as well as
at the [W3F research website](https://research.web3.foundation/en/latest/polkadot/XCMP.html) and
[this blogpost](https://medium.com/web3foundation/polkadots-messaging-scheme-b1ec560908b7).
Further details can be found at the official repository for the [Cross-Consensus Message Format
(XCM)](https://github.com/paritytech/xcm-format/blob/master/README.md), as well as at the [W3F research
website](https://research.web3.foundation/en/latest/polkadot/XCMP.html) and [this
blogpost](https://medium.com/web3foundation/polkadots-messaging-scheme-b1ec560908b7).
HRMP (Horizontally Relay-routed Message Passing) is a stop gap that predates XCMP. Semantically, it mimics XCMP's interface.
The crucial difference from XCMP though is that all the messages are stored in the relay-chain storage. That makes
things simple but at the same time that makes HRMP more demanding in terms of resources thus making it more expensive.
HRMP (Horizontally Relay-routed Message Passing) is a stop gap that predates XCMP. Semantically, it mimics XCMP's
interface. The crucial difference from XCMP though is that all the messages are stored in the relay-chain storage. That
makes things simple but at the same time that makes HRMP more demanding in terms of resources thus making it more
expensive.
Once XCMP is available we expect to retire HRMP.
@@ -2,30 +2,49 @@
## Design Goals
* Modularity: Components of the system should be as self-contained as possible. Communication boundaries between components should be well-defined and mockable. This is key to creating testable, easily reviewable code.
* Minimizing side effects: Components of the system should aim to minimize side effects and to communicate with other components via message-passing.
* Operational Safety: The software will be managing signing keys where conflicting messages can lead to large amounts of value to be slashed. Care should be taken to ensure that no messages are signed incorrectly or in conflict with each other.
* Modularity: Components of the system should be as self-contained as possible. Communication boundaries between
components should be well-defined and mockable. This is key to creating testable, easily reviewable code.
* Minimizing side effects: Components of the system should aim to minimize side effects and to communicate with other
components via message-passing.
* Operational Safety: The software will be managing signing keys where conflicting messages can lead to large amounts of
value to be slashed. Care should be taken to ensure that no messages are signed incorrectly or in conflict with each
other.
The architecture of the node-side behavior aims to embody the Rust principles of ownership and message-passing to create clean, isolatable code. Each resource should have a single owner, with minimal sharing where unavoidable.
The architecture of the node-side behavior aims to embody the Rust principles of ownership and message-passing to create
clean, isolatable code. Each resource should have a single owner, with minimal sharing where unavoidable.
Many operations that need to be carried out involve the network, which is asynchronous. This asynchrony affects all core subsystems that rely on the network as well. The approach of hierarchical state machines is well-suited to this kind of environment.
Many operations that need to be carried out involve the network, which is asynchronous. This asynchrony affects all core
subsystems that rely on the network as well. The approach of hierarchical state machines is well-suited to this kind of
environment.
We introduce
## Components
The node architecture consists of the following components:
* The Overseer (and subsystems): A hierarchy of state machines where an overseer supervises subsystems. Subsystems can contain their own internal hierarchy of jobs. This is elaborated on in the next section on Subsystems.
* The Overseer (and subsystems): A hierarchy of state machines where an overseer supervises subsystems. Subsystems can
contain their own internal hierarchy of jobs. This is elaborated on in the next section on Subsystems.
* A block proposer: Logic triggered by the consensus algorithm of the chain when the node should author a block.
* A GRANDPA voting rule: A strategy for selecting chains to vote on in the GRANDPA algorithm to ensure that only valid parachain candidates appear in finalized relay-chain blocks.
* A GRANDPA voting rule: A strategy for selecting chains to vote on in the GRANDPA algorithm to ensure that only valid
parachain candidates appear in finalized relay-chain blocks.
## Assumptions
The Node-side code comes with a set of assumptions that we build upon. These assumptions encompass most of the fundamental blockchain functionality.
The Node-side code comes with a set of assumptions that we build upon. These assumptions encompass most of the
fundamental blockchain functionality.
We assume the following constraints regarding provided basic functionality:
* The underlying **consensus** algorithm, whether it is BABE or SASSAFRAS is implemented.
* There is a **chain synchronization** protocol which will search for and download the longest available chains at all times.
* The **state** of all blocks at the head of the chain is available. There may be **state pruning** such that state of the last `k` blocks behind the last finalized block are available, as well as the state of all their descendants. This assumption implies that the state of all active leaves and their last `k` ancestors are all available. The underlying implementation is expected to support `k` of a few hundred blocks, but we reduce this to a very conservative `k=5` for our purposes.
* There is an underlying **networking** framework which provides **peer discovery** services which will provide us with peers and will not create "loopback" connections to our own node. The number of peers we will have is assumed to be bounded at 1000.
* There is a **transaction pool** and a **transaction propagation** mechanism which maintains a set of current transactions and distributes to connected peers. Current transactions are those which are not outdated relative to some "best" fork of the chain, which is part of the active heads, and have not been included in the best fork.
* There is a **chain synchronization** protocol which will search for and download the longest available chains at all
times.
* The **state** of all blocks at the head of the chain is available. There may be **state pruning** such that state of
the last `k` blocks behind the last finalized block are available, as well as the state of all their descendants.
This assumption implies that the state of all active leaves and their last `k` ancestors are all available. The
underlying implementation is expected to support `k` of a few hundred blocks, but we reduce this to a very
conservative `k=5` for our purposes.
* There is an underlying **networking** framework which provides **peer discovery** services which will provide us
with peers and will not create "loopback" connections to our own node. The number of peers we will have is assumed
to be bounded at 1000.
* There is a **transaction pool** and a **transaction propagation** mechanism which maintains a set of current
transactions and distributes to connected peers. Current transactions are those which are not outdated relative to
some "best" fork of the chain, which is part of the active heads, and have not been included in the best fork.
@@ -2,6 +2,9 @@
The approval subsystems implement the node-side of the [Approval Protocol](../../protocol-approval.md).
We make a divide between the [assignment/voting logic](approval-voting.md) and the [distribution logic](approval-distribution.md) that distributes assignment certifications and approval votes. The logic in the assignment and voting also informs the GRANDPA voting rule on how to vote.
We make a divide between the [assignment/voting logic](approval-voting.md) and the [distribution
logic](approval-distribution.md) that distributes assignment certifications and approval votes. The logic in the
assignment and voting also informs the GRANDPA voting rule on how to vote.
These subsystems are intended to flag issues and begin participating in live disputes. Dispute subsystems also track all observed votes (backing, approval, and dispute-specific) by all validators on all candidates.
These subsystems are intended to flag issues and begin participating in live disputes. Dispute subsystems also track all
observed votes (backing, approval, and dispute-specific) by all validators on all candidates.
@@ -2,50 +2,73 @@
A subsystem for the distribution of assignments and approvals for approval checks on candidates over the network.
The [Approval Voting](approval-voting.md) subsystem is responsible for active participation in a protocol designed to select a sufficient number of validators to check each and every candidate which appears in the relay chain. Statements of participation in this checking process are divided into two kinds:
- **Assignments** indicate that validators have been selected to do checking
- **Approvals** indicate that validators have checked and found the candidate satisfactory.
The [Approval Voting](approval-voting.md) subsystem is responsible for active participation in a protocol designed to
select a sufficient number of validators to check each and every candidate which appears in the relay chain. Statements
of participation in this checking process are divided into two kinds:
* **Assignments** indicate that validators have been selected to do checking
* **Approvals** indicate that validators have checked and found the candidate satisfactory.
The [Approval Voting](approval-voting.md) subsystem handles all the issuing and tallying of this protocol, but this subsystem is responsible for the disbursal of statements among the validator-set.
The [Approval Voting](approval-voting.md) subsystem handles all the issuing and tallying of this protocol, but this
subsystem is responsible for the disbursal of statements among the validator-set.
The inclusion pipeline of candidates concludes after availability, and only after inclusion do candidates actually get pushed into the approval checking pipeline. As such, this protocol deals with the candidates _made available by_ particular blocks, as opposed to the candidates which actually appear within those blocks, which are the candidates _backed by_ those blocks. Unless stated otherwise, whenever we reference a candidate partially by block hash, we are referring to the set of candidates _made available by_ those blocks.
The inclusion pipeline of candidates concludes after availability, and only after inclusion do candidates actually get
pushed into the approval checking pipeline. As such, this protocol deals with the candidates _made available by_
particular blocks, as opposed to the candidates which actually appear within those blocks, which are the candidates
_backed by_ those blocks. Unless stated otherwise, whenever we reference a candidate partially by block hash, we are
referring to the set of candidates _made available by_ those blocks.
We implement this protocol as a gossip protocol, and like other parachain-related gossip protocols our primary concerns are about ensuring fast message propagation while maintaining an upper bound on the number of messages any given node must store at any time.
We implement this protocol as a gossip protocol, and like other parachain-related gossip protocols our primary concerns
are about ensuring fast message propagation while maintaining an upper bound on the number of messages any given node
must store at any time.
Approval messages should always follow assignments, so we need to be able to discern two pieces of information based on our [View](../../types/network.md#universal-types):
Approval messages should always follow assignments, so we need to be able to discern two pieces of information based on
our [View](../../types/network.md#universal-types):
1. Is a particular assignment relevant under a given `View`?
2. Is a particular approval relevant to any assignment in a set?
For our own local view, these two queries must not yield false negatives. When applied to our peers' views, it is acceptable for them to yield false negatives. The reason for that is that our peers' views may be beyond ours, and we are not capable of fully evaluating them. Once we have caught up, we can check again for false negatives to continue distributing.
For our own local view, these two queries must not yield false negatives. When applied to our peers' views, it is
acceptable for them to yield false negatives. The reason for that is that our peers' views may be beyond ours, and we
are not capable of fully evaluating them. Once we have caught up, we can check again for false negatives to continue
distributing.
For assignments, what we need to be checking is whether we are aware of the (block, candidate) pair that the assignment references. For approvals, we need to be aware of an assignment by the same validator which references the candidate being approved.
For assignments, what we need to be checking is whether we are aware of the (block, candidate) pair that the assignment
references. For approvals, we need to be aware of an assignment by the same validator which references the candidate
being approved.
However, awareness on its own of a (block, candidate) pair would imply that even ancient candidates all the way back to the genesis are relevant. We are actually not interested in anything before finality.
However, awareness on its own of a (block, candidate) pair would imply that even ancient candidates all the way back to
the genesis are relevant. We are actually not interested in anything before finality.
We gossip assignments along a grid topology produced by the [Gossip Support Subsystem](../utility/gossip-support.md) and also to a few random peers. The first time we accept an assignment or approval, regardless of the source, which originates from a validator peer in a shared dimension of the grid, we propagate the message to validator peers in the unshared dimension as well as a few random peers.
We gossip assignments along a grid topology produced by the [Gossip Support Subsystem](../utility/gossip-support.md) and
also to a few random peers. The first time we accept an assignment or approval, regardless of the source, which
originates from a validator peer in a shared dimension of the grid, we propagate the message to validator peers in the
unshared dimension as well as a few random peers.
But, in case these mechanisms don't work on their own, we need to trade bandwidth for protocol liveness by introducing aggression.
But, in case these mechanisms don't work on their own, we need to trade bandwidth for protocol liveness by introducing
aggression.
Aggression has 3 levels:
Aggression Level 0: The basic behaviors described above.
Aggression Level 1: The originator of a message sends to all peers. Other peers follow the rules above.
Aggression Level 2: All peers send all messages to all their row and column neighbors. This means that each validator will, on average, receive each message approximately 2*sqrt(n) times.
* Aggression Level 0: The basic behaviors described above.
* Aggression Level 1: The originator of a message sends to all peers. Other peers follow the rules above.
* Aggression Level 2: All peers send all messages to all their row and column neighbors. This means that each validator
will, on average, receive each message approximately 2*sqrt(n) times.
These aggression levels are chosen based on how long a block has taken to finalize: assignments and approvals related to the unfinalized block will be propagated with more aggression. In particular, it's only the earliest unfinalized blocks that aggression should be applied to, because descendants may be unfinalized only by virtue of being descendants.
These aggression levels are chosen based on how long a block has taken to finalize: assignments and approvals related to
the unfinalized block will be propagated with more aggression. In particular, it's only the earliest unfinalized blocks
that aggression should be applied to, because descendants may be unfinalized only by virtue of being descendants.
## Protocol
Input:
- `ApprovalDistributionMessage::NewBlocks`
- `ApprovalDistributionMessage::DistributeAssignment`
- `ApprovalDistributionMessage::DistributeApproval`
- `ApprovalDistributionMessage::NetworkBridgeUpdate`
- `OverseerSignal::BlockFinalized`
* `ApprovalDistributionMessage::NewBlocks`
* `ApprovalDistributionMessage::DistributeAssignment`
* `ApprovalDistributionMessage::DistributeApproval`
* `ApprovalDistributionMessage::NetworkBridgeUpdate`
* `OverseerSignal::BlockFinalized`
Output:
- `ApprovalVotingMessage::CheckAndImportAssignment`
- `ApprovalVotingMessage::CheckAndImportApproval`
- `NetworkBridgeMessage::SendValidationMessage::ApprovalDistribution`
* `ApprovalVotingMessage::CheckAndImportAssignment`
* `ApprovalVotingMessage::CheckAndImportApproval`
* `NetworkBridgeMessage::SendValidationMessage::ApprovalDistribution`
## Functionality
@@ -134,28 +157,37 @@ Iterate over every `BlockEntry` and remove `PeerId` from it.
#### `NetworkBridgeEvent::OurViewChange`
Remove entries in `pending_known` for all hashes not present in the view.
Ensure a vector is present in `pending_known` for each hash in the view that does not have an entry in `blocks`.
Remove entries in `pending_known` for all hashes not present in the view. Ensure a vector is present in `pending_known`
for each hash in the view that does not have an entry in `blocks`.
#### `NetworkBridgeEvent::PeerViewChange`
Invoke `unify_with_peer(peer, view)` to catch them up to messages we have.
We also need to use the `view.finalized_number` to remove the `PeerId` from any blocks that it won't be wanting information about anymore. Note that we have to be on guard for peers doing crazy stuff like jumping their `finalized_number` forward 10 trillion blocks to try and get us stuck in a loop for ages.
We also need to use the `view.finalized_number` to remove the `PeerId` from any blocks that it won't be wanting
information about anymore. Note that we have to be on guard for peers doing crazy stuff like jumping their
`finalized_number` forward 10 trillion blocks to try and get us stuck in a loop for ages.
One of the safeguards we can implement is to reject view updates from peers where the new `finalized_number` is less than the previous.
One of the safeguards we can implement is to reject view updates from peers where the new `finalized_number` is less
than the previous.
We augment that by defining `constrain(x)` to output the x bounded by the first and last numbers in `state.blocks_by_number`.
We augment that by defining `constrain(x)` to output the x bounded by the first and last numbers in
`state.blocks_by_number`.
From there, we can loop backwards from `constrain(view.finalized_number)` until `constrain(last_view.finalized_number)` is reached, removing the `PeerId` from all `BlockEntry`s referenced at that height. We can break the loop early if we ever exit the bound supplied by the first block in `state.blocks_by_number`.
From there, we can loop backwards from `constrain(view.finalized_number)` until `constrain(last_view.finalized_number)`
is reached, removing the `PeerId` from all `BlockEntry`s referenced at that height. We can break the loop early if we
ever exit the bound supplied by the first block in `state.blocks_by_number`.
#### `NetworkBridgeEvent::PeerMessage`
If the block hash referenced by the message exists in `pending_known`, add it to the vector of pending messages and return.
If the block hash referenced by the message exists in `pending_known`, add it to the vector of pending messages and
return.
If the message is of type `ApprovalDistributionV1Message::Assignment(assignment_cert, claimed_index)`, then call `import_and_circulate_assignment(MessageSource::Peer(sender), assignment_cert, claimed_index)`
If the message is of type `ApprovalDistributionV1Message::Assignment(assignment_cert, claimed_index)`, then call
`import_and_circulate_assignment(MessageSource::Peer(sender), assignment_cert, claimed_index)`
If the message is of type `ApprovalDistributionV1Message::Approval(approval_vote)`, then call `import_and_circulate_approval(MessageSource::Peer(sender), approval_vote)`
If the message is of type `ApprovalDistributionV1Message::Approval(approval_vote)`, then call
`import_and_circulate_approval(MessageSource::Peer(sender), approval_vote)`
### Subsystem Updates
@@ -164,7 +196,8 @@ If the message is of type `ApprovalDistributionV1Message::Approval(approval_vote
Create `BlockEntry` and `CandidateEntries` for all blocks.
For all entries in `pending_known`:
* If there is now an entry under `blocks` for the block hash, drain all messages and import with `import_and_circulate_assignment` and `import_and_circulate_approval`.
* If there is now an entry under `blocks` for the block hash, drain all messages and import with
`import_and_circulate_assignment` and `import_and_circulate_approval`.
For all peers:
* Compute `view_intersection` as the intersection of the peer's view blocks with the hashes of the new blocks.
@@ -180,7 +213,8 @@ Call `import_and_circulate_approval` with `MessageSource::Local`.
#### `OverseerSignal::BlockFinalized`
Prune all lists from `blocks_by_number` with number less than or equal to `finalized_number`. Prune all the `BlockEntry`s referenced by those lists.
Prune all lists from `blocks_by_number` with number less than or equal to `finalized_number`. Prune all the
`BlockEntry`s referenced by those lists.
### Utility
@@ -192,9 +226,14 @@ enum MessageSource {
}
```
#### `import_and_circulate_assignment(source: MessageSource, assignment: IndirectAssignmentCert, claimed_candidate_index: CandidateIndex)`
#### `import_and_circulate_assignment(...)`
Imports an assignment cert referenced by block hash and candidate index. As a postcondition, if the cert is valid, it will have distributed the cert to all peers who have the block in their view, with the exclusion of the peer referenced by the `MessageSource`.
`import_and_circulate_assignment(source: MessageSource, assignment: IndirectAssignmentCert, claimed_candidate_index:
CandidateIndex)`
Imports an assignment cert referenced by block hash and candidate index. As a postcondition, if the cert is valid, it
will have distributed the cert to all peers who have the block in their view, with the exclusion of the peer referenced
by the `MessageSource`.
We maintain a few invariants:
* we only send an assignment to a peer after we add its fingerprint to our knowledge
@@ -202,61 +241,84 @@ We maintain a few invariants:
The algorithm is the following:
* Load the `BlockEntry` using `assignment.block_hash`. If it does not exist, report the source if it is `MessageSource::Peer` and return.
* Load the `BlockEntry` using `assignment.block_hash`. If it does not exist, report the source if it is
`MessageSource::Peer` and return.
* Compute a fingerprint for the `assignment` using `claimed_candidate_index`.
* If the source is `MessageSource::Peer(sender)`:
* check if `peer` appears under `known_by` and whether the fingerprint is in the knowledge of the peer. If the peer does not know the block, report for providing data out-of-view and proceed. If the peer does know the block and the `sent` knowledge contains the fingerprint, report for providing replicate data and return, otherwise, insert into the `received` knowledge and return.
* If the message fingerprint appears under the `BlockEntry`'s `Knowledge`, give the peer a small positive reputation boost,
add the fingerprint to the peer's knowledge only if it knows about the block and return.
Note that we must do this after checking for out-of-view and if the peers knows about the block to avoid being spammed.
If we did this check earlier, a peer could provide data out-of-view repeatedly and be rewarded for it.
* check if `peer` appears under `known_by` and whether the fingerprint is in the knowledge of the peer. If the peer
does not know the block, report for providing data out-of-view and proceed. If the peer does know the block and
the `sent` knowledge contains the fingerprint, report for providing replicate data and return, otherwise, insert
into the `received` knowledge and return.
* If the message fingerprint appears under the `BlockEntry`'s `Knowledge`, give the peer a small positive reputation
boost, add the fingerprint to the peer's knowledge only if it knows about the block and return. Note that we must do
this after checking for out-of-view and if the peers knows about the block to avoid being spammed. If we did this
check earlier, a peer could provide data out-of-view repeatedly and be rewarded for it.
* Dispatch `ApprovalVotingMessage::CheckAndImportAssignment(assignment)` and wait for the response.
* If the result is `AssignmentCheckResult::Accepted`
* If the vote was accepted but not duplicate, give the peer a positive reputation boost
* add the fingerprint to both our and the peer's knowledge in the `BlockEntry`. Note that we only doing this after making sure we have the right fingerprint.
* If the result is `AssignmentCheckResult::AcceptedDuplicate`, add the fingerprint to the peer's knowledge if it knows about the block and return.
* add the fingerprint to both our and the peer's knowledge in the `BlockEntry`. Note that we only doing this after
making sure we have the right fingerprint.
* If the result is `AssignmentCheckResult::AcceptedDuplicate`, add the fingerprint to the peer's knowledge if it
knows about the block and return.
* If the result is `AssignmentCheckResult::TooFarInFuture`, mildly punish the peer and return.
* If the result is `AssignmentCheckResult::Bad`, punish the peer and return.
* If the source is `MessageSource::Local(CandidateIndex)`
* check if the fingerprint appears under the `BlockEntry's` knowledge. If not, add it.
* Load the candidate entry for the given candidate index. It should exist unless there is a logic error in the approval voting subsystem.
* Set the approval state for the validator index to `ApprovalState::Assigned` unless the approval state is set already. This should not happen as long as the approval voting subsystem instructs us to ignore duplicate assignments.
* Dispatch a `ApprovalDistributionV1Message::Assignment(assignment, candidate_index)` to all peers in the `BlockEntry`'s `known_by` set, excluding the peer in the `source`, if `source` has kind `MessageSource::Peer`. Add the fingerprint of the assignment to the knowledge of each peer.
* Load the candidate entry for the given candidate index. It should exist unless there is a logic error in the
approval voting subsystem.
* Set the approval state for the validator index to `ApprovalState::Assigned` unless the approval state is set
already. This should not happen as long as the approval voting subsystem instructs us to ignore duplicate
assignments.
* Dispatch a `ApprovalDistributionV1Message::Assignment(assignment, candidate_index)` to all peers in the
`BlockEntry`'s `known_by` set, excluding the peer in the `source`, if `source` has kind `MessageSource::Peer`. Add
the fingerprint of the assignment to the knowledge of each peer.
#### `import_and_circulate_approval(source: MessageSource, approval: IndirectSignedApprovalVote)`
Imports an approval signature referenced by block hash and candidate index:
* Load the `BlockEntry` using `approval.block_hash` and the candidate entry using `approval.candidate_entry`. If either does not exist, report the source if it is `MessageSource::Peer` and return.
* Load the `BlockEntry` using `approval.block_hash` and the candidate entry using `approval.candidate_entry`. If
either does not exist, report the source if it is `MessageSource::Peer` and return.
* Compute a fingerprint for the approval.
* Compute a fingerprint for the corresponding assignment. If the `BlockEntry`'s knowledge does not contain that fingerprint, then report the source if it is `MessageSource::Peer` and return. All references to a fingerprint after this refer to the approval's, not the assignment's.
* Compute a fingerprint for the corresponding assignment. If the `BlockEntry`'s knowledge does not contain that
fingerprint, then report the source if it is `MessageSource::Peer` and return. All references to a fingerprint after
this refer to the approval's, not the assignment's.
* If the source is `MessageSource::Peer(sender)`:
* check if `peer` appears under `known_by` and whether the fingerprint is in the knowledge of the peer. If the peer does not know the block, report for providing data out-of-view and proceed. If the peer does know the block and the `sent` knowledge contains the fingerprint, report for providing replicate data and return, otherwise, insert into the `received` knowledge and return.
* If the message fingerprint appears under the `BlockEntry`'s `Knowledge`, give the peer a small positive reputation boost,
add the fingerprint to the peer's knowledge only if it knows about the block and return.
Note that we must do this after checking for out-of-view to avoid being spammed. If we did this check earlier, a peer could provide data out-of-view repeatedly and be rewarded for it.
* check if `peer` appears under `known_by` and whether the fingerprint is in the knowledge of the peer. If the peer
does not know the block, report for providing data out-of-view and proceed. If the peer does know the block and
the `sent` knowledge contains the fingerprint, report for providing replicate data and return, otherwise, insert
into the `received` knowledge and return.
* If the message fingerprint appears under the `BlockEntry`'s `Knowledge`, give the peer a small positive reputation
boost, add the fingerprint to the peer's knowledge only if it knows about the block and return. Note that we must do
this after checking for out-of-view to avoid being spammed. If we did this check earlier, a peer could provide data
out-of-view repeatedly and be rewarded for it.
* Dispatch `ApprovalVotingMessage::CheckAndImportApproval(approval)` and wait for the response.
* If the result is `VoteCheckResult::Accepted(())`:
* Give the peer a positive reputation boost and add the fingerprint to both our and the peer's knowledge.
* If the result is `VoteCheckResult::Bad`:
* Report the peer and return.
* Load the candidate entry for the given candidate index. It should exist unless there is a logic error in the approval voting subsystem.
* Set the approval state for the validator index to `ApprovalState::Approved`. It should already be in the `Assigned` state as our `BlockEntry` knowledge contains a fingerprint for the assignment.
* Dispatch a `ApprovalDistributionV1Message::Approval(approval)` to all peers in the `BlockEntry`'s `known_by` set, excluding the peer in the `source`, if `source` has kind `MessageSource::Peer`. Add the fingerprint of the assignment to the knowledge of each peer. Note that this obeys the politeness conditions:
* Load the candidate entry for the given candidate index. It should exist unless there is a logic error in the
approval voting subsystem.
* Set the approval state for the validator index to `ApprovalState::Approved`. It should already be in the `Assigned`
state as our `BlockEntry` knowledge contains a fingerprint for the assignment.
* Dispatch a `ApprovalDistributionV1Message::Approval(approval)` to all peers in the `BlockEntry`'s `known_by` set,
excluding the peer in the `source`, if `source` has kind `MessageSource::Peer`. Add the fingerprint of the
assignment to the knowledge of each peer. Note that this obeys the politeness conditions:
* We guarantee elsewhere that all peers within `known_by` are aware of all assignments relative to the block.
* We've checked that this specific approval has a corresponding assignment within the `BlockEntry`.
* Thus, all peers are aware of the assignment or have a message to them in-flight which will make them so.
#### `unify_with_peer(peer: PeerId, view)`:
#### `unify_with_peer(peer: PeerId, view)`
1. Initialize a set `missing_knowledge = {}`
For each block in the view:
2. Load the `BlockEntry` for the block. If the block is unknown, or the number is less than or equal to the view's finalized number go to step 6.
3. Inspect the `known_by` set of the `BlockEntry`. If the peer already knows all assignments/approvals, go to step 6.
4. Add the peer to `known_by` and add the hash and missing knowledge of the block to `missing_knowledge`.
5. Return to step 2 with the ancestor of the block.
1. Load the `BlockEntry` for the block. If the block is unknown, or the number is less than or equal to the view's
finalized number go to step 6.
1. Inspect the `known_by` set of the `BlockEntry`. If the peer already knows all assignments/approvals, go to step 6.
1. Add the peer to `known_by` and add the hash and missing knowledge of the block to `missing_knowledge`.
1. Return to step 2 with the ancestor of the block.
6. For each block in `missing_knowledge`, send all assignments and approvals for all candidates in those blocks to the peer.
1. For each block in `missing_knowledge`, send all assignments and approvals for all candidates in those blocks to the
peer.
@@ -1,35 +1,61 @@
# Approval Voting
Reading the [section on the approval protocol](../../protocol-approval.md) will likely be necessary to understand the aims of this subsystem.
Reading the [section on the approval protocol](../../protocol-approval.md) will likely be necessary to understand the
aims of this subsystem.
Approval votes are split into two parts: Assignments and Approvals. Validators first broadcast their assignment to indicate intent to check a candidate. Upon successfully checking, they broadcast an approval vote. If a validator doesn't broadcast their approval vote shortly after issuing an assignment, this is an indication that they are being prevented from recovering or validating the block data and that more validators should self-select to check the candidate. This is known as a "no-show".
Approval votes are split into two parts: Assignments and Approvals. Validators first broadcast their assignment to
indicate intent to check a candidate. Upon successfully checking, they broadcast an approval vote. If a validator
doesn't broadcast their approval vote shortly after issuing an assignment, this is an indication that they are being
prevented from recovering or validating the block data and that more validators should self-select to check the
candidate. This is known as a "no-show".
The core of this subsystem is a Tick-based timer loop, where Ticks are 500ms. We also reason about time in terms of `DelayTranche`s, which measure the number of ticks elapsed since a block was produced. We track metadata for all un-finalized but included candidates. We compute our local assignments to check each candidate, as well as which `DelayTranche` those assignments may be minimally triggered at. As the same candidate may appear in more than one block, we must produce our potential assignments for each (Block, Candidate) pair. The timing loop is based on waiting for assignments to become no-shows or waiting to broadcast and begin our own assignment to check.
The core of this subsystem is a Tick-based timer loop, where Ticks are 500ms. We also reason about time in terms of
`DelayTranche`s, which measure the number of ticks elapsed since a block was produced. We track metadata for all
un-finalized but included candidates. We compute our local assignments to check each candidate, as well as which
`DelayTranche` those assignments may be minimally triggered at. As the same candidate may appear in more than one block,
we must produce our potential assignments for each (Block, Candidate) pair. The timing loop is based on waiting for
assignments to become no-shows or waiting to broadcast and begin our own assignment to check.
Another main component of this subsystem is the logic for determining when a (Block, Candidate) pair has been approved and when to broadcast and trigger our own assignment. Once a (Block, Candidate) pair has been approved, we mark a corresponding bit in the `BlockEntry` that indicates the candidate has been approved under the block. When we trigger our own assignment, we broadcast it via Approval Distribution, begin fetching the data from Availability Recovery, and then pass it through to the Candidate Validation. Once these steps are successful, we issue our approval vote. If any of these steps fail, we don't issue any vote and will "no-show" from the perspective of other validators in addition a dispute is raised via the dispute-coordinator, by sending `IssueLocalStatement`.
Another main component of this subsystem is the logic for determining when a (Block, Candidate) pair has been approved
and when to broadcast and trigger our own assignment. Once a (Block, Candidate) pair has been approved, we mark a
corresponding bit in the `BlockEntry` that indicates the candidate has been approved under the block. When we trigger
our own assignment, we broadcast it via Approval Distribution, begin fetching the data from Availability Recovery, and
then pass it through to the Candidate Validation. Once these steps are successful, we issue our approval vote. If any of
these steps fail, we don't issue any vote and will "no-show" from the perspective of other validators in addition a
dispute is raised via the dispute-coordinator, by sending `IssueLocalStatement`.
Where this all fits into Polkadot is via block finality. Our goal is to not finalize any block containing a candidate that is not approved. We provide a hook for a custom GRANDPA voting rule - GRANDPA makes requests of the form (target, minimum) consisting of a target block (i.e. longest chain) that it would like to finalize, and a minimum block which, due to the rules of GRANDPA, must be voted on. The minimum is typically the last finalized block, but may be beyond it, in the case of having a last-round-estimate beyond the last finalized. Thus, our goal is to inform GRANDPA of some block between target and minimum which we believe can be finalized safely. We do this by iterating backwards from the target to the minimum and finding the longest continuous chain from minimum where all candidates included by those blocks have been approved.
Where this all fits into Polkadot is via block finality. Our goal is to not finalize any block containing a candidate
that is not approved. We provide a hook for a custom GRANDPA voting rule - GRANDPA makes requests of the form (target,
minimum) consisting of a target block (i.e. longest chain) that it would like to finalize, and a minimum block which,
due to the rules of GRANDPA, must be voted on. The minimum is typically the last finalized block, but may be beyond it,
in the case of having a last-round-estimate beyond the last finalized. Thus, our goal is to inform GRANDPA of some block
between target and minimum which we believe can be finalized safely. We do this by iterating backwards from the target
to the minimum and finding the longest continuous chain from minimum where all candidates included by those blocks have
been approved.
## Protocol
Input:
- `ApprovalVotingMessage::CheckAndImportAssignment`
- `ApprovalVotingMessage::CheckAndImportApproval`
- `ApprovalVotingMessage::ApprovedAncestor`
* `ApprovalVotingMessage::CheckAndImportAssignment`
* `ApprovalVotingMessage::CheckAndImportApproval`
* `ApprovalVotingMessage::ApprovedAncestor`
Output:
- `ApprovalDistributionMessage::DistributeAssignment`
- `ApprovalDistributionMessage::DistributeApproval`
- `RuntimeApiMessage::Request`
- `ChainApiMessage`
- `AvailabilityRecoveryMessage::Recover`
- `CandidateExecutionMessage::ValidateFromExhaustive`
* `ApprovalDistributionMessage::DistributeAssignment`
* `ApprovalDistributionMessage::DistributeApproval`
* `RuntimeApiMessage::Request`
* `ChainApiMessage`
* `AvailabilityRecoveryMessage::Recover`
* `CandidateExecutionMessage::ValidateFromExhaustive`
## Functionality
The approval voting subsystem is responsible for casting votes and determining approval of candidates and as a result, blocks.
The approval voting subsystem is responsible for casting votes and determining approval of candidates and as a result,
blocks.
This subsystem wraps a database which is used to store metadata about unfinalized blocks and the candidates within them. Candidates may appear in multiple blocks, and assignment criteria are chosen differently based on the hash of the block they appear in.
This subsystem wraps a database which is used to store metadata about unfinalized blocks and the candidates within them.
Candidates may appear in multiple blocks, and assignment criteria are chosen differently based on the hash of the block
they appear in.
## Database Schema
@@ -150,16 +176,22 @@ struct State {
}
```
This guide section makes no explicit references to writes to or reads from disk. Instead, it handles them implicitly, with the understanding that updates to block, candidate, and approval entries are persisted to disk.
This guide section makes no explicit references to writes to or reads from disk. Instead, it handles them implicitly,
with the understanding that updates to block, candidate, and approval entries are persisted to disk.
[`SessionInfo`](../../runtime/session_info.md)
On start-up, we clear everything currently stored by the database. This is done by loading the `StoredBlockRange`, iterating through each block number, iterating through each block hash, and iterating through each candidate referenced by each block. Although this is `O(o*n*p)`, we don't expect to have more than a few unfinalized blocks at any time and in extreme cases, a few thousand. The clearing operation should be relatively fast as a result.
On start-up, we clear everything currently stored by the database. This is done by loading the `StoredBlockRange`,
iterating through each block number, iterating through each block hash, and iterating through each candidate referenced
by each block. Although this is `O(o*n*p)`, we don't expect to have more than a few unfinalized blocks at any time and
in extreme cases, a few thousand. The clearing operation should be relatively fast as a result.
Main loop:
* Each iteration, select over all of
* The next `Tick` in `wakeups`: trigger `wakeup_process` for each `(Hash, Hash)` pair scheduled under the `Tick` and then remove all entries under the `Tick`.
* The next message from the overseer: handle the message as described in the [Incoming Messages section](#incoming-messages)
* The next `Tick` in `wakeups`: trigger `wakeup_process` for each `(Hash, Hash)` pair scheduled under the `Tick` and
then remove all entries under the `Tick`.
* The next message from the overseer: handle the message as described in the [Incoming Messages
section](#incoming-messages)
* The next approval vote request from `background_rx`
* If this is an `ApprovalVoteRequest`, [Issue an approval vote](#issue-approval-vote).
@@ -167,41 +199,84 @@ Main loop:
#### `OverseerSignal::BlockFinalized`
On receiving an `OverseerSignal::BlockFinalized(h)`, we fetch the block number `b` of that block from the `ChainApi` subsystem. We update our `StoredBlockRange` to begin at `b+1`. Additionally, we remove all block entries and candidates referenced by them up to and including `b`. Lastly, we prune out all descendants of `h` transitively: when we remove a `BlockEntry` with number `b` that is not equal to `h`, we recursively delete all the `BlockEntry`s referenced as children. We remove the `block_assignments` entry for the block hash and if `block_assignments` is now empty, remove the `CandidateEntry`. We also update each of the `BlockNumber -> Vec<Hash>` keys in the database to reflect the blocks at that height, clearing if empty.
On receiving an `OverseerSignal::BlockFinalized(h)`, we fetch the block number `b` of that block from the `ChainApi`
subsystem. We update our `StoredBlockRange` to begin at `b+1`. Additionally, we remove all block entries and candidates
referenced by them up to and including `b`. Lastly, we prune out all descendants of `h` transitively: when we remove a
`BlockEntry` with number `b` that is not equal to `h`, we recursively delete all the `BlockEntry`s referenced as
children. We remove the `block_assignments` entry for the block hash and if `block_assignments` is now empty, remove the
`CandidateEntry`. We also update each of the `BlockNumber -> Vec<Hash>` keys in the database to reflect the blocks at
that height, clearing if empty.
#### `OverseerSignal::ActiveLeavesUpdate`
On receiving an `OverseerSignal::ActiveLeavesUpdate(update)`:
* We determine the set of new blocks that were not in our previous view. This is done by querying the ancestry of all new items in the view and contrasting against the stored `BlockNumber`s. Typically, there will be only one new block. We fetch the headers and information on these blocks from the `ChainApi` subsystem. Stale leaves in the update can be ignored.
* We determine the set of new blocks that were not in our previous view. This is done by querying the ancestry of all
new items in the view and contrasting against the stored `BlockNumber`s. Typically, there will be only one new
block. We fetch the headers and information on these blocks from the `ChainApi` subsystem. Stale leaves in the
update can be ignored.
* We update the `StoredBlockRange` and the `BlockNumber` maps.
* We use the `RuntimeApiSubsystem` to determine information about these blocks. It is generally safe to assume that runtime state is available for recent, unfinalized blocks. In the case that it isn't, it means that we are catching up to the head of the chain and needn't worry about assignments to those blocks anyway, as the security assumption of the protocol tolerates nodes being temporarily offline or out-of-date.
* We fetch the set of candidates included by each block by dispatching a `RuntimeApiRequest::CandidateEvents` and checking the `CandidateIncluded` events.
* We fetch the session of the block by dispatching a `session_index_for_child` request with the parent-hash of the block.
* If the `session index - APPROVAL_SESSIONS > state.earliest_session`, then bump `state.earliest_sessions` to that amount and prune earlier sessions.
* If the session isn't in our `state.session_info`, load the session info for it and for all sessions since the earliest-session, including the earliest-session, if that is missing. And it can be, just after pruning, if we've done a big jump forward, as is the case when we've just finished chain synchronization.
* We use the `RuntimeApiSubsystem` to determine information about these blocks. It is generally safe to assume that
runtime state is available for recent, unfinalized blocks. In the case that it isn't, it means that we are catching
up to the head of the chain and needn't worry about assignments to those blocks anyway, as the security assumption
of the protocol tolerates nodes being temporarily offline or out-of-date.
* We fetch the set of candidates included by each block by dispatching a `RuntimeApiRequest::CandidateEvents` and
checking the `CandidateIncluded` events.
* We fetch the session of the block by dispatching a `session_index_for_child` request with the parent-hash of the
block.
* If the `session index - APPROVAL_SESSIONS > state.earliest_session`, then bump `state.earliest_sessions` to that
amount and prune earlier sessions.
* If the session isn't in our `state.session_info`, load the session info for it and for all sessions since the
earliest-session, including the earliest-session, if that is missing. And it can be, just after pruning, if we've
done a big jump forward, as is the case when we've just finished chain synchronization.
* If any of the runtime API calls fail, we just warn and skip the block.
* We use the `RuntimeApiSubsystem` to determine the set of candidates included in these blocks and use BABE logic to determine the slot number and VRF of the blocks.
* We also note how late we appear to have received the block. We create a `BlockEntry` for each block and a `CandidateEntry` for each candidate obtained from `CandidateIncluded` events after making a `RuntimeApiRequest::CandidateEvents` request.
* For each candidate, if the amount of needed approvals is more than the validators remaining after the backing group of the candidate is subtracted, then the candidate is insta-approved as approval would be impossible otherwise. If all candidates in the block are insta-approved, or there are no candidates in the block, then the block is insta-approved. If the block is insta-approved, a [`ChainSelectionMessage::Approved`][CSM] should be sent for the block.
* Ensure that the `CandidateEntry` contains a `block_assignments` entry for the block, with the correct backing group set.
* We use the `RuntimeApiSubsystem` to determine the set of candidates included in these blocks and use BABE logic to
determine the slot number and VRF of the blocks.
* We also note how late we appear to have received the block. We create a `BlockEntry` for each block and a
`CandidateEntry` for each candidate obtained from `CandidateIncluded` events after making a
`RuntimeApiRequest::CandidateEvents` request.
* For each candidate, if the amount of needed approvals is more than the validators remaining after the backing group
of the candidate is subtracted, then the candidate is insta-approved as approval would be impossible otherwise. If
all candidates in the block are insta-approved, or there are no candidates in the block, then the block is
insta-approved. If the block is insta-approved, a [`ChainSelectionMessage::Approved`][CSM] should be sent for the
block.
* Ensure that the `CandidateEntry` contains a `block_assignments` entry for the block, with the correct backing group
set.
* If a validator in this session, compute and assign `our_assignment` for the `block_assignments`
* Only if not a member of the backing group.
* Run `RelayVRFModulo` and `RelayVRFDelay` according to the [the approvals protocol section](../../protocol-approval.md#assignment-criteria). Ensure that the assigned core derived from the output is covered by the auxiliary signature aggregated in the `VRFPRoof`.
* [Handle Wakeup](#handle-wakeup) for each new candidate in each new block - this will automatically broadcast a 0-tranche assignment, kick off approval work, and schedule the next delay.
* Run `RelayVRFModulo` and `RelayVRFDelay` according to the [the approvals protocol
section](../../protocol-approval.md#assignment-criteria). Ensure that the assigned core derived from the output is
covered by the auxiliary signature aggregated in the `VRFPRoof`.
* [Handle Wakeup](#handle-wakeup) for each new candidate in each new block - this will automatically broadcast a
0-tranche assignment, kick off approval work, and schedule the next delay.
* Dispatch an `ApprovalDistributionMessage::NewBlocks` with the meta information filled out for each new block.
#### `ApprovalVotingMessage::CheckAndImportAssignment`
On receiving a `ApprovalVotingMessage::CheckAndImportAssignment` message, we check the assignment cert against the block entry. The cert itself contains information necessary to determine the candidate that is being assigned-to. In detail:
* Load the `BlockEntry` for the relay-parent referenced by the message. If there is none, return `AssignmentCheckResult::Bad`.
On receiving a `ApprovalVotingMessage::CheckAndImportAssignment` message, we check the assignment cert against the block
entry. The cert itself contains information necessary to determine the candidate that is being assigned-to. In detail:
* Load the `BlockEntry` for the relay-parent referenced by the message. If there is none, return
`AssignmentCheckResult::Bad`.
* Fetch the `SessionInfo` for the session of the block
* Determine the assignment key of the validator based on that.
* Determine the claimed core index by looking up the candidate with given index in `block_entry.candidates`. Return `AssignmentCheckResult::Bad` if missing.
* Determine the claimed core index by looking up the candidate with given index in `block_entry.candidates`. Return
`AssignmentCheckResult::Bad` if missing.
* Check the assignment cert
* If the cert kind is `RelayVRFModulo`, then the certificate is valid as long as `sample < session_info.relay_vrf_samples` and the VRF is valid for the validator's key with the input `block_entry.relay_vrf_story ++ sample.encode()` as described with [the approvals protocol section](../../protocol-approval.md#assignment-criteria). We set `core_index = vrf.make_bytes().to_u32() % session_info.n_cores`. If the `BlockEntry` causes inclusion of a candidate at `core_index`, then this is a valid assignment for the candidate at `core_index` and has delay tranche 0. Otherwise, it can be ignored.
* If the cert kind is `RelayVRFDelay`, then we check if the VRF is valid for the validator's key with the input `block_entry.relay_vrf_story ++ cert.core_index.encode()` as described in [the approvals protocol section](../../protocol-approval.md#assignment-criteria). The cert can be ignored if the block did not cause inclusion of a candidate on that core index. Otherwise, this is a valid assignment for the included candidate. The delay tranche for the assignment is determined by reducing `(vrf.make_bytes().to_u64() % (session_info.n_delay_tranches + session_info.zeroth_delay_tranche_width)).saturating_sub(session_info.zeroth_delay_tranche_width)`.
* We also check that the core index derived by the output is covered by the `VRFProof` by means of an auxiliary signature.
* If the cert kind is `RelayVRFModulo`, then the certificate is valid as long as `sample <
session_info.relay_vrf_samples` and the VRF is valid for the validator's key with the input
`block_entry.relay_vrf_story ++ sample.encode()` as described with [the approvals protocol
section](../../protocol-approval.md#assignment-criteria). We set `core_index = vrf.make_bytes().to_u32() %
session_info.n_cores`. If the `BlockEntry` causes inclusion of a candidate at `core_index`, then this is a valid
assignment for the candidate at `core_index` and has delay tranche 0. Otherwise, it can be ignored.
* If the cert kind is `RelayVRFDelay`, then we check if the VRF is valid for the validator's key with the input
`block_entry.relay_vrf_story ++ cert.core_index.encode()` as described in [the approvals protocol
section](../../protocol-approval.md#assignment-criteria). The cert can be ignored if the block did not cause
inclusion of a candidate on that core index. Otherwise, this is a valid assignment for the included candidate. The
delay tranche for the assignment is determined by reducing `(vrf.make_bytes().to_u64() %
(session_info.n_delay_tranches +
session_info.zeroth_delay_tranche_width)).saturating_sub(session_info.zeroth_delay_tranche_width)`.
* We also check that the core index derived by the output is covered by the `VRFProof` by means of an auxiliary
signature.
* If the delay tranche is too far in the future, return `AssignmentCheckResult::TooFarInFuture`.
* Import the assignment.
* Load the candidate in question and access the `approval_entry` for the block hash the cert references.
@@ -217,32 +292,41 @@ On receiving a `ApprovalVotingMessage::CheckAndImportAssignment` message, we che
On receiving a `CheckAndImportApproval(indirect_approval_vote, response_channel)` message:
* Fetch the `BlockEntry` from the indirect approval vote's `block_hash`. If none, return `ApprovalCheckResult::Bad`.
* Fetch the `CandidateEntry` from the indirect approval vote's `candidate_index`. If the block did not trigger inclusion of enough candidates, return `ApprovalCheckResult::Bad`.
* Construct a `SignedApprovalVote` using the candidate hash and check against the validator's approval key, based on the session info of the block. If invalid or no such validator, return `ApprovalCheckResult::Bad`.
* Fetch the `CandidateEntry` from the indirect approval vote's `candidate_index`. If the block did not trigger
inclusion of enough candidates, return `ApprovalCheckResult::Bad`.
* Construct a `SignedApprovalVote` using the candidate hash and check against the validator's approval key, based on
the session info of the block. If invalid or no such validator, return `ApprovalCheckResult::Bad`.
* Send `ApprovalCheckResult::Accepted`
* [Import the checked approval vote](#import-checked-approval)
#### `ApprovalVotingMessage::ApprovedAncestor`
On receiving an `ApprovedAncestor(Hash, BlockNumber, response_channel)`:
* Iterate over the ancestry of the hash all the way back to block number given, starting from the provided block hash. Load the `CandidateHash`es from each block entry.
* Iterate over the ancestry of the hash all the way back to block number given, starting from the provided block hash.
Load the `CandidateHash`es from each block entry.
* Keep track of an `all_approved_max: Option<(Hash, BlockNumber, Vec<(Hash, Vec<CandidateHash>))>`.
* For each block hash encountered, load the `BlockEntry` associated. If any are not found, return `None` on the response channel and conclude.
* If the block entry's `approval_bitfield` has all bits set to 1 and `all_approved_max == None`, set `all_approved_max = Some((current_hash, current_number))`.
* For each block hash encountered, load the `BlockEntry` associated. If any are not found, return `None` on the
response channel and conclude.
* If the block entry's `approval_bitfield` has all bits set to 1 and `all_approved_max == None`, set `all_approved_max
= Some((current_hash, current_number))`.
* If the block entry's `approval_bitfield` has any 0 bits, set `all_approved_max = None`.
* If `all_approved_max` is `Some`, push the current block hash and candidate hashes onto the list of blocks and candidates `all_approved_max`.
* If `all_approved_max` is `Some`, push the current block hash and candidate hashes onto the list of blocks and
candidates `all_approved_max`.
* After iterating all ancestry, return `all_approved_max`.
### Updates and Auxiliary Logic
#### Import Checked Approval
* Import an approval vote which we can assume to have passed signature checks and correspond to an imported assignment.
* Import an approval vote which we can assume to have passed signature checks and correspond to an imported
assignment.
* Requires `(BlockEntry, CandidateEntry, ValidatorIndex)`
* Set the corresponding bit of the `approvals` bitfield in the `CandidateEntry` to `1`. If already `1`, return.
* Checks the approval state of a candidate under a specific block, and updates the block and candidate entries accordingly.
* Checks the approval state of a candidate under a specific block, and updates the block and candidate entries
accordingly.
* Checks the `ApprovalEntry` for the block.
* [determine the tranches to inspect](#determine-required-tranches) of the candidate,
* [the candidate is approved under the block](#check-approval), set the corresponding bit in the `block_entry.approved_bitfield`.
* [the candidate is approved under the block](#check-approval), set the corresponding bit in the
`block_entry.approved_bitfield`.
* If the block is now fully approved and was not before, send a [`ChainSelectionMessage::Approved`][CSM].
* Otherwise, [schedule a wakeup of the candidate](#schedule-wakeup)
* If the approval vote originates locally, set the `our_approval_sig` in the candidate entry.
@@ -250,13 +334,19 @@ On receiving an `ApprovedAncestor(Hash, BlockNumber, response_channel)`:
#### Handling Wakeup
* Handle a previously-scheduled wakeup of a candidate under a specific block.
* Requires `(relay_block, candidate_hash)`
* Load the `BlockEntry` and `CandidateEntry` from disk. If either is not present, this may have lost a race with finality and can be ignored. Also load the `ApprovalEntry` for the block and candidate.
* Load the `BlockEntry` and `CandidateEntry` from disk. If either is not present, this may have lost a race with
finality and can be ignored. Also load the `ApprovalEntry` for the block and candidate.
* [determine the `RequiredTranches` of the candidate](#determine-required-tranches).
* Determine if we should trigger our assignment.
* If we've already triggered or `OurAssignment` is `None`, we do not trigger.
* If we have `RequiredTranches::All`, then we trigger if the candidate is [not approved](#check-approval). We have no next wakeup as we assume that other validators are doing the same and we will be implicitly woken up by handling new votes.
* If we have `RequiredTranches::Pending { considered, next_no_show, uncovered, maximum_broadcast, clock_drift }`, then we trigger if our assignment's tranche is less than or equal to `maximum_broadcast` and the current tick, with `clock_drift` applied, is at least the tick of our tranche.
* If we have `RequiredTranches::Exact { .. }` then we do not trigger, because this value indicates that no new assignments are needed at the moment.
* If we have `RequiredTranches::All`, then we trigger if the candidate is [not approved](#check-approval). We have
no next wakeup as we assume that other validators are doing the same and we will be implicitly woken up by
handling new votes.
* If we have `RequiredTranches::Pending { considered, next_no_show, uncovered, maximum_broadcast, clock_drift }`,
then we trigger if our assignment's tranche is less than or equal to `maximum_broadcast` and the current tick,
with `clock_drift` applied, is at least the tick of our tranche.
* If we have `RequiredTranches::Exact { .. }` then we do not trigger, because this value indicates that no new
assignments are needed at the moment.
* If we should trigger our assignment
* Import the assignment to the `ApprovalEntry`
* Broadcast on network with an `ApprovalDistributionMessage::DistributeAssignment`.
@@ -265,26 +355,39 @@ On receiving an `ApprovedAncestor(Hash, BlockNumber, response_channel)`:
#### Schedule Wakeup
* Requires `(approval_entry, candidate_entry)` which effectively denotes a `(Block Hash, Candidate Hash)` pair - the candidate, along with the block it appears in.
* Requires `(approval_entry, candidate_entry)` which effectively denotes a `(Block Hash, Candidate Hash)` pair - the
candidate, along with the block it appears in.
* Also requires `RequiredTranches`
* If the `approval_entry` is approved, this doesn't need to be woken up again.
* If `RequiredTranches::All` - no wakeup. We assume other incoming votes will trigger wakeup and potentially re-schedule.
* If `RequiredTranches::Pending { considered, next_no_show, uncovered, maximum_broadcast, clock_drift }` - schedule at the lesser of the next no-show tick, or the tick, offset positively by `clock_drift` of the next non-empty tranche we are aware of after `considered`, including any tranche containing our own unbroadcast assignment. This can lead to no wakeup in the case that we have already broadcast our assignment and there are no pending no-shows; that is, we have approval votes for every assignment we've received that is not already a no-show. In this case, we will be re-triggered by other validators broadcasting their assignments.
* If `RequiredTranches::Exact { next_no_show, latest_assignment_tick, .. }` - set a wakeup for the earlier of the next no-show tick or the latest assignment tick + `APPROVAL_DELAY`.
* If `RequiredTranches::All` - no wakeup. We assume other incoming votes will trigger wakeup and potentially
re-schedule.
* If `RequiredTranches::Pending { considered, next_no_show, uncovered, maximum_broadcast, clock_drift }` - schedule at
the lesser of the next no-show tick, or the tick, offset positively by `clock_drift` of the next non-empty tranche
we are aware of after `considered`, including any tranche containing our own unbroadcast assignment. This can lead
to no wakeup in the case that we have already broadcast our assignment and there are no pending no-shows; that is,
we have approval votes for every assignment we've received that is not already a no-show. In this case, we will be
re-triggered by other validators broadcasting their assignments.
* If `RequiredTranches::Exact { next_no_show, latest_assignment_tick, .. }` - set a wakeup for the earlier of the next
no-show tick or the latest assignment tick + `APPROVAL_DELAY`.
#### Launch Approval Work
* Requires `(SessionIndex, SessionInfo, CandidateReceipt, ValidatorIndex, backing_group, block_hash, candidate_index)`
* Extract the public key of the `ValidatorIndex` from the `SessionInfo` for the session.
* Issue an `AvailabilityRecoveryMessage::RecoverAvailableData(candidate, session_index, Some(backing_group), response_sender)`
* Load the historical validation code of the parachain by dispatching a `RuntimeApiRequest::ValidationCodeByHash(descriptor.validation_code_hash)` against the state of `block_hash`.
* Issue an `AvailabilityRecoveryMessage::RecoverAvailableData(candidate, session_index, Some(backing_group),
response_sender)`
* Load the historical validation code of the parachain by dispatching a
`RuntimeApiRequest::ValidationCodeByHash(descriptor.validation_code_hash)` against the state of `block_hash`.
* Spawn a background task with a clone of `background_tx`
* Wait for the available data
* Issue a `CandidateValidationMessage::ValidateFromExhaustive` message with `APPROVAL_EXECUTION_TIMEOUT` as the timeout parameter.
* Issue a `CandidateValidationMessage::ValidateFromExhaustive` message with `APPROVAL_EXECUTION_TIMEOUT` as the
timeout parameter.
* Wait for the result of validation
* Check that the result of validation, if valid, matches the commitments in the receipt.
* If valid, issue a message on `background_tx` detailing the request.
* If any of the data, the candidate, or the commitments are invalid, issue on `background_tx` a [`DisputeCoordinatorMessage::IssueLocalStatement`](../../types/overseer-protocol.md#dispute-coordinator-message) with `valid = false` to initiate a dispute.
* If any of the data, the candidate, or the commitments are invalid, issue on `background_tx` a
[`DisputeCoordinatorMessage::IssueLocalStatement`](../../types/overseer-protocol.md#dispute-coordinator-message)
with `valid = false` to initiate a dispute.
#### Issue Approval Vote
* Fetch the block entry and candidate entry. Ignore if `None` - we've probably just lost a race with finality.
@@ -297,14 +400,22 @@ On receiving an `ApprovedAncestor(Hash, BlockNumber, response_channel)`:
#### Determine Required Tranches
This logic is for inspecting an approval entry that tracks the assignments received, along with information on which assignments have corresponding approval votes. Inspection also involves the current time and expected requirements and is used to help the higher-level code determine the following:
This logic is for inspecting an approval entry that tracks the assignments received, along with information on which
assignments have corresponding approval votes. Inspection also involves the current time and expected requirements and
is used to help the higher-level code determine the following:
* Whether to broadcast the local assignment
* Whether to check that the candidate entry has been completely approved.
* If the candidate is waiting on approval, when to schedule the next wakeup of the `(candidate, block)` pair at a point where the state machine could be advanced.
* If the candidate is waiting on approval, when to schedule the next wakeup of the `(candidate, block)` pair at a
point where the state machine could be advanced.
These routines are pure functions which only depend on the environmental state. The expectation is that this determination is re-run every time we attempt to update an approval entry: either when we trigger a wakeup to advance the state machine based on a no-show or our own broadcast, or when we receive further assignments or approvals from the network.
These routines are pure functions which only depend on the environmental state. The expectation is that this
determination is re-run every time we attempt to update an approval entry: either when we trigger a wakeup to advance
the state machine based on a no-show or our own broadcast, or when we receive further assignments or approvals from the
network.
Thus it may be that at some point in time, we consider that tranches 0..X is required to be considered, but as we receive more information, we might require fewer tranches. Or votes that we perceived to be missing and require replacement are filled in and change our view.
Thus it may be that at some point in time, we consider that tranches 0..X is required to be considered, but as we
receive more information, we might require fewer tranches. Or votes that we perceived to be missing and require
replacement are filled in and change our view.
Requires `(approval_entry, approvals_received, tranche_now, block_tick, no_show_duration, needed_approvals)`
@@ -327,7 +438,8 @@ enum RequiredTranches {
/// as though it is `clock_drift` ticks earlier.
clock_drift: Tick,
},
// An exact number of required tranches and a number of no-shows. This indicates that the amount of `needed_approvals` are assigned and additionally all no-shows are covered.
// An exact number of required tranches and a number of no-shows. This indicates that the amount of `needed_approvals`
// are assigned and additionally all no-shows are covered.
Exact {
/// The tranche to inspect up to.
needed: DelayTranche,
@@ -345,13 +457,21 @@ enum RequiredTranches {
**Clock-drift and Tranche-taking**
Our vote-counting procedure depends heavily on how we interpret time based on the presence of no-shows - assignments which have no corresponding approval after some time.
Our vote-counting procedure depends heavily on how we interpret time based on the presence of no-shows - assignments
which have no corresponding approval after some time.
We have this is because of how we handle no-shows: we keep track of the depth of no-shows we are covering.
As an example: there may be initial no-shows in tranche 0. It'll take `no_show_duration` ticks before those are considered no-shows. Then, we don't want to immediately take `no_show_duration` more tranches. Instead, we want to take one tranche for each uncovered no-show. However, as we take those tranches, there may be further no-shows. Since these depth-1 no-shows should have only been triggered after the depth-0 no-shows were already known to be no-shows, we need to discount the local clock by `no_show_duration` to see whether these should be considered no-shows or not. There may be malicious parties who broadcast their assignment earlier than they were meant to, who shouldn't be counted as instant no-shows. We continue onwards to cover all depth-1 no-shows which may lead to depth-2 no-shows and so on.
As an example: there may be initial no-shows in tranche 0. It'll take `no_show_duration` ticks before those are
considered no-shows. Then, we don't want to immediately take `no_show_duration` more tranches. Instead, we want to take
one tranche for each uncovered no-show. However, as we take those tranches, there may be further no-shows. Since these
depth-1 no-shows should have only been triggered after the depth-0 no-shows were already known to be no-shows, we need
to discount the local clock by `no_show_duration` to see whether these should be considered no-shows or not. There may
be malicious parties who broadcast their assignment earlier than they were meant to, who shouldn't be counted as instant
no-shows. We continue onwards to cover all depth-1 no-shows which may lead to depth-2 no-shows and so on.
Likewise, when considering how many tranches to take, the no-show depth should be used to apply a depth-discount or clock drift to the `tranche_now`.
Likewise, when considering how many tranches to take, the no-show depth should be used to apply a depth-discount or
clock drift to the `tranche_now`.
**Procedure**
@@ -360,21 +480,35 @@ Likewise, when considering how many tranches to take, the no-show depth should b
* Take tranches up to `tranche_now - clock_drift` until all needed assignments are met.
* Keep track of the `next_no_show` according to the clock drift, as we go.
* Keep track of the `last_assignment_tick` as we go.
* If running out of tranches before then, return `Pending { considered, next_no_show, maximum_broadcast, clock_drift }`
* If running out of tranches before then, return `Pending { considered, next_no_show, maximum_broadcast, clock_drift
}`
* If there are no no-shows, return `Exact { needed, tolerated_missing, next_no_show, last_assignment_tick }`
* `maximum_broadcast` is either `DelayTranche::max_value()` at tranche 0 or otherwise by the last considered tranche + the number of uncovered no-shows at this point.
* If there are no-shows, return to the beginning, incrementing `depth` and attempting to cover the number of no-shows. Each no-show must be covered by a non-empty tranche, which are tranches that have at least one assignment. Each non-empty tranche covers exactly one no-show.
* If at any point, it seems that all validators are required, do an early return with `RequiredTranches::All` which indicates that everyone should broadcast.
* `maximum_broadcast` is either `DelayTranche::max_value()` at tranche 0 or otherwise by the last considered tranche +
the number of uncovered no-shows at this point.
* If there are no-shows, return to the beginning, incrementing `depth` and attempting to cover the number of no-shows.
Each no-show must be covered by a non-empty tranche, which are tranches that have at least one assignment. Each
non-empty tranche covers exactly one no-show.
* If at any point, it seems that all validators are required, do an early return with `RequiredTranches::All` which
indicates that everyone should broadcast.
#### Check Approval
* Check whether a candidate is approved under a particular block.
* Requires `(block_entry, candidate_entry, approval_entry, n_tranches)`
* If we have `3 * n_approvals > n_validators`, return true. This is because any set with f+1 validators must have at least one honest validator, who has approved the candidate.
* If we have `3 * n_approvals > n_validators`, return true. This is because any set with f+1 validators must have at
least one honest validator, who has approved the candidate.
* If `n_tranches` is `RequiredTranches::Pending`, return false
* If `n_tranches` is `RequiredTranches::All`, return false.
* If `n_tranches` is `RequiredTranches::Exact { tranche, tolerated_missing, latest_assignment_tick, .. }`, then we return whether all assigned validators up to `tranche` less `tolerated_missing` have approved and `latest_assignment_tick + APPROVAL_DELAY >= tick_now`.
* e.g. if we had 5 tranches and 1 tolerated missing, we would accept only if all but 1 of assigned validators in tranches 0..=5 have approved. In that example, we also accept all validators in tranches 0..=5 having approved, but that would indicate that the `RequiredTranches` value was incorrectly constructed, so it is not realistic. `tolerated_missing` actually represents covered no-shows. If there are more missing approvals than there are tolerated missing, that indicates that there are some assignments which are not yet no-shows, but may become no-shows, and we should wait for the validators to either approve or become no-shows.
* e.g. If the above passes and the `latest_assignment_tick` was 5 and the current tick was 6, then we'd return false.
* If `n_tranches` is `RequiredTranches::Exact { tranche, tolerated_missing, latest_assignment_tick, .. }`, then we
return whether all assigned validators up to `tranche` less `tolerated_missing` have approved and
`latest_assignment_tick + APPROVAL_DELAY >= tick_now`.
* e.g. if we had 5 tranches and 1 tolerated missing, we would accept only if all but 1 of assigned validators in
tranches 0..=5 have approved. In that example, we also accept all validators in tranches 0..=5 having approved,
but that would indicate that the `RequiredTranches` value was incorrectly constructed, so it is not realistic.
`tolerated_missing` actually represents covered no-shows. If there are more missing approvals than there are
tolerated missing, that indicates that there are some assignments which are not yet no-shows, but may become
no-shows, and we should wait for the validators to either approve or become no-shows.
* e.g. If the above passes and the `latest_assignment_tick` was 5 and the current tick was 6, then we'd return
false.
### Time
@@ -1,3 +1,7 @@
# Availability Subsystems
The availability subsystems are responsible for ensuring that Proofs of Validity of backed candidates are widely available within the validator set, without requiring every node to retain a full copy. They accomplish this by broadly distributing erasure-coded chunks of the PoV, keeping track of which validator has which chunk by means of signed bitfields. They are also responsible for reassembling a complete PoV when required, e.g. when an approval checker needs to validate a parachain block.
The availability subsystems are responsible for ensuring that Proofs of Validity of backed candidates are widely
available within the validator set, without requiring every node to retain a full copy. They accomplish this by broadly
distributing erasure-coded chunks of the PoV, keeping track of which validator has which chunk by means of signed
bitfields. They are also responsible for reassembling a complete PoV when required, e.g. when an approval checker needs
to validate a parachain block.
@@ -1,31 +1,26 @@
# Availability Distribution
This subsystem is responsible for distribution availability data to peers.
Availability data are chunks, `PoV`s and `AvailableData` (which is `PoV` +
`PersistedValidationData`). It does so via request response protocols.
This subsystem is responsible for distribution availability data to peers. Availability data are chunks, `PoV`s and
`AvailableData` (which is `PoV` + `PersistedValidationData`). It does so via request response protocols.
In particular this subsystem is responsible for:
- Respond to network requests requesting availability data by querying the
[Availability Store](../utility/availability-store.md).
- Request chunks from backing validators to put them in the local `Availability
Store` whenever we find an occupied core on any fresh leaf,
this is to ensure availability by at least 2/3+ of all validators, this
happens after a candidate is backed.
- Fetch `PoV` from validators, when requested via `FetchPoV` message from
backing (`pov_requester` module).
- Respond to network requests requesting availability data by querying the [Availability
Store](../utility/availability-store.md).
- Request chunks from backing validators to put them in the local `Availability Store` whenever we find an occupied core
on any fresh leaf, this is to ensure availability by at least 2/3+ of all validators, this happens after a candidate
is backed.
- Fetch `PoV` from validators, when requested via `FetchPoV` message from backing (`pov_requester` module).
The backing subsystem is responsible of making available data available in the
local `Availability Store` upon validation. This subsystem will serve any
network requests by querying that store.
The backing subsystem is responsible of making available data available in the local `Availability Store` upon
validation. This subsystem will serve any network requests by querying that store.
## Protocol
This subsystem does not handle any peer set messages, but the `pov_requester`
does connect to validators of the same backing group on the validation peer
set, to ensure fast propagation of statements between those validators and for
ensuring already established connections for requesting `PoV`s. Other than that
this subsystem drives request/response protocols.
This subsystem does not handle any peer set messages, but the `pov_requester` does connect to validators of the same
backing group on the validation peer set, to ensure fast propagation of statements between those validators and for
ensuring already established connections for requesting `PoV`s. Other than that this subsystem drives request/response
protocols.
Input:
@@ -48,51 +43,42 @@ Output:
### PoV Requester
The PoV requester in the `pov_requester` module takes care of staying connected
to validators of the current backing group of this very validator on the `Validation`
peer set and it will handle `FetchPoV` requests by issuing network requests to
those validators. It will check the hash of the received `PoV`, but will not do any
further validation. That needs to be done by the original `FetchPoV` sender
(backing subsystem).
The PoV requester in the `pov_requester` module takes care of staying connected to validators of the current backing
group of this very validator on the `Validation` peer set and it will handle `FetchPoV` requests by issuing network
requests to those validators. It will check the hash of the received `PoV`, but will not do any further validation. That
needs to be done by the original `FetchPoV` sender (backing subsystem).
### Chunk Requester
After a candidate is backed, the availability of the PoV block must be confirmed
by 2/3+ of all validators. The chunk requester is responsible of making that
availability a reality.
After a candidate is backed, the availability of the PoV block must be confirmed by 2/3+ of all validators. The chunk
requester is responsible of making that availability a reality.
It does that by querying checking occupied cores for all active leaves. For each
occupied core it will spawn a task fetching the erasure chunk which has the
`ValidatorIndex` of the node. For this an `ChunkFetchingRequest` is issued, via
substrate's generic request/response protocol.
It does that by querying checking occupied cores for all active leaves. For each occupied core it will spawn a task
fetching the erasure chunk which has the `ValidatorIndex` of the node. For this an `ChunkFetchingRequest` is issued, via
Substrate's generic request/response protocol.
The spawned task will start trying to fetch the chunk from validators in
responsible group of the occupied core, in a random order. For ensuring that we
use already open TCP connections wherever possible, the requester maintains a
cache and preserves that random order for the entire session.
The spawned task will start trying to fetch the chunk from validators in responsible group of the occupied core, in a
random order. For ensuring that we use already open TCP connections wherever possible, the requester maintains a cache
and preserves that random order for the entire session.
Note however that, because not all validators in a group have to be actual
backers, not all of them are required to have the needed chunk. This in turn
could lead to low throughput, as we have to wait for fetches to fail,
before reaching a validator finally having our chunk. We do rank back validators
not delivering our chunk, but as backers could vary from block to block on a
perfectly legitimate basis, this is still not ideal. See issues [2509](https://github.com/paritytech/polkadot/issues/2509) and [2512](https://github.com/paritytech/polkadot/issues/2512)
for more information.
Note however that, because not all validators in a group have to be actual backers, not all of them are required to have
the needed chunk. This in turn could lead to low throughput, as we have to wait for fetches to fail, before reaching a
validator finally having our chunk. We do rank back validators not delivering our chunk, but as backers could vary from
block to block on a perfectly legitimate basis, this is still not ideal. See issues
[2509](https://github.com/paritytech/polkadot/issues/2509) and
[2512](https://github.com/paritytech/polkadot/issues/2512) for more information.
The current implementation also only fetches chunks for occupied cores in blocks
in active leaves. This means though, if active leaves skips a block or we are
particularly slow in fetching our chunk, we might not fetch our chunk if
availability reached 2/3 fast enough (slot becomes free). This is not desirable
as we would like as many validators as possible to have their chunk. See this
[issue](https://github.com/paritytech/polkadot/issues/2513) for more details.
The current implementation also only fetches chunks for occupied cores in blocks in active leaves. This means though, if
active leaves skips a block or we are particularly slow in fetching our chunk, we might not fetch our chunk if
availability reached 2/3 fast enough (slot becomes free). This is not desirable as we would like as many validators as
possible to have their chunk. See this [issue](https://github.com/paritytech/polkadot/issues/2513) for more details.
### Serving
On the other side the subsystem will listen for incoming `ChunkFetchingRequest`s
and `PoVFetchingRequest`s from the network bridge and will respond to queries,
by looking the requested chunks and `PoV`s up in the availability store, this
happens in the `responder` module.
On the other side the subsystem will listen for incoming `ChunkFetchingRequest`s and `PoVFetchingRequest`s from the
network bridge and will respond to queries, by looking the requested chunks and `PoV`s up in the availability store,
this happens in the `responder` module.
We rely on the backing subsystem to make available data available locally in the
`Availability Store` after it has validated it.
We rely on the backing subsystem to make available data available locally in the `Availability Store` after it has
validated it.
@@ -1,8 +1,13 @@
# Availability Recovery
This subsystem is the inverse of the [Availability Distribution](availability-distribution.md) subsystem: validators will serve the availability chunks kept in the availability store to nodes who connect to them. And the subsystem will also implement the other side: the logic for nodes to connect to validators, request availability pieces, and reconstruct the `AvailableData`.
This subsystem is the inverse of the [Availability Distribution](availability-distribution.md) subsystem: validators
will serve the availability chunks kept in the availability store to nodes who connect to them. And the subsystem will
also implement the other side: the logic for nodes to connect to validators, request availability pieces, and
reconstruct the `AvailableData`.
This version of the availability recovery subsystem is based off of direct connections to validators. In order to recover any given `AvailableData`, we must recover at least `f + 1` pieces from validators of the session. Thus, we will connect to and query randomly chosen validators until we have received `f + 1` pieces.
This version of the availability recovery subsystem is based off of direct connections to validators. In order to
recover any given `AvailableData`, we must recover at least `f + 1` pieces from validators of the session. Thus, we will
connect to and query randomly chosen validators until we have received `f + 1` pieces.
## Protocol
@@ -10,18 +15,20 @@ This version of the availability recovery subsystem is based off of direct conne
Input:
- `NetworkBridgeUpdate(update)`
- `AvailabilityRecoveryMessage::RecoverAvailableData(candidate, session, backing_group, response)`
* `NetworkBridgeUpdate(update)`
* `AvailabilityRecoveryMessage::RecoverAvailableData(candidate, session, backing_group, response)`
Output:
- `NetworkBridge::SendValidationMessage`
- `NetworkBridge::ReportPeer`
- `AvailabilityStore::QueryChunk`
* `NetworkBridge::SendValidationMessage`
* `NetworkBridge::ReportPeer`
* `AvailabilityStore::QueryChunk`
## Functionality
We hold a state which tracks the currently ongoing recovery tasks, as well as which request IDs correspond to which task. A recovery task is a structure encapsulating all recovery tasks with the network necessary to recover the available data in respect to one candidate.
We hold a state which tracks the currently ongoing recovery tasks, as well as which request IDs correspond to which
task. A recovery task is a structure encapsulating all recovery tasks with the network necessary to recover the
available data in respect to one candidate.
```rust
struct State {
@@ -87,17 +94,22 @@ On `Conclude`, shut down the subsystem.
1. Check the `availability_lru` for the candidate and return the data if so.
1. Check if there is already an recovery handle for the request. If so, add the response handle to it.
1. Otherwise, load the session info for the given session under the state of `live_block_hash`, and initiate a recovery task with *`launch_recovery_task`*. Add a recovery handle to the state and add the response channel to it.
1. Otherwise, load the session info for the given session under the state of `live_block_hash`, and initiate a recovery
task with *`launch_recovery_task`*. Add a recovery handle to the state and add the response channel to it.
1. If the session info is not available, return `RecoveryError::Unavailable` on the response channel.
### Recovery logic
#### `launch_recovery_task(session_index, session_info, candidate_receipt, candidate_hash, Option<backing_group_index>)`
1. Compute the threshold from the session info. It should be `f + 1`, where `n = 3f + k`, where `k in {1, 2, 3}`, and `n` is the number of validators.
1. Set the various fields of `RecoveryParams` based on the validator lists in `session_info` and information about the candidate.
1. If the `backing_group_index` is `Some`, start in the `RequestFromBackers` phase with a shuffling of the backing group validator indices and a `None` requesting value.
1. Otherwise, start in the `RequestChunksFromValidators` source with `received_chunks`,`requesting_chunks`, and `next_shuffling` all empty.
1. Compute the threshold from the session info. It should be `f + 1`, where `n = 3f + k`, where `k in {1, 2, 3}`, and
`n` is the number of validators.
1. Set the various fields of `RecoveryParams` based on the validator lists in `session_info` and information about the
candidate.
1. If the `backing_group_index` is `Some`, start in the `RequestFromBackers` phase with a shuffling of the backing group
validator indices and a `None` requesting value.
1. Otherwise, start in the `RequestChunksFromValidators` source with `received_chunks`,`requesting_chunks`, and
`next_shuffling` all empty.
1. Set the `to_subsystems` sender to be equal to a clone of the `SubsystemContext`'s sender.
1. Initialize `received_chunks` to an empty set, as well as `requesting_chunks`.
@@ -115,19 +127,24 @@ const N_PARALLEL: usize = 50;
* Loop:
* If the `requesting_pov` is `Some`, poll for updates on it. If it concludes, set `requesting_pov` to `None`.
* If the `requesting_pov` is `None`, take the next backer off the `shuffled_backers`.
* If the backer is `Some`, issue a `NetworkBridgeMessage::Requests` with a network request for the `AvailableData` and wait for the response.
* If the backer is `Some`, issue a `NetworkBridgeMessage::Requests` with a network request for the
`AvailableData` and wait for the response.
* If it concludes with a `None` result, return to beginning.
* If it concludes with available data, attempt a re-encoding.
* If it has the correct erasure-root, break and issue a `Ok(available_data)`.
* If it has an incorrect erasure-root, return to beginning.
* Send the result to each member of `awaiting`.
* If the backer is `None`, set the source to `RequestChunksFromValidators` with a random shuffling of validators and empty `received_chunks`, and `requesting_chunks` and break the loop.
* If the backer is `None`, set the source to `RequestChunksFromValidators` with a random shuffling of validators
and empty `received_chunks`, and `requesting_chunks` and break the loop.
* If the task contains `RequestChunksFromValidators`:
* Request `AvailabilityStoreMessage::QueryAllChunks`. For each chunk that exists, add it to `received_chunks` and remote the validator from `shuffling`.
* Request `AvailabilityStoreMessage::QueryAllChunks`. For each chunk that exists, add it to `received_chunks` and
remote the validator from `shuffling`.
* Loop:
* If `received_chunks + requesting_chunks + shuffling` lengths are less than the threshold, break and return `Err(Unavailable)`.
* Poll for new updates from `requesting_chunks`. Check merkle proofs of any received chunks. If the request simply fails due to network issues, insert into the front of `shuffling` to be retried.
* If `received_chunks + requesting_chunks + shuffling` lengths are less than the threshold, break and return
`Err(Unavailable)`.
* Poll for new updates from `requesting_chunks`. Check merkle proofs of any received chunks. If the request simply
fails due to network issues, insert into the front of `shuffling` to be retried.
* If `received_chunks` has more than `threshold` entries, attempt to recover the data.
* If that fails, return `Err(RecoveryError::Invalid)`
* If correct:
@@ -135,5 +152,6 @@ const N_PARALLEL: usize = 50;
* break and issue `Ok(available_data)`
* Send the result to each member of `awaiting`.
* While there are fewer than `N_PARALLEL` entries in `requesting_chunks`,
* Pop the next item from `shuffling`. If it's empty and `requesting_chunks` is empty, return `Err(RecoveryError::Unavailable)`.
* Pop the next item from `shuffling`. If it's empty and `requesting_chunks` is empty, return
`Err(RecoveryError::Unavailable)`.
* Issue a `NetworkBridgeMessage::Requests` and wait for the response in `requesting_chunks`.
@@ -1,34 +1,40 @@
# Bitfield Distribution
Validators vote on the availability of a backed candidate by issuing signed bitfields, where each bit corresponds to a single candidate. These bitfields can be used to compactly determine which backed candidates are available or not based on a 2/3+ quorum.
Validators vote on the availability of a backed candidate by issuing signed bitfields, where each bit corresponds to a
single candidate. These bitfields can be used to compactly determine which backed candidates are available or not based
on a 2/3+ quorum.
## Protocol
`PeerSet`: `Validation`
Input:
[`BitfieldDistributionMessage`](../../types/overseer-protocol.md#bitfield-distribution-message) which are gossiped to all peers, no matter if validator or not.
Input: [`BitfieldDistributionMessage`](../../types/overseer-protocol.md#bitfield-distribution-message) which are
gossiped to all peers, no matter if validator or not.
Output:
- `NetworkBridge::SendValidationMessage([PeerId], message)` gossip a verified incoming bitfield on to interested subsystems within this validator node.
- `NetworkBridge::ReportPeer(PeerId, cost_or_benefit)` improve or penalize the reputation of peers based on the messages that are received relative to the current view.
- `ProvisionerMessage::ProvisionableData(ProvisionableData::Bitfield(relay_parent, SignedAvailabilityBitfield))` pass
on the bitfield to the other submodules via the overseer.
- `NetworkBridge::SendValidationMessage([PeerId], message)` gossip a verified incoming bitfield on to interested
subsystems within this validator node.
- `NetworkBridge::ReportPeer(PeerId, cost_or_benefit)` improve or penalize the reputation of peers based on the messages
that are received relative to the current view.
- `ProvisionerMessage::ProvisionableData(ProvisionableData::Bitfield(relay_parent, SignedAvailabilityBitfield))` pass on
the bitfield to the other submodules via the overseer.
## Functionality
This is implemented as a gossip system.
It is necessary to track peer connection, view change, and disconnection events, in order to maintain an index of which peers are interested in which relay parent bitfields.
It is necessary to track peer connection, view change, and disconnection events, in order to maintain an index of which
peers are interested in which relay parent bitfields.
Before gossiping incoming bitfields, they must be checked to be signed by one of the validators
of the validator set relevant to the current relay parent.
Only accept bitfields relevant to our current view and only distribute bitfields to other peers when relevant to their most recent view.
Accept and distribute only one bitfield per validator.
Before gossiping incoming bitfields, they must be checked to be signed by one of the validators of the validator set
relevant to the current relay parent. Only accept bitfields relevant to our current view and only distribute bitfields
to other peers when relevant to their most recent view. Accept and distribute only one bitfield per validator.
When receiving a bitfield either from the network or from a `DistributeBitfield` message, forward it along to the block authorship (provisioning) subsystem for potential inclusion in a block.
When receiving a bitfield either from the network or from a `DistributeBitfield` message, forward it along to the block
authorship (provisioning) subsystem for potential inclusion in a block.
Peers connecting after a set of valid bitfield gossip messages was received, those messages must be cached and sent upon connection of new peers or re-connecting peers.
Peers connecting after a set of valid bitfield gossip messages was received, those messages must be cached and sent upon
connection of new peers or re-connecting peers.
@@ -1,12 +1,15 @@
# Bitfield Signing
Validators vote on the availability of a backed candidate by issuing signed bitfields, where each bit corresponds to a single candidate. These bitfields can be used to compactly determine which backed candidates are available or not based on a 2/3+ quorum.
Validators vote on the availability of a backed candidate by issuing signed bitfields, where each bit corresponds to a
single candidate. These bitfields can be used to compactly determine which backed candidates are available or not based
on a 2/3+ quorum.
## Protocol
Input:
There is no dedicated input mechanism for bitfield signing. Instead, Bitfield Signing produces a bitfield representing the current state of availability on `StartWork`.
There is no dedicated input mechanism for bitfield signing. Instead, Bitfield Signing produces a bitfield representing
the current state of availability on `StartWork`.
Output:
@@ -15,15 +18,20 @@ Output:
## Functionality
Upon receipt of an `ActiveLeavesUpdate`, launch bitfield signing job for each `activated` head referring to a fresh leaf. Stop the job for each `deactivated` head.
Upon receipt of an `ActiveLeavesUpdate`, launch bitfield signing job for each `activated` head referring to a fresh
leaf. Stop the job for each `deactivated` head.
## Bitfield Signing Job
Localized to a specific relay-parent `r`
If not running as a validator, do nothing.
Localized to a specific relay-parent `r` If not running as a validator, do nothing.
- For each fresh leaf, begin by waiting a fixed period of time so availability distribution has the chance to make candidates available.
- Determine our validator index `i`, the set of backed candidates pending availability in `r`, and which bit of the bitfield each corresponds to.
- Start with an empty bitfield. For each bit in the bitfield, if there is a candidate pending availability, query the [Availability Store](../utility/availability-store.md) for whether we have the availability chunk for our validator index. The `OccupiedCore` struct contains the candidate hash so the full candidate does not need to be fetched from runtime.
- For each fresh leaf, begin by waiting a fixed period of time so availability distribution has the chance to make
candidates available.
- Determine our validator index `i`, the set of backed candidates pending availability in `r`, and which bit of the
bitfield each corresponds to.
- Start with an empty bitfield. For each bit in the bitfield, if there is a candidate pending availability, query the
[Availability Store](../utility/availability-store.md) for whether we have the availability chunk for our validator
index. The `OccupiedCore` struct contains the candidate hash so the full candidate does not need to be fetched from
runtime.
- For all chunks we have, set the corresponding bit in the bitfield.
- Sign the bitfield and dispatch a `BitfieldDistribution::DistributeBitfield` message.
@@ -1,10 +1,15 @@
# Backing Subsystems
The backing subsystems, when conceived as a black box, receive an arbitrary quantity of parablock candidates and associated proofs of validity from arbitrary untrusted collators. From these, they produce a bounded quantity of backable candidates which relay chain block authors may choose to include in a subsequent block.
The backing subsystems, when conceived as a black box, receive an arbitrary quantity of parablock candidates and
associated proofs of validity from arbitrary untrusted collators. From these, they produce a bounded quantity of
backable candidates which relay chain block authors may choose to include in a subsequent block.
In broad strokes, the flow operates like this:
- **Candidate Selection** winnows the field of parablock candidates, selecting up to one of them to second.
- **Candidate Backing** ensures that a seconding candidate is valid, then generates the appropriate `Statement`. It also keeps track of which candidates have received the backing of a quorum of other validators.
- **Statement Distribution** is the networking component which ensures that all validators receive each others' statements.
- **PoV Distribution** is the networking component which ensures that validators considering a candidate can get the appropriate PoV.
- **Candidate Backing** ensures that a seconding candidate is valid, then generates the appropriate `Statement`. It also
keeps track of which candidates have received the backing of a quorum of other validators.
- **Statement Distribution** is the networking component which ensures that all validators receive each others'
statements.
- **PoV Distribution** is the networking component which ensures that validators considering a candidate can get the
appropriate PoV.
@@ -1,12 +1,20 @@
# Candidate Backing
The Candidate Backing subsystem ensures every parablock considered for relay block inclusion has been seconded by at least one validator, and approved by a quorum. Parablocks for which not enough validators will assert correctness are discarded. If the block later proves invalid, the initial backers are slashable; this gives polkadot a rational threat model during subsequent stages.
The Candidate Backing subsystem ensures every parablock considered for relay block inclusion has been seconded by at
least one validator, and approved by a quorum. Parablocks for which not enough validators will assert correctness are
discarded. If the block later proves invalid, the initial backers are slashable; this gives Polkadot a rational threat
model during subsequent stages.
Its role is to produce backable candidates for inclusion in new relay-chain blocks. It does so by issuing signed [`Statement`s][Statement] and tracking received statements signed by other validators. Once enough statements are received, they can be combined into backing for specific candidates.
Its role is to produce backable candidates for inclusion in new relay-chain blocks. It does so by issuing signed
[`Statement`s][Statement] and tracking received statements signed by other validators. Once enough statements are
received, they can be combined into backing for specific candidates.
Note that though the candidate backing subsystem attempts to produce as many backable candidates as possible, it does _not_ attempt to choose a single authoritative one. The choice of which actually gets included is ultimately up to the block author, by whatever metrics it may use; those are opaque to this subsystem.
Note that though the candidate backing subsystem attempts to produce as many backable candidates as possible, it does
_not_ attempt to choose a single authoritative one. The choice of which actually gets included is ultimately up to the
block author, by whatever metrics it may use; those are opaque to this subsystem.
Once a sufficient quorum has agreed that a candidate is valid, this subsystem notifies the [Provisioner][PV], which in turn engages block production mechanisms to include the parablock.
Once a sufficient quorum has agreed that a candidate is valid, this subsystem notifies the [Provisioner][PV], which in
turn engages block production mechanisms to include the parablock.
## Protocol
@@ -14,33 +22,49 @@ Input: [`CandidateBackingMessage`][CBM]
Output:
- [`CandidateValidationMessage`][CVM]
- [`RuntimeApiMessage`][RAM]
- [`CollatorProtocolMessage`][CPM]
- [`ProvisionerMessage`][PM]
- [`AvailabilityDistributionMessage`][ADM]
- [`StatementDistributionMessage`][SDM]
* [`CandidateValidationMessage`][CVM]
* [`RuntimeApiMessage`][RAM]
* [`CollatorProtocolMessage`][CPM]
* [`ProvisionerMessage`][PM]
* [`AvailabilityDistributionMessage`][ADM]
* [`StatementDistributionMessage`][SDM]
## Functionality
The [Collator Protocol][CP] subsystem is the primary source of non-overseer messages into this subsystem. That subsystem generates appropriate [`CandidateBackingMessage`s][CBM] and passes them to this subsystem.
The [Collator Protocol][CP] subsystem is the primary source of non-overseer messages into this subsystem. That subsystem
generates appropriate [`CandidateBackingMessage`s][CBM] and passes them to this subsystem.
This subsystem requests validation from the [Candidate Validation][CV] and generates an appropriate [`Statement`][Statement]. All `Statement`s are then passed on to the [Statement Distribution][SD] subsystem to be gossiped to peers. When [Candidate Validation][CV] decides that a candidate is invalid, and it was recommended to us to second by our own [Collator Protocol][CP] subsystem, a message is sent to the [Collator Protocol][CP] subsystem with the candidate's hash so that the collator which recommended it can be penalized.
This subsystem requests validation from the [Candidate Validation][CV] and generates an appropriate
[`Statement`][Statement]. All `Statement`s are then passed on to the [Statement Distribution][SD] subsystem to be
gossiped to peers. When [Candidate Validation][CV] decides that a candidate is invalid, and it was recommended to us to
second by our own [Collator Protocol][CP] subsystem, a message is sent to the [Collator Protocol][CP] subsystem with the
candidate's hash so that the collator which recommended it can be penalized.
The subsystem should maintain a set of handles to Candidate Backing Jobs that are currently live, as well as the relay-parent to which they correspond.
The subsystem should maintain a set of handles to Candidate Backing Jobs that are currently live, as well as the
relay-parent to which they correspond.
### On Overseer Signal
* If the signal is an [`OverseerSignal`][OverseerSignal]`::ActiveLeavesUpdate`:
* spawn a Candidate Backing Job for each `activated` head referring to a fresh leaf, storing a bidirectional channel with the Candidate Backing Job in the set of handles.
* spawn a Candidate Backing Job for each `activated` head referring to a fresh leaf, storing a bidirectional channel
with the Candidate Backing Job in the set of handles.
* cease the Candidate Backing Job for each `deactivated` head, if any.
* If the signal is an [`OverseerSignal`][OverseerSignal]`::Conclude`: Forward conclude messages to all jobs, wait a small amount of time for them to join, and then exit.
* If the signal is an [`OverseerSignal`][OverseerSignal]`::Conclude`: Forward conclude messages to all jobs, wait a
small amount of time for them to join, and then exit.
### On Receiving `CandidateBackingMessage`
* If the message is a [`CandidateBackingMessage`][CBM]`::GetBackedCandidates`, get all backable candidates from the statement table and send them back.
* If the message is a [`CandidateBackingMessage`][CBM]`::Second`, sign and dispatch a `Seconded` statement only if we have not seconded any other candidate and have not signed a `Valid` statement for the requested candidate. Signing both a `Seconded` and `Valid` message is a double-voting misbehavior with a heavy penalty, and this could occur if another validator has seconded the same candidate and we've received their message before the internal seconding request.
* If the message is a [`CandidateBackingMessage`][CBM]`::Statement`, count the statement to the quorum. If the statement in the message is `Seconded` and it contains a candidate that belongs to our assignment, request the corresponding `PoV` from the backing node via `AvailabilityDistribution` and launch validation. Issue our own `Valid` or `Invalid` statement as a result.
* If the message is a [`CandidateBackingMessage`][CBM]`::GetBackedCandidates`, get all backable candidates from the
statement table and send them back.
* If the message is a [`CandidateBackingMessage`][CBM]`::Second`, sign and dispatch a `Seconded` statement only if we
have not seconded any other candidate and have not signed a `Valid` statement for the requested candidate. Signing
both a `Seconded` and `Valid` message is a double-voting misbehavior with a heavy penalty, and this could occur if
another validator has seconded the same candidate and we've received their message before the internal seconding
request.
* If the message is a [`CandidateBackingMessage`][CBM]`::Statement`, count the statement to the quorum. If the statement
in the message is `Seconded` and it contains a candidate that belongs to our assignment, request the corresponding
`PoV` from the backing node via `AvailabilityDistribution` and launch validation. Issue our own `Valid` or `Invalid`
statement as a result.
If the seconding node did not provide us with the `PoV` we will retry fetching from other backing validators.
@@ -51,19 +75,25 @@ If the seconding node did not provide us with the `PoV` we will retry fetching f
> * Allow inclusion of _old_ parachain candidates validated by _current_ validators.
> * Allow inclusion of _old_ parachain candidates validated by _old_ validators.
>
> This will probably blur the lines between jobs, will probably require inter-job communication and a short-term memory of recently backable, but not backed candidates.
> This will probably blur the lines between jobs, will probably require inter-job communication and a short-term memory
> of recently backable, but not backed candidates.
## Candidate Backing Job
The Candidate Backing Job represents the work a node does for backing candidates with respect to a particular relay-parent.
The Candidate Backing Job represents the work a node does for backing candidates with respect to a particular
relay-parent.
The goal of a Candidate Backing Job is to produce as many backable candidates as possible. This is done via signed [`Statement`s][STMT] by validators. If a candidate receives a majority of supporting Statements from the Parachain Validators currently assigned, then that candidate is considered backable.
The goal of a Candidate Backing Job is to produce as many backable candidates as possible. This is done via signed
[`Statement`s][STMT] by validators. If a candidate receives a majority of supporting Statements from the Parachain
Validators currently assigned, then that candidate is considered backable.
### On Startup
* Fetch current validator set, validator -> parachain assignments from [`Runtime API`][RA] subsystem using [`RuntimeApiRequest::Validators`][RAM] and [`RuntimeApiRequest::ValidatorGroups`][RAM]
* Fetch current validator set, validator -> parachain assignments from [`Runtime API`][RA] subsystem using
[`RuntimeApiRequest::Validators`][RAM] and [`RuntimeApiRequest::ValidatorGroups`][RAM]
* Determine if the node controls a key in the current validator set. Call this the local key if so.
* If the local key exists, extract the parachain head and validation function from the [`Runtime API`][RA] for the parachain the local key is assigned to by issuing a [`RuntimeApiRequest::Validators`][RAM]
* If the local key exists, extract the parachain head and validation function from the [`Runtime API`][RA] for the
parachain the local key is assigned to by issuing a [`RuntimeApiRequest::Validators`][RAM]
* Issue a [`RuntimeApiRequest::SigningContext`][RAM] message to get a context that will later be used upon signing.
### On Receiving New Candidate Backing Message
@@ -91,15 +121,17 @@ match msg {
}
```
Add `Seconded` statements and `Valid` statements to a quorum. If the quorum reaches a pre-defined threshold, send a [`ProvisionerMessage`][PM]`::ProvisionableData(ProvisionableData::BackedCandidate(CandidateReceipt))` message.
`Invalid` statements that conflict with already witnessed `Seconded` and `Valid` statements for the given candidate, statements that are double-votes, self-contradictions and so on, should result in issuing a [`ProvisionerMessage`][PM]`::MisbehaviorReport` message for each newly detected case of this kind.
Add `Seconded` statements and `Valid` statements to a quorum. If the quorum reaches a pre-defined threshold, send a
[`ProvisionerMessage`][PM]`::ProvisionableData(ProvisionableData::BackedCandidate(CandidateReceipt))` message. `Invalid`
statements that conflict with already witnessed `Seconded` and `Valid` statements for the given candidate, statements
that are double-votes, self-contradictions and so on, should result in issuing a
[`ProvisionerMessage`][PM]`::MisbehaviorReport` message for each newly detected case of this kind.
Backing does not need to concern itself with providing statements to the dispute
coordinator as the dispute coordinator scrapes them from chain. This way the
import is batched and contains only statements that actually made it on some
Backing does not need to concern itself with providing statements to the dispute coordinator as the dispute coordinator
scrapes them from chain. This way the import is batched and contains only statements that actually made it on some
chain.
### Validating Candidates.
### Validating Candidates
```rust
fn spawn_validation_work(candidate, parachain head, validation function) {
@@ -119,14 +151,16 @@ fn spawn_validation_work(candidate, parachain head, validation function) {
### Fetch PoV Block
Create a `(sender, receiver)` pair.
Dispatch a [`AvailabilityDistributionMessage`][ADM]`::FetchPoV{ validator_index, pov_hash, candidate_hash, tx, } and listen on the passed receiver for a response. Availability distribution will send the request to the validator specified by `validator_index`, which might not be serving it for whatever reasons, therefore we need to retry with other backing validators in that case.
Create a `(sender, receiver)` pair. Dispatch a [`AvailabilityDistributionMessage`][ADM]`::FetchPoV{ validator_index,
pov_hash, candidate_hash, tx, }` and listen on the passed receiver for a response. Availability distribution will send
the request to the validator specified by `validator_index`, which might not be serving it for whatever reasons,
therefore we need to retry with other backing validators in that case.
### Validate PoV Block
Create a `(sender, receiver)` pair.
Dispatch a `CandidateValidationMessage::Validate(validation function, candidate, pov, BACKING_EXECUTION_TIMEOUT, sender)` and listen on the receiver for a response.
Create a `(sender, receiver)` pair. Dispatch a `CandidateValidationMessage::Validate(validation function, candidate,
pov, BACKING_EXECUTION_TIMEOUT, sender)` and listen on the receiver for a response.
### Distribute Signed Statement
@@ -1,18 +1,16 @@
# Statement Distribution (Legacy)
This describes the legacy, backwards-compatible version of the Statement
Distribution subsystem.
This describes the legacy, backwards-compatible version of the Statement Distribution subsystem.
**Note:** All the V1 (legacy) code was extracted out to a `legacy_v1` module of
the `statement-distribution` crate, which doesn't alter any logic. V2 (new
protocol) peers also run `legacy_v1` and communicate with V1 peers using V1
messages and with V2 peers using V2 messages. Once the runtime upgrade goes
through on all networks, this `legacy_v1` code will no longer be triggered and
will be vestigial and can be removed.
**Note:** All the V1 (legacy) code was extracted out to a `legacy_v1` module of the `statement-distribution` crate,
which doesn't alter any logic. V2 (new protocol) peers also run `legacy_v1` and communicate with V1 peers using V1
messages and with V2 peers using V2 messages. Once the runtime upgrade goes through on all networks, this `legacy_v1`
code will no longer be triggered and will be vestigial and can be removed.
## Overview
The Statement Distribution Subsystem is responsible for distributing statements about seconded candidates between validators.
The Statement Distribution Subsystem is responsible for distributing statements about seconded candidates between
validators.
## Protocol
@@ -31,89 +29,133 @@ Output:
## Functionality
Implemented as a gossip protocol. Handles updates to our view and peers' views. Neighbor packets are used to inform peers which chain heads we are interested in data for.
Implemented as a gossip protocol. Handles updates to our view and peers' views. Neighbor packets are used to inform
peers which chain heads we are interested in data for.
The Statement Distribution Subsystem is responsible for distributing signed statements that we have generated and for forwarding statements generated by other validators. It also detects a variety of Validator misbehaviors for reporting to the [Provisioner Subsystem](../utility/provisioner.md). During the Backing stage of the inclusion pipeline, Statement Distribution is the main point of contact with peer nodes. On receiving a signed statement from a peer in the same backing group, assuming the peer receipt state machine is in an appropriate state, it sends the Candidate Receipt to the [Candidate Backing subsystem](candidate-backing.md) to handle the validator's statement. On receiving `StatementDistributionMessage::Share` we make sure to send messages to our backing group in addition to random other peers, to ensure a fast backing process and getting all statements quickly for distribution.
The Statement Distribution Subsystem is responsible for distributing signed statements that we have generated and for
forwarding statements generated by other validators. It also detects a variety of Validator misbehaviors for reporting
to the [Provisioner Subsystem](../utility/provisioner.md). During the Backing stage of the inclusion pipeline, Statement
Distribution is the main point of contact with peer nodes. On receiving a signed statement from a peer in the same
backing group, assuming the peer receipt state machine is in an appropriate state, it sends the Candidate Receipt to the
[Candidate Backing subsystem](candidate-backing.md) to handle the validator's statement. On receiving
`StatementDistributionMessage::Share` we make sure to send messages to our backing group in addition to random other
peers, to ensure a fast backing process and getting all statements quickly for distribution.
This subsystem tracks equivocating validators and stops accepting information from them. It establishes a data-dependency order:
This subsystem tracks equivocating validators and stops accepting information from them. It establishes a
data-dependency order:
- In order to receive a `Seconded` message we have the corresponding chain head in our view
- In order to receive a `Valid` message we must have received the corresponding `Seconded` message.
And respect this data-dependency order from our peers by respecting their views. This subsystem is responsible for checking message signatures.
And respect this data-dependency order from our peers by respecting their views. This subsystem is responsible for
checking message signatures.
The Statement Distribution subsystem sends statements to peer nodes.
## Peer Receipt State Machine
There is a very simple state machine which governs which messages we are willing to receive from peers. Not depicted in the state machine: on initial receipt of any [`SignedFullStatement`](../../types/backing.md#signed-statement-type), validate that the provided signature does in fact sign the included data. Note that each individual parablock candidate gets its own instance of this state machine; it is perfectly legal to receive a `Valid(X)` before a `Seconded(Y)`, as long as a `Seconded(X)` has been received.
There is a very simple state machine which governs which messages we are willing to receive from peers. Not depicted in
the state machine: on initial receipt of any [`SignedFullStatement`](../../types/backing.md#signed-statement-type),
validate that the provided signature does in fact sign the included data. Note that each individual parablock candidate
gets its own instance of this state machine; it is perfectly legal to receive a `Valid(X)` before a `Seconded(Y)`, as
long as a `Seconded(X)` has been received.
A: Initial State. Receive `SignedFullStatement(Statement::Second)`: extract `Statement`, forward to Candidate Backing, proceed to B. Receive any other `SignedFullStatement` variant: drop it.
A: Initial State. Receive `SignedFullStatement(Statement::Second)`: extract `Statement`, forward to Candidate Backing,
proceed to B. Receive any other `SignedFullStatement` variant: drop it.
B: Receive any `SignedFullStatement`: check signature and determine whether the statement is new to us. if new, forward to Candidate Backing and circulate to other peers. Receive `OverseerMessage::StopWork`: proceed to C.
B: Receive any `SignedFullStatement`: check signature and determine whether the statement is new to us. if new, forward
to Candidate Backing and circulate to other peers. Receive `OverseerMessage::StopWork`: proceed to C.
C: Receive any message for this block: drop it.
For large statements (see below), we also keep track of the total received large
statements per peer and have a hard limit on that number for flood protection.
This is necessary as in the current code we only forward statements once we have
all the data, therefore flood protection for large statement is a bit more
subtle. This will become an obsolete problem once [off chain code
upgrades](https://github.com/paritytech/polkadot/issues/2979) are implemented.
For large statements (see below), we also keep track of the total received large statements per peer and have a hard
limit on that number for flood protection. This is necessary as in the current code we only forward statements once we
have all the data, therefore flood protection for large statement is a bit more subtle. This will become an obsolete
problem once [off chain code upgrades](https://github.com/paritytech/polkadot/issues/2979) are implemented.
## Peer Knowledge Tracking
The peer receipt state machine implies that for parsimony of network resources, we should model the knowledge of our peers, and help them out. For example, let's consider a case with peers A, B, and C, validators X and Y, and candidate M. A sends us a `Statement::Second(M)` signed by X. We've double-checked it, and it's valid. While we're checking it, we receive a copy of X's `Statement::Second(M)` from `B`, along with a `Statement::Valid(M)` signed by Y.
The peer receipt state machine implies that for parsimony of network resources, we should model the knowledge of our
peers, and help them out. For example, let's consider a case with peers A, B, and C, validators X and Y, and candidate
M. A sends us a `Statement::Second(M)` signed by X. We've double-checked it, and it's valid. While we're checking it, we
receive a copy of X's `Statement::Second(M)` from `B`, along with a `Statement::Valid(M)` signed by Y.
Our response to A is just the `Statement::Valid(M)` signed by Y. However, we haven't heard anything about this from C. Therefore, we send it everything we have: first a copy of X's `Statement::Second`, then Y's `Statement::Valid`.
Our response to A is just the `Statement::Valid(M)` signed by Y. However, we haven't heard anything about this from C.
Therefore, we send it everything we have: first a copy of X's `Statement::Second`, then Y's `Statement::Valid`.
This system implies a certain level of duplication of messages--we received X's `Statement::Second` from both our peers, and C may experience the same--but it minimizes the degree to which messages are simply dropped.
This system implies a certain level of duplication of messages--we received X's `Statement::Second` from both our peers,
and C may experience the same--but it minimizes the degree to which messages are simply dropped.
And respect this data-dependency order from our peers. This subsystem is responsible for checking message signatures.
No jobs. We follow view changes from the [`NetworkBridge`](../utility/network-bridge.md), which in turn is updated by the overseer.
No jobs. We follow view changes from the [`NetworkBridge`](../utility/network-bridge.md), which in turn is updated by
the overseer.
## Equivocations and Flood Protection
An equivocation is a double-vote by a validator. The [Candidate Backing](candidate-backing.md) Subsystem is better-suited than this one to detect equivocations as it adds votes to quorum trackers.
An equivocation is a double-vote by a validator. The [Candidate Backing](candidate-backing.md) Subsystem is
better-suited than this one to detect equivocations as it adds votes to quorum trackers.
At this level, we are primarily concerned about flood-protection, and to some extent, detecting equivocations is a part of that. In particular, we are interested in detecting equivocations of `Seconded` statements. Since every other statement is dependent on `Seconded` statements, ensuring that we only ever hold a bounded number of `Seconded` statements is sufficient for flood-protection.
At this level, we are primarily concerned about flood-protection, and to some extent, detecting equivocations is a part
of that. In particular, we are interested in detecting equivocations of `Seconded` statements. Since every other
statement is dependent on `Seconded` statements, ensuring that we only ever hold a bounded number of `Seconded`
statements is sufficient for flood-protection.
The simple approach is to say that we only receive up to two `Seconded` statements per validator per chain head. However, the marginal cost of equivocation, conditional on having already equivocated, is close to 0, since a single double-vote offence is counted as all double-vote offences for a particular chain-head. Even if it were not, there is some amount of equivocations that can be done such that the marginal cost of issuing further equivocations is close to 0, as there would be an amount of equivocations necessary to be completely and totally obliterated by the slashing algorithm. We fear the validator with nothing left to lose.
The simple approach is to say that we only receive up to two `Seconded` statements per validator per chain head.
However, the marginal cost of equivocation, conditional on having already equivocated, is close to 0, since a single
double-vote offence is counted as all double-vote offences for a particular chain-head. Even if it were not, there is
some amount of equivocations that can be done such that the marginal cost of issuing further equivocations is close to
0, as there would be an amount of equivocations necessary to be completely and totally obliterated by the slashing
algorithm. We fear the validator with nothing left to lose.
With that in mind, this simple approach has a caveat worth digging deeper into.
First: We may be aware of two equivocated `Seconded` statements issued by a validator. A totally honest peer of ours can also be aware of one or two different `Seconded` statements issued by the same validator. And yet another peer may be aware of one or two _more_ `Seconded` statements. And so on. This interacts badly with pre-emptive sending logic. Upon sending a `Seconded` statement to a peer, we will want to pre-emptively follow up with all statements relative to that candidate. Waiting for acknowledgment introduces latency at every hop, so that is best avoided. What can happen is that upon receipt of the `Seconded` statement, the peer will discard it as it falls beyond the bound of 2 that it is allowed to store. It cannot store anything in memory about discarded candidates as that would introduce a DoS vector. Then, the peer would receive from us all of the statements pertaining to that candidate, which, from its perspective, would be undesired - they are data-dependent on the `Seconded` statement we sent them, but they have erased all record of that from their memory. Upon receiving a potential flood of undesired statements, this 100% honest peer may choose to disconnect from us. In this way, an adversary may be able to partition the network with careful distribution of equivocated `Seconded` statements.
First: We may be aware of two equivocated `Seconded` statements issued by a validator. A totally honest peer of ours can
also be aware of one or two different `Seconded` statements issued by the same validator. And yet another peer may be
aware of one or two _more_ `Seconded` statements. And so on. This interacts badly with pre-emptive sending logic. Upon
sending a `Seconded` statement to a peer, we will want to pre-emptively follow up with all statements relative to that
candidate. Waiting for acknowledgment introduces latency at every hop, so that is best avoided. What can happen is that
upon receipt of the `Seconded` statement, the peer will discard it as it falls beyond the bound of 2 that it is allowed
to store. It cannot store anything in memory about discarded candidates as that would introduce a DoS vector. Then, the
peer would receive from us all of the statements pertaining to that candidate, which, from its perspective, would be
undesired - they are data-dependent on the `Seconded` statement we sent them, but they have erased all record of that
from their memory. Upon receiving a potential flood of undesired statements, this 100% honest peer may choose to
disconnect from us. In this way, an adversary may be able to partition the network with careful distribution of
equivocated `Seconded` statements.
The fix is to track, per-peer, the hashes of up to 4 candidates per validator (per relay-parent) that the peer is aware of. It is 4 because we may send them 2 and they may send us 2 different ones. We track the data that they are aware of as the union of things we have sent them and things they have sent us. If we receive a 1st or 2nd `Seconded` statement from a peer, we note it in the peer's known candidates even if we do disregard the data locally. And then, upon receipt of any data dependent on that statement, we do not reduce that peer's standing in our eyes, as the data was not undesired.
The fix is to track, per-peer, the hashes of up to 4 candidates per validator (per relay-parent) that the peer is aware
of. It is 4 because we may send them 2 and they may send us 2 different ones. We track the data that they are aware of
as the union of things we have sent them and things they have sent us. If we receive a 1st or 2nd `Seconded` statement
from a peer, we note it in the peer's known candidates even if we do disregard the data locally. And then, upon receipt
of any data dependent on that statement, we do not reduce that peer's standing in our eyes, as the data was not
undesired.
There is another caveat to the fix: we don't want to allow the peer to flood us because it has set things up in a way that it knows we will drop all of its traffic.
We also track how many statements we have received per peer, per candidate, and per chain-head. This is any statement concerning a particular candidate: `Seconded`, `Valid`, or `Invalid`. If we ever receive a statement from a peer which would push any of these counters beyond twice the amount of validators at the chain-head, we begin to lower the peer's standing and eventually disconnect. This bound is a massive overestimate and could be reduced to twice the number of validators in the corresponding validator group. It is worth noting that the goal at the time of writing is to ensure any finite bound on the amount of stored data, as any equivocation results in a large slash.
There is another caveat to the fix: we don't want to allow the peer to flood us because it has set things up in a way
that it knows we will drop all of its traffic. We also track how many statements we have received per peer, per
candidate, and per chain-head. This is any statement concerning a particular candidate: `Seconded`, `Valid`, or
`Invalid`. If we ever receive a statement from a peer which would push any of these counters beyond twice the amount of
validators at the chain-head, we begin to lower the peer's standing and eventually disconnect. This bound is a massive
overestimate and could be reduced to twice the number of validators in the corresponding validator group. It is worth
noting that the goal at the time of writing is to ensure any finite bound on the amount of stored data, as any
equivocation results in a large slash.
## Large statements
Seconded statements can become quite large on parachain runtime upgrades for
example. For this reason, there exists a `LargeStatement` constructor for the
`StatementDistributionMessage` wire message, which only contains light metadata
of a statement. The actual candidate data is not included. This message type is
used whenever a message is deemed large. The receiver of such a message needs to
request the actual payload via request/response by means of a
Seconded statements can become quite large on parachain runtime upgrades for example. For this reason, there exists a
`LargeStatement` constructor for the `StatementDistributionMessage` wire message, which only contains light metadata of
a statement. The actual candidate data is not included. This message type is used whenever a message is deemed large.
The receiver of such a message needs to request the actual payload via request/response by means of a
`StatementFetchingV1` request.
This is necessary as distribution of a large payload (mega bytes) via gossip
would make the network collapse and timely distribution of statements would no
longer be possible. By using request/response it is ensured that each peer only
transferes large data once. We only take good care to detect an overloaded
peer early and immediately move on to a different peer for fetching the data.
This mechanism should result in a good load distribution and therefore a rather
This is necessary as distribution of a large payload (mega bytes) via gossip would make the network collapse and timely
distribution of statements would no longer be possible. By using request/response it is ensured that each peer only
transferes large data once. We only take good care to detect an overloaded peer early and immediately move on to a
different peer for fetching the data. This mechanism should result in a good load distribution and therefore a rather
optimal distribution path.
With these optimizations, distribution of payloads in the size of up to 3 to 4
MB should work with Kusama validator specifications. For scaling up even more,
runtime upgrades and message passing should be done off chain at some point.
With these optimizations, distribution of payloads in the size of up to 3 to 4 MB should work with Kusama validator
specifications. For scaling up even more, runtime upgrades and message passing should be done off chain at some point.
Flood protection considerations: For making DoS attacks slightly harder on this
subsystem, nodes will only respond to large statement requests, when they
previously notified that peer via gossip about that statement. So, it is not
possible to DoS nodes at scale, by requesting candidate data over and over
again.
Flood protection considerations: For making DoS attacks slightly harder on this subsystem, nodes will only respond to
large statement requests, when they previously notified that peer via gossip about that statement. So, it is not
possible to DoS nodes at scale, by requesting candidate data over and over again.
@@ -1,158 +1,127 @@
# Statement Distribution
This subsystem is responsible for distributing signed statements that we have generated and forwarding statements generated by our peers. Received candidate receipts and statements are passed to the [Candidate Backing subsystem](candidate-backing.md) to handle producing local statements. On receiving `StatementDistributionMessage::Share`, this subsystem distributes the message across the network with redundency to ensure a fast backing process.
This subsystem is responsible for distributing signed statements that we have generated and forwarding statements
generated by our peers. Received candidate receipts and statements are passed to the [Candidate Backing
subsystem](candidate-backing.md) to handle producing local statements. On receiving
`StatementDistributionMessage::Share`, this subsystem distributes the message across the network with redundency to
ensure a fast backing process.
## Overview
**Goal:** every well-connected node is aware of every next potential parachain
block.
**Goal:** every well-connected node is aware of every next potential parachain block.
Validators can either:
- receive parachain block from collator, check block, and gossip statement.
- receive statements from other validators, check the parachain block if it
originated within their own group, gossip forward statement if valid.
- receive statements from other validators, check the parachain block if it originated within their own group, gossip
forward statement if valid.
Validators must have statements, candidates, and persisted validation from all
other validators. This is because we need to store statements from validators
who've checked the candidate on the relay chain, so we know who to hold
accountable in case of disputes. Any validator can be selected as the next
relay-chain block author, and this is not revealed in advance for security
reasons. As a result, all validators must have a up to date view of all possible
parachain candidates + backing statements that could be placed on-chain in the
next block.
Validators must have statements, candidates, and persisted validation from all other validators. This is because we need
to store statements from validators who've checked the candidate on the relay chain, so we know who to hold accountable
in case of disputes. Any validator can be selected as the next relay-chain block author, and this is not revealed in
advance for security reasons. As a result, all validators must have a up to date view of all possible parachain
candidates + backing statements that could be placed on-chain in the next block.
[This blog post](https://polkadot.network/blog/polkadot-v1-0-sharding-and-economic-security)
puts it another way: "Validators who aren't assigned to the parachain still
listen for the attestations [statements] because whichever validator ends up
being the author of the relay-chain block needs to bundle up attested parachain
blocks for several parachains and place them into the relay-chain block."
[This blog post](https://polkadot.network/blog/polkadot-v1-0-sharding-and-economic-security) puts it another way:
"Validators who aren't assigned to the parachain still listen for the attestations [statements] because whichever
validator ends up being the author of the relay-chain block needs to bundle up attested parachain blocks for several
parachains and place them into the relay-chain block."
Backing-group quorum (that is, enough backing group votes) must be reached
before the block author will consider the candidate. Therefore, validators need
to consider _all_ seconded candidates within their own group, because that's
what they're assigned to work on. Validators only need to consider _backable_
candidates from other groups. This informs the design of the statement
distribution protocol to have separate phases for in-group and out-group
distribution, respectively called "cluster" and "grid" mode (see below).
Backing-group quorum (that is, enough backing group votes) must be reached before the block author will consider the
candidate. Therefore, validators need to consider _all_ seconded candidates within their own group, because that's what
they're assigned to work on. Validators only need to consider _backable_ candidates from other groups. This informs the
design of the statement distribution protocol to have separate phases for in-group and out-group distribution,
respectively called "cluster" and "grid" mode (see below).
### With Async Backing
Asynchronous backing changes the runtime to accept parachain candidates from a
certain allowed range of historic relay-parents. These candidates must be backed
by the group assigned to the parachain as-of their corresponding relay parents.
Asynchronous backing changes the runtime to accept parachain candidates from a certain allowed range of historic
relay-parents. These candidates must be backed by the group assigned to the parachain as-of their corresponding relay
parents.
## Protocol
To address the concern of dealing with large numbers of spam candidates or
statements, the overall design approach is to combine a focused "clustering"
protocol for legitimate fresh candidates with a broad-distribution "grid"
protocol to quickly get backed candidates into the hands of many validators.
Validators do not eagerly send each other heavy `CommittedCandidateReceipt`,
but instead request these lazily through request/response protocols.
To address the concern of dealing with large numbers of spam candidates or statements, the overall design approach is to
combine a focused "clustering" protocol for legitimate fresh candidates with a broad-distribution "grid" protocol to
quickly get backed candidates into the hands of many validators. Validators do not eagerly send each other heavy
`CommittedCandidateReceipt`, but instead request these lazily through request/response protocols.
A high-level description of the protocol follows:
### Messages
Nodes can send each other a few kinds of messages: `Statement`,
`BackedCandidateManifest`, `BackedCandidateAcknowledgement`.
Nodes can send each other a few kinds of messages: `Statement`, `BackedCandidateManifest`,
`BackedCandidateAcknowledgement`.
- `Statement` messages contain only a signed compact statement, without full
candidate info.
- `BackedCandidateManifest` messages advertise a description of a backed
candidate and stored statements.
- `BackedCandidateAcknowledgement` messages acknowledge that a backed candidate
is fully known.
- `Statement` messages contain only a signed compact statement, without full candidate info.
- `BackedCandidateManifest` messages advertise a description of a backed candidate and stored statements.
- `BackedCandidateAcknowledgement` messages acknowledge that a backed candidate is fully known.
### Request/response protocol
Nodes can request the full `CommittedCandidateReceipt` and
`PersistedValidationData`, along with statements, over a request/response
protocol. This is the `AttestedCandidateRequest`; the response is
`AttestedCandidateResponse`.
Nodes can request the full `CommittedCandidateReceipt` and `PersistedValidationData`, along with statements, over a
request/response protocol. This is the `AttestedCandidateRequest`; the response is `AttestedCandidateResponse`.
### Importability and the Hypothetical Frontier
The **prospective parachains** subsystem maintains prospective "fragment trees"
which can be used to determine whether a particular parachain candidate could
possibly be included in the future. Candidates which either are within a
fragment tree or _would be_ part of a fragment tree if accepted are said to be
in the "hypothetical frontier".
The **prospective parachains** subsystem maintains prospective "fragment trees" which can be used to determine whether a
particular parachain candidate could possibly be included in the future. Candidates which either are within a fragment
tree or _would be_ part of a fragment tree if accepted are said to be in the "hypothetical frontier".
The **statement-distribution** subsystem keeps track of all candidates, and
updates its knowledge of the hypothetical frontier based on events such as new
relay parents, new confirmed candidates, and newly backed candidates.
The **statement-distribution** subsystem keeps track of all candidates, and updates its knowledge of the hypothetical
frontier based on events such as new relay parents, new confirmed candidates, and newly backed candidates.
We only consider statements as "importable" when the corresponding candidate is
part of the hypothetical frontier, and only send "importable" statements to the
backing subsystem itself.
We only consider statements as "importable" when the corresponding candidate is part of the hypothetical frontier, and
only send "importable" statements to the backing subsystem itself.
### Cluster Mode
- Validator nodes are partitioned into groups (with some exceptions), and
validators within a group at a relay-parent can send each other `Statement`
messages for any candidates within that group and based on that relay-parent.
- Validator nodes are partitioned into groups (with some exceptions), and validators within a group at a relay-parent
can send each other `Statement` messages for any candidates within that group and based on that relay-parent.
- This is referred to as the "cluster" mode.
- Right now these are the same as backing groups, though "cluster"
specifically refers to the set of nodes communicating with each other in the
first phase of distribution.
- Right now these are the same as backing groups, though "cluster" specifically refers to the set of nodes
communicating with each other in the first phase of distribution.
- `Seconded` statements must be sent before `Valid` statements.
- `Seconded` statements may only be sent to other members of the group when the
candidate is fully known by the local validator.
- "Fully known" means the validator has the full `CommittedCandidateReceipt`
and `PersistedValidationData`, which it receives on request from other
validators or from a collator.
- The reason for this is that sending a statement (which is always a
`CompactStatement` carrying nothing but a hash and signature) to the
cluster, is also a signal that the sending node is available to request the
candidate from.
- This makes the protocol easier to reason about, while also reducing network
messages about candidates that don't really exist.
- Validators in a cluster receiving messages about unknown candidates request
the candidate (and statements) from other cluster members which have it.
- `Seconded` statements may only be sent to other members of the group when the candidate is fully known by the local
validator.
- "Fully known" means the validator has the full `CommittedCandidateReceipt` and `PersistedValidationData`, which it
receives on request from other validators or from a collator.
- The reason for this is that sending a statement (which is always a `CompactStatement` carrying nothing but a hash
and signature) to the cluster, is also a signal that the sending node is available to request the candidate from.
- This makes the protocol easier to reason about, while also reducing network messages about candidates that don't
really exist.
- Validators in a cluster receiving messages about unknown candidates request the candidate (and statements) from other
cluster members which have it.
- Spam considerations
- The maximum depth of candidates allowed in asynchronous backing determines
the maximum amount of `Seconded` statements originating from a validator V
which each validator in a cluster may send to others. This bounds the number
of candidates.
- There is a small number of validators in each group, which further limits
the amount of candidates.
- We accept candidates which don't fit in the fragment trees of any relay
parents.
- "Accept" means "attempt to request and store in memory until useful or
expired".
- We listen to prospective parachains subsystem to learn of new additions to
the fragment trees.
- The maximum depth of candidates allowed in asynchronous backing determines the maximum amount of `Seconded`
statements originating from a validator V which each validator in a cluster may send to others. This bounds the
number of candidates.
- There is a small number of validators in each group, which further limits the amount of candidates.
- We accept candidates which don't fit in the fragment trees of any relay parents.
- "Accept" means "attempt to request and store in memory until useful or expired".
- We listen to prospective parachains subsystem to learn of new additions to the fragment trees.
- Use this to attempt to import the candidate later.
### Grid Mode
- Every consensus session provides randomness and a fixed validator set, which
is used to build a redundant grid topology.
- It's redundant in the sense that there are 2 paths from every node to every
other node. See "Grid Topology" section for more details.
- This grid topology is used to create a sending path from each validator group
to every validator.
- When a node observes a candidate as backed, it sends a
`BackedCandidateManifest` to their "receiving" nodes.
- Every consensus session provides randomness and a fixed validator set, which is used to build a redundant grid
topology.
- It's redundant in the sense that there are 2 paths from every node to every other node. See "Grid Topology" section
for more details.
- This grid topology is used to create a sending path from each validator group to every validator.
- When a node observes a candidate as backed, it sends a `BackedCandidateManifest` to their "receiving" nodes.
- If receiving nodes don't yet know the candidate, they request it.
- Once they know the candidate, they respond with a
`BackedCandidateAcknowledgement`.
- Once two nodes perform a manifest/acknowledgement exchange, they can send
`Statement` messages directly to each other for any new statements they might
need.
- This limits the amount of statements we'd have to deal with w.r.t.
candidates that don't really exist. See "Manifest Exchange" section.
- There are limitations on the number of candidates that can be advertised by
each peer, similar to those in the cluster. Validators do not request
candidates which exceed these limitations.
- Validators request candidates as soon as they are advertised, but do not
import the statements until the candidate is part of the hypothetical
frontier, and do not re-advertise or acknowledge until the candidate is
considered both backable and part of the hypothetical frontier.
- Note that requesting is not an implicit acknowledgement, and an explicit
acknowledgement must be sent upon receipt.
- Once they know the candidate, they respond with a `BackedCandidateAcknowledgement`.
- Once two nodes perform a manifest/acknowledgement exchange, they can send `Statement` messages directly to each other
for any new statements they might need.
- This limits the amount of statements we'd have to deal with w.r.t. candidates that don't really exist. See "Manifest
Exchange" section.
- There are limitations on the number of candidates that can be advertised by each peer, similar to those in the
cluster. Validators do not request candidates which exceed these limitations.
- Validators request candidates as soon as they are advertised, but do not import the statements until the candidate is
part of the hypothetical frontier, and do not re-advertise or acknowledge until the candidate is considered both
backable and part of the hypothetical frontier.
- Note that requesting is not an implicit acknowledgement, and an explicit acknowledgement must be sent upon receipt.
## Messages
@@ -161,27 +130,23 @@ backing subsystem itself.
- `ActiveLeaves`
- Notification of a change in the set of active leaves.
- `StatementDistributionMessage::Share`
- Notification of a locally-originating statement. That is, this statement
comes from our node and should be distributed to other nodes.
- Sent by the Backing Subsystem after it successfully imports a
locally-originating statement.
- Notification of a locally-originating statement. That is, this statement comes from our node and should be
distributed to other nodes.
- Sent by the Backing Subsystem after it successfully imports a locally-originating statement.
- `StatementDistributionMessage::Backed`
- Notification of a candidate being backed (received enough validity votes
from the backing group).
- Sent by the Backing Subsystem after it successfully imports a statement for
the first time and after sending ~Share~.
- Notification of a candidate being backed (received enough validity votes from the backing group).
- Sent by the Backing Subsystem after it successfully imports a statement for the first time and after sending
~Share~.
- `StatementDistributionMessage::NetworkBridgeUpdate`
- See next section.
#### Network bridge events
- v1 compatibility
- Messages for the v1 protocol are routed to the legacy statement
distribution.
- Messages for the v1 protocol are routed to the legacy statement distribution.
- `Statement`
- Notification of a signed statement.
- Sent by a peer's Statement Distribution subsystem when circulating
statements.
- Sent by a peer's Statement Distribution subsystem when circulating statements.
- `BackedCandidateManifest`
- Notification of a backed candidate being known by the sending node.
- For the candidate being requested by the receiving node if needed.
@@ -196,26 +161,23 @@ backing subsystem itself.
### Outgoing
- `NetworkBridgeTxMessage::SendValidationMessages`
- Sends a peer all pending messages / acknowledgements / statements for a
relay parent, either through the cluster or the grid.
- Sends a peer all pending messages / acknowledgements / statements for a relay parent, either through the cluster or
the grid.
- `NetworkBridgeTxMessage::SendValidationMessage`
- Circulates a compact statement to all peers who need it, either through the
cluster or the grid.
- Circulates a compact statement to all peers who need it, either through the cluster or the grid.
- `NetworkBridgeTxMessage::ReportPeer`
- Reports a peer (either good or bad).
- `CandidateBackingMessage::Statement`
- Note a validator's statement about a particular candidate.
- `ProspectiveParachainsMessage::GetHypotheticalFrontier`
- Gets the hypothetical frontier membership of candidates under active leaves'
fragment trees.
- Gets the hypothetical frontier membership of candidates under active leaves' fragment trees.
- `NetworkBridgeTxMessage::SendRequests`
- Sends requests, initiating the request/response protocol.
## Request/Response
We also have a request/response protocol because validators do not eagerly send
each other heavy `CommittedCandidateReceipt`, but instead need to request these
lazily.
We also have a request/response protocol because validators do not eagerly send each other heavy
`CommittedCandidateReceipt`, but instead need to request these lazily.
### Protocol
@@ -225,16 +187,13 @@ lazily.
- Done as needed, when handling incoming manifests/statements.
- `RequestManager::dispatch_requests` sends any queued-up requests.
- Calls `RequestManager::next_request` to completion.
- Creates the `OutgoingRequest`, saves the receiver in
`RequestManager::pending_responses`.
- Does nothing if we have more responses pending than the limit of parallel
requests.
- Creates the `OutgoingRequest`, saves the receiver in `RequestManager::pending_responses`.
- Does nothing if we have more responses pending than the limit of parallel requests.
2. Peer
- Requests come in on a peer on the `IncomingRequestReceiver`.
- Runs in a background responder task which feeds requests to `answer_request`
through `MuxedMessage`.
- Runs in a background responder task which feeds requests to `answer_request` through `MuxedMessage`.
- This responder task has a limit on the number of parallel requests.
- `answer_request` on the peer takes the request and sends a response.
- Does this using the response sender on the request.
@@ -243,8 +202,7 @@ lazily.
- `receive_response` on the original validator yields a response.
- Response was sent on the request's response sender.
- Uses `RequestManager::await_incoming` to await on pending responses in an
unordered fashion.
- Uses `RequestManager::await_incoming` to await on pending responses in an unordered fashion.
- Runs on the `MuxedMessage` receiver.
- `handle_response` handles the response.
@@ -265,25 +223,23 @@ lazily.
## Manifests
A manifest is a message about a known backed candidate, along with a description
of the statements backing it. It can be one of two kinds:
A manifest is a message about a known backed candidate, along with a description of the statements backing it. It can be
one of two kinds:
- `Full`: Contains information about the candidate and should be sent to peers
who may not have the candidate yet. This is also called an `Announcement`.
- `Acknowledgement`: Omits information implicit in the candidate, and should be
sent to peers which are guaranteed to have the candidate already.
- `Full`: Contains information about the candidate and should be sent to peers who may not have the candidate yet. This
is also called an `Announcement`.
- `Acknowledgement`: Omits information implicit in the candidate, and should be sent to peers which are guaranteed to
have the candidate already.
### Manifest Exchange
Manifest exchange is when a receiving node received a `Full` manifest and
replied with an `Acknowledgement`. It indicates that both nodes know the
candidate as valid and backed. This allows the nodes to send `Statement`
messages directly to each other for any new statements.
Manifest exchange is when a receiving node received a `Full` manifest and replied with an `Acknowledgement`. It
indicates that both nodes know the candidate as valid and backed. This allows the nodes to send `Statement` messages
directly to each other for any new statements.
Why? This limits the amount of statements we'd have to deal with w.r.t.
candidates that don't really exist. Limiting out-of-group statement distribution
between peers to only candidates that both peers agree are backed and exist
ensures we only have to store statements about real candidates.
Why? This limits the amount of statements we'd have to deal with w.r.t. candidates that don't really exist. Limiting
out-of-group statement distribution between peers to only candidates that both peers agree are backed and exist ensures
we only have to store statements about real candidates.
In practice, manifest exchange means that one of three things have happened:
@@ -291,36 +247,31 @@ In practice, manifest exchange means that one of three things have happened:
- We announced, they acknowledged.
- We announced, they announced.
Concerning the last case, note that it is possible for two nodes to have each
other in their sending set. Consider:
Concerning the last case, note that it is possible for two nodes to have each other in their sending set. Consider:
```
1 2
3 4
```
If validators 2 and 4 are in group B, then there is a path `2->1->3` and
`4->3->1`. Therefore, 1 and 3 might send each other manifests for the same
candidate at the same time, without having seen the other's yet. This also
counts as a manifest exchange, but is only allowed to occur in this way.
If validators 2 and 4 are in group B, then there is a path `2->1->3` and `4->3->1`. Therefore, 1 and 3 might send each
other manifests for the same candidate at the same time, without having seen the other's yet. This also counts as a
manifest exchange, but is only allowed to occur in this way.
After the exchange is complete, we update pending statements. Pending statements
are those we know locally that the remote node does not.
After the exchange is complete, we update pending statements. Pending statements are those we know locally that the
remote node does not.
#### Alternative Paths Through The Topology
Nodes should send a `BackedCandidateAcknowledgement(CandidateHash,
StatementFilter)` notification to any peer which has sent a manifest, and the
candidate has been acquired by other means. This keeps alternative paths through
the topology open, which allows nodes to receive additional statements that come
later, but not after the candidate has been posted on-chain.
Nodes should send a `BackedCandidateAcknowledgement(CandidateHash, StatementFilter)` notification to any peer which has
sent a manifest, and the candidate has been acquired by other means. This keeps alternative paths through the topology
open, which allows nodes to receive additional statements that come later, but not after the candidate has been posted
on-chain.
This is mostly about the limitation that the runtime has no way for block
authors to post statements that come after the parablock is posted on-chain and
ensure those validators still get rewarded. Technically, we only need enough
statements to back the candidate and the manifest + request will provide that.
But more statements might come shortly afterwards, and we want those to end up
on-chain as well to ensure all validators in the group are rewarded.
This is mostly about the limitation that the runtime has no way for block authors to post statements that come after the
parablock is posted on-chain and ensure those validators still get rewarded. Technically, we only need enough statements
to back the candidate and the manifest + request will provide that. But more statements might come shortly afterwards,
and we want those to end up on-chain as well to ensure all validators in the group are rewarded.
For clarity, here is the full timeline:
@@ -333,52 +284,42 @@ For clarity, here is the full timeline:
## Cluster Module
The cluster module provides direct distribution of unbacked candidates within a
group. By utilizing this initial phase of propagating only within
clusters/groups, we bound the number of `Seconded` messages per validator per
relay-parent, helping us prevent spam. Validators can try to circumvent this,
but they would only consume a few KB of memory and it is trivially slashable on
chain.
The cluster module provides direct distribution of unbacked candidates within a group. By utilizing this initial phase
of propagating only within clusters/groups, we bound the number of `Seconded` messages per validator per relay-parent,
helping us prevent spam. Validators can try to circumvent this, but they would only consume a few KB of memory and it is
trivially slashable on chain.
The cluster module determines whether to accept/reject messages from other
validators in the same group. It keeps track of what we have sent to other
validators in the group, and pending statements. For the full protocol, see
"Protocol".
The cluster module determines whether to accept/reject messages from other validators in the same group. It keeps track
of what we have sent to other validators in the group, and pending statements. For the full protocol, see "Protocol".
## Grid Module
The grid module provides distribution of backed candidates and late statements
outside the backing group. For the full protocol, see the "Protocol" section.
The grid module provides distribution of backed candidates and late statements outside the backing group. For the full
protocol, see the "Protocol" section.
### Grid Topology
For distributing outside our cluster (aka backing group) we use a 2D grid
topology. This limits the amount of peers we send messages to, and handles
view updates.
For distributing outside our cluster (aka backing group) we use a 2D grid topology. This limits the amount of peers we
send messages to, and handles view updates.
The basic operation of the grid topology is that:
- A validator producing a message sends it to its row-neighbors and its
column-neighbors.
- A validator receiving a message originating from one of its row-neighbors
sends it to its column-neighbors.
- A validator receiving a message originating from one of its column-neighbors
sends it to its row-neighbors.
- A validator producing a message sends it to its row-neighbors and its column-neighbors.
- A validator receiving a message originating from one of its row-neighbors sends it to its column-neighbors.
- A validator receiving a message originating from one of its column-neighbors sends it to its row-neighbors.
This grid approach defines 2 unique paths for every validator to reach every
other validator in at most 2 hops, providing redundancy.
This grid approach defines 2 unique paths for every validator to reach every other validator in at most 2 hops,
providing redundancy.
Propagation follows these rules:
- Each node has a receiving set and a sending set. These are different for each
group. That is, if a node receives a candidate from group A, it checks if it
is allowed to receive from that node for candidates from group A.
- Each node has a receiving set and a sending set. These are different for each group. That is, if a node receives a
candidate from group A, it checks if it is allowed to receive from that node for candidates from group A.
- For groups that we are in, receive from nobody and send to our X/Y peers.
- For groups that we are not part of:
- We receive from any validator in the group we share a slice with and send to
the corresponding X/Y slice in the other dimension.
- For any validators we don't share a slice with, we receive from the nodes
which share a slice with them.
- We receive from any validator in the group we share a slice with and send to the corresponding X/Y slice in the
other dimension.
- For any validators we don't share a slice with, we receive from the nodes which share a slice with them.
### Example
@@ -391,81 +332,63 @@ For size 11, the matrix would be:
9 10
```
e.g. for index 10, the neighbors would be 1, 4, 7, 9 -- these are the nodes we
could directly communicate with (e.g. either send to or receive from).
e.g. for index 10, the neighbors would be 1, 4, 7, 9 -- these are the nodes we could directly communicate with (e.g.
either send to or receive from).
Now, which of these neighbors can 10 receive from? Recall that the
sending/receiving sets for 10 would be different for different groups. Here are
some hypothetical scenarios:
Now, which of these neighbors can 10 receive from? Recall that the sending/receiving sets for 10 would be different for
different groups. Here are some hypothetical scenarios:
- **Scenario 1:** 9 belongs to group A but not 10. Here, 10 can directly receive
candidates from group A from 9. 10 would propagate them to the nodes in {1, 4,
7} that are not in A.
- **Scenario 2:** 6 is in group A instead of 9, and 7 is not in group A. 10 can
receive group A messages from 7 or 9. 10 will try to relay these messages, but
7 and 9 together should have already propagated the message to all x/y
peers of 10. If so, then 10 will just receive acknowledgements in reply rather
than requests.
- **Scenario 3:** 10 itself is in group A. 10 would not receive candidates from
this group from any other nodes through the grid. It would itself send such
candidates to all its neighbors that are not in A.
- **Scenario 1:** 9 belongs to group A but not 10. Here, 10 can directly receive candidates from group A from 9. 10
would propagate them to the nodes in {1, 4, 7} that are not in A.
- **Scenario 2:** 6 is in group A instead of 9, and 7 is not in group A. 10 can receive group A messages from 7 or 9. 10
will try to relay these messages, but 7 and 9 together should have already propagated the message to all x/y peers of
10. If so, then 10 will just receive acknowledgements in reply rather than requests.
- **Scenario 3:** 10 itself is in group A. 10 would not receive candidates from this group from any other nodes through
the grid. It would itself send such candidates to all its neighbors that are not in A.
### Seconding Limit
The seconding limit is a per-validator limit. Before asynchronous backing, we
had a rule that every validator was only allowed to second one candidate per
relay parent. With asynchronous backing, we have a 'maximum depth' which makes
it possible to second multiple candidates per relay parent. The seconding limit
is set to `max depth + 1` to set an upper bound on candidates entering the
system.
The seconding limit is a per-validator limit. Before asynchronous backing, we had a rule that every validator was only
allowed to second one candidate per relay parent. With asynchronous backing, we have a 'maximum depth' which makes it
possible to second multiple candidates per relay parent. The seconding limit is set to `max depth + 1` to set an upper
bound on candidates entering the system.
## Candidates Module
The candidates module provides a tracker for all known candidates in the view,
whether they are confirmed or not, and how peers have advertised the candidates.
What is a confirmed candidate? It is a candidate for which we have the full
receipt and the persisted validation data. This module gets confirmed candidates
from two sources:
The candidates module provides a tracker for all known candidates in the view, whether they are confirmed or not, and
how peers have advertised the candidates. What is a confirmed candidate? It is a candidate for which we have the full
receipt and the persisted validation data. This module gets confirmed candidates from two sources:
- It can be that a validator fetched a collation directly from the collator and
validated it.
- The first time a validator gets an announcement for an unknown candidate, it
will send a request for the candidate. Upon receiving a response and
validating it (see `UnhandledResponse::validate_response`), it will mark the
candidate as confirmed.
- It can be that a validator fetched a collation directly from the collator and validated it.
- The first time a validator gets an announcement for an unknown candidate, it will send a request for the candidate.
Upon receiving a response and validating it (see `UnhandledResponse::validate_response`), it will mark the candidate
as confirmed.
## Requests Module
The requests module provides a manager for pending requests for candidate data,
as well as pending responses. See "Request/Response Protocol" for a high-level
description of the flow. See module-docs for full details.
The requests module provides a manager for pending requests for candidate data, as well as pending responses. See
"Request/Response Protocol" for a high-level description of the flow. See module-docs for full details.
## Glossary
- **Acknowledgement:** A partial manifest sent to a validator that already has the
candidate to inform them that the sending node also knows the candidate.
Concludes a manifest exchange.
- **Announcement:** A full manifest indicating that a backed candidate is known by
the sending node. Initiates a manifest exchange.
- **Acknowledgement:** A partial manifest sent to a validator that already has the candidate to inform them that the
sending node also knows the candidate. Concludes a manifest exchange.
- **Announcement:** A full manifest indicating that a backed candidate is known by the sending node. Initiates a
manifest exchange.
- **Attestation:** See "Statement".
- **Backable vs. Backed:**
- Note that we sometimes use "backed" to refer to candidates that are
"backable", but not yet backed on chain.
- **Backed** should technically mean that the parablock candidate and its
backing statements have been added to a relay chain block.
- **Backable** is when the necessary backing statements have been acquired but
those statements and the parablock candidate haven't been backed in a relay
chain block yet.
- **Fragment tree:** A parachain fragment not referenced by the relay-chain.
It is a tree of prospective parachain blocks.
- **Manifest:** A message about a known backed candidate, along with a
description of the statements backing it. There are two kinds of manifest,
`Acknowledgement` and `Announcement`. See "Manifests" section.
- Note that we sometimes use "backed" to refer to candidates that are "backable", but not yet backed on chain.
- **Backed** should technically mean that the parablock candidate and its backing statements have been added to a
relay chain block.
- **Backable** is when the necessary backing statements have been acquired but those statements and the parablock
candidate haven't been backed in a relay chain block yet.
- **Fragment tree:** A parachain fragment not referenced by the relay-chain. It is a tree of prospective parachain
blocks.
- **Manifest:** A message about a known backed candidate, along with a description of the statements backing it. There
are two kinds of manifest, `Acknowledgement` and `Announcement`. See "Manifests" section.
- **Peer:** Another validator that a validator is connected to.
- **Request/response:** A protocol used to lazily request and receive heavy
candidate data when needed.
- **Reputation:** Tracks reputation of peers. Applies annoyance cost and good
behavior benefits.
- **Request/response:** A protocol used to lazily request and receive heavy candidate data when needed.
- **Reputation:** Tracks reputation of peers. Applies annoyance cost and good behavior benefits.
- **Statement:** Signed statements that can be made about parachain candidates.
- **Seconded:** Proposal of a parachain candidate. Implicit validity vote.
- **Valid:** States that a parachain candidate is valid.
@@ -474,6 +397,5 @@ description of the flow. See module-docs for full details.
- **Explicit view** / **immediate view**
- The view a peer has of the relay chain heads and highest finalized block.
- **Implicit view**
- Derived from the immediate view. Composed of active leaves and minimum
relay-parents allowed for candidates of various parachains at those
leaves.
- Derived from the immediate view. Composed of active leaves and minimum relay-parents allowed for candidates of
various parachains at those leaves.
@@ -1,6 +1,8 @@
# Collators
Collators are special nodes which bridge a parachain to the relay chain. They are simultaneously full nodes of the parachain, and at least light clients of the relay chain. Their overall contribution to the system is the generation of Proofs of Validity for parachain candidates.
Collators are special nodes which bridge a parachain to the relay chain. They are simultaneously full nodes of the
parachain, and at least light clients of the relay chain. Their overall contribution to the system is the generation of
Proofs of Validity for parachain candidates.
The **Collation Generation** subsystem triggers collators to produce collations
and then forwards them to **Collator Protocol** to circulate to validators.
The **Collation Generation** subsystem triggers collators to produce collations and then forwards them to **Collator
Protocol** to circulate to validators.
@@ -1,17 +1,18 @@
# Collation Generation
The collation generation subsystem is executed on collator nodes and produces candidates to be distributed to validators. If configured to produce collations for a para, it produces collations and then feeds them to the [Collator Protocol][CP] subsystem, which handles the networking.
The collation generation subsystem is executed on collator nodes and produces candidates to be distributed to
validators. If configured to produce collations for a para, it produces collations and then feeds them to the [Collator
Protocol][CP] subsystem, which handles the networking.
## Protocol
Collation generation for Parachains currently works in the following way:
1. A new relay chain block is imported.
2. The collation generation subsystem checks if the core associated to
the parachain is free and if yes, continues.
3. Collation generation calls our collator callback, if present, to generate a PoV. If none exists, do nothing.
4. Authoring logic determines if the current node should build a PoV.
5. Build new PoV and give it back to collation generation.
1. A new relay chain block is imported.
2. The collation generation subsystem checks if the core associated to the parachain is free and if yes, continues.
3. Collation generation calls our collator callback, if present, to generate a PoV. If none exists, do nothing.
4. Authoring logic determines if the current node should build a PoV.
5. Build new PoV and give it back to collation generation.
## Messages
@@ -22,8 +23,7 @@ Collation generation for Parachains currently works in the following way:
- Triggers collation generation procedure outlined in "Protocol" section.
- `CollationGenerationMessage::Initialize`
- Initializes the subsystem. Carries a config.
- No more than one initialization message should ever be sent to the collation
generation subsystem.
- No more than one initialization message should ever be sent to the collation generation subsystem.
- Sent by a collator to initialize this subsystem.
- `CollationGenerationMessage::SubmitCollation`
- If the subsystem isn't initialized or the relay-parent is too old to be relevant, ignore the message.
@@ -37,7 +37,9 @@ Collation generation for Parachains currently works in the following way:
## Functionality
The process of generating a collation for a parachain is very parachain-specific. As such, the details of how to do so are left beyond the scope of this description. The subsystem should be implemented as an abstract wrapper, which is aware of this configuration:
The process of generating a collation for a parachain is very parachain-specific. As such, the details of how to do so
are left beyond the scope of this description. The subsystem should be implemented as an abstract wrapper, which is
aware of this configuration:
```rust
/// The output of a collator.
@@ -117,30 +119,24 @@ The configuration should be optional, to allow for the case where the node is no
- **Collation (output of a collator)**
- Contains the PoV (proof to verify the state transition of the
parachain) and other data.
- Contains the PoV (proof to verify the state transition of the parachain) and other data.
- **Collation result**
- Contains the collation, and an optional result sender for a
collation-seconded signal.
- Contains the collation, and an optional result sender for a collation-seconded signal.
- **Collation seconded signal**
- The signal that is returned when a collation was seconded by a
validator.
- The signal that is returned when a collation was seconded by a validator.
- **Collation function**
- Called with the relay chain block the parablock will be built on top
of.
- Called with the relay chain block the parablock will be built on top of.
- Called with the validation data.
- Provides information about the state of the parachain on the relay
chain.
- Provides information about the state of the parachain on the relay chain.
- **Collation generation config**
- Contains collator's authentication key, optional collator function, and
parachain ID.
- Contains collator's authentication key, optional collator function, and parachain ID.
[CP]: collator-protocol.md
@@ -1,16 +1,25 @@
# Collator Protocol
The Collator Protocol implements the network protocol by which collators and validators communicate. It is used by collators to distribute collations to validators and used by validators to accept collations by collators.
The Collator Protocol implements the network protocol by which collators and validators communicate. It is used by
collators to distribute collations to validators and used by validators to accept collations by collators.
Collator-to-Validator networking is more difficult than Validator-to-Validator networking because the set of possible collators for any given para is unbounded, unlike the validator set. Validator-to-Validator networking protocols can easily be implemented as gossip because the data can be bounded, and validators can authenticate each other by their `PeerId`s for the purposes of instantiating and accepting connections.
Collator-to-Validator networking is more difficult than Validator-to-Validator networking because the set of possible
collators for any given para is unbounded, unlike the validator set. Validator-to-Validator networking protocols can
easily be implemented as gossip because the data can be bounded, and validators can authenticate each other by their
`PeerId`s for the purposes of instantiating and accepting connections.
Since, at least at the level of the para abstraction, the collator-set for any given para is unbounded, validators need to make sure that they are receiving connections from capable and honest collators and that their bandwidth and time are not being wasted by attackers. Communicating across this trust-boundary is the most difficult part of this subsystem.
Since, at least at the level of the para abstraction, the collator-set for any given para is unbounded, validators need
to make sure that they are receiving connections from capable and honest collators and that their bandwidth and time are
not being wasted by attackers. Communicating across this trust-boundary is the most difficult part of this subsystem.
Validation of candidates is a heavy task, and furthermore, the [`PoV`][PoV] itself is a large piece of data. Empirically, `PoV`s are on the order of 10MB.
Validation of candidates is a heavy task, and furthermore, the [`PoV`][PoV] itself is a large piece of data.
Empirically, `PoV`s are on the order of 10MB.
> TODO: note the incremental validation function Ximin proposes at https://github.com/paritytech/polkadot/issues/1348
As this network protocol serves as a bridge between collators and validators, it communicates primarily with one subsystem on behalf of each. As a collator, this will receive messages from the [`CollationGeneration`][CG] subsystem. As a validator, this will communicate only with the [`CandidateBacking`][CB].
As this network protocol serves as a bridge between collators and validators, it communicates primarily with one
subsystem on behalf of each. As a collator, this will receive messages from the [`CollationGeneration`][CG] subsystem.
As a validator, this will communicate only with the [`CandidateBacking`][CB].
## Protocol
@@ -18,9 +27,9 @@ Input: [`CollatorProtocolMessage`][CPM]
Output:
- [`RuntimeApiMessage`][RAM]
- [`NetworkBridgeMessage`][NBM]
- [`CandidateBackingMessage`][CBM]
* [`RuntimeApiMessage`][RAM]
* [`NetworkBridgeMessage`][NBM]
* [`CandidateBackingMessage`][CBM]
## Functionality
@@ -28,7 +37,8 @@ This network protocol uses the `Collation` peer-set of the [`NetworkBridge`][NB]
It uses the [`CollatorProtocolV1Message`](../../types/network.md#collator-protocol) as its `WireMessage`
Since this protocol functions both for validators and collators, it is easiest to go through the protocol actions for each of them separately.
Since this protocol functions both for validators and collators, it is easiest to go through the protocol actions for
each of them separately.
Validators and collators.
```dot process
@@ -47,24 +57,44 @@ digraph {
### Collators
It is assumed that collators are only collating on a single parachain. Collations are generated by the [Collation Generation][CG] subsystem. We will keep up to one local collation per relay-parent, based on `DistributeCollation` messages. If the para is not scheduled on any core, at the relay-parent, or the relay-parent isn't in the active-leaves set, we ignore the message as it must be invalid in that case - although this indicates a logic error elsewhere in the node.
It is assumed that collators are only collating on a single parachain. Collations are generated by the [Collation
Generation][CG] subsystem. We will keep up to one local collation per relay-parent, based on `DistributeCollation`
messages. If the para is not scheduled on any core, at the relay-parent, or the relay-parent isn't in the active-leaves
set, we ignore the message as it must be invalid in that case - although this indicates a logic error elsewhere in the
node.
We keep track of the Para ID we are collating on as a collator. This starts as `None`, and is updated with each `CollateOn` message received. If the `ParaId` of a collation requested to be distributed does not match the one we expect, we ignore the message.
We keep track of the Para ID we are collating on as a collator. This starts as `None`, and is updated with each
`CollateOn` message received. If the `ParaId` of a collation requested to be distributed does not match the one we
expect, we ignore the message.
As with most other subsystems, we track the active leaves set by following `ActiveLeavesUpdate` signals.
For the purposes of actually distributing a collation, we need to be connected to the validators who are interested in collations on that `ParaId` at this point in time. We assume that there is a discovery API for connecting to a set of validators.
For the purposes of actually distributing a collation, we need to be connected to the validators who are interested in
collations on that `ParaId` at this point in time. We assume that there is a discovery API for connecting to a set of
validators.
As seen in the [Scheduler Module][SCH] of the runtime, validator groups are fixed for an entire session and their rotations across cores are predictable. Collators will want to do these things when attempting to distribute collations at a given relay-parent:
As seen in the [Scheduler Module][SCH] of the runtime, validator groups are fixed for an entire session and their
rotations across cores are predictable. Collators will want to do these things when attempting to distribute collations
at a given relay-parent:
* Determine which core the para collated-on is assigned to.
* Determine the group on that core.
* Issue a discovery request for the validators of the current group with[`NetworkBridgeMessage`][NBM]`::ConnectToValidators`.
* Issue a discovery request for the validators of the current group
with[`NetworkBridgeMessage`][NBM]`::ConnectToValidators`.
Once connected to the relevant peers for the current group assigned to the core (transitively, the para), advertise the collation to any of them which advertise the relay-parent in their view (as provided by the [Network Bridge][NB]). If any respond with a request for the full collation, provide it. However, we only send one collation at a time per relay parent, other requests need to wait. This is done to reduce the bandwidth requirements of a collator and also increases the chance to fully send the collation to at least one validator. From the point where one validator has received the collation and seconded it, it will also start to share this collation with other validators in its backing group. Upon receiving a view update from any of these peers which includes a relay-parent for which we have a collation that they will find relevant, advertise the collation to them if we haven't already.
Once connected to the relevant peers for the current group assigned to the core (transitively, the para), advertise the
collation to any of them which advertise the relay-parent in their view (as provided by the [Network Bridge][NB]). If
any respond with a request for the full collation, provide it. However, we only send one collation at a time per relay
parent, other requests need to wait. This is done to reduce the bandwidth requirements of a collator and also increases
the chance to fully send the collation to at least one validator. From the point where one validator has received the
collation and seconded it, it will also start to share this collation with other validators in its backing group. Upon
receiving a view update from any of these peers which includes a relay-parent for which we have a collation that they
will find relevant, advertise the collation to them if we haven't already.
### Validators
On the validator side of the protocol, validators need to accept incoming connections from collators. They should keep some peer slots open for accepting new speculative connections from collators and should disconnect from collators who are not relevant.
On the validator side of the protocol, validators need to accept incoming connections from collators. They should keep
some peer slots open for accepting new speculative connections from collators and should disconnect from collators who
are not relevant.
```dot process
digraph G {
@@ -98,32 +128,62 @@ digraph G {
}
```
When peers connect to us, they can `Declare` that they represent a collator with given public key and intend to collate on a specific para ID. Once they've declared that, and we checked their signature, they can begin to send advertisements of collations. The peers should not send us any advertisements for collations that are on a relay-parent outside of our view or for a para outside of the one they've declared.
When peers connect to us, they can `Declare` that they represent a collator with given public key and intend to collate
on a specific para ID. Once they've declared that, and we checked their signature, they can begin to send advertisements
of collations. The peers should not send us any advertisements for collations that are on a relay-parent outside of our
view or for a para outside of the one they've declared.
The protocol tracks advertisements received and the source of the advertisement. The advertisement source is the `PeerId` of the peer who sent the message. We accept one advertisement per collator per source per relay-parent.
The protocol tracks advertisements received and the source of the advertisement. The advertisement source is the
`PeerId` of the peer who sent the message. We accept one advertisement per collator per source per relay-parent.
As a validator, we will handle requests from other subsystems to fetch a collation on a specific `ParaId` and relay-parent. These requests are made with the request response protocol `CollationFetchingRequest` request. To do so, we need to first check if we have already gathered a collation on that `ParaId` and relay-parent. If not, we need to select one of the advertisements and issue a request for it. If we've already issued a request, we shouldn't issue another one until the first has returned.
As a validator, we will handle requests from other subsystems to fetch a collation on a specific `ParaId` and
relay-parent. These requests are made with the request response protocol `CollationFetchingRequest` request. To do so,
we need to first check if we have already gathered a collation on that `ParaId` and relay-parent. If not, we need to
select one of the advertisements and issue a request for it. If we've already issued a request, we shouldn't issue
another one until the first has returned.
When acting on an advertisement, we issue a `Requests::CollationFetchingV1`. However, we only request one collation at a time per relay parent. This reduces the bandwidth requirements and as we can second only one candidate per relay parent, the others are probably not required anyway. If the request times out, we need to note the collator as being unreliable and reduce its priority relative to other collators.
When acting on an advertisement, we issue a `Requests::CollationFetchingV1`. However, we only request one collation at a
time per relay parent. This reduces the bandwidth requirements and as we can second only one candidate per relay parent,
the others are probably not required anyway. If the request times out, we need to note the collator as being unreliable
and reduce its priority relative to other collators.
As a validator, once the collation has been fetched some other subsystem will inspect and do deeper validation of the collation. The subsystem will report to this subsystem with a [`CollatorProtocolMessage`][CPM]`::ReportCollator`. In that case, if we are connected directly to the collator, we apply a cost to the `PeerId` associated with the collator and potentially disconnect or blacklist it. If the collation is seconded, we notify the collator and apply a benefit to the `PeerId` associated with the collator.
As a validator, once the collation has been fetched some other subsystem will inspect and do deeper validation of the
collation. The subsystem will report to this subsystem with a [`CollatorProtocolMessage`][CPM]`::ReportCollator`. In
that case, if we are connected directly to the collator, we apply a cost to the `PeerId` associated with the collator
and potentially disconnect or blacklist it. If the collation is seconded, we notify the collator and apply a benefit to
the `PeerId` associated with the collator.
### Interaction with [Candidate Backing][CB]
As collators advertise the availability, a validator will simply second the first valid parablock candidate per relay head by sending a [`CandidateBackingMessage`][CBM]`::Second`. Note that this message contains the relay parent of the advertised collation, the candidate receipt and the [PoV][PoV].
As collators advertise the availability, a validator will simply second the first valid parablock candidate per relay
head by sending a [`CandidateBackingMessage`][CBM]`::Second`. Note that this message contains the relay parent of the
advertised collation, the candidate receipt and the [PoV][PoV].
Subsequently, once a valid parablock candidate has been seconded, the [`CandidateBacking`][CB] subsystem will send a [`CollatorProtocolMessage`][CPM]`::Seconded`, which will trigger this subsystem to notify the collator at the `PeerId` that first advertised the parablock on the seconded relay head of their successful seconding.
Subsequently, once a valid parablock candidate has been seconded, the [`CandidateBacking`][CB] subsystem will send a
[`CollatorProtocolMessage`][CPM]`::Seconded`, which will trigger this subsystem to notify the collator at the `PeerId`
that first advertised the parablock on the seconded relay head of their successful seconding.
## Future Work
Several approaches have been discussed, but all have some issues:
- The current approach is very straightforward. However, that protocol is vulnerable to a single collator which, as an attack or simply through chance, gets its block candidate to the node more often than its fair share of the time.
- If collators produce blocks via Aura, BABE or in future Sassafras, it may be possible to choose an "Official" collator for the round, but it may be tricky to ensure that the PVF logic is enforced at collator leader election.
- We could use relay-chain BABE randomness to generate some delay `D` on the order of 1 second, +- 1 second. The collator would then second the first valid parablock which arrives after `D`, or in case none has arrived by `2*D`, the last valid parablock which has arrived. This makes it very hard for a collator to game the system to always get its block nominated, but it reduces the maximum throughput of the system by introducing delay into an already tight schedule.
- A variation of that scheme would be to have a fixed acceptance window `D` for parablock candidates and keep track of count `C`: the number of parablock candidates received. At the end of the period `D`, we choose a random number I in the range `[0, C)` and second the block at Index I. Its drawback is the same: it must wait the full `D` period before seconding any of its received candidates, reducing throughput.
- In order to protect against DoS attacks, it may be prudent to run throw out collations from collators that have behaved poorly (whether recently or historically) and subsequently only verify the PoV for the most suitable of collations.
* The current approach is very straightforward. However, that protocol is vulnerable to a single collator which, as an
attack or simply through chance, gets its block candidate to the node more often than its fair share of the time.
* If collators produce blocks via Aura, BABE or in future Sassafras, it may be possible to choose an "Official" collator
for the round, but it may be tricky to ensure that the PVF logic is enforced at collator leader election.
* We could use relay-chain BABE randomness to generate some delay `D` on the order of 1 second, +* 1 second. The
collator would then second the first valid parablock which arrives after `D`, or in case none has arrived by `2*D`,
the last valid parablock which has arrived. This makes it very hard for a collator to game the system to always get
its block nominated, but it reduces the maximum throughput of the system by introducing delay into an already tight
schedule.
* A variation of that scheme would be to have a fixed acceptance window `D` for parablock candidates and keep track of
count `C`: the number of parablock candidates received. At the end of the period `D`, we choose a random number I in
the range `[0, C)` and second the block at Index I. Its drawback is the same: it must wait the full `D` period before
seconding any of its received candidates, reducing throughput.
* In order to protect against DoS attacks, it may be prudent to run throw out collations from collators that have
behaved poorly (whether recently or historically) and subsequently only verify the PoV for the most suitable of
collations.
[CB]: ../backing/candidate-backing.md
[CBM]: ../../types/overseer-protocol.md#candidate-backing-mesage
@@ -4,12 +4,12 @@ If approval voting finds an invalid candidate, a dispute is raised. The disputes
subsystems are concerned with the following:
1. Disputes can be raised
2. Disputes (votes) get propagated to all other validators
3. Votes get recorded as necessary
3. Nodes will participate in disputes in a sensible fashion
4. Finality is stopped while a candidate is being disputed on chain
5. Chains can be reverted in case a dispute concludes invalid
6. Votes are provided to the provisioner for importing on chain, in order for
1. Disputes (votes) get propagated to all other validators
1. Votes get recorded as necessary
1. Nodes will participate in disputes in a sensible fashion
1. Finality is stopped while a candidate is being disputed on chain
1. Chains can be reverted in case a dispute concludes invalid
1. Votes are provided to the provisioner for importing on chain, in order for
slashing to work.
The dispute-coordinator subsystem interfaces with the provisioner and chain
File diff suppressed because it is too large Load Diff
@@ -202,8 +202,8 @@ the dispute-coordinator already knows about the dispute.
Goal 3 and 4 are obviously very related and both can easily be solved via rate
limiting as we shall see below. Rate limits should already be implemented at the
substrate level, but [are not](https://github.com/paritytech/substrate/issues/7750)
at the time of writing. But even if they were, the enforced substrate limits would
Substrate level, but [are not](https://github.com/paritytech/substrate/issues/7750)
at the time of writing. But even if they were, the enforced Substrate limits would
likely not be configurable and thus would still be to high for our needs as we can
rely on the following observations:
@@ -282,10 +282,10 @@ well, we will do the following:
to assume this is concerning a new dispute.
2. We open a batch and start collecting incoming messages for that candidate,
instead of immediately forwarding.
4. We keep collecting votes in the batch until we receive less than
3. We keep collecting votes in the batch until we receive less than
`MIN_KEEP_BATCH_ALIVE_VOTES` unique votes in the last `BATCH_COLLECTING_INTERVAL`. This is
important to accommodate for goal 5 and also 3.
5. We send the whole batch to the dispute-coordinator.
4. We send the whole batch to the dispute-coordinator.
This together with rate limiting explained above ensures we will be able to
process valid disputes: We can limit the number of simultaneous existing batches
@@ -312,8 +312,8 @@ of attackers, each has 10 messages per second, all are needed to maintain the
batches in memory. Therefore we have a hard cap of around 330 (number of
malicious nodes) open batches. Each can be filled with number of malicious
actor's votes. So 330 batches with each 330 votes: Let's assume approximately 100
bytes per signature/vote. This results in a worst case memory usage of 330 * 330
* 100 ~= 10 MiB.
bytes per signature/vote. This results in a worst case memory usage of
`330 * 330 * 100 ~= 10 MiB`.
For 10_000 validators, we are already in the Gigabyte range, which means that
with a validator set that large we might want to be more strict with the rate limit or
@@ -1,10 +1,25 @@
# GRANDPA Voting Rule
Specifics on the motivation and types of constraints we apply to the GRANDPA voting logic as well as the definitions of **viable** and **finalizable** blocks can be found in the [Chain Selection Protocol](../protocol-chain-selection.md) section.
The subsystem which provides us with viable leaves is the [Chain Selection Subsystem](utility/chain-selection.md).
Specifics on the motivation and types of constraints we apply to the GRANDPA voting logic as well as the definitions of
**viable** and **finalizable** blocks can be found in the [Chain Selection Protocol](../protocol-chain-selection.md)
section. The subsystem which provides us with viable leaves is the [Chain Selection
Subsystem](utility/chain-selection.md).
GRANDPA's regular voting rule is for each validator to select the longest chain they are aware of. GRANDPA proceeds in rounds, collecting information from all online validators and determines the blocks that a supermajority of validators all have in common with each other.
GRANDPA's regular voting rule is for each validator to select the longest chain they are aware of. GRANDPA proceeds in
rounds, collecting information from all online validators and determines the blocks that a supermajority of validators
all have in common with each other.
The low-level GRANDPA logic will provide us with a **required block**. We can find the best leaf containing that block in its chain with the [`ChainSelectionMessage::BestLeafContaining`](../types/overseer-protocol.md#chain-selection-message). If the result is `None`, then we will simply cast a vote on the required block.
The low-level GRANDPA logic will provide us with a **required block**. We can find the best leaf containing that block
in its chain with the
[`ChainSelectionMessage::BestLeafContaining`](../types/overseer-protocol.md#chain-selection-message). If the result is
`None`, then we will simply cast a vote on the required block.
The **viable** leaves provided from the chain selection subsystem are not necessarily **finalizable**, so we need to perform further work to discover the finalizable ancestor of the block. The first constraint is to avoid voting on any unapproved block. The highest approved ancestor of a given block can be determined by querying the Approval Voting subsystem via the [`ApprovalVotingMessage::ApprovedAncestor`](../types/overseer-protocol.md#approval-voting) message. If the response is `Some`, we continue and apply the second constraint. The second constraint is to avoid voting on any block containing a candidate undergoing an active dispute. The list of block hashes and candidates returned from `ApprovedAncestor` should be reversed, and passed to the [`DisputeCoordinatorMessage::DetermineUndisputedChain`](../types/overseer-protocol.md#dispute-coordinator-message) to determine the **finalizable** block which will be our eventual vote.
The **viable** leaves provided from the chain selection subsystem are not necessarily **finalizable**, so we need to
perform further work to discover the finalizable ancestor of the block. The first constraint is to avoid voting on any
unapproved block. The highest approved ancestor of a given block can be determined by querying the Approval Voting
subsystem via the [`ApprovalVotingMessage::ApprovedAncestor`](../types/overseer-protocol.md#approval-voting) message. If
the response is `Some`, we continue and apply the second constraint. The second constraint is to avoid voting on any
block containing a candidate undergoing an active dispute. The list of block hashes and candidates returned from
`ApprovedAncestor` should be reversed, and passed to the
[`DisputeCoordinatorMessage::DetermineUndisputedChain`](../types/overseer-protocol.md#dispute-coordinator-message) to
determine the **finalizable** block which will be our eventual vote.
@@ -24,27 +24,44 @@ The hierarchy of subsystems:
```
The overseer determines work to do based on block import events and block finalization events. It does this by keeping track of the set of relay-parents for which work is currently being done. This is known as the "active leaves" set. It determines an initial set of active leaves on startup based on the data on-disk, and uses events about blockchain import to update the active leaves. Updates lead to [`OverseerSignal`](../types/overseer-protocol.md#overseer-signal)`::ActiveLeavesUpdate` being sent according to new relay-parents, as well as relay-parents to stop considering. Block import events inform the overseer of leaves that no longer need to be built on, now that they have children, and inform us to begin building on those children. Block finalization events inform us when we can stop focusing on blocks that appear to have been orphaned.
The overseer determines work to do based on block import events and block finalization events. It does this by keeping
track of the set of relay-parents for which work is currently being done. This is known as the "active leaves" set. It
determines an initial set of active leaves on startup based on the data on-disk, and uses events about blockchain import
to update the active leaves. Updates lead to
[`OverseerSignal`](../types/overseer-protocol.md#overseer-signal)`::ActiveLeavesUpdate` being sent according to new
relay-parents, as well as relay-parents to stop considering. Block import events inform the overseer of leaves that no
longer need to be built on, now that they have children, and inform us to begin building on those children. Block
finalization events inform us when we can stop focusing on blocks that appear to have been orphaned.
The overseer is also responsible for tracking the freshness of active leaves. Leaves are fresh when they're encountered for the first time, and stale when they're encountered for subsequent times. This can occur after chain reversions or when the fork-choice rule abandons some chain. This distinction is used to manage **Reversion Safety**. Consensus messages are often localized to a specific relay-parent, and it is often a misbehavior to equivocate or sign two conflicting messages. When reverting the chain, we may begin work on a leaf that subsystems have already signed messages for. Subsystems which need to account for reversion safety should avoid performing work on stale leaves.
The overseer is also responsible for tracking the freshness of active leaves. Leaves are fresh when they're encountered
for the first time, and stale when they're encountered for subsequent times. This can occur after chain reversions or
when the fork-choice rule abandons some chain. This distinction is used to manage **Reversion Safety**. Consensus
messages are often localized to a specific relay-parent, and it is often a misbehavior to equivocate or sign two
conflicting messages. When reverting the chain, we may begin work on a leaf that subsystems have already signed messages
for. Subsystems which need to account for reversion safety should avoid performing work on stale leaves.
The overseer's logic can be described with these functions:
## On Startup
* Start all subsystems
* Determine all blocks of the blockchain that should be built on. This should typically be the head of the best fork of the chain we are aware of. Sometimes add recent forks as well.
* Determine all blocks of the blockchain that should be built on. This should typically be the head of the best fork of
the chain we are aware of. Sometimes add recent forks as well.
* Send an `OverseerSignal::ActiveLeavesUpdate` to all subsystems with `activated` containing each of these blocks.
* Begin listening for block import and finality events
## On Block Import Event
* Apply the block import event to the active leaves. A new block should lead to its addition to the active leaves set and its parent being deactivated.
* Mark any stale leaves as stale. The overseer should track all leaves it activates to determine whether leaves are fresh or stale.
* Send an `OverseerSignal::ActiveLeavesUpdate` message to all subsystems containing all activated and deactivated leaves.
* Apply the block import event to the active leaves. A new block should lead to its addition to the active leaves set
and its parent being deactivated.
* Mark any stale leaves as stale. The overseer should track all leaves it activates to determine whether leaves are
fresh or stale.
* Send an `OverseerSignal::ActiveLeavesUpdate` message to all subsystems containing all activated and deactivated
leaves.
* Ensure all `ActiveLeavesUpdate` messages are flushed before resuming activity as a message router.
> TODO: in the future, we may want to avoid building on too many sibling blocks at once. the notion of a "preferred head" among many competing sibling blocks would imply changes in our "active leaves" update rules here
> TODO: in the future, we may want to avoid building on too many sibling blocks at once. the notion of a "preferred
> head" among many competing sibling blocks would imply changes in our "active leaves" update rules here
## On Finalization Event
@@ -54,11 +71,16 @@ The overseer's logic can be described with these functions:
## On Subsystem Failure
Subsystems are essential tasks meant to run as long as the node does. Subsystems can spawn ephemeral work in the form of jobs, but the subsystems themselves should not go down. If a subsystem goes down, it will be because of a critical error that should take the entire node down as well.
Subsystems are essential tasks meant to run as long as the node does. Subsystems can spawn ephemeral work in the form of
jobs, but the subsystems themselves should not go down. If a subsystem goes down, it will be because of a critical error
that should take the entire node down as well.
## Communication Between Subsystems
When a subsystem wants to communicate with another subsystem, or, more typically, a job within a subsystem wants to communicate with its counterpart under another subsystem, that communication must happen via the overseer. Consider this example where a job on subsystem A wants to send a message to its counterpart under subsystem B. This is a realistic scenario, where you can imagine that both jobs correspond to work under the same relay-parent.
When a subsystem wants to communicate with another subsystem, or, more typically, a job within a subsystem wants to
communicate with its counterpart under another subsystem, that communication must happen via the overseer. Consider this
example where a job on subsystem A wants to send a message to its counterpart under subsystem B. This is a realistic
scenario, where you can imagine that both jobs correspond to work under the same relay-parent.
```text
+--------+ +--------+
@@ -78,21 +100,48 @@ When a subsystem wants to communicate with another subsystem, or, more typically
+------------------------------+
```
First, the subsystem that spawned a job is responsible for handling the first step of the communication. The overseer is not aware of the hierarchy of tasks within any given subsystem and is only responsible for subsystem-to-subsystem communication. So the sending subsystem must pass on the message via the overseer to the receiving subsystem, in such a way that the receiving subsystem can further address the communication to one of its internal tasks, if necessary.
First, the subsystem that spawned a job is responsible for handling the first step of the communication. The overseer is
not aware of the hierarchy of tasks within any given subsystem and is only responsible for subsystem-to-subsystem
communication. So the sending subsystem must pass on the message via the overseer to the receiving subsystem, in such a
way that the receiving subsystem can further address the communication to one of its internal tasks, if necessary.
This communication prevents a certain class of race conditions. When the Overseer determines that it is time for subsystems to begin working on top of a particular relay-parent, it will dispatch a `ActiveLeavesUpdate` message to all subsystems to do so, and those messages will be handled asynchronously by those subsystems. Some subsystems will receive those messsages before others, and it is important that a message sent by subsystem A after receiving `ActiveLeavesUpdate` message will arrive at subsystem B after its `ActiveLeavesUpdate` message. If subsystem A maintained an independent channel with subsystem B to communicate, it would be possible for subsystem B to handle the side message before the `ActiveLeavesUpdate` message, but it wouldn't have any logical course of action to take with the side message - leading to it being discarded or improperly handled. Well-architectured state machines should have a single source of inputs, so that is what we do here.
This communication prevents a certain class of race conditions. When the Overseer determines that it is time for
subsystems to begin working on top of a particular relay-parent, it will dispatch a `ActiveLeavesUpdate` message to all
subsystems to do so, and those messages will be handled asynchronously by those subsystems. Some subsystems will receive
those messsages before others, and it is important that a message sent by subsystem A after receiving
`ActiveLeavesUpdate` message will arrive at subsystem B after its `ActiveLeavesUpdate` message. If subsystem A
maintained an independent channel with subsystem B to communicate, it would be possible for subsystem B to handle the
side message before the `ActiveLeavesUpdate` message, but it wouldn't have any logical course of action to take with the
side message - leading to it being discarded or improperly handled. Well-architectured state machines should have a
single source of inputs, so that is what we do here.
One exception is reasonable to make for responses to requests. A request should be made via the overseer in order to ensure that it arrives after any relevant `ActiveLeavesUpdate` message. A subsystem issuing a request as a result of a `ActiveLeavesUpdate` message can safely receive the response via a side-channel for two reasons:
One exception is reasonable to make for responses to requests. A request should be made via the overseer in order to
ensure that it arrives after any relevant `ActiveLeavesUpdate` message. A subsystem issuing a request as a result of a
`ActiveLeavesUpdate` message can safely receive the response via a side-channel for two reasons:
1. It's impossible for a request to be answered before it arrives, it is provable that any response to a request obeys the same ordering constraint.
1. The request was sent as a result of handling a `ActiveLeavesUpdate` message. Then there is no possible future in which the `ActiveLeavesUpdate` message has not been handled upon the receipt of the response.
1. It's impossible for a request to be answered before it arrives, it is provable that any response to a request obeys
the same ordering constraint.
1. The request was sent as a result of handling a `ActiveLeavesUpdate` message. Then there is no possible future in
which the `ActiveLeavesUpdate` message has not been handled upon the receipt of the response.
So as a single exception to the rule that all communication must happen via the overseer we allow the receipt of responses to requests via a side-channel, which may be established for that purpose. This simplifies any cases where the outside world desires to make a request to a subsystem, as the outside world can then establish a side-channel to receive the response on.
So as a single exception to the rule that all communication must happen via the overseer we allow the receipt of
responses to requests via a side-channel, which may be established for that purpose. This simplifies any cases where the
outside world desires to make a request to a subsystem, as the outside world can then establish a side-channel to
receive the response on.
It's important to note that the overseer is not aware of the internals of subsystems, and this extends to the jobs that they spawn. The overseer isn't aware of the existence or definition of those jobs, and is only aware of the outer subsystems with which it interacts. This gives subsystem implementations leeway to define internal jobs as they see fit, and to wrap a more complex hierarchy of state machines than having a single layer of jobs for relay-parent-based work. Likewise, subsystems aren't required to spawn jobs. Certain types of subsystems, such as those for shared storage or networking resources, won't perform block-based work but would still benefit from being on the Overseer's message bus. These subsystems can just ignore the overseer's signals for block-based work.
It's important to note that the overseer is not aware of the internals of subsystems, and this extends to the jobs that
they spawn. The overseer isn't aware of the existence or definition of those jobs, and is only aware of the outer
subsystems with which it interacts. This gives subsystem implementations leeway to define internal jobs as they see fit,
and to wrap a more complex hierarchy of state machines than having a single layer of jobs for relay-parent-based work.
Likewise, subsystems aren't required to spawn jobs. Certain types of subsystems, such as those for shared storage or
networking resources, won't perform block-based work but would still benefit from being on the Overseer's message bus.
These subsystems can just ignore the overseer's signals for block-based work.
Furthermore, the protocols by which subsystems communicate with each other should be well-defined irrespective of the implementation of the subsystem. In other words, their interface should be distinct from their implementation. This will prevent subsystems from accessing aspects of each other that are beyond the scope of the communication boundary.
Furthermore, the protocols by which subsystems communicate with each other should be well-defined irrespective of the
implementation of the subsystem. In other words, their interface should be distinct from their implementation. This will
prevent subsystems from accessing aspects of each other that are beyond the scope of the communication boundary.
## On shutdown
Send an `OverseerSignal::Conclude` message to each subsystem and wait some time for them to conclude before hard-exiting.
Send an `OverseerSignal::Conclude` message to each subsystem and wait some time for them to conclude before
hard-exiting.
@@ -1,25 +1,66 @@
# Subsystems and Jobs
In this section we define the notions of Subsystems and Jobs. These are guidelines for how we will employ an architecture of hierarchical state machines. We'll have a top-level state machine which oversees the next level of state machines which oversee another layer of state machines and so on. The next sections will lay out these guidelines for what we've called subsystems and jobs, since this model applies to many of the tasks that the Node-side behavior needs to encompass, but these are only guidelines and some Subsystems may have deeper hierarchies internally.
In this section we define the notions of Subsystems and Jobs. These are
guidelines for how we will employ an architecture of hierarchical state
machines. We'll have a top-level state machine which oversees the next level of
state machines which oversee another layer of state machines and so on. The next
sections will lay out these guidelines for what we've called subsystems and
jobs, since this model applies to many of the tasks that the Node-side behavior
needs to encompass, but these are only guidelines and some Subsystems may have
deeper hierarchies internally.
Subsystems are long-lived worker tasks that are in charge of performing some particular kind of work. All subsystems can communicate with each other via a well-defined protocol. Subsystems can't generally communicate directly, but must coordinate communication through an [Overseer](overseer.md), which is responsible for relaying messages, handling subsystem failures, and dispatching work signals.
Subsystems are long-lived worker tasks that are in charge of performing some
particular kind of work. All subsystems can communicate with each other via a
well-defined protocol. Subsystems can't generally communicate directly, but must
coordinate communication through an [Overseer](overseer.md), which is
responsible for relaying messages, handling subsystem failures, and dispatching
work signals.
Most work that happens on the Node-side is related to building on top of a specific relay-chain block, which is contextually known as the "relay parent". We call it the relay parent to explicitly denote that it is a block in the relay chain and not on a parachain. We refer to the parent because when we are in the process of building a new block, we don't know what that new block is going to be. The parent block is our only stable point of reference, even though it is usually only useful when it is not yet a parent but in fact a leaf of the block-DAG expected to soon become a parent (because validators are authoring on top of it). Furthermore, we are assuming a forkful blockchain-extension protocol, which means that there may be multiple possible children of the relay-parent. Even if the relay parent has multiple children blocks, the parent of those children is the same, and the context in which those children is authored should be the same. The parent block is the best and most stable reference to use for defining the scope of work items and messages, and is typically referred to by its cryptographic hash.
Most work that happens on the Node-side is related to building on top of a
specific relay-chain block, which is contextually known as the "relay parent".
We call it the relay parent to explicitly denote that it is a block in the relay
chain and not on a parachain. We refer to the parent because when we are in the
process of building a new block, we don't know what that new block is going to
be. The parent block is our only stable point of reference, even though it is
usually only useful when it is not yet a parent but in fact a leaf of the
block-DAG expected to soon become a parent (because validators are authoring on
top of it). Furthermore, we are assuming a forkful blockchain-extension
protocol, which means that there may be multiple possible children of the
relay-parent. Even if the relay parent has multiple children blocks, the parent
of those children is the same, and the context in which those children is
authored should be the same. The parent block is the best and most stable
reference to use for defining the scope of work items and messages, and is
typically referred to by its cryptographic hash.
Since this goal of determining when to start and conclude work relative to a specific relay-parent is common to most, if not all subsystems, it is logically the job of the Overseer to distribute those signals as opposed to each subsystem duplicating that effort, potentially being out of synchronization with each other. Subsystem A should be able to expect that subsystem B is working on the same relay-parents as it is. One of the Overseer's tasks is to provide this heartbeat, or synchronized rhythm, to the system.
Since this goal of determining when to start and conclude work relative to a
specific relay-parent is common to most, if not all subsystems, it is logically
the job of the Overseer to distribute those signals as opposed to each subsystem
duplicating that effort, potentially being out of synchronization with each
other. Subsystem A should be able to expect that subsystem B is working on the
same relay-parents as it is. One of the Overseer's tasks is to provide this
heartbeat, or synchronized rhythm, to the system.
The work that subsystems spawn to be done on a specific relay-parent is known as a job. Subsystems should set up and tear down jobs according to the signals received from the overseer. Subsystems may share or cache state between jobs.
The work that subsystems spawn to be done on a specific relay-parent is known as
a job. Subsystems should set up and tear down jobs according to the signals
received from the overseer. Subsystems may share or cache state between jobs.
Subsystems must be robust to spurious exits. The outputs of the set of subsystems as a whole comprises of signed messages and data committed to disk. Care must be taken to avoid issuing messages that are not substantiated. Since subsystems need to be safe under spurious exits, it is the expected behavior that an `OverseerSignal::Conclude` can just lead to breaking the loop and exiting directly as opposed to waiting for everything to shut down gracefully.
Subsystems must be robust to spurious exits. The outputs of the set of
subsystems as a whole comprises of signed messages and data committed to disk.
Care must be taken to avoid issuing messages that are not substantiated. Since
subsystems need to be safe under spurious exits, it is the expected behavior
that an `OverseerSignal::Conclude` can just lead to breaking the loop and
exiting directly as opposed to waiting for everything to shut down gracefully.
## Subsystem Message Traffic
Which subsystems send messages to which other subsystems.
**Note**: This diagram omits the overseer for simplicity. In fact, all messages are relayed via the overseer.
**Note**: This diagram omits the overseer for simplicity. In fact, all messages
are relayed via the overseer.
**Note**: Messages with a filled diamond arrowhead ("♦") include a `oneshot::Sender` which communicates a response from the recipient.
Messages with an open triangle arrowhead ("Δ") do not include a return sender.
**Note**: Messages with a filled diamond arrowhead ("♦") include a
`oneshot::Sender` which communicates a response from the recipient. Messages
with an open triangle arrowhead ("Δ") do not include a return sender.
```dot process
digraph {
@@ -125,14 +166,17 @@ digraph {
## The Path to Inclusion (Node Side)
Let's contextualize that diagram a bit by following a parachain block from its creation through finalization.
Parachains can use completely arbitrary processes to generate blocks. The relay chain doesn't know or care about
the details; each parachain just needs to provide a [collator](collators/collation-generation.md).
Let's contextualize that diagram a bit by following a parachain block from its
creation through finalization. Parachains can use completely arbitrary processes
to generate blocks. The relay chain doesn't know or care about the details; each
parachain just needs to provide a [collator](collators/collation-generation.md).
**Note**: Inter-subsystem communications are relayed via the overseer, but that step is omitted here for brevity.
**Note**: Inter-subsystem communications are relayed via the overseer, but that
step is omitted here for brevity.
**Note**: Dashed lines indicate a request/response cycle, where the response is communicated asynchronously via
a oneshot channel. Adjacent dashed lines may be processed in parallel.
**Note**: Dashed lines indicate a request/response cycle, where the response is
communicated asynchronously via a oneshot channel. Adjacent dashed lines may be
processed in parallel.
```mermaid
sequenceDiagram
@@ -156,11 +200,13 @@ sequenceDiagram
end
```
The `DistributeCollation` messages that `CollationGeneration` sends to the `CollatorProtocol` contains
two items: a `CandidateReceipt` and `PoV`. The `CollatorProtocol` is then responsible for distributing
that collation to interested validators. However, not all potential collations are of interest. The
`CandidateSelection` subsystem is responsible for determining which collations are interesting, before
`CollatorProtocol` actually fetches the collation.
The `DistributeCollation` messages that `CollationGeneration` sends to the
`CollatorProtocol` contains two items: a `CandidateReceipt` and `PoV`. The
`CollatorProtocol` is then responsible for distributing that collation to
interested validators. However, not all potential collations are of interest.
The `CandidateSelection` subsystem is responsible for determining which
collations are interesting, before `CollatorProtocol` actually fetches the
collation.
```mermaid
sequenceDiagram
@@ -205,10 +251,11 @@ sequenceDiagram
end
```
Assuming we hit the happy path, flow continues with `CandidateSelection` receiving a `(candidate_receipt, pov)` as
the return value from its
`FetchCollation` request. The only time `CandidateSelection` actively requests a collation is when
it hasn't yet seconded one for some `relay_parent`, and is ready to second.
Assuming we hit the happy path, flow continues with `CandidateSelection`
receiving a `(candidate_receipt, pov)` as the return value from its
`FetchCollation` request. The only time `CandidateSelection` actively requests a
collation is when it hasn't yet seconded one for some `relay_parent`, and is
ready to second.
```mermaid
sequenceDiagram
@@ -243,15 +290,17 @@ sequenceDiagram
end
```
At this point, you'll see that control flows in two directions: to `StatementDistribution` to distribute
the `SignedStatement`, and to `PoVDistribution` to distribute the `PoV`. However, that's largely a mirage:
while the initial implementation distributes `PoV`s by gossip, that's inefficient, and will be replaced
with a system which fetches `PoV`s only when actually necessary.
At this point, you'll see that control flows in two directions: to
`StatementDistribution` to distribute the `SignedStatement`, and to
`PoVDistribution` to distribute the `PoV`. However, that's largely a mirage:
while the initial implementation distributes `PoV`s by gossip, that's
inefficient, and will be replaced with a system which fetches `PoV`s only when
actually necessary.
> TODO: figure out more precisely the current status and plans; write them up
Therefore, we'll follow the `SignedStatement`. The `StatementDistribution` subsystem is largely concerned
with implementing a gossip protocol:
Therefore, we'll follow the `SignedStatement`. The `StatementDistribution`
subsystem is largely concerned with implementing a gossip protocol:
```mermaid
sequenceDiagram
@@ -278,8 +327,8 @@ sequenceDiagram
end
```
But who are these `Listener`s who've asked to be notified about incoming `SignedStatement`s?
Nobody, as yet.
But who are these `Listener`s who've asked to be notified about incoming
`SignedStatement`s? Nobody, as yet.
Let's pick back up with the PoV Distribution subsystem.
@@ -305,11 +354,13 @@ sequenceDiagram
Note over PD,NB: On receipt of a network PoV, PovDistribution forwards it to each Listener.<br/>It also penalizes bad gossipers.
```
Unlike in the case of `StatementDistribution`, there is another subsystem which in various circumstances
already registers a listener to be notified when a new `PoV` arrives: `CandidateBacking`. Note that this
is the second time that `CandidateBacking` has gotten involved. The first instance was from the perspective
of the validator choosing to second a candidate via its `CandidateSelection` subsystem. This time, it's
from the perspective of some other validator, being informed that this foreign `PoV` has been received.
Unlike in the case of `StatementDistribution`, there is another subsystem which
in various circumstances already registers a listener to be notified when a new
`PoV` arrives: `CandidateBacking`. Note that this is the second time that
`CandidateBacking` has gotten involved. The first instance was from the
perspective of the validator choosing to second a candidate via its
`CandidateSelection` subsystem. This time, it's from the perspective of some
other validator, being informed that this foreign `PoV` has been received.
```mermaid
sequenceDiagram
@@ -326,10 +377,11 @@ sequenceDiagram
CB ->> AS: StoreAvailableData
```
At this point, things have gone a bit nonlinear. Let's pick up the thread again with `BitfieldSigning`. As
the `Overseer` activates each relay parent, it starts a `BitfieldSigningJob` which operates on an extremely
simple metric: after creation, it immediately goes to sleep for 1.5 seconds. On waking, it records the state
of the world pertaining to availability at that moment.
At this point, things have gone a bit nonlinear. Let's pick up the thread again
with `BitfieldSigning`. As the `Overseer` activates each relay parent, it starts
a `BitfieldSigningJob` which operates on an extremely simple metric: after
creation, it immediately goes to sleep for 1.5 seconds. On waking, it records
the state of the world pertaining to availability at that moment.
```mermaid
sequenceDiagram
@@ -350,9 +402,10 @@ sequenceDiagram
end
```
`BitfieldDistribution` is, like the other `*Distribution` subsystems, primarily interested in implementing
a peer-to-peer gossip network propagating its particular messages. However, it also serves as an essential
relay passing the message along.
`BitfieldDistribution` is, like the other `*Distribution` subsystems, primarily
interested in implementing a peer-to-peer gossip network propagating its
particular messages. However, it also serves as an essential relay passing the
message along.
```mermaid
sequenceDiagram
@@ -366,12 +419,14 @@ sequenceDiagram
BD ->> NB: SendValidationMessage::BitfieldDistribution::Bitfield
```
We've now seen the message flow to the `Provisioner`: both `CandidateBacking` and `BitfieldDistribution`
contribute provisionable data. Now, let's look at that subsystem.
We've now seen the message flow to the `Provisioner`: both `CandidateBacking`
and `BitfieldDistribution` contribute provisionable data. Now, let's look at
that subsystem.
Much like the `BitfieldSigning` subsystem, the `Provisioner` creates a new job for each newly-activated
leaf, and starts a timer. Unlike `BitfieldSigning`, we won't depict that part of the process, because
the `Provisioner` also has other things going on.
Much like the `BitfieldSigning` subsystem, the `Provisioner` creates a new job
for each newly-activated leaf, and starts a timer. Unlike `BitfieldSigning`, we
won't depict that part of the process, because the `Provisioner` also has other
things going on.
```mermaid
sequenceDiagram
@@ -411,8 +466,9 @@ sequenceDiagram
end
```
In principle, any arbitrary subsystem could send a `RequestInherentData` to the `Provisioner`. In practice,
only the `ParachainsInherentDataProvider` does so.
In principle, any arbitrary subsystem could send a `RequestInherentData` to the
`Provisioner`. In practice, only the `ParachainsInherentDataProvider` does so.
The tuple `(SignedAvailabilityBitfields, BackedCandidates, ParentHeader)` is injected by the `ParachainsInherentDataProvider`
into the inherent data. From that point on, control passes from the node to the runtime.
The tuple `(SignedAvailabilityBitfields, BackedCandidates, ParentHeader)` is
injected by the `ParachainsInherentDataProvider` into the inherent data. From
that point on, control passes from the node to the runtime.
@@ -9,13 +9,20 @@ The two data types:
For each of these data we have pruning rules that determine how long we need to keep that data available.
PoV hypothetically only need to be kept around until the block where the data was made fully available is finalized. However, disputes can revert finality, so we need to be a bit more conservative and we add a delay. We should keep the PoV until a block that finalized availability of it has been finalized for 1 day + 1 hour.
PoV hypothetically only need to be kept around until the block where the data was made fully available is finalized.
However, disputes can revert finality, so we need to be a bit more conservative and we add a delay. We should keep the
PoV until a block that finalized availability of it has been finalized for 1 day + 1 hour.
Availability chunks need to be kept available until the dispute period for the corresponding candidate has ended. We can accomplish this by using the same criterion as the above. This gives us a pruning condition of the block finalizing availability of the chunk being final for 1 day + 1 hour.
Availability chunks need to be kept available until the dispute period for the corresponding candidate has ended. We can
accomplish this by using the same criterion as the above. This gives us a pruning condition of the block finalizing
availability of the chunk being final for 1 day + 1 hour.
There is also the case where a validator commits to make a PoV available, but the corresponding candidate is never backed. In this case, we keep the PoV available for 1 hour.
There is also the case where a validator commits to make a PoV available, but the corresponding candidate is never
backed. In this case, we keep the PoV available for 1 hour.
There may be multiple competing blocks all ending the availability phase for a particular candidate. Until finality, it will be unclear which of those is actually the canonical chain, so the pruning records for PoVs and Availability chunks should keep track of all such blocks.
There may be multiple competing blocks all ending the availability phase for a particular candidate. Until finality, it
will be unclear which of those is actually the canonical chain, so the pruning records for PoVs and Availability chunks
should keep track of all such blocks.
## Lifetime of the block data and chunks in storage
@@ -44,7 +51,8 @@ We use an underlying Key-Value database where we assume we have the following op
- `write(key, value)`
- `read(key) -> Option<value>`
- `iter_with_prefix(prefix) -> Iterator<(key, value)>` - gives all keys and values in lexicographical order where the key starts with `prefix`.
- `iter_with_prefix(prefix) -> Iterator<(key, value)>` - gives all keys and values in lexicographical order where the
key starts with `prefix`.
We use this database to encode the following schema:
@@ -57,7 +65,8 @@ We use this database to encode the following schema:
("prune_by_time", Timestamp, CandidateHash) -> Option<()>
```
Timestamps are the wall-clock seconds since Unix epoch. Timestamps and block numbers are both encoded as big-endian so lexicographic order is ascending.
Timestamps are the wall-clock seconds since Unix epoch. Timestamps and block numbers are both encoded as big-endian so
lexicographic order is ascending.
The meta information that we track per-candidate is defined as the `CandidateMeta` struct
@@ -80,9 +89,12 @@ enum State {
}
```
We maintain the invariant that if a candidate has a meta entry, its available data exists on disk if `data_available` is true. All chunks mentioned in the meta entry are available.
We maintain the invariant that if a candidate has a meta entry, its available data exists on disk if `data_available` is
true. All chunks mentioned in the meta entry are available.
Additionally, there is exactly one `prune_by_time` entry which holds the candidate hash unless the state is `Unfinalized`. There may be zero, one, or many "unfinalized" keys with the given candidate, and this will correspond to the `state` of the meta entry.
Additionally, there is exactly one `prune_by_time` entry which holds the candidate hash unless the state is
`Unfinalized`. There may be zero, one, or many "unfinalized" keys with the given candidate, and this will correspond to
the `state` of the meta entry.
## Protocol
@@ -96,9 +108,15 @@ Output:
For each head in the `activated` list:
- Load all ancestors of the head back to the finalized block so we don't miss anything if import notifications are missed. If a `StoreChunk` message is received for a candidate which has no entry, then we will prematurely lose the data.
- Note any new candidates backed in the head. Update the `CandidateMeta` for each. If the `CandidateMeta` does not exist, create it as `Unavailable` with the current timestamp. Register a `"prune_by_time"` entry based on the current timestamp + 1 hour.
- Note any new candidate included in the head. Update the `CandidateMeta` for each, performing a transition from `Unavailable` to `Unfinalized` if necessary. That includes removing the `"prune_by_time"` entry. Add the head hash and number to the state, if unfinalized. Add an `"unfinalized"` entry for the block and candidate.
- Load all ancestors of the head back to the finalized block so we don't miss anything if import notifications are
missed. If a `StoreChunk` message is received for a candidate which has no entry, then we will prematurely lose the
data.
- Note any new candidates backed in the head. Update the `CandidateMeta` for each. If the `CandidateMeta` does not
exist, create it as `Unavailable` with the current timestamp. Register a `"prune_by_time"` entry based on the current
timestamp + 1 hour.
- Note any new candidate included in the head. Update the `CandidateMeta` for each, performing a transition from
`Unavailable` to `Unfinalized` if necessary. That includes removing the `"prune_by_time"` entry. Add the head hash and
number to the state, if unfinalized. Add an `"unfinalized"` entry for the block and candidate.
- The `CandidateEvent` runtime API can be used for this purpose.
On `OverseerSignal::BlockFinalized(finalized)` events:
@@ -106,17 +124,22 @@ On `OverseerSignal::BlockFinalized(finalized)` events:
- for each key in `iter_with_prefix("unfinalized")`
- Stop if the key is beyond `("unfinalized, finalized)`
- For each block number f that we encounter, load the finalized hash for that block.
- The state of each `CandidateMeta` we encounter here must be `Unfinalized`, since we loaded the candidate from an `"unfinalized"` key.
- The state of each `CandidateMeta` we encounter here must be `Unfinalized`, since we loaded the candidate from an
`"unfinalized"` key.
- For each candidate that we encounter under `f` and the finalized block hash,
- Update the `CandidateMeta` to have `State::Finalized`. Remove all `"unfinalized"` entries from the old `Unfinalized` state.
- Update the `CandidateMeta` to have `State::Finalized`. Remove all `"unfinalized"` entries from the old
`Unfinalized` state.
- Register a `"prune_by_time"` entry for the candidate based on the current time + 1 day + 1 hour.
- For each candidate that we encounter under `f` which is not under the finalized block hash,
- Remove all entries under `f` in the `Unfinalized` state.
- If the `CandidateMeta` has state `Unfinalized` with an empty list of blocks, downgrade to `Unavailable` and re-schedule pruning under the timestamp + 1 hour. We do not prune here as the candidate still may be included in a descendant of the finalized chain.
- If the `CandidateMeta` has state `Unfinalized` with an empty list of blocks, downgrade to `Unavailable` and
re-schedule pruning under the timestamp + 1 hour. We do not prune here as the candidate still may be included in
a descendant of the finalized chain.
- Remove all `"unfinalized"` keys under `f`.
- Update `last_finalized` = finalized.
This is roughly `O(n * m)` where n is the number of blocks finalized since the last update, and `m` is the number of parachains.
This is roughly `O(n * m)` where n is the number of blocks finalized since the last update, and `m` is the number of
parachains.
On `QueryAvailableData` message:
@@ -139,7 +162,8 @@ On `QueryChunk` message:
On `QueryAllChunks` message:
- Query `("meta", candidate_hash)`. If `None`, send an empty response and return.
- For all `1` bits in the `chunks_stored`, query `("chunk", candidate_hash, index)`. Ignore but warn on errors, and return a vector of all loaded chunks.
- For all `1` bits in the `chunks_stored`, query `("chunk", candidate_hash, index)`. Ignore but warn on errors, and
return a vector of all loaded chunks.
On `QueryChunkAvailability` message:
@@ -149,14 +173,17 @@ On `QueryChunkAvailability` message:
On `StoreChunk` message:
- If there is a `CandidateMeta` under the candidate hash, set the bit of the erasure-chunk in the `chunks_stored` bitfield to `1`. If it was not `1` already, write the chunk under `("chunk", candidate_hash, chunk_index)`.
- If there is a `CandidateMeta` under the candidate hash, set the bit of the erasure-chunk in the `chunks_stored`
bitfield to `1`. If it was not `1` already, write the chunk under `("chunk", candidate_hash, chunk_index)`.
This is `O(n)` in the size of the chunk.
On `StoreAvailableData` message:
- Compute the erasure root of the available data and compare it with `expected_erasure_root`. Return `StoreAvailableDataError::InvalidErasureRoot` on mismatch.
- If there is no `CandidateMeta` under the candidate hash, create it with `State::Unavailable(now)`. Load the `CandidateMeta` otherwise.
- Compute the erasure root of the available data and compare it with `expected_erasure_root`. Return
`StoreAvailableDataError::InvalidErasureRoot` on mismatch.
- If there is no `CandidateMeta` under the candidate hash, create it with `State::Unavailable(now)`. Load the
`CandidateMeta` otherwise.
- Store `data` under `("available", candidate_hash)` and set `data_available` to true.
- Store each chunk under `("chunk", candidate_hash, index)` and set every bit in `chunks_stored` to `1`.
@@ -172,12 +199,13 @@ Every 5 minutes, run a pruning routine:
- For each erasure chunk bit set, remove `("chunk", candidate_hash, bit_index)`.
- If `data_available`, remove `("available", candidate_hash)`
This is O(n * m) in the amount of candidates and average size of the data stored. This is probably the most expensive operation but does not need
to be run very often.
This is O(n * m) in the amount of candidates and average size of the data stored. This is probably the most expensive
operation but does not need to be run very often.
## Basic scenarios to test
Basically we need to test the correctness of data flow through state FSMs described earlier. These tests obviously assume that some mocking of time is happening.
Basically we need to test the correctness of data flow through state FSMs described earlier. These tests obviously
assume that some mocking of time is happening.
- Stored data that is never included pruned in necessary timeout
- A block (and/or a chunk) is added to the store.
@@ -2,7 +2,8 @@
This subsystem is responsible for handling candidate validation requests. It is a simple request/response server.
A variety of subsystems want to know if a parachain block candidate is valid. None of them care about the detailed mechanics of how a candidate gets validated, just the results. This subsystem handles those details.
A variety of subsystems want to know if a parachain block candidate is valid. None of them care about the detailed
mechanics of how a candidate gets validated, just the results. This subsystem handles those details.
## Protocol
@@ -12,35 +13,53 @@ Output: Validation result via the provided response side-channel.
## Functionality
This subsystem groups the requests it handles in two categories: *candidate validation* and *PVF pre-checking*.
This subsystem groups the requests it handles in two categories: *candidate validation* and *PVF pre-checking*.
The first category can be further subdivided in two request types: one which draws out validation data from the state, and another which accepts all validation data exhaustively. Validation returns three possible outcomes on the response channel: the candidate is valid, the candidate is invalid, or an internal error occurred.
The first category can be further subdivided in two request types: one which draws out validation data from the state,
and another which accepts all validation data exhaustively. Validation returns three possible outcomes on the response
channel: the candidate is valid, the candidate is invalid, or an internal error occurred.
Parachain candidates are validated against their validation function: A piece of Wasm code that describes the state-transition of the parachain. Validation function execution is not metered. This means that an execution which is an infinite loop or simply takes too long must be forcibly exited by some other means. For this reason, we recommend dispatching candidate validation to be done on subprocesses which can be killed if they time-out.
Parachain candidates are validated against their validation function: A piece of Wasm code that describes the
state-transition of the parachain. Validation function execution is not metered. This means that an execution which is
an infinite loop or simply takes too long must be forcibly exited by some other means. For this reason, we recommend
dispatching candidate validation to be done on subprocesses which can be killed if they time-out.
Upon receiving a validation request, the first thing the candidate validation subsystem should do is make sure it has all the necessary parameters to the validation function. These are:
Upon receiving a validation request, the first thing the candidate validation subsystem should do is make sure it has
all the necessary parameters to the validation function. These are:
* The Validation Function itself.
* The [`CandidateDescriptor`](../../types/candidate.md#candidatedescriptor).
* The [`ValidationData`](../../types/candidate.md#validationdata).
* The [`PoV`](../../types/availability.md#proofofvalidity).
The second category is for PVF pre-checking. This is primarly used by the [PVF pre-checker](pvf-prechecker.md) subsystem.
The second category is for PVF pre-checking. This is primarly used by the [PVF pre-checker](pvf-prechecker.md)
subsystem.
### Determining Parameters
For a [`CandidateValidationMessage`][CVM]`::ValidateFromExhaustive`, these parameters are exhaustively provided.
For a [`CandidateValidationMessage`][CVM]`::ValidateFromChainState`, some more work needs to be done. Due to the uncertainty of Availability Cores (implemented in the [`Scheduler`](../../runtime/scheduler.md) module of the runtime), a candidate at a particular relay-parent and for a particular para may have two different valid validation-data to be executed under depending on what is assumed to happen if the para is occupying a core at the onset of the new block. This is encoded as an `OccupiedCoreAssumption` in the runtime API.
For a [`CandidateValidationMessage`][CVM]`::ValidateFromChainState`, some more work needs to be done. Due to the
uncertainty of Availability Cores (implemented in the [`Scheduler`](../../runtime/scheduler.md) module of the runtime),
a candidate at a particular relay-parent and for a particular para may have two different valid validation-data to be
executed under depending on what is assumed to happen if the para is occupying a core at the onset of the new block.
This is encoded as an `OccupiedCoreAssumption` in the runtime API.
The way that we can determine which assumption the candidate is meant to be executed under is simply to do an exhaustive check of both possibilities based on the state of the relay-parent. First we fetch the validation data under the assumption that the block occupying becomes available. If the `validation_data_hash` of the `CandidateDescriptor` matches this validation data, we use that. Otherwise, if the `validation_data_hash` matches the validation data fetched under the `TimedOut` assumption, we use that. Otherwise, we return a `ValidationResult::Invalid` response and conclude.
The way that we can determine which assumption the candidate is meant to be executed under is simply to do an exhaustive
check of both possibilities based on the state of the relay-parent. First we fetch the validation data under the
assumption that the block occupying becomes available. If the `validation_data_hash` of the `CandidateDescriptor`
matches this validation data, we use that. Otherwise, if the `validation_data_hash` matches the validation data fetched
under the `TimedOut` assumption, we use that. Otherwise, we return a `ValidationResult::Invalid` response and conclude.
Then, we can fetch the validation code from the runtime based on which type of candidate this is. This gives us all the parameters. The descriptor and PoV come from the request itself, and the other parameters have been derived from the state.
Then, we can fetch the validation code from the runtime based on which type of candidate this is. This gives us all the
parameters. The descriptor and PoV come from the request itself, and the other parameters have been derived from the
state.
> TODO: This would be a great place for caching to avoid making lots of runtime requests. That would need a job, though.
### Execution of the Parachain Wasm
Once we have all parameters, we can spin up a background task to perform the validation in a way that doesn't hold up the entire event loop. Before invoking the validation function itself, this should first do some basic checks:
Once we have all parameters, we can spin up a background task to perform the validation in a way that doesn't hold up
the entire event loop. Before invoking the validation function itself, this should first do some basic checks:
* The collator signature is valid
* The PoV provided matches the `pov_hash` field of the descriptor
@@ -48,6 +67,8 @@ For more details please see [PVF Host and Workers](pvf-host-and-workers.md).
### Checking Validation Outputs
If we can assume the presence of the relay-chain state (that is, during processing [`CandidateValidationMessage`][CVM]`::ValidateFromChainState`) we can run all the checks that the relay-chain would run at the inclusion time thus confirming that the candidate will be accepted.
If we can assume the presence of the relay-chain state (that is, during processing
[`CandidateValidationMessage`][CVM]`::ValidateFromChainState`) we can run all the checks that the relay-chain would run
at the inclusion time thus confirming that the candidate will be accepted.
[CVM]: ../../types/overseer-protocol.md#validationrequesttype
@@ -1,6 +1,7 @@
# Chain API
The Chain API subsystem is responsible for providing a single point of access to chain state data via a set of pre-determined queries.
The Chain API subsystem is responsible for providing a single point of access to chain state data via a set of
pre-determined queries.
## Protocol
@@ -10,7 +11,8 @@ Output: None
## Functionality
On receipt of `ChainApiMessage`, answer the request and provide the response to the side-channel embedded within the request.
On receipt of `ChainApiMessage`, answer the request and provide the response to the side-channel embedded within the
request.
Currently, the following requests are supported:
* Block hash to number
@@ -1,8 +1,12 @@
# Chain Selection Subsystem
This subsystem implements the necessary metadata for the implementation of the [chain selection](../../protocol-chain-selection.md) portion of the protocol.
This subsystem implements the necessary metadata for the implementation of the [chain
selection](../../protocol-chain-selection.md) portion of the protocol.
The subsystem wraps a database component which maintains a view of the unfinalized chain and records the properties of each block: whether the block is **viable**, whether it is **stagnant**, and whether it is **reverted**. It should also maintain an updated set of active leaves in accordance with this view, which should be cheap to query. Leaves are ordered descending first by weight and then by block number.
The subsystem wraps a database component which maintains a view of the unfinalized chain and records the properties of
each block: whether the block is **viable**, whether it is **stagnant**, and whether it is **reverted**. It should also
maintain an updated set of active leaves in accordance with this view, which should be cheap to query. Leaves are
ordered descending first by weight and then by block number.
This subsystem needs to update its information on the unfinalized chain:
* On every leaf-activated signal
@@ -11,32 +15,47 @@ This subsystem needs to update its information on the unfinalized chain:
* On every `ChainSelectionMessage::RevertBlocks`
* Periodically, to detect stagnation.
Simple implementations of these updates do `O(n_unfinalized_blocks)` disk operations. If the amount of unfinalized blocks is relatively small, the updates should not take very much time. However, in cases where there are hundreds or thousands of unfinalized blocks the naive implementations of these update algorithms would have to be replaced with more sophisticated versions.
Simple implementations of these updates do `O(n_unfinalized_blocks)` disk operations. If the amount of unfinalized
blocks is relatively small, the updates should not take very much time. However, in cases where there are hundreds or
thousands of unfinalized blocks the naive implementations of these update algorithms would have to be replaced with more
sophisticated versions.
### `OverseerSignal::ActiveLeavesUpdate`
## `OverseerSignal::ActiveLeavesUpdate`
Determine all new blocks implicitly referenced by any new active leaves and add them to the view. Update the set of viable leaves accordingly. The weights of imported blocks can be determined by the [`ChainApiMessage::BlockWeight`](../../types/overseer-protocol.md#chain-api-message).
Determine all new blocks implicitly referenced by any new active leaves and add them to the view. Update the set of
viable leaves accordingly. The weights of imported blocks can be determined by the
[`ChainApiMessage::BlockWeight`](../../types/overseer-protocol.md#chain-api-message).
### `OverseerSignal::BlockFinalized`
## `OverseerSignal::BlockFinalized`
Delete data for all orphaned chains and update all metadata descending from the new finalized block accordingly, along with the set of viable leaves. Note that finalizing a **reverted** or **stagnant** block means that the descendants of those blocks may lose that status because the definitions of those properties don't include the finalized chain. Update the set of viable leaves accordingly.
Delete data for all orphaned chains and update all metadata descending from the new finalized block accordingly, along
with the set of viable leaves. Note that finalizing a **reverted** or **stagnant** block means that the descendants of
those blocks may lose that status because the definitions of those properties don't include the finalized chain. Update
the set of viable leaves accordingly.
### `ChainSelectionMessage::Approved`
## `ChainSelectionMessage::Approved`
Update the approval status of the referenced block. If the block was stagnant and thus non-viable and is now viable, then the metadata of all of its descendants needs to be updated as well, as they may no longer be stagnant either. Update the set of viable leaves accordingly.
Update the approval status of the referenced block. If the block was stagnant and thus non-viable and is now viable,
then the metadata of all of its descendants needs to be updated as well, as they may no longer be stagnant either.
Update the set of viable leaves accordingly.
### `ChainSelectionMessage::Leaves`
## `ChainSelectionMessage::Leaves`
Gets all leaves of the chain, i.e. block hashes that are suitable to build upon and have no suitable children. Supplies the leaves in descending order by score.
Gets all leaves of the chain, i.e. block hashes that are suitable to build upon and have no suitable children. Supplies
the leaves in descending order by score.
### `ChainSelectionMessage::BestLeafContaining`
## `ChainSelectionMessage::BestLeafContaining`
If the required block is unknown or not viable, then return `None`. Iterate over all leaves in order of descending weight, returning the first leaf containing the required block in its chain, and `None` otherwise.
If the required block is unknown or not viable, then return `None`. Iterate over all leaves in order of descending
weight, returning the first leaf containing the required block in its chain, and `None` otherwise.
### `ChainSelectionMessage::RevertBlocks`
This message indicates that a dispute has concluded against a parachain block candidate. The message passes along a vector containing the block number and block hash of each block where the disputed candidate was included. The passed blocks will be marked as reverted, and their descendants will be marked as non-viable.
## `ChainSelectionMessage::RevertBlocks`
This message indicates that a dispute has concluded against a parachain block candidate. The message passes along a
vector containing the block number and block hash of each block where the disputed candidate was included. The passed
blocks will be marked as reverted, and their descendants will be marked as non-viable.
### Periodically
## Periodically
Detect stagnant blocks and apply the stagnant definition to all descendants. Update the set of viable leaves accordingly.
Detect stagnant blocks and apply the stagnant definition to all descendants. Update the set of viable leaves
accordingly.
@@ -1,30 +1,43 @@
# Network Bridge
One of the main features of the overseer/subsystem duality is to avoid shared ownership of resources and to communicate via message-passing. However, implementing each networking subsystem as its own network protocol brings a fair share of challenges.
One of the main features of the overseer/subsystem duality is to avoid shared ownership of resources and to communicate
via message-passing. However, implementing each networking subsystem as its own network protocol brings a fair share of
challenges.
The most notable challenge is coordinating and eliminating race conditions of peer connection and disconnection events. If we have many network protocols that peers are supposed to be connected on, it is difficult to enforce that a peer is indeed connected on all of them or the order in which those protocols receive notifications that peers have connected. This becomes especially difficult when attempting to share peer state across protocols. All of the Parachain-Host's gossip protocols eliminate DoS with a data-dependency on current chain heads. However, it is inefficient and confusing to implement the logic for tracking our current chain heads as well as our peers' on each of those subsystems. Having one subsystem for tracking this shared state and distributing it to the others is an improvement in architecture and efficiency.
The most notable challenge is coordinating and eliminating race conditions of peer connection and disconnection events.
If we have many network protocols that peers are supposed to be connected on, it is difficult to enforce that a peer is
indeed connected on all of them or the order in which those protocols receive notifications that peers have connected.
This becomes especially difficult when attempting to share peer state across protocols. All of the Parachain-Host's
gossip protocols eliminate DoS with a data-dependency on current chain heads. However, it is inefficient and confusing
to implement the logic for tracking our current chain heads as well as our peers' on each of those subsystems. Having
one subsystem for tracking this shared state and distributing it to the others is an improvement in architecture and
efficiency.
One other piece of shared state to track is peer reputation. When peers are found to have provided value or cost, we adjust their reputation accordingly.
One other piece of shared state to track is peer reputation. When peers are found to have provided value or cost, we
adjust their reputation accordingly.
So in short, this Subsystem acts as a bridge between an actual network component and a subsystem's protocol. The implementation of the underlying network component is beyond the scope of this module. We make certain assumptions about the network component:
* The network allows registering of protocols and multiple versions of each protocol.
* The network handles version negotiation of protocols with peers and only connects the peer on the highest version of the protocol.
* Each protocol has its own peer-set, although there may be some overlap.
* The network provides peer-set management utilities for discovering the peer-IDs of validators and a means of dialing peers with given IDs.
So in short, this Subsystem acts as a bridge between an actual network component and a subsystem's protocol. The
implementation of the underlying network component is beyond the scope of this module. We make certain assumptions about
the network component:
- The network allows registering of protocols and multiple versions of each protocol.
- The network handles version negotiation of protocols with peers and only connects the peer on the highest version of
the protocol.
- Each protocol has its own peer-set, although there may be some overlap.
- The network provides peer-set management utilities for discovering the peer-IDs of validators and a means of dialing
peers with given IDs.
The network bridge makes use of the peer-set feature, but is not generic over peer-set. Instead, it exposes two peer-sets that event producers can attach to: `Validation` and `Collation`. More information can be found on the documentation of the [`NetworkBridgeMessage`][NBM].
The network bridge makes use of the peer-set feature, but is not generic over peer-set. Instead, it exposes two
peer-sets that event producers can attach to: `Validation` and `Collation`. More information can be found on the
documentation of the [`NetworkBridgeMessage`][NBM].
## Protocol
Input: [`NetworkBridgeMessage`][NBM]
Output:
- [`ApprovalDistributionMessage`][AppD]`::NetworkBridgeUpdate`
- [`BitfieldDistributionMessage`][BitD]`::NetworkBridgeUpdate`
- [`CollatorProtocolMessage`][CollP]`::NetworkBridgeUpdate`
- [`StatementDistributionMessage`][StmtD]`::NetworkBridgeUpdate`
Output: - [`ApprovalDistributionMessage`][AppD]`::NetworkBridgeUpdate` -
[`BitfieldDistributionMessage`][BitD]`::NetworkBridgeUpdate` -
[`CollatorProtocolMessage`][CollP]`::NetworkBridgeUpdate` -
[`StatementDistributionMessage`][StmtD]`::NetworkBridgeUpdate`
## Functionality
@@ -37,7 +50,8 @@ enum WireMessage<M> {
}
```
and instantiates this type twice, once using the [`ValidationProtocolV1`][VP1] message type, and once with the [`CollationProtocolV1`][CP1] message type.
and instantiates this type twice, once using the [`ValidationProtocolV1`][VP1] message type, and once with the
[`CollationProtocolV1`][CP1] message type.
```rust
type ValidationV1Message = WireMessage<ValidationProtocolV1>;
@@ -46,17 +60,21 @@ type CollationV1Message = WireMessage<CollationProtocolV1>;
### Startup
On startup, we register two protocols with the underlying network utility. One for validation and one for collation. We register only version 1 of each of these protocols.
On startup, we register two protocols with the underlying network utility. One for validation and one for collation. We
register only version 1 of each of these protocols.
### Main Loop
The bulk of the work done by this subsystem is in responding to network events, signals from the overseer, and messages from other subsystems.
The bulk of the work done by this subsystem is in responding to network events, signals from the overseer, and messages
from other subsystems.
Each network event is associated with a particular peer-set.
### Overseer Signal: `ActiveLeavesUpdate`
The `activated` and `deactivated` lists determine the evolution of our local view over time. A `ProtocolMessage::ViewUpdate` is issued to each connected peer on each peer-set, and a `NetworkBridgeEvent::OurViewChange` is issued to each event handler for each protocol.
The `activated` and `deactivated` lists determine the evolution of our local view over time. A
`ProtocolMessage::ViewUpdate` is issued to each connected peer on each peer-set, and a
`NetworkBridgeEvent::OurViewChange` is issued to each event handler for each protocol.
We only send view updates if the node has indicated that it has finished major blockchain synchronization.
@@ -64,24 +82,31 @@ If we are connected to the same peer on both peer-sets, we will send the peer tw
### Overseer Signal: `BlockFinalized`
We update our view's `finalized_number` to the provided one and delay `ProtocolMessage::ViewUpdate` and `NetworkBridgeEvent::OurViewChange` till the next `ActiveLeavesUpdate`.
We update our view's `finalized_number` to the provided one and delay `ProtocolMessage::ViewUpdate` and
`NetworkBridgeEvent::OurViewChange` till the next `ActiveLeavesUpdate`.
### Network Event: `PeerConnected`
Issue a `NetworkBridgeEvent::PeerConnected` for each [Event Handler](#event-handlers) of the peer-set and negotiated protocol version of the peer. Also issue a `NetworkBridgeEvent::PeerViewChange` and send the peer our current view, but only if the node has indicated that it has finished major blockchain synchronization. Otherwise, we only send the peer an empty view.
Issue a `NetworkBridgeEvent::PeerConnected` for each [Event Handler](#event-handlers) of the peer-set and negotiated
protocol version of the peer. Also issue a `NetworkBridgeEvent::PeerViewChange` and send the peer our current view, but
only if the node has indicated that it has finished major blockchain synchronization. Otherwise, we only send the peer
an empty view.
### Network Event: `PeerDisconnected`
Issue a `NetworkBridgeEvent::PeerDisconnected` for each [Event Handler](#event-handlers) of the peer-set and negotiated protocol version of the peer.
Issue a `NetworkBridgeEvent::PeerDisconnected` for each [Event Handler](#event-handlers) of the peer-set and negotiated
protocol version of the peer.
### Network Event: `ProtocolMessage`
Map the message onto the corresponding [Event Handler](#event-handlers) based on the peer-set this message was received on and dispatch via overseer.
Map the message onto the corresponding [Event Handler](#event-handlers) based on the peer-set this message was received
on and dispatch via overseer.
### Network Event: `ViewUpdate`
- Check that the new view is valid and note it as the most recent view update of the peer on this peer-set.
- Map a `NetworkBridgeEvent::PeerViewChange` onto the corresponding [Event Handler](#event-handlers) based on the peer-set this message was received on and dispatch via overseer.
- Map a `NetworkBridgeEvent::PeerViewChange` onto the corresponding [Event Handler](#event-handlers) based on the
peer-set this message was received on and dispatch via overseer.
### `ReportPeer`
@@ -108,22 +133,23 @@ Map the message onto the corresponding [Event Handler](#event-handlers) based on
### `NewGossipTopology`
- Map all `AuthorityDiscoveryId`s to `PeerId`s and issue a corresponding `NetworkBridgeUpdate`
to all validation subsystems.
- Map all `AuthorityDiscoveryId`s to `PeerId`s and issue a corresponding `NetworkBridgeUpdate` to all validation
subsystems.
## Event Handlers
Network bridge event handlers are the intended recipients of particular network protocol messages. These are each a variant of a message to be sent via the overseer.
Network bridge event handlers are the intended recipients of particular network protocol messages. These are each a
variant of a message to be sent via the overseer.
### Validation V1
* `ApprovalDistributionV1Message -> ApprovalDistributionMessage::NetworkBridgeUpdate`
* `BitfieldDistributionV1Message -> BitfieldDistributionMessage::NetworkBridgeUpdate`
* `StatementDistributionV1Message -> StatementDistributionMessage::NetworkBridgeUpdate`
- `ApprovalDistributionV1Message -> ApprovalDistributionMessage::NetworkBridgeUpdate`
- `BitfieldDistributionV1Message -> BitfieldDistributionMessage::NetworkBridgeUpdate`
- `StatementDistributionV1Message -> StatementDistributionMessage::NetworkBridgeUpdate`
### Collation V1
* `CollatorProtocolV1Message -> CollatorProtocolMessage::NetworkBridgeUpdate`
- `CollatorProtocolV1Message -> CollatorProtocolMessage::NetworkBridgeUpdate`
[NBM]: ../../types/overseer-protocol.md#network-bridge-message
[AppD]: ../../types/overseer-protocol.md#approval-distribution-message
@@ -1,28 +1,43 @@
# Provisioner
Relay chain block authorship authority is governed by BABE and is beyond the scope of the Overseer and the rest of the subsystems. That said, ultimately the block author needs to select a set of backable parachain candidates and other consensus data, and assemble a block from them. This subsystem is responsible for providing the necessary data to all potential block authors.
Relay chain block authorship authority is governed by BABE and is beyond the scope of the Overseer and the rest of the
subsystems. That said, ultimately the block author needs to select a set of backable parachain candidates and other
consensus data, and assemble a block from them. This subsystem is responsible for providing the necessary data to all
potential block authors.
## Provisionable Data
There are several distinct types of provisionable data, but they share this property in common: all should eventually be included in a relay chain block.
There are several distinct types of provisionable data, but they share this property in common: all should eventually be
included in a relay chain block.
### Backed Candidates
The block author can choose 0 or 1 backed parachain candidates per parachain; the only constraint is that each backable candidate has the appropriate relay parent. However, the choice of a backed candidate must be the block author's. The provisioner subsystem is how those block authors make this choice in practice.
The block author can choose 0 or 1 backed parachain candidates per parachain; the only constraint is that each backable
candidate has the appropriate relay parent. However, the choice of a backed candidate must be the block author's. The
provisioner subsystem is how those block authors make this choice in practice.
### Signed Bitfields
[Signed bitfields](../../types/availability.md#signed-availability-bitfield) are attestations from a particular validator about which candidates it believes are available. Those will only be provided on fresh leaves.
[Signed bitfields](../../types/availability.md#signed-availability-bitfield) are attestations from a particular
validator about which candidates it believes are available. Those will only be provided on fresh leaves.
### Misbehavior Reports
Misbehavior reports are self-contained proofs of misbehavior by a validator or group of validators. For example, it is very easy to verify a double-voting misbehavior report: the report contains two votes signed by the same key, advocating different outcomes. Concretely, misbehavior reports become inherents which cause dots to be slashed.
Misbehavior reports are self-contained proofs of misbehavior by a validator or group of validators. For example, it is
very easy to verify a double-voting misbehavior report: the report contains two votes signed by the same key, advocating
different outcomes. Concretely, misbehavior reports become inherents which cause dots to be slashed.
Note that there is no mechanism in place which forces a block author to include a misbehavior report which it doesn't like, for example if it would be slashed by such a report. The chain's defense against this is to have a relatively long slash period, such that it's likely to encounter an honest author before the slash period expires.
Note that there is no mechanism in place which forces a block author to include a misbehavior report which it doesn't
like, for example if it would be slashed by such a report. The chain's defense against this is to have a relatively long
slash period, such that it's likely to encounter an honest author before the slash period expires.
### Dispute Inherent
The dispute inherent is similar to a misbehavior report in that it is an attestation of misbehavior on the part of a validator or group of validators. Unlike a misbehavior report, it is not self-contained: resolution requires coordinated action by several validators. The canonical example of a dispute inherent involves an approval checker discovering that a set of validators has improperly approved an invalid parachain block: resolving this requires the entire validator set to re-validate the block, so that the minority can be slashed.
The dispute inherent is similar to a misbehavior report in that it is an attestation of misbehavior on the part of a
validator or group of validators. Unlike a misbehavior report, it is not self-contained: resolution requires coordinated
action by several validators. The canonical example of a dispute inherent involves an approval checker discovering that
a set of validators has improperly approved an invalid parachain block: resolving this requires the entire validator set
to re-validate the block, so that the minority can be slashed.
Dispute resolution is complex and is explained in substantially more detail [here](../../runtime/disputes.md).
@@ -34,58 +49,85 @@ The subsystem should maintain a set of handles to Block Authorship Provisioning
- `ActiveLeavesUpdate`:
- For each `activated` head:
- spawn a Block Authorship Provisioning iteration with the given relay parent, storing a bidirectional channel with that iteration.
- spawn a Block Authorship Provisioning iteration with the given relay parent, storing a bidirectional channel with
that iteration.
- For each `deactivated` head:
- terminate the Block Authorship Provisioning iteration for the given relay parent, if any.
- `Conclude`: Forward `Conclude` to all iterations, waiting a small amount of time for them to join, and then hard-exiting.
- `Conclude`: Forward `Conclude` to all iterations, waiting a small amount of time for them to join, and then
hard-exiting.
### On `ProvisionerMessage`
Forward the message to the appropriate Block Authorship Provisioning iteration, or discard if no appropriate iteration is currently active.
Forward the message to the appropriate Block Authorship Provisioning iteration, or discard if no appropriate iteration
is currently active.
### Per Provisioning Iteration
Input: [`ProvisionerMessage`](../../types/overseer-protocol.md#provisioner-message). Backed candidates come from the [Candidate Backing subsystem](../backing/candidate-backing.md), signed bitfields come from the [Bitfield Distribution subsystem](../availability/bitfield-distribution.md), and disputes come from the [Disputes Subsystem](../disputes/dispute-coordinator.md). Misbehavior reports are currently sent from the [Candidate Backing subsystem](../backing/candidate-backing.md) and contain the following misbehaviors:
Input: [`ProvisionerMessage`](../../types/overseer-protocol.md#provisioner-message). Backed candidates come from the
[Candidate Backing subsystem](../backing/candidate-backing.md), signed bitfields come from the [Bitfield Distribution
subsystem](../availability/bitfield-distribution.md), and disputes come from the [Disputes
Subsystem](../disputes/dispute-coordinator.md). Misbehavior reports are currently sent from the [Candidate Backing
subsystem](../backing/candidate-backing.md) and contain the following misbehaviors:
1. `Misbehavior::ValidityDoubleVote`
2. `Misbehavior::MultipleCandidates`
3. `Misbehavior::UnauthorizedStatement`
4. `Misbehavior::DoubleSign`
But we choose not to punish these forms of misbehavior for the time being. Risks from misbehavior are sufficiently mitigated at the protocol level via reputation changes. Punitive actions here may become desirable enough to dedicate time to in the future.
But we choose not to punish these forms of misbehavior for the time being. Risks from misbehavior are sufficiently
mitigated at the protocol level via reputation changes. Punitive actions here may become desirable enough to dedicate
time to in the future.
At initialization, this subsystem has no outputs.
Block authors request the inherent data they should use for constructing the inherent in the block which contains parachain execution information.
Block authors request the inherent data they should use for constructing the inherent in the block which contains
parachain execution information.
## Block Production
When a validator is selected by BABE to author a block, it becomes a block producer. The provisioner is the subsystem best suited to choosing which specific backed candidates and availability bitfields should be assembled into the block. To engage this functionality, a `ProvisionerMessage::RequestInherentData` is sent; the response is a [`ParaInherentData`](../../types/runtime.md#parainherentdata). Each relay chain block backs at most one backable parachain block candidate per parachain. Additionally no further block candidate can be backed until the previous one either gets declared available or expired. If bitfields indicate that candidate A, predecessor of B, should be declared available, then B can be backed in the same relay block. Appropriate bitfields, as outlined in the section on [bitfield selection](#bitfield-selection), and any dispute statements should be attached as well.
When a validator is selected by BABE to author a block, it becomes a block producer. The provisioner is the subsystem
best suited to choosing which specific backed candidates and availability bitfields should be assembled into the block.
To engage this functionality, a `ProvisionerMessage::RequestInherentData` is sent; the response is a
[`ParaInherentData`](../../types/runtime.md#parainherentdata). Each relay chain block backs at most one backable
parachain block candidate per parachain. Additionally no further block candidate can be backed until the previous one
either gets declared available or expired. If bitfields indicate that candidate A, predecessor of B, should be declared
available, then B can be backed in the same relay block. Appropriate bitfields, as outlined in the section on [bitfield
selection](#bitfield-selection), and any dispute statements should be attached as well.
### Bitfield Selection
Our goal with respect to bitfields is simple: maximize availability. However, it's not quite as simple as always including all bitfields; there are constraints which still need to be met:
Our goal with respect to bitfields is simple: maximize availability. However, it's not quite as simple as always
including all bitfields; there are constraints which still need to be met:
- not more than one bitfield per validator
- each 1 bit must correspond to an occupied core
Beyond that, a semi-arbitrary selection policy is fine. In order to meet the goal of maximizing availability, a heuristic of picking the bitfield with the greatest number of 1 bits set in the event of conflict is useful.
Beyond that, a semi-arbitrary selection policy is fine. In order to meet the goal of maximizing availability, a
heuristic of picking the bitfield with the greatest number of 1 bits set in the event of conflict is useful.
### Dispute Statement Selection
This is the point at which the block author provides further votes to active disputes or initiates new disputes in the runtime state.
This is the point at which the block author provides further votes to active disputes or initiates new disputes in the
runtime state.
The block-authoring logic of the runtime has an extra step between handling the inherent-data and producing the actual inherent call, which we assume performs the work of filtering out disputes which are not relevant to the on-chain state. Backing votes are always kept in the dispute statement set. This ensures we punish the maximum number of misbehaving backers.
The block-authoring logic of the runtime has an extra step between handling the inherent-data and producing the actual
inherent call, which we assume performs the work of filtering out disputes which are not relevant to the on-chain state.
Backing votes are always kept in the dispute statement set. This ensures we punish the maximum number of misbehaving
backers.
To select disputes:
- Issue a `DisputeCoordinatorMessage::RecentDisputes` message and wait for the response. This is a set of all disputes in recent sessions which we are aware of.
- Issue a `DisputeCoordinatorMessage::RecentDisputes` message and wait for the response. This is a set of all disputes
in recent sessions which we are aware of.
### Determining Bitfield Availability
An occupied core has a `CoreAvailability` bitfield. We also have a list of `SignedAvailabilityBitfield`s. We need to determine from these whether or not a core at a particular index has become available.
An occupied core has a `CoreAvailability` bitfield. We also have a list of `SignedAvailabilityBitfield`s. We need to
determine from these whether or not a core at a particular index has become available.
The key insight required is that `CoreAvailability` is transverse to the `SignedAvailabilityBitfield`s: if we conceptualize the list of bitfields as many rows, each bit of which is its own column, then `CoreAvailability` for a given core index is the vertical slice of bits in the set at that index.
The key insight required is that `CoreAvailability` is transverse to the `SignedAvailabilityBitfield`s: if we
conceptualize the list of bitfields as many rows, each bit of which is its own column, then `CoreAvailability` for a
given core index is the vertical slice of bits in the set at that index.
To compute bitfield availability, then:
@@ -97,16 +139,22 @@ To compute bitfield availability, then:
### Candidate Selection: Prospective Parachains Mode
The state of the provisioner `PerRelayParent` tracks an important setting, `ProspectiveParachainsMode`. This setting determines which backable candidate selection method the provisioner uses.
The state of the provisioner `PerRelayParent` tracks an important setting, `ProspectiveParachainsMode`. This setting
determines which backable candidate selection method the provisioner uses.
`ProspectiveParachainsMode::Disabled` - The provisioner uses its own internal legacy candidate selection.
`ProspectiveParachainsMode::Enabled` - The provisioner requests that [prospective parachains](../backing/prospective-parachains.md) provide selected candidates.
`ProspectiveParachainsMode::Disabled` - The provisioner uses its own internal legacy candidate selection.
`ProspectiveParachainsMode::Enabled` - The provisioner requests that [prospective
parachains](../backing/prospective-parachains.md) provide selected candidates.
Candidates selected with `ProspectiveParachainsMode::Enabled` are able to benefit from the increased block production time asynchronous backing allows. For this reason all Polkadot protocol networks will eventually use prospective parachains candidate selection. Then legacy candidate selection will be removed as obsolete.
Candidates selected with `ProspectiveParachainsMode::Enabled` are able to benefit from the increased block production
time asynchronous backing allows. For this reason all Polkadot protocol networks will eventually use prospective
parachains candidate selection. Then legacy candidate selection will be removed as obsolete.
### Prospective Parachains Candidate Selection
The goal of candidate selection is to determine which cores are free, and then to the degree possible, pick a candidate appropriate to each free core. In prospective parachains candidate selection the provisioner handles the former process while [prospective parachains](../backing/prospective-parachains.md) handles the latter.
The goal of candidate selection is to determine which cores are free, and then to the degree possible, pick a candidate
appropriate to each free core. In prospective parachains candidate selection the provisioner handles the former process
while [prospective parachains](../backing/prospective-parachains.md) handles the latter.
To select backable candidates:
@@ -116,32 +164,50 @@ To select backable candidates:
- The core is unscheduled and doesnt need to be provisioned with a candidate
- On `CoreState::Scheduled`
- The core is unoccupied and scheduled to accept a backed block for a particular `para_id`.
- The provisioner requests a backable candidate from [prospective parachains](../backing/prospective-parachains.md) with the desired relay parent, the cores scheduled `para_id`, and an empty required path.
- The provisioner requests a backable candidate from [prospective parachains](../backing/prospective-parachains.md)
with the desired relay parent, the cores scheduled `para_id`, and an empty required path.
- On `CoreState::Occupied`
- The availability core is occupied by a parachain block candidate pending availability. A further candidate need not be provided by the provisioner unless the core will be vacated this block. This is the case when either bitfields indicate the current core occupant has been made available or a timeout is reached.
- The availability core is occupied by a parachain block candidate pending availability. A further candidate need
not be provided by the provisioner unless the core will be vacated this block. This is the case when either
bitfields indicate the current core occupant has been made available or a timeout is reached.
- If `bitfields_indicate_availability`
- If `Some(scheduled_core) = occupied_core.next_up_on_available`, the core will be vacated and in need of a provisioned candidate. The provisioner requests a backable candidate from [prospective parachains](../backing/prospective-parachains.md) with the cores scheduled `para_id` and a required path with one entry. This entry corresponds to the parablock candidate previously occupying this core, which was made available and can be built upon even though it hasnt been seen as included in a relay chain block yet. See the Required Path section below for more detail.
- If `occupied_core.next_up_on_available` is `None`, then the core being vacated is unscheduled and doesnt need to be provisioned with a candidate.
- If `Some(scheduled_core) = occupied_core.next_up_on_available`, the core will be vacated and in need of a
provisioned candidate. The provisioner requests a backable candidate from [prospective
parachains](../backing/prospective-parachains.md) with the cores scheduled `para_id` and a required path with
one entry. This entry corresponds to the parablock candidate previously occupying this core, which was made
available and can be built upon even though it hasnt been seen as included in a relay chain block yet. See the
Required Path section below for more detail.
- If `occupied_core.next_up_on_available` is `None`, then the core being vacated is unscheduled and doesnt need
to be provisioned with a candidate.
- Else-if `occupied_core.time_out_at == block_number`
- If `Some(scheduled_core) = occupied_core.next_up_on_timeout`, the core will be vacated and in need of a provisioned candidate. A candidate is requested in exactly the same way as with `CoreState::Scheduled`.
- Else the core being vacated is unscheduled and doesnt need to be provisioned with a candidate
The end result of this process is a vector of `CandidateHash`s, sorted in order of their core index.
- If `Some(scheduled_core) = occupied_core.next_up_on_timeout`, the core will be vacated and in need of a
provisioned candidate. A candidate is requested in exactly the same way as with `CoreState::Scheduled`.
- Else the core being vacated is unscheduled and doesnt need to be provisioned with a candidate The end result of
this process is a vector of `CandidateHash`s, sorted in order of their core index.
#### Required Path
Required path is a parameter for `ProspectiveParachainsMessage::GetBackableCandidate`, which the provisioner sends in candidate selection.
Required path is a parameter for `ProspectiveParachainsMessage::GetBackableCandidate`, which the provisioner sends in
candidate selection.
An empty required path indicates that the requested candidate should be a direct child of the most recently included parablock for the given `para_id` as of the given relay parent.
An empty required path indicates that the requested candidate should be a direct child of the most recently included
parablock for the given `para_id` as of the given relay parent.
In contrast, a required path with one or more entries prompts [prospective parachains](../backing/prospective-parachains.md) to step forward through its fragment tree for the given `para_id` and relay parent until the desired parablock is reached. We then select a direct child of that parablock to pass to the provisioner.
In contrast, a required path with one or more entries prompts [prospective
parachains](../backing/prospective-parachains.md) to step forward through its fragment tree for the given `para_id` and
relay parent until the desired parablock is reached. We then select a direct child of that parablock to pass to the
provisioner.
The parablocks making up a required path do not need to have been previously seen as included in relay chain blocks. Thus the ability to provision backable candidates based on a required path effectively decouples backing from inclusion.
The parablocks making up a required path do not need to have been previously seen as included in relay chain blocks.
Thus the ability to provision backable candidates based on a required path effectively decouples backing from inclusion.
### Legacy Candidate Selection
### Legacy Candidate Selection
Legacy candidate selection takes place in the provisioner. Thus the provisioner needs to keep an up to date record of all [backed_candidates](../../types/backing.md#backed-candidate) `PerRelayParent` to pick from.
Legacy candidate selection takes place in the provisioner. Thus the provisioner needs to keep an up to date record of
all [backed_candidates](../../types/backing.md#backed-candidate) `PerRelayParent` to pick from.
The goal of candidate selection is to determine which cores are free, and then to the degree possible, pick a candidate appropriate to each free core.
The goal of candidate selection is to determine which cores are free, and then to the degree possible, pick a candidate
appropriate to each free core.
To determine availability:
@@ -149,38 +215,54 @@ To determine availability:
- For each core state:
- On `CoreState::Scheduled`, then we can make an `OccupiedCoreAssumption::Free`.
- On `CoreState::Occupied`, then we may be able to make an assumption:
- If the bitfields indicate availability and there is a scheduled `next_up_on_available`, then we can make an `OccupiedCoreAssumption::Included`.
- If the bitfields do not indicate availability, and there is a scheduled `next_up_on_time_out`, and `occupied_core.time_out_at == block_number_under_production`, then we can make an `OccupiedCoreAssumption::TimedOut`.
- If the bitfields indicate availability and there is a scheduled `next_up_on_available`, then we can make an
`OccupiedCoreAssumption::Included`.
- If the bitfields do not indicate availability, and there is a scheduled `next_up_on_time_out`, and
`occupied_core.time_out_at == block_number_under_production`, then we can make an
`OccupiedCoreAssumption::TimedOut`.
- If we did not make an `OccupiedCoreAssumption`, then continue on to the next core.
- Now compute the core's `validation_data_hash`: get the `PersistedValidationData` from the runtime, given the known `ParaId` and `OccupiedCoreAssumption`;
- Now compute the core's `validation_data_hash`: get the `PersistedValidationData` from the runtime, given the known
`ParaId` and `OccupiedCoreAssumption`;
- Find an appropriate candidate for the core.
- There are two constraints: `backed_candidate.candidate.descriptor.para_id == scheduled_core.para_id && candidate.candidate.descriptor.validation_data_hash == computed_validation_data_hash`.
- In the event that more than one candidate meets the constraints, selection between the candidates is arbitrary. However, not more than one candidate can be selected per core.
- There are two constraints: `backed_candidate.candidate.descriptor.para_id == scheduled_core.para_id &&
candidate.candidate.descriptor.validation_data_hash == computed_validation_data_hash`.
- In the event that more than one candidate meets the constraints, selection between the candidates is arbitrary.
However, not more than one candidate can be selected per core.
The end result of this process is a vector of `CandidateHash`s, sorted in order of their core index.
### Retrieving Full `BackedCandidate`s for Selected Hashes
Legacy candidate selection and prospective parachains candidate selection both leave us with a vector of `CandidateHash`s. These are passed to the backing subsystem with `CandidateBackingMessage::GetBackedCandidates`.
Legacy candidate selection and prospective parachains candidate selection both leave us with a vector of
`CandidateHash`s. These are passed to the backing subsystem with `CandidateBackingMessage::GetBackedCandidates`.
The response is a vector of `BackedCandidate`s, sorted in order of their core index and ready to be provisioned to block authoring. The candidate selection and retrieval process should select at maximum one candidate which upgrades the runtime validation code.
The response is a vector of `BackedCandidate`s, sorted in order of their core index and ready to be provisioned to block
authoring. The candidate selection and retrieval process should select at maximum one candidate which upgrades the
runtime validation code.
## Glossary
- **Relay-parent:**
- A particular relay-chain block which serves as an anchor and reference point for processes and data which depend on relay-chain state.
- **Active Leaf:**
- A relay chain block which is the head of an active fork of the relay chain.
- **Relay-parent:**
- A particular relay-chain block which serves as an anchor and reference point for processes and data which depend on
relay-chain state.
- **Active Leaf:**
- A relay chain block which is the head of an active fork of the relay chain.
- Block authorship provisioning jobs are spawned per active leaf and concluded for any leaves which become inactive.
- **Candidate Selection:**
- **Candidate Selection:**
- The process by which the provisioner selects backable parachain block candidates to pass to block authoring.
- Two versions, prospective parachains candidate selection and legacy candidate selection. See their respective protocol sections for details.
- **Availability Core:**
- Often referred to simply as "cores", availability cores are an abstraction used for resource management. For the provisioner, availability cores are most relevant in that core states determine which `para_id`s to provision backable candidates for.
- For more on availability cores see [Scheduler Module: Availability Cores](../../runtime/scheduler.md#availability-cores)
- Two versions, prospective parachains candidate selection and legacy candidate selection. See their respective
protocol sections for details.
- **Availability Core:**
- Often referred to simply as "cores", availability cores are an abstraction used for resource management. For the
provisioner, availability cores are most relevant in that core states determine which `para_id`s to provision
backable candidates for.
- For more on availability cores see [Scheduler Module: Availability
Cores](../../runtime/scheduler.md#availability-cores)
- **Availability Bitfield:**
- Often referred to simply as a "bitfield", an availability bitfield represents the view of parablock candidate availability from a particular validator's perspective. Each bit in the bitfield corresponds to a single [availability core](../../runtime-api/availability-cores.md).
- Often referred to simply as a "bitfield", an availability bitfield represents the view of parablock candidate
availability from a particular validator's perspective. Each bit in the bitfield corresponds to a single
[availability core](../../runtime-api/availability-cores.md).
- For more on availability bitfields see [availability](../../types/availability.md)
- **Backable vs. Backed:**
- Note that we sometimes use "backed" to refer to candidates that are "backable", but not yet backed on chain.
- Backable means that a quorum of the candidate's assigned backing group have provided signed affirming statements.
- Backable means that a quorum of the candidate's assigned backing group have provided signed affirming statements.
@@ -1,45 +1,70 @@
# PVF Pre-checker
The PVF pre-checker is a subsystem that is responsible for watching the relay chain for new PVFs that require pre-checking. Head over to [overview] for the PVF pre-checking process overview.
The PVF pre-checker is a subsystem that is responsible for watching the relay chain for new PVFs that require
pre-checking. Head over to [overview] for the PVF pre-checking process overview.
## Protocol
There is no dedicated input mechanism for PVF pre-checker. Instead, PVF pre-checker looks on the `ActiveLeavesUpdate` event stream for work.
There is no dedicated input mechanism for PVF pre-checker. Instead, PVF pre-checker looks on the `ActiveLeavesUpdate`
event stream for work.
This subsytem does not produce any output messages either. The subsystem will, however, send messages to the [Runtime API] subsystem to query for the pending PVFs and to submit votes. In addition to that, it will also communicate with [Candidate Validation] Subsystem to request PVF pre-check.
This subsytem does not produce any output messages either. The subsystem will, however, send messages to the [Runtime
API] subsystem to query for the pending PVFs and to submit votes. In addition to that, it will also communicate with
[Candidate Validation] Subsystem to request PVF pre-check.
## Functionality
If the node is running in a collator mode, this subsystem will be disabled. The PVF pre-checker subsystem keeps track of the PVFs that are relevant for the subsystem.
If the node is running in a collator mode, this subsystem will be disabled. The PVF pre-checker subsystem keeps track of
the PVFs that are relevant for the subsystem.
To be relevant for the subsystem, a PVF must be returned by the [`pvfs_require_precheck` runtime API][PVF pre-checking runtime API] in any of the active leaves. If the PVF is not present in any of the active leaves, it ceases to be relevant.
To be relevant for the subsystem, a PVF must be returned by the [`pvfs_require_precheck` runtime API][PVF pre-checking
runtime API] in any of the active leaves. If the PVF is not present in any of the active leaves, it ceases to be
relevant.
When a PVF just becomes relevant, the subsystem will send a message to the [Candidate Validation] subsystem asking for the pre-check.
When a PVF just becomes relevant, the subsystem will send a message to the [Candidate Validation] subsystem asking for
the pre-check.
Upon receving a message from the candidate-validation subsystem, the pre-checker will note down that the PVF has its judgement and will also sign and submit a [`PvfCheckStatement`][PvfCheckStatement] via the [`submit_pvf_check_statement` runtime API][PVF pre-checking runtime API]. In case, a judgement was received for a PVF that is no longer in view it is ignored.
Upon receving a message from the candidate-validation subsystem, the pre-checker will note down that the PVF has its
judgement and will also sign and submit a [`PvfCheckStatement`][PvfCheckStatement] via the [`submit_pvf_check_statement`
runtime API][PVF pre-checking runtime API]. In case, a judgement was received for a PVF that is no longer in view it is
ignored.
Since a vote only is valid during [one session][overview], the subsystem will have to resign and submit the statements for the new session. The new session is assumed to be started if at least one of the leaves has a greater session index that was previously observed in any of the leaves.
Since a vote only is valid during [one session][overview], the subsystem will have to resign and submit the statements
for the new session. The new session is assumed to be started if at least one of the leaves has a greater session index
that was previously observed in any of the leaves.
The subsystem tracks all the statements that it submitted within a session. If for some reason a PVF became irrelevant and then becomes relevant again, the subsystem will not submit a new statement for that PVF within the same session.
The subsystem tracks all the statements that it submitted within a session. If for some reason a PVF became irrelevant
and then becomes relevant again, the subsystem will not submit a new statement for that PVF within the same session.
If the node is not in the active validator set, it will still perform all the checks. However, it will only submit the check statements when the node is in the active validator set.
If the node is not in the active validator set, it will still perform all the checks. However, it will only submit the
check statements when the node is in the active validator set.
### Rejecting failed PVFs
It is possible that the candidate validation was not able to check the PVF, e.g. if it timed out. In that case, the PVF pre-checker will vote against it. This is considered safe, as there is no slashing for being on the wrong side of a pre-check vote.
It is possible that the candidate validation was not able to check the PVF, e.g. if it timed out. In that case, the PVF
pre-checker will vote against it. This is considered safe, as there is no slashing for being on the wrong side of a
pre-check vote.
Rejecting instead of abstaining is better in several ways:
1. Conclusion is reached faster - we have actual votes, instead of relying on a timeout.
1. Being strict in pre-checking makes it safer to be more lenient in preparation errors afterwards. Hence we have more leeway in avoiding raising dubious disputes, without making things less secure.
1. Being strict in pre-checking makes it safer to be more lenient in preparation errors afterwards. Hence we have more
leeway in avoiding raising dubious disputes, without making things less secure.
Also, if we only abstain, an attacker can specially craft a PVF wasm blob so that it will fail on e.g. 50% of the validators. In that case a supermajority will never be reached and the vote will repeat multiple times, most likely with the same result (since all votes are cleared on a session change). This is avoided by rejecting failed PVFs, and by only requiring 1/3 of validators to reject a PVF to reach a decision.
Also, if we only abstain, an attacker can specially craft a PVF wasm blob so that it will fail on e.g. 50% of the
validators. In that case a supermajority will never be reached and the vote will repeat multiple times, most likely with
the same result (since all votes are cleared on a session change). This is avoided by rejecting failed PVFs, and by only
requiring 1/3 of validators to reject a PVF to reach a decision.
### Note on Disputes
Having a pre-checking phase allows us to make certain assumptions later when preparing the PVF for execution. If a runtime passed pre-checking, then we know that the runtime should be valid, and therefore any issue during preparation for execution can be assumed to be a local problem on the current node.
Having a pre-checking phase allows us to make certain assumptions later when preparing the PVF for execution. If a
runtime passed pre-checking, then we know that the runtime should be valid, and therefore any issue during preparation
for execution can be assumed to be a local problem on the current node.
For this reason, even deterministic preparation errors should not trigger disputes. And since we do not dispute as a result of the pre-checking phase, as stated above, it should be impossible for preparation in general to result in disputes.
For this reason, even deterministic preparation errors should not trigger disputes. And since we do not dispute as a
result of the pre-checking phase, as stated above, it should be impossible for preparation in general to result in
disputes.
[overview]: ../../pvf-prechecking.md
[Runtime API]: runtime-api.md
@@ -1,6 +1,7 @@
# Runtime API
The Runtime API subsystem is responsible for providing a single point of access to runtime state data via a set of pre-determined queries. This prevents shared ownership of a blockchain client resource by providing
The Runtime API subsystem is responsible for providing a single point of access to runtime state data via a set of
pre-determined queries. This prevents shared ownership of a blockchain client resource by providing
## Protocol
@@ -10,8 +11,11 @@ Output: None
## Functionality
On receipt of `RuntimeApiMessage::Request(relay_parent, request)`, answer the request using the post-state of the `relay_parent` provided and provide the response to the side-channel embedded within the request.
On receipt of `RuntimeApiMessage::Request(relay_parent, request)`, answer the request using the post-state of the
`relay_parent` provided and provide the response to the side-channel embedded within the request.
## Jobs
> TODO Don't limit requests based on parent hash, but limit caching. No caching should be done for any requests on `relay_parent`s that are not active based on `ActiveLeavesUpdate` messages. Maybe with some leeway for things that have just been stopped.
> TODO Don't limit requests based on parent hash, but limit caching. No caching should be done for any requests on
> `relay_parent`s that are not active based on `ActiveLeavesUpdate` messages. Maybe with some leeway for things that
> have just been stopped.
@@ -1,19 +1,39 @@
# Approval Process
The Approval Process is the mechanism by which the relay-chain ensures that only valid parablocks are finalized and that backing validators are held accountable for managing to get bad blocks included into the relay chain.
The Approval Process is the mechanism by which the relay-chain ensures that only valid parablocks are finalized and that
backing validators are held accountable for managing to get bad blocks included into the relay chain.
Having a parachain include a bad block into a fork of the relay-chain is not catastrophic as long as the block isn't finalized by the relay-chain's finality gadget, GRANDPA. If the block isn't finalized, that means that the fork of the relay-chain can be reverted in favor of another by means of a dynamic fork-choice rule which leads honest validators to ignore any forks containing that parablock.
Having a parachain include a bad block into a fork of the relay-chain is not catastrophic as long as the block isn't
finalized by the relay-chain's finality gadget, GRANDPA. If the block isn't finalized, that means that the fork of the
relay-chain can be reverted in favor of another by means of a dynamic fork-choice rule which leads honest validators to
ignore any forks containing that parablock.
Dealing with a bad parablock proceeds in these stages:
1. Detection
2. Escalation
3. Consequences
First, the bad block must be detected by an honest party. Second, the honest party must escalate the bad block to be checked by all validators. And last, the correct consequences of a bad block must occur. The first consequence, as mentioned above, is to revert the chain so what full nodes perceive to be best no longer contains the bad parablock. The second consequence is to slash all malicious validators. Note that, if the chain containing the bad block is reverted, that the result of the dispute needs to be transplanted or at least transplantable to all other forks of the chain so that malicious validators are slashed in all possible histories. Phrased alternatively, there needs to be no possible relay-chain in which malicious validators get away cost-free.
First, the bad block must be detected by an honest party. Second, the honest party must escalate the bad block to be
checked by all validators. And last, the correct consequences of a bad block must occur. The first consequence, as
mentioned above, is to revert the chain so what full nodes perceive to be best no longer contains the bad parablock. The
second consequence is to slash all malicious validators. Note that, if the chain containing the bad block is reverted,
that the result of the dispute needs to be transplanted or at least transplantable to all other forks of the chain so
that malicious validators are slashed in all possible histories. Phrased alternatively, there needs to be no possible
relay-chain in which malicious validators get away cost-free.
Accepting a parablock is the end result of having passed through the detection stage without dispute, or having passed through the escalation/dispute stage with a positive outcome. For this to work, we need the detection procedure to have the properties that enough honest validators are always selected to check the parablock and that they cannot be interfered with by an adversary. This needs to be balanced with the scaling concern of parachains in general: the easiest way to get the first property is to have everyone check everything, but that is clearly too heavy. So we also have a desired constraint on the other property that we have as few validators as possible check any particular parablock. Our assignment function is the method by which we select validators to do approval checks on parablocks.
Accepting a parablock is the end result of having passed through the detection stage without dispute, or having passed
through the escalation/dispute stage with a positive outcome. For this to work, we need the detection procedure to have
the properties that enough honest validators are always selected to check the parablock and that they cannot be
interfered with by an adversary. This needs to be balanced with the scaling concern of parachains in general: the
easiest way to get the first property is to have everyone check everything, but that is clearly too heavy. So we also
have a desired constraint on the other property that we have as few validators as possible check any particular
parablock. Our assignment function is the method by which we select validators to do approval checks on parablocks.
It often makes more sense to think of relay-chain blocks as having been approved or not as opposed to thinking about whether parablocks have been approved. A relay-chain block containing a single bad parablock needs to be reverted, and a relay-chain block that contains only approved parablocks can be called approved, as long as its parent relay-chain block is also approved. It is important that the validity of any particular relay-chain block depend on the validity of its ancestry, so we do not finalize a block which has a bad block in its ancestry.
It often makes more sense to think of relay-chain blocks as having been approved or not as opposed to thinking about
whether parablocks have been approved. A relay-chain block containing a single bad parablock needs to be reverted, and a
relay-chain block that contains only approved parablocks can be called approved, as long as its parent relay-chain block
is also approved. It is important that the validity of any particular relay-chain block depend on the validity of its
ancestry, so we do not finalize a block which has a bad block in its ancestry.
```dot process Approval Process
digraph {
@@ -24,29 +44,56 @@ digraph {
Approval has roughly two parts:
- **Assignments** determines which validators performs approval checks on which candidates. It ensures that each candidate receives enough random checkers, while reducing adversaries' odds for obtaining enough checkers, and limiting adversaries' foreknowledge. It tracks approval votes to identify when "no show" approval check takes suspiciously long, perhaps indicating the node being under attack, and assigns more checks in this case. It tracks relay chain equivocations to determine when adversaries possibly gained foreknowledge about assignments, and adds additional checks in this case.
- **Assignments** determines which validators performs approval checks on which candidates. It ensures that each
candidate receives enough random checkers, while reducing adversaries' odds for obtaining enough checkers, and
limiting adversaries' foreknowledge. It tracks approval votes to identify when "no show" approval check takes
suspiciously long, perhaps indicating the node being under attack, and assigns more checks in this case. It tracks
relay chain equivocations to determine when adversaries possibly gained foreknowledge about assignments, and adds
additional checks in this case.
- **Approval checks** listens to the assignments subsystem for outgoing assignment notices that we shall check specific candidates. It then performs these checks by first invoking the reconstruction subsystem to obtain the candidate, second invoking the candidate validity utility subsystem upon the candidate, and finally sending out an approval vote, or perhaps initiating a dispute.
- **Approval checks** listens to the assignments subsystem for outgoing assignment notices that we shall check specific
candidates. It then performs these checks by first invoking the reconstruction subsystem to obtain the candidate,
second invoking the candidate validity utility subsystem upon the candidate, and finally sending out an approval vote,
or perhaps initiating a dispute.
These both run first as off-chain consensus protocols using messages gossiped among all validators, and second as an on-chain record of this off-chain protocols' progress after the fact. We need the on-chain protocol to provide rewards for the off-chain protocol.
These both run first as off-chain consensus protocols using messages gossiped among all validators, and second as an
on-chain record of this off-chain protocols' progress after the fact. We need the on-chain protocol to provide rewards
for the off-chain protocol.
Approval requires two gossiped message types, assignment notices created by its assignments subsystem, and approval votes sent by our approval checks subsystem when authorized by the candidate validity utility subsystem.
Approval requires two gossiped message types, assignment notices created by its assignments subsystem, and approval
votes sent by our approval checks subsystem when authorized by the candidate validity utility subsystem.
### Approval keys
## Approval keys
We need two separate keys for the approval subsystem:
- **Approval assignment keys** are sr25519/schnorrkel keys used only for the assignment criteria VRFs. We implicitly sign assignment notices with approval assignment keys by including their relay chain context and additional data in the VRF's extra message, but exclude these from its VRF input.
- **Approval assignment keys** are sr25519/schnorrkel keys used only for the assignment criteria VRFs. We implicitly
sign assignment notices with approval assignment keys by including their relay chain context and additional data in
the VRF's extra message, but exclude these from its VRF input.
- **Approval vote keys** would only sign off on candidate parablock validity and has no natural key type restrictions. There's no need for this to actually embody a new session key type. We just want to make a distinction between assignments and approvals, although distant future node configurations might favor separate roles. We re-use the same keys as are used for parachain backing in practice.
- **Approval vote keys** would only sign off on candidate parablock validity and has no natural key type restrictions.
There's no need for this to actually embody a new session key type. We just want to make a distinction between
assignments and approvals, although distant future node configurations might favor separate roles. We re-use the same
keys as are used for parachain backing in practice.
Approval vote keys could relatively easily be handled by some hardened signer tooling, perhaps even HSMs assuming we select ed25519 for approval vote keys. Approval assignment keys might or might not support hardened signer tooling, but doing so sounds far more complex. In fact, assignment keys determine only VRF outputs that determine approval checker assignments, for which they can only act or not act, so they cannot equivocate, lie, etc. and represent little if any slashing risk for validator operators.
Approval vote keys could relatively easily be handled by some hardened signer tooling, perhaps even HSMs assuming we
select ed25519 for approval vote keys. Approval assignment keys might or might not support hardened signer tooling, but
doing so sounds far more complex. In fact, assignment keys determine only VRF outputs that determine approval checker
assignments, for which they can only act or not act, so they cannot equivocate, lie, etc. and represent little if any
slashing risk for validator operators.
In future, we shall determine which among the several hardening techniques best benefits the network as a whole. We could provide a multi-process multi-machine architecture for validators, perhaps even reminiscent of GNUNet, or perhaps more resembling smart HSM tooling. We might instead design a system that more resembled full systems, like like Cosmos' sentry nodes. In either case, approval assignments might be handled by a slightly hardened machine, but not necessarily nearly as hardened as approval votes, but approval votes machines must similarly run foreign WASM code, which increases their risk, so assignments being separate sounds helpful.
In future, we shall determine which among the several hardening techniques best benefits the network as a whole. We
could provide a multi-process multi-machine architecture for validators, perhaps even reminiscent of GNUNet, or perhaps
more resembling smart HSM tooling. We might instead design a system that more resembled full systems, like like Cosmos'
sentry nodes. In either case, approval assignments might be handled by a slightly hardened machine, but not necessarily
nearly as hardened as approval votes, but approval votes machines must similarly run foreign WASM code, which increases
their risk, so assignments being separate sounds helpful.
## Assignments
Approval assignment determines on which candidate parachain blocks each validator performs approval checks. An approval session considers only one relay chain block and assigns only those candidates that relay chain block declares available.
Approval assignment determines on which candidate parachain blocks each validator performs approval checks. An approval
session considers only one relay chain block and assigns only those candidates that relay chain block declares
available.
Assignment balances several concerns:
@@ -54,149 +101,286 @@ Assignment balances several concerns:
- ensures enough checkers, and
- distributes assignments relatively equitably.
Assignees determine their own assignments to check specific candidates using two or three assignment criteria. Assignees never reveal their assignments until relevant, and gossip delays assignments sent early, which limits others' foreknowledge. Assignees learn their assignment only with the relay chain block.
Assignees determine their own assignments to check specific candidates using two or three assignment criteria.
Assignees never reveal their assignments until relevant, and gossip delays assignments sent early, which limits others'
foreknowledge. Assignees learn their assignment only with the relay chain block.
All criteria require the validator evaluate a verifiable random function (VRF) using their VRF secret key. All criteria input specific data called "stories" about the session's relay chain block, and output candidates to check and a precedence called a `DelayTranche`.
All criteria require the validator evaluate a verifiable random function (VRF) using their VRF secret key. All criteria
input specific data called "stories" about the session's relay chain block, and output candidates to check and a
precedence called a `DelayTranche`.
We liberate availability cores when their candidate becomes available of course, but one approval assignment criteria continues associating each candidate with the core number it occupied when it became available.
We liberate availability cores when their candidate becomes available of course, but one approval assignment criteria
continues associating each candidate with the core number it occupied when it became available.
Assignment operates in loosely timed rounds determined by this `DelayTranche`s, which proceed roughly 12 times faster than six second block production assuming half second gossip times. If a candidate `C` needs more approval checkers by the time we reach round `t` then any validators with an assignment to `C` in delay tranche `t` gossip their send assignment notice for `C`. We continue until all candidates have enough approval checkers assigned. We take entire tranches together if we do not yet have enough, so we expect strictly more than enough checkers. We also take later tranches if some checkers return their approval votes too slow (see no shows below).
Assignment operates in loosely timed rounds determined by this `DelayTranche`s, which proceed roughly 12 times faster
than six second block production assuming half second gossip times. If a candidate `C` needs more approval checkers by
the time we reach round `t` then any validators with an assignment to `C` in delay tranche `t` gossip their send
assignment notice for `C`. We continue until all candidates have enough approval checkers assigned. We take entire
tranches together if we do not yet have enough, so we expect strictly more than enough checkers. We also take later
tranches if some checkers return their approval votes too slow (see no shows below).
Assignment ensures validators check those relay chain blocks for which they have delay tranche zero aka the highest precedence, so that adversaries always face honest checkers equal to the expected number of assignments with delay tranche zero.
Assignment ensures validators check those relay chain blocks for which they have delay tranche zero aka the highest
precedence, so that adversaries always face honest checkers equal to the expected number of assignments with delay
tranche zero.
Among these criteria, the BABE VRF output provides the story for two, which reduces how frequently adversaries could position their own checkers. We have one criterion whose story consists of the candidate's block hash plus external knowledge that a relay chain equivocation exists with a conflicting candidate. It provides unforeseeable assignments when adversaries gain foreknowledge about the other two by committing an equivocation in relay chain block production.
Among these criteria, the BABE VRF output provides the story for two, which reduces how frequently adversaries could
position their own checkers. We have one criterion whose story consists of the candidate's block hash plus external
knowledge that a relay chain equivocation exists with a conflicting candidate. It provides unforeseeable assignments
when adversaries gain foreknowledge about the other two by committing an equivocation in relay chain block production.
## Announcements / Notices
We gossip assignment notices among nodes so that all validators know which validators should check each candidate, and if any candidate requires more checkers.
We gossip assignment notices among nodes so that all validators know which validators should check each candidate, and
if any candidate requires more checkers.
Assignment notices consist of a relay chain context given by a block hash, an assignment criteria, consisting of the criteria identifier and optionally a criteria specific field, an assignee identifier, and a VRF signature by the assignee, which itself consists of a VRF pre-output and a DLEQ proof. Its VRF input consists of the criteria, usually including a criteria specific field, and a "story" about its relay chain context block.
Assignment notices consist of a relay chain context given by a block hash, an assignment criteria, consisting of the
criteria identifier and optionally a criteria specific field, an assignee identifier, and a VRF signature by the
assignee, which itself consists of a VRF pre-output and a DLEQ proof. Its VRF input consists of the criteria, usually
including a criteria specific field, and a "story" about its relay chain context block.
We never include stories inside the gossip messages containing assignment notices, but require each validator reconstruct them. We never care about assignments in the disputes process, so this does not complicate remote disputes.
We never include stories inside the gossip messages containing assignment notices, but require each validator
reconstruct them. We never care about assignments in the disputes process, so this does not complicate remote disputes.
In a Schnorr VRF, there is an extra signed message distinct from this input, which we set to the relay chain block hash. As a result, assignment notices are self signing and can be "politely" gossiped without additional signatures, meaning between nodes who can compute the story from the relay chain context. In other words, if we cannot compute the story required by an assignment notice's VRF part then our self signing property fails and we cannot verify its origin. We could fix this with either another signature layer (64 bytes) or by including the VRF input point computed from the story (32 bytes), but doing so appears unhelpful.
In a Schnorr VRF, there is an extra signed message distinct from this input, which we set to the relay chain block hash.
As a result, assignment notices are self signing and can be "politely" gossiped without additional signatures, meaning
between nodes who can compute the story from the relay chain context. In other words, if we cannot compute the story
required by an assignment notice's VRF part then our self signing property fails and we cannot verify its origin. We
could fix this with either another signature layer (64 bytes) or by including the VRF input point computed from the
story (32 bytes), but doing so appears unhelpful.
Any validator could send their assignment notices and/or approval votes too early. We gossip the approval votes early because they represent a major commitment by the validator. We delay gossiping the assignment notices until they agree with our local clock however. We also impose a politeness condition that the recipient knows the relay chain context used by the assignment notice.
Any validator could send their assignment notices and/or approval votes too early. We gossip the approval votes early
because they represent a major commitment by the validator. We delay gossiping the assignment notices until they agree
with our local clock however. We also impose a politeness condition that the recipient knows the relay chain context
used by the assignment notice.
## Stories
We based assignment criteria upon two possible "stories" about the relay chain block `R` that included the candidate aka declared the candidate available. All stories have an output that attempts to minimize adversarial influence, which then acts as the VRF input for an assignment criteria.
We based assignment criteria upon two possible "stories" about the relay chain block `R` that included the candidate aka
declared the candidate available. All stories have an output that attempts to minimize adversarial influence, which
then acts as the VRF input for an assignment criteria.
We first have a `RelayVRFStory` that outputs the randomness from another VRF output produced by the relay chain block producer when creating `R`. Among honest nodes, only this one relay chain block producer who creates `R` knew the story in advance, and even they knew nothing two epochs previously.
We first have a `RelayVRFStory` that outputs the randomness from another VRF output produced by the relay chain block
producer when creating `R`. Among honest nodes, only this one relay chain block producer who creates `R` knew the story
in advance, and even they knew nothing two epochs previously.
In BABE, we create this value calling `schnorrkel::vrf::VRFInOut::make_bytes` with a context "A&V RC-VRF", with the `VRFInOut` coming from either the VRF that authorized block production for primary blocks, or else from the secondary block VRF for the secondary block type.
In BABE, we create this value calling `schnorrkel::vrf::VRFInOut::make_bytes` with a context "A&V RC-VRF", with the
`VRFInOut` coming from either the VRF that authorized block production for primary blocks, or else from the secondary
block VRF for the secondary block type.
In Sassafras, we shall always use the non-anonymized recycling VRF output, never the anonymized ring VRF that authorizes block production. We do not currently know if Sassafras shall have a separate schnorrkel key, but if it reuses its ring VRF key there is an equivalent `ring_vrf::VRFInOut::make_bytes`.
In Sassafras, we shall always use the non-anonymized recycling VRF output, never the anonymized ring VRF that authorizes
block production. We do not currently know if Sassafras shall have a separate schnorrkel key, but if it reuses its ring
VRF key there is an equivalent `ring_vrf::VRFInOut::make_bytes`.
We like that `RelayVRFStory` admits relatively few choices, but an adversary who equivocates in relay chain block production could learn assignments that depend upon the `RelayVRFStory` too early because the same relay chain VRF appears in multiple blocks.
We like that `RelayVRFStory` admits relatively few choices, but an adversary who equivocates in relay chain block
production could learn assignments that depend upon the `RelayVRFStory` too early because the same relay chain VRF
appears in multiple blocks.
We therefore provide a secondary `RelayEquivocationStory` that outputs the candidate's block hash, but only for candidate equivocations. We say a candidate `C` in `R` is an equivocation when there exists another relay chain block `R1` that equivocates for `R` in the sense that `R` and `R1` have the same `RelayVRFStory`, but `R` contains `C` and `R1` does not contain `C`.
We therefore provide a secondary `RelayEquivocationStory` that outputs the candidate's block hash, but only for
candidate equivocations. We say a candidate `C` in `R` is an equivocation when there exists another relay chain block
`R1` that equivocates for `R` in the sense that `R` and `R1` have the same `RelayVRFStory`, but `R` contains `C` and
`R1` does not contain `C`.
We want checkers for candidate equivocations that lie outside our preferred relay chain as well, which represents a slightly different usage for the assignments module, and might require more information in the gossip messages.
We want checkers for candidate equivocations that lie outside our preferred relay chain as well, which represents a
slightly different usage for the assignments module, and might require more information in the gossip messages.
## Assignment criteria
Assignment criteria compute actual assignments using stories and the validators' secret approval assignment key. Assignment criteria output a `Position` consisting of both a `ParaId` to be checked, as well as a precedence `DelayTranche` for when the assignment becomes valid.
Assignment criteria compute actual assignments using stories and the validators' secret approval assignment key.
Assignment criteria output a `Position` consisting of both a `ParaId` to be checked, as well as a precedence
`DelayTranche` for when the assignment becomes valid.
Assignment criteria come in three flavors, `RelayVRFModulo`, `RelayVRFDelay` and `RelayEquivocation`. Among these, both `RelayVRFModulo` and `RelayVRFDelay` run a VRF whose input is the output of a `RelayVRFStory`, while `RelayEquivocation` runs a VRF whose input is the output of a `RelayEquivocationStory`.
Assignment criteria come in three flavors, `RelayVRFModulo`, `RelayVRFDelay` and `RelayEquivocation`. Among these, both
`RelayVRFModulo` and `RelayVRFDelay` run a VRF whose input is the output of a `RelayVRFStory`, while `RelayEquivocation`
runs a VRF whose input is the output of a `RelayEquivocationStory`.
Among these, we have two distinct VRF output computations:
`RelayVRFModulo` runs several distinct samples whose VRF input is the `RelayVRFStory` and the sample number. It computes the VRF output with `schnorrkel::vrf::VRFInOut::make_bytes` using the context "A&V Core", reduces this number modulo the number of availability cores, and outputs the candidate just declared available by, and included by aka leaving, that availability core. We drop any samples that return no candidate because no candidate was leaving the sampled availability core in this relay chain block. We choose three samples initially, but we could make polkadot more secure and efficient by increasing this to four or five, and reducing the backing checks accordingly. All successful `RelayVRFModulo` samples are assigned delay tranche zero.
`RelayVRFModulo` runs several distinct samples whose VRF input is the `RelayVRFStory` and the sample number. It
computes the VRF output with `schnorrkel::vrf::VRFInOut::make_bytes` using the context "A&V Core", reduces this number
modulo the number of availability cores, and outputs the candidate just declared available by, and included by aka
leaving, that availability core. We drop any samples that return no candidate because no candidate was leaving the
sampled availability core in this relay chain block. We choose three samples initially, but we could make Polkadot more
secure and efficient by increasing this to four or five, and reducing the backing checks accordingly. All successful
`RelayVRFModulo` samples are assigned delay tranche zero.
There is no sampling process for `RelayVRFDelay` and `RelayEquivocation`. We instead run them on specific candidates and they compute a delay from their VRF output. `RelayVRFDelay` runs for all candidates included under, aka declared available by, a relay chain block, and inputs the associated VRF output via `RelayVRFStory`. `RelayEquivocation` runs only on candidate block equivocations, and inputs their block hashes via the `RelayEquivocation` story.
There is no sampling process for `RelayVRFDelay` and `RelayEquivocation`. We instead run them on specific candidates
and they compute a delay from their VRF output. `RelayVRFDelay` runs for all candidates included under, aka declared
available by, a relay chain block, and inputs the associated VRF output via `RelayVRFStory`. `RelayEquivocation` runs
only on candidate block equivocations, and inputs their block hashes via the `RelayEquivocation` story.
`RelayVRFDelay` and `RelayEquivocation` both compute their output with `schnorrkel::vrf::VRFInOut::make_bytes` using the context "A&V Tranche" and reduce the result modulo `num_delay_tranches + zeroth_delay_tranche_width`, and consolidate results 0 through `zeroth_delay_tranche_width` to be 0. In this way, they ensure the zeroth delay tranche has `zeroth_delay_tranche_width+1` times as many assignments as any other tranche.
`RelayVRFDelay` and `RelayEquivocation` both compute their output with `schnorrkel::vrf::VRFInOut::make_bytes` using the
context "A&V Tranche" and reduce the result modulo `num_delay_tranches + zeroth_delay_tranche_width`, and consolidate
results 0 through `zeroth_delay_tranche_width` to be 0. In this way, they ensure the zeroth delay tranche has
`zeroth_delay_tranche_width+1` times as many assignments as any other tranche.
As future work (or TODO?), we should merge assignment notices with the same delay and story using `vrf_merge`. We cannot merge those with the same delay and different stories because `RelayEquivocationStory`s could change but `RelayVRFStory` never changes.
As future work (or TODO?), we should merge assignment notices with the same delay and story using `vrf_merge`. We
cannot merge those with the same delay and different stories because `RelayEquivocationStory`s could change but
`RelayVRFStory` never changes.
## Announcer and Watcher/Tracker
We track all validators' announced approval assignments for each candidate associated to each relay chain block, which tells us which validators were assigned to which candidates.
We track all validators' announced approval assignments for each candidate associated to each relay chain block, which
tells us which validators were assigned to which candidates.
We permit at most one assignment per candidate per story per validator, so one validator could be assigned under both the `RelayVRFDelay` and `RelayEquivocation` criteria, but not under both `RelayVRFModulo` and `RelayVRFDelay` criteria, since those both use the same story. We permit only one approval vote per candidate per validator, which counts for any applicable criteria.
We permit at most one assignment per candidate per story per validator, so one validator could be assigned under both
the `RelayVRFDelay` and `RelayEquivocation` criteria, but not under both `RelayVRFModulo` and `RelayVRFDelay` criteria,
since those both use the same story. We permit only one approval vote per candidate per validator, which counts for any
applicable criteria.
We announce, and start checking for, our own assignments when the delay of their tranche is reached, but only if the tracker says the assignee candidate requires more approval checkers. We never announce an assignment we believe unnecessary because early announcements gives an adversary information. All delay tranche zero assignments always get announced, which includes all `RelayVRFModulo` assignments.
We announce, and start checking for, our own assignments when the delay of their tranche is reached, but only if the
tracker says the assignee candidate requires more approval checkers. We never announce an assignment we believe
unnecessary because early announcements gives an adversary information. All delay tranche zero assignments always get
announced, which includes all `RelayVRFModulo` assignments.
In other words, if some candidate `C` needs more approval checkers by the time we reach round `t` then any validators with an assignment to `C` in delay tranche `t` gossip their send assignment notice for `C`, and begin reconstruction and validation for 'C. If however `C` reached enough assignments, then validators with later assignments skip announcing their assignments.
In other words, if some candidate `C` needs more approval checkers by the time we reach round `t` then any validators
with an assignment to `C` in delay tranche `t` gossip their send assignment notice for `C`, and begin reconstruction and
validation for 'C. If however `C` reached enough assignments, then validators with later assignments skip announcing
their assignments.
We continue until all candidates have enough approval checkers assigned. We never prioritize assignments within tranches and count all or no assignments for a given tranche together, so we often overshoot the target number of assigned approval checkers.
We continue until all candidates have enough approval checkers assigned. We never prioritize assignments within
tranches and count all or no assignments for a given tranche together, so we often overshoot the target number of
assigned approval checkers.
### No shows
We have a "no show" timeout longer than one relay chain slot, so at least 6 seconds, during which we expect approval checks should succeed in reconstructing the candidate block, in redoing its erasure coding to check the candidate receipt, and finally in rechecking the candidate block itself.
We have a "no show" timeout longer than one relay chain slot, so at least 6 seconds, during which we expect approval
checks should succeed in reconstructing the candidate block, in redoing its erasure coding to check the candidate
receipt, and finally in rechecking the candidate block itself.
We consider a validator a "no show" if they do not approve or dispute within this "no show" timeout from our receiving their assignment notice. We time this from our receipt of their assignment notice instead of our imagined real time for their tranche because otherwise receiving late assignment notices creates immediate "no shows" and unnecessary work.
We consider a validator a "no show" if they do not approve or dispute within this "no show" timeout from our receiving
their assignment notice. We time this from our receipt of their assignment notice instead of our imagined real time for
their tranche because otherwise receiving late assignment notices creates immediate "no shows" and unnecessary work.
We worry "no shows" represent a validator under denial of service attack, presumably to prevent it from reconstructing the candidate, but perhaps delaying it form gossiping a dispute too. We therefore always replace "no shows" by adding one entire extra delay tranche worth of validators, so such attacks always result in additional checkers.
We worry "no shows" represent a validator under denial of service attack, presumably to prevent it from reconstructing
the candidate, but perhaps delaying it form gossiping a dispute too. We therefore always replace "no shows" by adding
one entire extra delay tranche worth of validators, so such attacks always result in additional checkers.
As an example, imagine we need 20 checkers, but tranche zero produces only 14, and tranche one only 4, then we take all 5 from tranche two, and thus require 23 checkers for that candidate. If one checker Charlie from tranche one or two does not respond within say 8 seconds, then we add all 7 checkers from tranche three. If again one checker Cindy from tranche three does not respond within 8 seconds then we take all 3 checkers from tranche four. We now have 33 checkers working on the candidate, so this escalated quickly.
As an example, imagine we need 20 checkers, but tranche zero produces only 14, and tranche one only 4, then we take all
5 from tranche two, and thus require 23 checkers for that candidate. If one checker Charlie from tranche one or two
does not respond within say 8 seconds, then we add all 7 checkers from tranche three. If again one checker Cindy from
tranche three does not respond within 8 seconds then we take all 3 checkers from tranche four. We now have 33 checkers
working on the candidate, so this escalated quickly.
We escalated so quickly because we worried that Charlie and Cindy might be the only honest checkers assigned to that candidate. If therefore either Charlie or Cindy finally return an approval, then we can conclude approval, and abandon the checkers from tranche four.
We escalated so quickly because we worried that Charlie and Cindy might be the only honest checkers assigned to that
candidate. If therefore either Charlie or Cindy finally return an approval, then we can conclude approval, and abandon
the checkers from tranche four.
We therefore require the "no show" timeout to be longer than a relay chain slot so that we can witness "no shows" on-chain. We discuss below how this helps reward validators who replace "no shows".
We therefore require the "no show" timeout to be longer than a relay chain slot so that we can witness "no shows"
on-chain. We discuss below how this helps reward validators who replace "no shows".
We avoid slashing for "no shows" by itself, although being "no show" could enter into some computation that punishes repeated poor performance, presumably replaces `ImOnline`, and we could reduce their rewards and further rewards those who filled in.
We avoid slashing for "no shows" by itself, although being "no show" could enter into some computation that punishes
repeated poor performance, presumably replaces `ImOnline`, and we could reduce their rewards and further rewards those
who filled in.
As future work, we foresee expanding the "no show" scheme to anonymize the additional checkers, like by using assignment noticed with a new criteria that employs a ring VRF and then all validators providing cover by requesting a couple erasure coded pieces, but such anonymity scheme sound extremely complex and lie far beyond our initial functionality.
As future work, we foresee expanding the "no show" scheme to anonymize the additional checkers, like by using assignment
noticed with a new criteria that employs a ring VRF and then all validators providing cover by requesting a couple
erasure coded pieces, but such anonymity scheme sound extremely complex and lie far beyond our initial functionality.
## Assignment postponement
We expect validators could occasionally overloaded when they randomly acquire too many assignments. All these fluctuations amortize over multiple blocks fairly well, but this slows down finality.
We expect validators could occasionally overloaded when they randomly acquire too many assignments. All these
fluctuations amortize over multiple blocks fairly well, but this slows down finality.
We therefore permit validators to delay sending their assignment noticed intentionally. If nobody knows about their assignment then they avoid creating "no shows" and the workload progresses normally.
We therefore permit validators to delay sending their assignment noticed intentionally. If nobody knows about their
assignment then they avoid creating "no shows" and the workload progresses normally.
We strongly prefer if postponements come from tranches higher aka less important than zero because tranche zero checks provide somewhat more security.
We strongly prefer if postponements come from tranches higher aka less important than zero because tranche zero checks
provide somewhat more security.
TODO: When? Is this optimal for the network? etc.
## On-chain verification
We should verify approval on-chain to reward approval checkers. We therefore require the "no show" timeout to be longer than a relay chain slot so that we can witness "no shows" on-chain, which helps with this goal. The major challenge with an on-chain record of the off-chain process is adversarial block producers who may either censor votes or publish votes to the chain which cause other votes to be ignored and unrewarded (reward stealing).
We should verify approval on-chain to reward approval checkers. We therefore require the "no show" timeout to be longer
than a relay chain slot so that we can witness "no shows" on-chain, which helps with this goal. The major challenge with
an on-chain record of the off-chain process is adversarial block producers who may either censor votes or publish votes
to the chain which cause other votes to be ignored and unrewarded (reward stealing).
In principle, all validators have some "tranche" at which they're assigned to the parachain candidate, which ensures we reach enough validators eventually. As noted above, we often retract "no shows" when the slow validator eventually shows up, so witnessing their initially being a "no show" helps manage rewards.
In principle, all validators have some "tranche" at which they're assigned to the parachain candidate, which ensures we
reach enough validators eventually. As noted above, we often retract "no shows" when the slow validator eventually
shows up, so witnessing their initially being a "no show" helps manage rewards.
We expect on-chain verification should work in two phases: We first record assignments notices and approval votes on-chain in relay chain block, doing the VRF or regular signature verification again in block verification, and inserting chain authenticated unsigned notes into the relay chain state that contain the checker, tranche, paraid, and relay block height for each assignment notice. We then later have another relay chain block that runs some "approved" intrinsic, which extract all these notes from the state and feeds them into our approval code.
We expect on-chain verification should work in two phases: We first record assignments notices and approval votes
on-chain in relay chain block, doing the VRF or regular signature verification again in block verification, and
inserting chain authenticated unsigned notes into the relay chain state that contain the checker, tranche, paraid, and
relay block height for each assignment notice. We then later have another relay chain block that runs some "approved"
intrinsic, which extract all these notes from the state and feeds them into our approval code.
We now encounter one niche concern in the interaction between postponement and on-chain verification: Any validator with a tranche zero (or other low) assignment could delay sending an assignment notice, like because they postponed their assigned tranche (which is allowed). If they later send this assignment notices right around finality time, then they race with this approved. intrinsic: If their announcement gets on-chain (also allowed), then yes it delays finality. If it does not get on-chain, then yes we've one announcement that the off-chain consensus system says is valid, but the chain ignores for being too slow.
We now encounter one niche concern in the interaction between postponement and on-chain verification: Any validator
with a tranche zero (or other low) assignment could delay sending an assignment notice, like because they postponed
their assigned tranche (which is allowed). If they later send this assignment notices right around finality time, then
they race with this approved. intrinsic: If their announcement gets on-chain (also allowed), then yes it delays
finality. If it does not get on-chain, then yes we've one announcement that the off-chain consensus system says is
valid, but the chain ignores for being too slow.
We need the chain to win in this case, but doing this requires imposing an annoyingly long overarching delay upon finality. We might explore limits on postponement too, but this sounds much harder.
We need the chain to win in this case, but doing this requires imposing an annoyingly long overarching delay upon
finality. We might explore limits on postponement too, but this sounds much harder.
## Parameters
We prefer doing approval checkers assignments under `RelayVRFModulo` as opposed to `RelayVRFDelay` because `RelayVRFModulo` avoids giving individual checkers too many assignments and tranche zero assignments benefit security the most. We suggest assigning at least 16 checkers under `RelayVRFModulo` although assignment levels have never been properly analyzed.
We prefer doing approval checkers assignments under `RelayVRFModulo` as opposed to `RelayVRFDelay` because
`RelayVRFModulo` avoids giving individual checkers too many assignments and tranche zero assignments benefit security
the most. We suggest assigning at least 16 checkers under `RelayVRFModulo` although assignment levels have never been
properly analyzed.
Our delay criteria `RelayVRFDelay` and `RelayEquivocation` both have two primary paramaters, expected checkers per tranche and the zeroth delay tranche width.
Our delay criteria `RelayVRFDelay` and `RelayEquivocation` both have two primary paramaters, expected checkers per
tranche and the zeroth delay tranche width.
We require expected checkers per tranche to be less than three because otherwise an adversary with 1/3 stake could force all nodes into checking all blocks. We strongly recommend expected checkers per tranche to be less than two, which helps avoid both accidental and intentional explosions. We also suggest expected checkers per tranche be larger than one, which helps prevent adversaries from predicting than advancing one tranche adds only their own validators.
We require expected checkers per tranche to be less than three because otherwise an adversary with 1/3 stake could force
all nodes into checking all blocks. We strongly recommend expected checkers per tranche to be less than two, which
helps avoid both accidental and intentional explosions. We also suggest expected checkers per tranche be larger than
one, which helps prevent adversaries from predicting than advancing one tranche adds only their own validators.
We improve security more with tranche zero assignments, so `RelayEquivocation` should consolidates its first several tranches into tranche zero. We describe this as the zeroth delay tranche width, which initially we set to 12 for `RelayEquivocation` and `1` for `RelayVRFDelay`.
We improve security more with tranche zero assignments, so `RelayEquivocation` should consolidates its first several
tranches into tranche zero. We describe this as the zeroth delay tranche width, which initially we set to 12 for
`RelayEquivocation` and `1` for `RelayVRFDelay`.
## Why VRFs?
We do assignments with VRFs to give "enough" checkers some meaning beyond merely "expected" checkers:
We could specify a protocol that used only system randomness, which works because our strongest defense is the expected number of honest checkers who assign themselves. In this, adversaries could trivially flood their own blocks with their own checkers, so this strong defense becomes our only defense, and delay tranches become useless, so some blocks actually have zero approval checkers and possibly only one checker overall.
We could specify a protocol that used only system randomness, which works because our strongest defense is the expected
number of honest checkers who assign themselves. In this, adversaries could trivially flood their own blocks with their
own checkers, so this strong defense becomes our only defense, and delay tranches become useless, so some blocks
actually have zero approval checkers and possibly only one checker overall.
VRFs though require adversaries wait far longer between such attacks, which also helps against adversaries with little at stake because they compromised validators. VRFs raise user confidence that no such "drive by" attacks occurred because the delay tranche system ensure at least some minimum number of approval checkers. In this vein, VRFs permit reducing backing checks and increasing approval checks, which makes polkadot more efficient.
VRFs though require adversaries wait far longer between such attacks, which also helps against adversaries with little
at stake because they compromised validators. VRFs raise user confidence that no such "drive by" attacks occurred
because the delay tranche system ensure at least some minimum number of approval checkers. In this vein, VRFs permit
reducing backing checks and increasing approval checks, which makes Polkadot more efficient.
## Gossip
Any validator could send their assignment notices and/or approval votes too early. We gossip the approval votes because they represent a major commitment by the validator. We retain but delay gossiping the assignment notices until they agree with our local clock.
Any validator could send their assignment notices and/or approval votes too early. We gossip the approval votes because
they represent a major commitment by the validator. We retain but delay gossiping the assignment notices until they
agree with our local clock.
Assignment notices being gossiped too early might create a denial of service vector. If so, we might exploit the relative time scheme that synchronizes our clocks, which conceivably permits just dropping excessively early assignments.
Assignment notices being gossiped too early might create a denial of service vector. If so, we might exploit the
relative time scheme that synchronizes our clocks, which conceivably permits just dropping excessively early
assignments.
## Finality GRANDPA Voting Rule
The relay-chain requires validators to participate in GRANDPA. In GRANDPA, validators submit off-chain votes on what they believe to be the best block of the chain, and GRANDPA determines the common block contained by a supermajority of sub-chains. There are also additional constraints on what can be submitted based on results of previous rounds of voting.
The relay-chain requires validators to participate in GRANDPA. In GRANDPA, validators submit off-chain votes on what
they believe to be the best block of the chain, and GRANDPA determines the common block contained by a supermajority of
sub-chains. There are also additional constraints on what can be submitted based on results of previous rounds of
voting.
In order to avoid finalizing anything which has not received enough approval votes or is disputed, we will pair the approval protocol with an alteration to the GRANDPA voting strategy for honest nodes which causes them to vote only on chains where every parachain candidate within has been approved. Furthermore, the voting rule prevents voting for chains where there is any live dispute or any dispute has resolved to a candidate being invalid.
In order to avoid finalizing anything which has not received enough approval votes or is disputed, we will pair the
approval protocol with an alteration to the GRANDPA voting strategy for honest nodes which causes them to vote only on
chains where every parachain candidate within has been approved. Furthermore, the voting rule prevents voting for
chains where there is any live dispute or any dispute has resolved to a candidate being invalid.
Thus, the finalized relay-chain should contain only relay-chain blocks where a majority believe that every block within has been sufficiently approved.
Thus, the finalized relay-chain should contain only relay-chain blocks where a majority believe that every block within
has been sufficiently approved.
### Future work
We could consider additional gossip messages with which nodes claims "slow availability" and/or "slow candidate" to fine tune the assignments "no show" system, but long enough "no show" delays suffice probably.
We could consider additional gossip messages with which nodes claims "slow availability" and/or "slow candidate" to fine
tune the assignments "no show" system, but long enough "no show" delays suffice probably.
We shall develop more practical experience with UDP once the availability system works using direct UDP connections. In this, we should discover if reconstruction performs adequately with a complete graphs or
benefits from topology restrictions. At this point, an assignment notices could implicitly request pieces from a random 1/3rd, perhaps topology restricted, which saves one gossip round. If this preliminary fast reconstruction fails, then nodes' request alternative pieces directly. There is an interesting design space in how this overlaps with "slow availability" claims.
We shall develop more practical experience with UDP once the availability system works using direct UDP connections. In
this, we should discover if reconstruction performs adequately with a complete graphs or benefits from topology
restrictions. At this point, an assignment notices could implicitly request pieces from a random 1/3rd, perhaps
topology restricted, which saves one gossip round. If this preliminary fast reconstruction fails, then nodes' request
alternative pieces directly. There is an interesting design space in how this overlaps with "slow availability" claims.
@@ -1,12 +1,21 @@
# Chain Selection
Chain selection processes in blockchains are used for the purpose of selecting blocks to build on and finalize. It is important for these processes to be consistent among nodes and resilient to a maximum proportion of malicious nodes which do not obey the chain selection process.
Chain selection processes in blockchains are used for the purpose of selecting blocks to build on and finalize. It is
important for these processes to be consistent among nodes and resilient to a maximum proportion of malicious nodes
which do not obey the chain selection process.
The parachain host uses both a block authoring system and a finality gadget. The chain selection strategy of the parachain host involves two key components: a _leaf-selection_ rule and a set of _finality constraints_. When it's a validator's turn to author on a block, they are expected to select the best block via the leaf-selection rule to build on top of. When a validator is participating in finality, there is a minimum block which can be voted on, which is usually the finalized block. The validator should select the best chain according to the leaf-selection rule and subsequently apply the finality constraints to arrive at the actual vote cast by that validator.
The parachain host uses both a block authoring system and a finality gadget. The chain selection strategy of the
parachain host involves two key components: a _leaf-selection_ rule and a set of _finality constraints_. When it's a
validator's turn to author on a block, they are expected to select the best block via the leaf-selection rule to build
on top of. When a validator is participating in finality, there is a minimum block which can be voted on, which is
usually the finalized block. The validator should select the best chain according to the leaf-selection rule and
subsequently apply the finality constraints to arrive at the actual vote cast by that validator.
Before diving into the particularities of the leaf-selection rule and the finality constraints, it's important to discuss the goals that these components are meant to achieve. For this it is useful to create the definitions of _viable_ and _finalizable_ blocks.
Before diving into the particularities of the leaf-selection rule and the finality constraints, it's important to
discuss the goals that these components are meant to achieve. For this it is useful to create the definitions of
_viable_ and _finalizable_ blocks.
### Property Definitions
## Property Definitions
A block is considered **viable** when all of the following hold:
1. It is or descends from the finalized block
@@ -32,17 +41,27 @@ A block is considered **finalizable** when all of the following hold:
4. It is either finalized or includes no candidates which have unresolved disputes or have lost a dispute.
### The leaf-selection rule
## The leaf-selection rule
We assume that every block has an implicit weight or score which can be used to compare blocks. In BABE, this is determined by the number of primary slots included in the chain. In PoW, this is the chain with either the most work or GHOST weight.
We assume that every block has an implicit weight or score which can be used to compare blocks. In BABE, this is
determined by the number of primary slots included in the chain. In PoW, this is the chain with either the most work or
GHOST weight.
The leaf-selection rule based on our definitions above is simple: we take the maximum-scoring viable leaf we are aware of. In the case of a tie we select the one with a lower lexicographical block hash.
The leaf-selection rule based on our definitions above is simple: we take the maximum-scoring viable leaf we are aware
of. In the case of a tie we select the one with a lower lexicographical block hash.
### The best-chain-containing rule
## The best-chain-containing rule
Finality gadgets, as mentioned above, will often impose an additional requirement to vote on a chain containing a specific block, known as the **required** block. Although this is typically the most recently finalized block, it is possible that it may be a block that is unfinalized. When receiving such a request:
Finality gadgets, as mentioned above, will often impose an additional requirement to vote on a chain containing a
specific block, known as the **required** block. Although this is typically the most recently finalized block, it is
possible that it may be a block that is unfinalized. When receiving such a request:
1. If the required block is the best finalized block, then select the best viable leaf.
2. If the required block is unfinalized and non-viable, then select the required block and go no further. This is likely an indication that something bad will be finalized in the network, which will never happen when approvals & disputes are functioning correctly. Nevertheless we account for the case here.
3. If the required block is unfinalized and viable, then iterate over the viable leaves in descending order by score and select the first one which contains the required block in its chain. Backwards iteration is a simple way to check this, but if unfinalized chains grow long then Merkle Mountain-Ranges will most likely be more efficient.
2. If the required block is unfinalized and non-viable, then select the required block and go no further. This is likely
an indication that something bad will be finalized in the network, which will never happen when approvals & disputes
are functioning correctly. Nevertheless we account for the case here.
3. If the required block is unfinalized and viable, then iterate over the viable leaves in descending order by score and
select the first one which contains the required block in its chain. Backwards iteration is a simple way to check
this, but if unfinalized chains grow long then Merkle Mountain-Ranges will most likely be more efficient.
Once selecting a leaf, the chain should be constrained to the maximum of the required block or the highest **finalizable** ancestor.
Once selecting a leaf, the chain should be constrained to the maximum of the required block or the highest
**finalizable** ancestor.
@@ -4,14 +4,27 @@ Fast forward to [more detailed disputes requirements](./disputes-flow.md).
## Motivation and Background
All parachain blocks that end up in the finalized relay chain should be valid. This does not apply to blocks that are only backed, but not included.
All parachain blocks that end up in the finalized relay chain should be valid. This does not apply to blocks that are
only backed, but not included.
We have two primary components for ensuring that nothing invalid ends up in the finalized relay chain:
* Approval Checking, as described [here](./protocol-approval.md) and implemented according to the [Approval Voting](node/approval/approval-voting.md) subsystem. This protocol can be shown to prevent invalid parachain blocks from making their way into the finalized relay chain as long as the amount of attempts are limited.
* Disputes, this protocol, which ensures that each attempt to include something bad is caught, and the offending validators are punished.
Disputes differ from backing and approval process (and can not be part of those) in that a dispute is independent of a particular fork, while both backing and approval operate on particular forks. This distinction is important! Approval voting stops, if an alternative fork which might not contain the currently approved candidate gets finalized. This is totally fine from the perspective of approval voting as its sole purpose is to make sure invalid blocks won't get finalized. For disputes on the other hand we have different requirements: Even though the "danger" is past and the adversaries were not able to get their invalid block approved, we still want them to get slashed for the attempt. Otherwise they just have been able to get a free try, but this is something we need to avoid in our security model, as it is based on the assumption that the probability of getting an invalid block finalized is very low and an attacker would get bankrupt before it could have tried often enough.
* Approval Checking, as described [here](./protocol-approval.md) and implemented according to the [Approval
Voting](node/approval/approval-voting.md) subsystem. This protocol can be shown to prevent invalid parachain blocks
from making their way into the finalized relay chain as long as the amount of attempts are limited.
* Disputes, this protocol, which ensures that each attempt to include something bad is caught, and the offending
validators are punished. Disputes differ from backing and approval process (and can not be part of those) in that a
dispute is independent of a particular fork, while both backing and approval operate on particular forks. This
distinction is important! Approval voting stops, if an alternative fork which might not contain the currently approved
candidate gets finalized. This is totally fine from the perspective of approval voting as its sole purpose is to make
sure invalid blocks won't get finalized. For disputes on the other hand we have different requirements: Even though the
"danger" is past and the adversaries were not able to get their invalid block approved, we still want them to get
slashed for the attempt. Otherwise they just have been able to get a free try, but this is something we need to avoid in
our security model, as it is based on the assumption that the probability of getting an invalid block finalized is very
low and an attacker would get bankrupt before it could have tried often enough.
Every dispute stems from a disagreement among two or more validators. If a bad actor creates a bad block, but the bad actor never distributes it to honest validators, then nobody will dispute it. Of course, such a situation is not even an attack on the network, so we don't need to worry about defending against it.
Every dispute stems from a disagreement among two or more validators. If a bad actor creates a bad block, but the bad
actor never distributes it to honest validators, then nobody will dispute it. Of course, such a situation is not even an
attack on the network, so we don't need to worry about defending against it.
We are interested in identifying and deterring the following attack scenario:
* A parablock included on a branch of the relay chain is bad
@@ -20,48 +33,101 @@ We are also interested in identifying these additional scenarios:
* A parablock backed on a branch of the relay chain is bad
* A parablock seconded, but not backed on any branch of the relay chain, is bad.
Punishing misbehavior in the latter two scenarios doesn't effect our security guarantees and introduces substantial technical challenges as described in the `No Disputes for Non Included Candidates` section of [Dispute Coordinator](./node/disputes/dispute-coordinator.md). We therefore choose to punt on disputes in these cases, instead favoring the protocol simplicity resulting from only punishing in the first scenario.
Punishing misbehavior in the latter two scenarios doesn't effect our security guarantees and introduces substantial
technical challenges as described in the `No Disputes for Non Included Candidates` section of [Dispute
Coordinator](./node/disputes/dispute-coordinator.md). We therefore choose to punt on disputes in these cases, instead
favoring the protocol simplicity resulting from only punishing in the first scenario.
As covered in the [protocol overview](./protocol-overview.md), checking a parachain block requires 3 pieces of data: the parachain validation code, the [`AvailableData`](types/availability.md), and the [`CandidateReceipt`](types/candidate.md). The validation code is available on-chain, and published ahead of time, so that no two branches of the relay chain have diverging views of the validation code for a given parachain. Note that only for the first scenario, where the parablock has been included on a branch of the relay chain, is the data necessarily available. Thus, dispute processes should begin with an availability process to ensure availability of the `AvailableData`. This availability process will conclude quickly if the data is already available. If the data is not already available, then the initiator of the dispute must make it available.
As covered in the [protocol overview](./protocol-overview.md), checking a parachain block requires 3 pieces of data: the
parachain validation code, the [`AvailableData`](types/availability.md), and the
[`CandidateReceipt`](types/candidate.md). The validation code is available on-chain, and published ahead of time, so
that no two branches of the relay chain have diverging views of the validation code for a given parachain. Note that
only for the first scenario, where the parablock has been included on a branch of the relay chain, is the data
necessarily available. Thus, dispute processes should begin with an availability process to ensure availability of the
`AvailableData`. This availability process will conclude quickly if the data is already available. If the data is not
already available, then the initiator of the dispute must make it available.
Disputes have both an on-chain and an off-chain component. Slashing and punishment is handled on-chain, so votes by validators on either side of the dispute must be placed on-chain. Furthermore, a dispute on one branch of the relay chain should be transposed to all other active branches of the relay chain. The fact that slashing occurs _in all histories_ is crucial for deterring attempts to attack the network. The attacker should not be able to escape with their funds because the network has moved on to another branch of the relay chain where no attack was attempted.
Disputes have both an on-chain and an off-chain component. Slashing and punishment is handled on-chain, so votes by
validators on either side of the dispute must be placed on-chain. Furthermore, a dispute on one branch of the relay
chain should be transposed to all other active branches of the relay chain. The fact that slashing occurs _in all
histories_ is crucial for deterring attempts to attack the network. The attacker should not be able to escape with their
funds because the network has moved on to another branch of the relay chain where no attack was attempted.
In fact, this is why we introduce a distinction between _local_ and _remote_ disputes. We categorize disputes as either local or remote relative to any particular branch of the relay chain. Local disputes are about dealing with our first scenario, where a parablock has been included on the specific branch we are looking at. In these cases, the chain is corrupted all the way back to the point where the parablock was backed and must be discarded. However, as mentioned before, the dispute must propagate to all other branches of the relay chain. All other disputes are considered _remote_. For the on-chain component, when handling a dispute for a block which was not included in the current fork of the relay chain, it is impossible to discern between our attack scenarios. It is possible that the parablock was included somewhere, or backed somewhere, or wasn't backed anywhere. The on-chain component for handling these cases will be the same.
In fact, this is why we introduce a distinction between _local_ and _remote_ disputes. We categorize disputes as either
local or remote relative to any particular branch of the relay chain. Local disputes are about dealing with our first
scenario, where a parablock has been included on the specific branch we are looking at. In these cases, the chain is
corrupted all the way back to the point where the parablock was backed and must be discarded. However, as mentioned
before, the dispute must propagate to all other branches of the relay chain. All other disputes are considered _remote_.
For the on-chain component, when handling a dispute for a block which was not included in the current fork of the relay
chain, it is impossible to discern between our attack scenarios. It is possible that the parablock was included
somewhere, or backed somewhere, or wasn't backed anywhere. The on-chain component for handling these cases will be the
same.
## Initiation
Disputes are initiated by any validator who finds their opinion on the validity of a parablock in opposition to another issued statement. As all statements currently gathered by the relay chain imply validity, disputes will be initiated only by nodes which perceive that the parablock is bad.
Disputes are initiated by any validator who finds their opinion on the validity of a parablock in opposition to another
issued statement. As all statements currently gathered by the relay chain imply validity, disputes will be initiated
only by nodes which perceive that the parablock is bad.
The initiation of a dispute begins off-chain. A validator signs a message indicating that it disputes the validity of the parablock and notifies all other validators, off-chain, of all of the statements it is aware of for the disputed parablock. These may be backing statements or approval-checking statements. It is worth noting that there is no special message type for initiating a dispute. It is the same message as is used to participate in a dispute and vote negatively. As such, there is no consensus required on who initiated a dispute, only on the fact that there is a dispute in-progress.
The initiation of a dispute begins off-chain. A validator signs a message indicating that it disputes the validity of
the parablock and notifies all other validators, off-chain, of all of the statements it is aware of for the disputed
parablock. These may be backing statements or approval-checking statements. It is worth noting that there is no special
message type for initiating a dispute. It is the same message as is used to participate in a dispute and vote
negatively. As such, there is no consensus required on who initiated a dispute, only on the fact that there is a dispute
in-progress.
In practice, the initiator of a dispute will be either one of the backers or one of the approval checkers for the parablock. If the result of execution is found to be invalid, the validator will initiate the dispute as described above. Furthermore, if the dispute occurs during the backing phase, the initiator must make the data available to other validators. If the dispute occurs during approval checking, the data is already available.
In practice, the initiator of a dispute will be either one of the backers or one of the approval checkers for the
parablock. If the result of execution is found to be invalid, the validator will initiate the dispute as described
above. Furthermore, if the dispute occurs during the backing phase, the initiator must make the data available to other
validators. If the dispute occurs during approval checking, the data is already available.
Lastly, it is possible that for backing disputes, i.e. where the data is not already available among all validators, that an adversary may DoS the few parties who are checking the block to prevent them from distributing the data to other validators participating in the dispute process. Note that this can only occur pre-inclusion for any given parablock, so the downside of this attack is small and it is not security-critical to address these cases. However, we assume that the adversary can only prevent the validator from issuing messages for a limited amount of time. We also assume that there is a side-channel where the relay chain's governance mechanisms can trigger disputes by providing the full PoV and candidate receipt on-chain manually.
Lastly, it is possible that for backing disputes, i.e. where the data is not already available among all validators,
that an adversary may DoS the few parties who are checking the block to prevent them from distributing the data to other
validators participating in the dispute process. Note that this can only occur pre-inclusion for any given parablock, so
the downside of this attack is small and it is not security-critical to address these cases. However, we assume that the
adversary can only prevent the validator from issuing messages for a limited amount of time. We also assume that there
is a side-channel where the relay chain's governance mechanisms can trigger disputes by providing the full PoV and
candidate receipt on-chain manually.
## Dispute Participation
Once becoming aware of a dispute, it is the responsibility of all validators to participate in the dispute. Concretely, this means:
* Circulate all statements about the candidate that we are aware of - backing statements, approval checking statements, and dispute statements.
Once becoming aware of a dispute, it is the responsibility of all validators to participate in the dispute. Concretely,
this means:
* Circulate all statements about the candidate that we are aware of - backing statements, approval checking
statements, and dispute statements.
* If we have already issued any type of statement about the candidate, go no further.
* Download the [`AvailableData`](types/availability.md). If possible, this should first be attempted from other dispute participants or backing validators, and then [(via erasure-coding)](node/availability/availability-recovery.md) from all validators.
* Extract the Validation Code from any recent relay chain block. Code is guaranteed to be kept available on-chain, so we don't need to download any particular fork of the chain.
* Execute the block under the validation code, using the `AvailableData`, and check that all outputs are correct, including the `erasure-root` of the [`CandidateReceipt`](types/candidate.md).
* Download the [`AvailableData`](types/availability.md). If possible, this should first be attempted from other
dispute participants or backing validators, and then [(via
erasure-coding)](node/availability/availability-recovery.md) from all validators.
* Extract the Validation Code from any recent relay chain block. Code is guaranteed to be kept available on-chain, so
we don't need to download any particular fork of the chain.
* Execute the block under the validation code, using the `AvailableData`, and check that all outputs are correct,
including the `erasure-root` of the [`CandidateReceipt`](types/candidate.md).
* Issue a dispute participation statement to the effect of the validity of the candidate block.
Disputes _conclude_ after ⅔ supermajority is reached in either direction.
The on-chain component of disputes can be initiated by providing any two conflicting votes and it also waits for a ⅔ supermajority on either side. The on-chain component also tracks which parablocks have already been disputed so the same parablock may only be disputed once on any particular branch of the relay chain. Lastly, it also tracks which blocks have been included on the current branch of the relay chain. When a dispute is initiated for a para, inclusion is halted for the para until the dispute concludes.
The on-chain component of disputes can be initiated by providing any two conflicting votes and it also waits for a ⅔
supermajority on either side. The on-chain component also tracks which parablocks have already been disputed so the same
parablock may only be disputed once on any particular branch of the relay chain. Lastly, it also tracks which blocks
have been included on the current branch of the relay chain. When a dispute is initiated for a para, inclusion is halted
for the para until the dispute concludes.
The author of a relay chain block should initiate the on-chain component of disputes for all disputes which the chain is not aware of, and provide all statements to the on-chain component as well. This should all be done via _inherents_.
The author of a relay chain block should initiate the on-chain component of disputes for all disputes which the chain is
not aware of, and provide all statements to the on-chain component as well. This should all be done via _inherents_.
Validators can learn about dispute statements in two ways:
* Receiving them from other validators over gossip
* Scraping them from imported blocks of the relay chain. This is also used for validators to track other types of statements, such as backing statements.
* Scraping them from imported blocks of the relay chain. This is also used for validators to track other types of
statements, such as backing statements.
Validators are rewarded for providing statements to the chain as well as for participating in the dispute, on either side. However, the losing side of the dispute is slashed.
Validators are rewarded for providing statements to the chain as well as for participating in the dispute, on either
side. However, the losing side of the dispute is slashed.
## Dispute Conclusion
Disputes, roughly, are over when one side reaches a ⅔ supermajority. They may also never conclude without either side witnessing supermajority, which will only happen if the majority of validators are unable to vote for some reason. Furthermore, disputes on-chain will stay open for some fixed amount of time even after concluding, to accept new votes.
Disputes, roughly, are over when one side reaches a ⅔ supermajority. They may also never conclude without either side
witnessing supermajority, which will only happen if the majority of validators are unable to vote for some reason.
Furthermore, disputes on-chain will stay open for some fixed amount of time even after concluding, to accept new votes.
Late votes, after the dispute already reached a ⅔ supermajority, must be rewarded (albeit a smaller amount) as well.
@@ -1,28 +1,61 @@
# Protocol Overview
This section aims to describe, at a high level, the actors and protocols involved in running parachains in Polkadot. Specifically, we describe how different actors communicate with each other, what data structures they keep both individually and collectively, and the high-level purpose on why they do these things.
This section aims to describe, at a high level, the actors and protocols involved in running parachains in Polkadot.
Specifically, we describe how different actors communicate with each other, what data structures they keep both
individually and collectively, and the high-level purpose on why they do these things.
Our top-level goal is to carry a parachain block from authoring to secure inclusion, and define a process which can be carried out repeatedly and in parallel for many different parachains to extend them over time. Understanding of the high-level approach taken here is important to provide context for the proposed architecture further on. The key parts of Polkadot relevant to this are the main Polkadot blockchain, known as the relay-chain, and the actors which provide security and inputs to this blockchain.
Our top-level goal is to carry a parachain block from authoring to secure inclusion, and define a process which can be
carried out repeatedly and in parallel for many different parachains to extend them over time. Understanding of the
high-level approach taken here is important to provide context for the proposed architecture further on. The key parts
of Polkadot relevant to this are the main Polkadot blockchain, known as the relay-chain, and the actors which provide
security and inputs to this blockchain.
First, it's important to go over the main actors we have involved in this protocol.
1. Validators. These nodes are responsible for validating proposed parachain blocks. They do so by checking a Proof-of-Validity (PoV) of the block and ensuring that the PoV remains available. They put financial capital down as "skin in the game" which can be slashed (destroyed) if they are proven to have misvalidated.
1. Collators. These nodes are responsible for creating the Proofs-of-Validity that validators know how to check. Creating a PoV typically requires familiarity with the transaction format and block authoring rules of the parachain, as well as having access to the full state of the parachain.
1. Validators. These nodes are responsible for validating proposed parachain blocks. They do so by checking a
Proof-of-Validity (PoV) of the block and ensuring that the PoV remains available. They put financial capital down as
"skin in the game" which can be slashed (destroyed) if they are proven to have misvalidated.
1. Collators. These nodes are responsible for creating the Proofs-of-Validity that validators know how to check.
Creating a PoV typically requires familiarity with the transaction format and block authoring rules of the parachain,
as well as having access to the full state of the parachain.
This implies a simple pipeline where collators send validators parachain blocks and their requisite PoV to check. Then, validators validate the block using the PoV, signing statements which describe either the positive or negative outcome, and with enough positive statements, the block can be noted on the relay-chain. Negative statements are not a veto but will lead to a dispute, with those on the wrong side being slashed. If another validator later detects that a validator or group of validators incorrectly signed a statement claiming a block was valid, then those validators will be _slashed_, with the checker receiving a bounty.
This implies a simple pipeline where collators send validators parachain blocks and their requisite PoV to check. Then,
validators validate the block using the PoV, signing statements which describe either the positive or negative outcome,
and with enough positive statements, the block can be noted on the relay-chain. Negative statements are not a veto but
will lead to a dispute, with those on the wrong side being slashed. If another validator later detects that a validator
or group of validators incorrectly signed a statement claiming a block was valid, then those validators will be
_slashed_, with the checker receiving a bounty.
However, there is a problem with this formulation. In order for another validator to check the previous group of validators' work after the fact, the PoV must remain _available_ so the other validator can fetch it in order to check the work. The PoVs are expected to be too large to include in the blockchain directly, so we require an alternate _data availability_ scheme which requires validators to prove that the inputs to their work will remain available, and so their work can be checked. Empirical tests tell us that many PoVs may be between 1 and 10MB during periods of heavy load.
However, there is a problem with this formulation. In order for another validator to check the previous group of
validators' work after the fact, the PoV must remain _available_ so the other validator can fetch it in order to check
the work. The PoVs are expected to be too large to include in the blockchain directly, so we require an alternate _data
availability_ scheme which requires validators to prove that the inputs to their work will remain available, and so
their work can be checked. Empirical tests tell us that many PoVs may be between 1 and 10MB during periods of heavy
load.
Here is a description of the Inclusion Pipeline: the path a parachain block (or parablock, for short) takes from creation to inclusion:
Here is a description of the Inclusion Pipeline: the path a parachain block (or parablock, for short) takes from
creation to inclusion:
1. Validators are selected and assigned to parachains by the Validator Assignment routine.
1. A collator produces the parachain block, which is known as a parachain candidate or candidate, along with a PoV for the candidate.
1. The collator forwards the candidate and PoV to validators assigned to the same parachain via the [Collator Protocol](node/collators/collator-protocol.md).
1. The validators assigned to a parachain at a given point in time participate in the [Candidate Backing subsystem](node/backing/candidate-backing.md) to validate candidates that were put forward for validation. Candidates which gather enough signed validity statements from validators are considered "backable". Their backing is the set of signed validity statements.
1. A relay-chain block author, selected by BABE, can note up to one (1) backable candidate for each parachain to include in the relay-chain block alongside its backing. A backable candidate once included in the relay-chain is considered backed in that fork of the relay-chain.
1. Once backed in the relay-chain, the parachain candidate is considered to be "pending availability". It is not considered to be included as part of the parachain until it is proven available.
1. In the following relay-chain blocks, validators will participate in the [Availability Distribution subsystem](node/availability/availability-distribution.md) to ensure availability of the candidate. Information regarding the availability of the candidate will be noted in the subsequent relay-chain blocks.
1. Once the relay-chain state machine has enough information to consider the candidate's PoV as being available, the candidate is considered to be part of the parachain and is graduated to being a full parachain block, or parablock for short.
1. A collator produces the parachain block, which is known as a parachain candidate or candidate, along with a PoV for
the candidate.
1. The collator forwards the candidate and PoV to validators assigned to the same parachain via the [Collator
Protocol](node/collators/collator-protocol.md).
1. The validators assigned to a parachain at a given point in time participate in the [Candidate Backing
subsystem](node/backing/candidate-backing.md) to validate candidates that were put forward for validation. Candidates
which gather enough signed validity statements from validators are considered "backable". Their backing is the set of
signed validity statements.
1. A relay-chain block author, selected by BABE, can note up to one (1) backable candidate for each parachain to include
in the relay-chain block alongside its backing. A backable candidate once included in the relay-chain is considered
backed in that fork of the relay-chain.
1. Once backed in the relay-chain, the parachain candidate is considered to be "pending availability". It is not
considered to be included as part of the parachain until it is proven available.
1. In the following relay-chain blocks, validators will participate in the [Availability Distribution
subsystem](node/availability/availability-distribution.md) to ensure availability of the candidate. Information
regarding the availability of the candidate will be noted in the subsequent relay-chain blocks.
1. Once the relay-chain state machine has enough information to consider the candidate's PoV as being available, the
candidate is considered to be part of the parachain and is graduated to being a full parachain block, or parablock
for short.
Note that the candidate can fail to be included in any of the following ways:
@@ -31,21 +64,47 @@ Note that the candidate can fail to be included in any of the following ways:
- The candidate is not selected by a relay-chain block author to be included in the relay chain
- The candidate's PoV is not considered as available within a timeout and is discarded from the relay chain.
This process can be divided further down. Steps 2 & 3 relate to the work of the collator in collating and distributing the candidate to validators via the Collation Distribution Subsystem. Steps 3 & 4 relate to the work of the validators in the Candidate Backing Subsystem and the block author (itself a validator) to include the block into the relay chain. Steps 6, 7, and 8 correspond to the logic of the relay-chain state-machine (otherwise known as the Runtime) used to fully incorporate the block into the chain. Step 7 requires further work on the validators' parts to participate in the Availability Distribution Subsystem and include that information into the relay chain for step 8 to be fully realized.
This process can be divided further down. Steps 2 & 3 relate to the work of the collator in collating and distributing
the candidate to validators via the Collation Distribution Subsystem. Steps 3 & 4 relate to the work of the validators
in the Candidate Backing Subsystem and the block author (itself a validator) to include the block into the relay chain.
Steps 6, 7, and 8 correspond to the logic of the relay-chain state-machine (otherwise known as the Runtime) used to
fully incorporate the block into the chain. Step 7 requires further work on the validators' parts to participate in the
Availability Distribution Subsystem and include that information into the relay chain for step 8 to be fully realized.
This brings us to the second part of the process. Once a parablock is considered available and part of the parachain, it is still "pending approval". At this stage in the pipeline, the parablock has been backed by a majority of validators in the group assigned to that parachain, and its data has been guaranteed available by the set of validators as a whole. Once it's considered available, the host will even begin to accept children of that block. At this point, we can consider the parablock as having been tentatively included in the parachain, although more confirmations are desired. However, the validators in the parachain-group (known as the "Parachain Validators" for that parachain) are sampled from a validator set which contains some proportion of byzantine, or arbitrarily malicious members. This implies that the Parachain Validators for some parachain may be majority-dishonest, which means that (secondary) approval checks must be done on the block before it can be considered approved. This is necessary only because the Parachain Validators for a given parachain are sampled from an overall validator set which is assumed to be up to <1/3 dishonest - meaning that there is a chance to randomly sample Parachain Validators for a parachain that are majority or fully dishonest and can back a candidate wrongly. The Approval Process allows us to detect such misbehavior after-the-fact without allocating more Parachain Validators and reducing the throughput of the system. A parablock's failure to pass the approval process will invalidate the block as well as all of its descendants. However, only the validators who backed the block in question will be slashed, not the validators who backed the descendants.
This brings us to the second part of the process. Once a parablock is considered available and part of the parachain, it
is still "pending approval". At this stage in the pipeline, the parablock has been backed by a majority of validators in
the group assigned to that parachain, and its data has been guaranteed available by the set of validators as a whole.
Once it's considered available, the host will even begin to accept children of that block. At this point, we can
consider the parablock as having been tentatively included in the parachain, although more confirmations are desired.
However, the validators in the parachain-group (known as the "Parachain Validators" for that parachain) are sampled from
a validator set which contains some proportion of byzantine, or arbitrarily malicious members. This implies that the
Parachain Validators for some parachain may be majority-dishonest, which means that (secondary) approval checks must be
done on the block before it can be considered approved. This is necessary only because the Parachain Validators for a
given parachain are sampled from an overall validator set which is assumed to be up to <1/3 dishonest - meaning that
there is a chance to randomly sample Parachain Validators for a parachain that are majority or fully dishonest and can
back a candidate wrongly. The Approval Process allows us to detect such misbehavior after-the-fact without allocating
more Parachain Validators and reducing the throughput of the system. A parablock's failure to pass the approval process
will invalidate the block as well as all of its descendants. However, only the validators who backed the block in
question will be slashed, not the validators who backed the descendants.
The Approval Process, at a glance, looks like this:
1. Parablocks that have been included by the Inclusion Pipeline are pending approval for a time-window known as the secondary checking window.
1. Parablocks that have been included by the Inclusion Pipeline are pending approval for a time-window known as the
secondary checking window.
1. During the secondary-checking window, validators randomly self-select to perform secondary checks on the parablock.
1. These validators, known in this context as secondary checkers, acquire the parablock and its PoV, and re-run the validation function.
1. The secondary checkers gossip the result of their checks. Contradictory results lead to escalation, where all validators are required to check the block. The validators on the losing side of the dispute are slashed.
1. At the end of the Approval Process, the parablock is either Approved or it is rejected. More on the rejection process later.
1. These validators, known in this context as secondary checkers, acquire the parablock and its PoV, and re-run the
validation function.
1. The secondary checkers gossip the result of their checks. Contradictory results lead to escalation, where all
validators are required to check the block. The validators on the losing side of the dispute are slashed.
1. At the end of the Approval Process, the parablock is either Approved or it is rejected. More on the rejection process
later.
More information on the Approval Process can be found in the dedicated section on [Approval](protocol-approval.md). More information on Disputes can be found in the dedicated section on [Disputes](protocol-disputes.md).
More information on the Approval Process can be found in the dedicated section on [Approval](protocol-approval.md). More
information on Disputes can be found in the dedicated section on [Disputes](protocol-disputes.md).
These two pipelines sum up the sequence of events necessary to extend and acquire full security on a Parablock. Note that the Inclusion Pipeline must conclude for a specific parachain before a new block can be accepted on that parachain. After inclusion, the Approval Process kicks off, and can be running for many parachain blocks at once.
These two pipelines sum up the sequence of events necessary to extend and acquire full security on a Parablock. Note
that the Inclusion Pipeline must conclude for a specific parachain before a new block can be accepted on that parachain.
After inclusion, the Approval Process kicks off, and can be running for many parachain blocks at once.
Reiterating the lifecycle of a candidate:
@@ -129,8 +188,11 @@ digraph {
The diagram above shows the happy path of a block from (1) Candidate to the (7) Approved state.
It is also important to take note of the fact that the relay-chain is extended by BABE, which is a forkful algorithm. That means that different block authors can be chosen at the same time, and may not be building on the same block parent. Furthermore, the set of validators is not fixed, nor is the set of parachains. And even with the same set of validators and parachains, the validators' assignments to parachains is flexible. This means that the architecture proposed in the next chapters must deal with the variability and multiplicity of the network state.
It is also important to take note of the fact that the relay-chain is extended by BABE, which is a forkful algorithm.
That means that different block authors can be chosen at the same time, and may not be building on the same block
parent. Furthermore, the set of validators is not fixed, nor is the set of parachains. And even with the same set of
validators and parachains, the validators' assignments to parachains is flexible. This means that the architecture
proposed in the next chapters must deal with the variability and multiplicity of the network state.
```dot process
digraph {
@@ -169,7 +231,9 @@ digraph {
}
```
In this example, group 1 has received block C while the others have not due to network asynchrony. Now, a validator from group 2 may be able to build another block on top of B, called `C'`. Assume that afterwards, some validators become aware of both C and `C'`, while others remain only aware of one.
In this example, group 1 has received block C while the others have not due to network asynchrony. Now, a validator from
group 2 may be able to build another block on top of B, called `C'`. Assume that afterwards, some validators become
aware of both C and `C'`, while others remain only aware of one.
```dot process
digraph {
@@ -216,4 +280,7 @@ digraph {
}
```
Those validators that are aware of many competing heads must be aware of the work happening on each one. They may contribute to some or a full extent on both. It is possible that due to network asynchrony two forks may grow in parallel for some time, although in the absence of an adversarial network this is unlikely in the case where there are validators who are aware of both chain heads.
Those validators that are aware of many competing heads must be aware of the work happening on each one. They may
contribute to some or a full extent on both. It is possible that due to network asynchrony two forks may grow in
parallel for some time, although in the absence of an adversarial network this is unlikely in the case where there are
validators who are aware of both chain heads.
@@ -2,20 +2,28 @@
## Motivation
Parachains' validation function is described by a wasm module that we refer to as a PVF. Since a PVF is a wasm module the typical way of executing it is to compile it to machine code.
Parachains' validation function is described by a wasm module that we refer to as a PVF. Since a PVF is a wasm module
the typical way of executing it is to compile it to machine code.
Typically an optimizing compiler consists of algorithms that are able to optimize the resulting machine code heavily. However, while those algorithms perform quite well for a typical wasm code produced by standard toolchains (e.g. rustc/LLVM), those algorithms can be abused to consume a lot of resources. Moreover, since those algorithms are rather complex there is a lot of room for a bug that can crash the compiler.
Typically an optimizing compiler consists of algorithms that are able to optimize the resulting machine code heavily.
However, while those algorithms perform quite well for a typical wasm code produced by standard toolchains (e.g.
rustc/LLVM), those algorithms can be abused to consume a lot of resources. Moreover, since those algorithms are rather
complex there is a lot of room for a bug that can crash the compiler.
If compilation of a Parachain Validation Function (PVF) takes too long or uses too much memory, this can leave a node in limbo as to whether a candidate of that parachain is valid or not.
If compilation of a Parachain Validation Function (PVF) takes too long or uses too much memory, this can leave a node in
limbo as to whether a candidate of that parachain is valid or not.
The amount of time that a PVF takes to compile is a subjective resource limit and as such PVFs may be maliciously crafted so that there is e.g. a 50/50 split of validators which can and cannot compile and execute the PVF.
The amount of time that a PVF takes to compile is a subjective resource limit and as such PVFs may be maliciously
crafted so that there is e.g. a 50/50 split of validators which can and cannot compile and execute the PVF.
This has the following implications:
- In backing, inclusion may be slow due to backing groups being unable to execute the block
- In approval checking, there may be many no-shows, leading to slow finality
- In disputes, neither side may reach supermajority. Nobody will get slashed and the chain will not be reverted or finalized.
- In disputes, neither side may reach supermajority. Nobody will get slashed and the chain will not be reverted or
finalized.
As a result of this issue we need a fairly hard guarantee that the PVFs of registered parachains/threads can be compiled within a reasonable amount of time.
As a result of this issue we need a fairly hard guarantee that the PVFs of registered parachains/threads can be compiled
within a reasonable amount of time.
## Solution
@@ -23,9 +31,12 @@ The problem is solved by having a pre-checking process.
### Pre-checking
Pre-checking mostly consists of attempting to prepare (compile) the PVF WASM blob. We use more strict limits (e.g. timeouts) here compared to regular preparation for execution. This way errors during preparation later are likely unrelated to the PVF itself, as it already passed pre-checking. We can treat such errors as local node issues.
Pre-checking mostly consists of attempting to prepare (compile) the PVF WASM blob. We use more strict limits (e.g.
timeouts) here compared to regular preparation for execution. This way errors during preparation later are likely
unrelated to the PVF itself, as it already passed pre-checking. We can treat such errors as local node issues.
We also have an additional step where we attempt to instantiate the WASM runtime without running it. This is unrelated to preparation so we don't time it, but it does help us catch more issues.
We also have an additional step where we attempt to instantiate the WASM runtime without running it. This is unrelated
to preparation so we don't time it, but it does help us catch more issues.
### Protocol
@@ -34,27 +45,41 @@ Pre-checking is run when a new validation code is included in the chain. A new P
- A new parachain is registered.
- An existing parachain signalled an upgrade of its validation code.
Before any of those operations finish, the PVF pre-checking vote is initiated. The PVF pre-checking vote is identified by the PVF code hash that is being voted on. If there is already PVF pre-checking process running, then no
new PVF pre-checking vote will be started. Instead, the operation just subscribes to the existing vote.
Before any of those operations finish, the PVF pre-checking vote is initiated. The PVF pre-checking vote is identified
by the PVF code hash that is being voted on. If there is already PVF pre-checking process running, then no new PVF
pre-checking vote will be started. Instead, the operation just subscribes to the existing vote.
The pre-checking vote can be concluded either by obtaining a threshold of votes for a decision, or if it expires. The threshold to accept is a supermajority of 2/3 of validators. We reject once a supermajority is no longer possible.
The pre-checking vote can be concluded either by obtaining a threshold of votes for a decision, or if it expires. The
threshold to accept is a supermajority of 2/3 of validators. We reject once a supermajority is no longer possible.
Each validator checks the list of PVFs available for voting. The vote is binary, i.e. accept or reject a given PVF. As soon as the threshold of votes are collected for one of the sides of the vote, the voting is concluded in that direction and the effects of the voting are enacted.
Each validator checks the list of PVFs available for voting. The vote is binary, i.e. accept or reject a given PVF. As
soon as the threshold of votes are collected for one of the sides of the vote, the voting is concluded in that direction
and the effects of the voting are enacted.
Only validators from the active set can participate in the vote. The set of active validators can change each session. That's why we reset the votes each session. A voting that observed a certain number of sessions will be rejected.
Only validators from the active set can participate in the vote. The set of active validators can change each session.
That's why we reset the votes each session. A voting that observed a certain number of sessions will be rejected.
The effects of the PVF accepting depend on the operations requested it:
1. All onboardings subscribed to the approved PVF pre-checking process will get scheduled and after passing 2 session boundaries they will be onboarded.
1. All upgrades subscribed to the approved PVF pre-checking process will get scheduled very similarly to the existing process. Upgrades with pre-checking are really the same process that is just delayed by the time required for pre-checking voting. In case of instant approval the mechanism is exactly the same.
1. All onboardings subscribed to the approved PVF pre-checking process will get scheduled and after passing 2 session
boundaries they will be onboarded.
1. All upgrades subscribed to the approved PVF pre-checking process will get scheduled very similarly to the existing
process. Upgrades with pre-checking are really the same process that is just delayed by the time required for
pre-checking voting. In case of instant approval the mechanism is exactly the same.
In case PVF pre-checking process was concluded with rejection, then all the operations that are subscribed to the rejected PVF pre-checking process will be processed as follows. That is, onboarding or upgrading will be cancelled.
In case PVF pre-checking process was concluded with rejection, then all the operations that are subscribed to the
rejected PVF pre-checking process will be processed as follows. That is, onboarding or upgrading will be cancelled.
The logic described above is implemented by the [paras] module.
### Subsystem
On the node-side, there is a PVF pre-checking [subsystem][pvf-prechecker-subsystem] that scans the chain for new PVFs via using [runtime APIs][pvf-runtime-api]. Upon finding a new PVF, the subsystem will initiate a PVF pre-checking request and wait for the result. Whenever the result is obtained, the subsystem will use the [runtime API][pvf-runtime-api] to submit a vote for the PVF. The vote is an unsigned transaction. The vote will be distributed via the gossip similarly to a normal transaction. Eventually a block producer will include the vote into the block where it will be handled by the [runtime][paras].
On the node-side, there is a PVF pre-checking [subsystem][pvf-prechecker-subsystem] that scans the chain for new PVFs
via using [runtime APIs][pvf-runtime-api]. Upon finding a new PVF, the subsystem will initiate a PVF pre-checking
request and wait for the result. Whenever the result is obtained, the subsystem will use the [runtime
API][pvf-runtime-api] to submit a vote for the PVF. The vote is an unsigned transaction. The vote will be distributed
via the gossip similarly to a normal transaction. Eventually a block producer will include the vote into the block where
it will be handled by the [runtime][paras].
## Summary
@@ -62,11 +87,15 @@ Parachains' validation function is described by a wasm module that we refer to a
In order to make the PVF usable for candidate validation it has to be registered on-chain.
As part of the registration process, it has to go through pre-checking. Pre-checking is a game of attempting preparation and additional checks, and reporting the results back on-chain.
As part of the registration process, it has to go through pre-checking. Pre-checking is a game of attempting preparation
and additional checks, and reporting the results back on-chain.
We define preparation as a process that: validates the consistency of the wasm binary (aka prevalidation) and the compilation of the wasm module into machine code (referred to as an artifact).
We define preparation as a process that: validates the consistency of the wasm binary (aka prevalidation) and the
compilation of the wasm module into machine code (referred to as an artifact).
Besides pre-checking, preparation can also be triggered by execution, since a compiled artifact is needed for the execution. If an artifact already exists, execution will skip preparation. If it does do preparation, execution uses a more lenient timeout than preparation, to avoid the situation where honest validators fail on valid, pre-checked PVFs.
Besides pre-checking, preparation can also be triggered by execution, since a compiled artifact is needed for the
execution. If an artifact already exists, execution will skip preparation. If it does do preparation, execution uses a
more lenient timeout than preparation, to avoid the situation where honest validators fail on valid, pre-checked PVFs.
[paras]: runtime/paras.md
[pvf-runtime-api]: runtime-api/pvf-prechecking.md
@@ -2,9 +2,16 @@
Runtime APIs are the means by which the node-side code extracts information from the state of the runtime.
Every block in the relay-chain contains a *state root* which is the root hash of a state trie encapsulating all storage of runtime modules after execution of the block. This is a cryptographic commitment to a unique state. We use the terminology of accessing the *state at* a block to refer accessing the state referred to by the state root of that block.
Every block in the relay-chain contains a *state root* which is the root hash of a state trie encapsulating all storage
of runtime modules after execution of the block. This is a cryptographic commitment to a unique state. We use the
terminology of accessing the *state at* a block to refer accessing the state referred to by the state root of that
block.
Although Runtime APIs are often used for simple storage access, they are actually empowered to do arbitrary computation. The implementation of the Runtime APIs lives within the Runtime as Wasm code and exposes `extern` functions that can be invoked with arguments and have a return value. Runtime APIs have access to a variety of host functions, which are contextual functions provided by the Wasm execution context, that allow it to carry out many different types of behaviors.
Although Runtime APIs are often used for simple storage access, they are actually empowered to do arbitrary computation.
The implementation of the Runtime APIs lives within the Runtime as Wasm code and exposes `extern` functions that can be
invoked with arguments and have a return value. Runtime APIs have access to a variety of host functions, which are
contextual functions provided by the Wasm execution context, that allow it to carry out many different types of
behaviors.
Abilities provided by host functions includes:
@@ -14,16 +21,25 @@ Abilities provided by host functions includes:
* Optimized versions of cryptographic functions
* More
So it is clear that Runtime APIs are a versatile and powerful tool to leverage the state of the chain. In general, we will use Runtime APIs for these purposes:
So it is clear that Runtime APIs are a versatile and powerful tool to leverage the state of the chain. In general, we
will use Runtime APIs for these purposes:
* Access of a storage item
* Access of a bundle of related storage items
* Deriving a value from storage based on arguments
* Submitting misbehavior reports
More broadly, we have the goal of using Runtime APIs to write Node-side code that fulfills the requirements set by the Runtime. In particular, the constraints set forth by the [Scheduler](../runtime/scheduler.md) and [Inclusion](../runtime/inclusion.md) modules. These modules are responsible for advancing paras with a two-phase protocol where validators are first chosen to validate and back a candidate and then required to ensure availability of referenced data. In the second phase, validators are meant to attest to those para-candidates that they have their availability chunk for. As the Node-side code needs to generate the inputs into these two phases, the runtime API needs to transmit information from the runtime that is aware of the Availability Cores model instantiated by the Scheduler and Inclusion modules.
More broadly, we have the goal of using Runtime APIs to write Node-side code that fulfills the requirements set by the
Runtime. In particular, the constraints set forth by the [Scheduler](../runtime/scheduler.md) and
[Inclusion](../runtime/inclusion.md) modules. These modules are responsible for advancing paras with a two-phase
protocol where validators are first chosen to validate and back a candidate and then required to ensure availability of
referenced data. In the second phase, validators are meant to attest to those para-candidates that they have their
availability chunk for. As the Node-side code needs to generate the inputs into these two phases, the runtime API needs
to transmit information from the runtime that is aware of the Availability Cores model instantiated by the Scheduler and
Inclusion modules.
Node-side code is also responsible for detecting and reporting misbehavior performed by other validators, and the set of Runtime APIs needs to provide methods for observing live disputes and submitting reports as transactions.
Node-side code is also responsible for detecting and reporting misbehavior performed by other validators, and the set of
Runtime APIs needs to provide methods for observing live disputes and submitting reports as transactions.
The next sections will contain information on specific runtime APIs. The format is this:
@@ -38,9 +54,16 @@ The next sections will contain information on specific runtime APIs. The format
fn some_runtime_api(at: Block, arg1: Type1, arg2: Type2, ...) -> ReturnValue;
```
Certain runtime APIs concerning the state of a para require the caller to provide an `OccupiedCoreAssumption`. This indicates how the result of the runtime API should be computed if there is a candidate from the para occupying an availability core in the [Inclusion Module](../runtime/inclusion.md).
Certain runtime APIs concerning the state of a para require the caller to provide an `OccupiedCoreAssumption`. This
indicates how the result of the runtime API should be computed if there is a candidate from the para occupying an
availability core in the [Inclusion Module](../runtime/inclusion.md).
The choices of assumption are whether the candidate occupying that core should be assumed to have been made available and included or timed out and discarded, along with a third option to assert that the core was not occupied. This choice affects everything from the parent head-data, the validation code, and the state of message-queues. Typically, users will take the assumption that either the core was free or that the occupying candidate was included, as timeouts are expected only in adversarial circumstances and even so, only in a small minority of blocks directly following validator set rotations.
The choices of assumption are whether the candidate occupying that core should be assumed to have been made available
and included or timed out and discarded, along with a third option to assert that the core was not occupied. This choice
affects everything from the parent head-data, the validation code, and the state of message-queues. Typically, users
will take the assumption that either the core was free or that the occupying candidate was included, as timeouts are
expected only in adversarial circumstances and even so, only in a small minority of blocks directly following validator
set rotations.
```rust
/// An assumption being made about the state of an occupied core.
@@ -52,4 +75,4 @@ enum OccupiedCoreAssumption {
/// The core was not occupied to begin with.
Free,
}
```
```
@@ -1,14 +1,26 @@
# Availability Cores
Yields information on all availability cores. Cores are either free or occupied. Free cores can have paras assigned to them. Occupied cores don't, but they can become available part-way through a block due to bitfields and then have something scheduled on them. To allow optimistic validation of candidates, the occupied cores are accompanied by information on what is upcoming. This information can be leveraged when validators perceive that there is a high likelihood of a core becoming available based on bitfields seen, and then optimistically validate something that would become scheduled based on that, although there is no guarantee on what the block producer will actually include in the block.
Yields information on all availability cores. Cores are either free or occupied. Free cores can have paras assigned to
them. Occupied cores don't, but they can become available part-way through a block due to bitfields and then have
something scheduled on them. To allow optimistic validation of candidates, the occupied cores are accompanied by
information on what is upcoming. This information can be leveraged when validators perceive that there is a high
likelihood of a core becoming available based on bitfields seen, and then optimistically validate something that would
become scheduled based on that, although there is no guarantee on what the block producer will actually include in the
block.
See also the [Scheduler Module](../runtime/scheduler.md) for a high-level description of what an availability core is and why it exists.
See also the [Scheduler Module](../runtime/scheduler.md) for a high-level description of what an availability core is
and why it exists.
```rust
fn availability_cores(at: Block) -> Vec<CoreState>;
```
This is all the information that a validator needs about scheduling for the current block. It includes all information on [Scheduler](../runtime/scheduler.md) core-assignments and [Inclusion](../runtime/inclusion.md) state of blocks occupying availability cores. It includes data necessary to determine not only which paras are assigned now, but which cores are likely to become freed after processing bitfields, and exactly which bitfields would be necessary to make them so. The implementation of this runtime API should invoke `Scheduler::clear` and `Scheduler::schedule(Vec::new(), current_block_number + 1)` to ensure that scheduling is accurate.
This is all the information that a validator needs about scheduling for the current block. It includes all information
on [Scheduler](../runtime/scheduler.md) core-assignments and [Inclusion](../runtime/inclusion.md) state of blocks
occupying availability cores. It includes data necessary to determine not only which paras are assigned now, but which
cores are likely to become freed after processing bitfields, and exactly which bitfields would be necessary to make them
so. The implementation of this runtime API should invoke `Scheduler::clear` and `Scheduler::schedule(Vec::new(),
current_block_number + 1)` to ensure that scheduling is accurate.
```rust
struct OccupiedCore {
@@ -1,6 +1,7 @@
# Candidate Pending Availability
Get the receipt of a candidate pending availability. This returns `Some` for any paras assigned to occupied cores in `availability_cores` and `None` otherwise.
Get the receipt of a candidate pending availability. This returns `Some` for any paras assigned to occupied cores in
`availability_cores` and `None` otherwise.
```rust
fn candidate_pending_availability(at: Block, ParaId) -> Option<CommittedCandidateReceipt>;
@@ -1,6 +1,9 @@
# Disputes Info
Get information about all disputes known by the chain as well as information about which validators the disputes subsystem will accept disputes from. These disputes may be either live or concluded. The [`DisputeState`](../types/disputes.md#disputestate) can be used to determine whether the dispute still accepts votes, as well as which validators' votes may be included.
Get information about all disputes known by the chain as well as information about which validators the disputes
subsystem will accept disputes from. These disputes may be either live or concluded. The
[`DisputeState`](../types/disputes.md#disputestate) can be used to determine whether the dispute still accepts votes, as
well as which validators' votes may be included.
```rust
struct Dispute {
@@ -1,6 +1,8 @@
# Persisted Validation Data
Yields the [`PersistedValidationData`](../types/candidate.md#persistedvalidationdata) for the given [`ParaId`](../types/candidate.md#paraid) along with an assumption that should be used if the para currently occupies a core:
Yields the [`PersistedValidationData`](../types/candidate.md#persistedvalidationdata) for the given
[`ParaId`](../types/candidate.md#paraid) along with an assumption that should be used if the para currently occupies a
core:
```rust
/// Returns the persisted validation data for the given para and occupied core assumption.
@@ -8,4 +10,4 @@ Yields the [`PersistedValidationData`](../types/candidate.md#persistedvalidation
/// Returns `None` if either the para is not registered or the assumption is `Freed`
/// and the para already occupies a core.
fn persisted_validation_data(at: Block, ParaId, OccupiedCoreAssumption) -> Option<PersistedValidationData>;
```
```
@@ -4,18 +4,18 @@
There are two main runtime APIs to work with PVF pre-checking.
The first runtime API is designed to fetch all PVFs that require pre-checking voting. The PVFs are
identified by their code hashes. As soon as the PVF gains required support, the runtime API will
not return the PVF anymore.
The first runtime API is designed to fetch all PVFs that require pre-checking voting. The PVFs are identified by their
code hashes. As soon as the PVF gains required support, the runtime API will not return the PVF anymore.
```rust
fn pvfs_require_precheck() -> Vec<ValidationCodeHash>;
```
The second runtime API is needed to submit the judgement for a PVF, whether it is approved or not.
The voting process uses unsigned transactions. The [`PvfCheckStatement`](../types/pvf-prechecking.md) is circulated through the network via gossip similar to a normal transaction. At some point the validator
will include the statement in the block, where it will be processed by the runtime. If that was the
last vote before gaining the super-majority, this PVF will not be returned by `pvfs_require_precheck` anymore.
The second runtime API is needed to submit the judgement for a PVF, whether it is approved or not. The voting process
uses unsigned transactions. The [`PvfCheckStatement`](../types/pvf-prechecking.md) is circulated through the network via
gossip similar to a normal transaction. At some point the validator will include the statement in the block, where it
will be processed by the runtime. If that was the last vote before gaining the super-majority, this PVF will not be
returned by `pvfs_require_precheck` anymore.
```rust
fn submit_pvf_check_statement(stmt: PvfCheckStatement, signature: ValidatorSignature);
@@ -2,7 +2,8 @@
Get the session index that is expected at the child of a block.
In the [`Initializer`](../runtime/initializer.md) module, session changes are buffered by one block. The session index of the child of any relay block is always predictable by that block's state.
In the [`Initializer`](../runtime/initializer.md) module, session changes are buffered by one block. The session index
of the child of any relay block is always predictable by that block's state.
This session index can be used to derive a [`SigningContext`](../types/candidate.md#signing-context).
@@ -1,6 +1,7 @@
# Validator Groups
Yields the validator groups used during the current session. The validators in the groups are referred to by their index into the validator-set and this is assumed to be as-of the child of the block whose state is being queried.
Yields the validator groups used during the current session. The validators in the groups are referred to by their index
into the validator-set and this is assumed to be as-of the child of the block whose state is being queried.
```rust
/// A helper data-type for tracking validator-group rotations.
@@ -1,6 +1,7 @@
# Validators
Yields the validator-set at the state of a given block. This validator set is always the one responsible for backing parachains in the child of the provided block.
Yields the validator-set at the state of a given block. This validator set is always the one responsible for backing
parachains in the child of the provided block.
```rust
fn validators(at: Block) -> Vec<ValidatorId>;
@@ -1,42 +1,83 @@
# Runtime Architecture
It's clear that we want to separate different aspects of the runtime logic into different modules. Modules define their own storage, routines, and entry-points. They also define initialization and finalization logic.
It's clear that we want to separate different aspects of the runtime logic into different modules. Modules define their
own storage, routines, and entry-points. They also define initialization and finalization logic.
Due to the (lack of) guarantees provided by a particular blockchain-runtime framework, there is no defined or dependable order in which modules' initialization or finalization logic will run. Supporting this blockchain-runtime framework is important enough to include that same uncertainty in our model of runtime modules in this guide. Furthermore, initialization logic of modules can trigger the entry-points or routines of other modules. This is one architectural pressure against dividing the runtime logic into multiple modules. However, in this case the benefits of splitting things up outweigh the costs, provided that we take certain precautions against initialization and entry-point races.
Due to the (lack of) guarantees provided by a particular blockchain-runtime framework, there is no defined or dependable
order in which modules' initialization or finalization logic will run. Supporting this blockchain-runtime framework is
important enough to include that same uncertainty in our model of runtime modules in this guide. Furthermore,
initialization logic of modules can trigger the entry-points or routines of other modules. This is one architectural
pressure against dividing the runtime logic into multiple modules. However, in this case the benefits of splitting
things up outweigh the costs, provided that we take certain precautions against initialization and entry-point races.
We also expect, although it's beyond the scope of this guide, that these runtime modules will exist alongside various other modules. This has two facets to consider. First, even if the modules that we describe here don't invoke each others' entry points or routines during initialization, we still have to protect against those other modules doing that. Second, some of those modules are expected to provide governance capabilities for the chain. Configuration exposed by parachain-host modules is mostly for the benefit of these governance modules, to allow the operators or community of the chain to tweak parameters.
We also expect, although it's beyond the scope of this guide, that these runtime modules will exist alongside various
other modules. This has two facets to consider. First, even if the modules that we describe here don't invoke each
others' entry points or routines during initialization, we still have to protect against those other modules doing that.
Second, some of those modules are expected to provide governance capabilities for the chain. Configuration exposed by
parachain-host modules is mostly for the benefit of these governance modules, to allow the operators or community of the
chain to tweak parameters.
The runtime's primary role is to manage scheduling and updating of parachains, as well as handling misbehavior reports and slashing. This guide doesn't focus on how parachains are registered, only that they are. Also, this runtime description assumes that validator sets are selected somehow, but doesn't assume any other details than a periodic _session change_ event. Session changes give information about the incoming validator set and the validator set of the following session.
The runtime's primary role is to manage scheduling and updating of parachains, as well as handling misbehavior reports
and slashing. This guide doesn't focus on how parachains are registered, only that they are. Also, this runtime
description assumes that validator sets are selected somehow, but doesn't assume any other details than a periodic
_session change_ event. Session changes give information about the incoming validator set and the validator set of the
following session.
The runtime also serves another role, which is to make data available to the Node-side logic via Runtime APIs. These Runtime APIs should be sufficient for the Node-side code to author blocks correctly.
The runtime also serves another role, which is to make data available to the Node-side logic via Runtime APIs. These
Runtime APIs should be sufficient for the Node-side code to author blocks correctly.
There is some functionality of the relay chain relating to parachains that we also consider beyond the scope of this document. In particular, all modules related to how parachains are registered aren't part of this guide, although we do provide routines that should be called by the registration process.
There is some functionality of the relay chain relating to parachains that we also consider beyond the scope of this
document. In particular, all modules related to how parachains are registered aren't part of this guide, although we do
provide routines that should be called by the registration process.
We will split the logic of the runtime up into these modules:
* Initializer: manages initialization order of the other modules.
* Shared: manages shared storage and configurations for other modules.
* Configuration: manages configuration and configuration updates in a non-racy manner.
* Paras: manages chain-head and validation code for parachains.
* Scheduler: manages parachain scheduling as well as validator assignments.
* Inclusion: handles the inclusion and availability of scheduled parachains.
* SessionInfo: manages various session keys of validators and other params stored per session.
* Disputes: handles dispute resolution for included, available parablocks.
* Slashing: handles slashing logic for concluded disputes.
* HRMP: handles horizontal messages between paras.
* UMP: handles upward messages from a para to the relay chain.
* DMP: handles downward messages from the relay chain to the para.
- Initializer: manages initialization order of the other modules.
- Shared: manages shared storage and configurations for other modules.
- Configuration: manages configuration and configuration updates in a non-racy manner.
- Paras: manages chain-head and validation code for parachains.
- Scheduler: manages parachain scheduling as well as validator assignments.
- Inclusion: handles the inclusion and availability of scheduled parachains.
- SessionInfo: manages various session keys of validators and other params stored per session.
- Disputes: handles dispute resolution for included, available parablocks.
- Slashing: handles slashing logic for concluded disputes.
- HRMP: handles horizontal messages between paras.
- UMP: handles upward messages from a para to the relay chain.
- DMP: handles downward messages from the relay chain to the para.
The [Initializer module](initializer.md) is special - it's responsible for handling the initialization logic of the other modules to ensure that the correct initialization order and related invariants are maintained. The other modules won't specify a on-initialize logic, but will instead expose a special semi-private routine that the initialization module will call. The other modules are relatively straightforward and perform the roles described above.
The [Initializer module](initializer.md) is special - it's responsible for handling the initialization logic of the
other modules to ensure that the correct initialization order and related invariants are maintained. The other modules
won't specify a on-initialize logic, but will instead expose a special semi-private routine that the initialization
module will call. The other modules are relatively straightforward and perform the roles described above.
The Parachain Host operates under a changing set of validators. Time is split up into periodic sessions, where each session brings a potentially new set of validators. Sessions are buffered by one, meaning that the validators of the upcoming session `n+1` are determined at the end of session `n-1`, right before session `n` starts. Parachain Host runtime modules need to react to changes in the validator set, as it will affect the runtime logic for processing candidate backing, availability bitfields, and misbehavior reports. The Parachain Host modules can't determine ahead-of-time exactly when session change notifications are going to happen within the block (note: this depends on module initialization order again - better to put session before parachains modules).
The Parachain Host operates under a changing set of validators. Time is split up into periodic sessions, where each
session brings a potentially new set of validators. Sessions are buffered by one, meaning that the validators of the
upcoming session `n+1` are determined at the end of session `n-1`, right before session `n` starts. Parachain Host
runtime modules need to react to changes in the validator set, as it will affect the runtime logic for processing
candidate backing, availability bitfields, and misbehavior reports. The Parachain Host modules can't determine
ahead-of-time exactly when session change notifications are going to happen within the block (note: this depends on
module initialization order again - better to put session before parachains modules).
The relay chain is intended to use BABE or SASSAFRAS, which both have the property that a session changing at a block is determined not by the number of the block but instead by the time the block is authored. In some sense, sessions change in-between blocks, not at blocks. This has the side effect that the session of a child block cannot be determined solely by the parent block's identifier. Being able to unilaterally determine the validator-set at a specific block based on its parent hash would make a lot of Node-side logic much simpler.
The relay chain is intended to use BABE or SASSAFRAS, which both have the property that a session changing at a block is
determined not by the number of the block but instead by the time the block is authored. In some sense, sessions change
in-between blocks, not at blocks. This has the side effect that the session of a child block cannot be determined solely
by the parent block's identifier. Being able to unilaterally determine the validator-set at a specific block based on
its parent hash would make a lot of Node-side logic much simpler.
In order to regain the property that the validator set of a block is predictable by its parent block, we delay session changes' application to Parachains by 1 block. This means that if there is a session change at block X, that session change will be stored and applied during initialization of direct descendants of X. This principal side effect of this change is that the Parachains runtime can disagree with session or consensus modules about which session it currently is. Misbehavior reporting routines in particular will be affected by this, although not severely. The parachains runtime might believe it is the last block of the session while the system is really in the first block of the next session. In such cases, a historical validator-set membership proof will need to accompany any misbehavior report, although they typically do not need to during current-session misbehavior reports.
In order to regain the property that the validator set of a block is predictable by its parent block, we delay session
changes' application to Parachains by 1 block. This means that if there is a session change at block X, that session
change will be stored and applied during initialization of direct descendants of X. This principal side effect of this
change is that the Parachains runtime can disagree with session or consensus modules about which session it currently
is. Misbehavior reporting routines in particular will be affected by this, although not severely. The parachains runtime
might believe it is the last block of the session while the system is really in the first block of the next session. In
such cases, a historical validator-set membership proof will need to accompany any misbehavior report, although they
typically do not need to during current-session misbehavior reports.
So the other role of the initializer module is to forward session change notifications to modules in the initialization order. Session change is also the point at which the [Configuration Module](configuration.md) updates the configuration. Most of the other modules will handle changes in the configuration during their session change operation, so the initializer should provide both the old and new configuration to all the other
modules alongside the session change notification. This means that a session change notification should consist of the following data:
So the other role of the initializer module is to forward session change notifications to modules in the initialization
order. Session change is also the point at which the [Configuration Module](configuration.md) updates the configuration.
Most of the other modules will handle changes in the configuration during their session change operation, so the
initializer should provide both the old and new configuration to all the other modules alongside the session change
notification. This means that a session change notification should consist of the following data:
```rust
struct SessionChangeNotification {
@@ -1,6 +1,11 @@
# Configuration Pallet
This module is responsible for managing all configuration of the parachain host in-flight. It provides a central point for configuration updates to prevent races between configuration changes and parachain-processing logic. Configuration can only change during the session change routine, and as this module handles the session change notification first it provides an invariant that the configuration does not change throughout the entire session. Both the [scheduler](scheduler.md) and [inclusion](inclusion.md) modules rely on this invariant to ensure proper behavior of the scheduler.
This module is responsible for managing all configuration of the parachain host in-flight. It provides a central point
for configuration updates to prevent races between configuration changes and parachain-processing logic. Configuration
can only change during the session change routine, and as this module handles the session change notification first it
provides an invariant that the configuration does not change throughout the entire session. Both the
[scheduler](scheduler.md) and [inclusion](inclusion.md) modules rely on this invariant to ensure proper behavior of the
scheduler.
The configuration that we will be tracking is the [`HostConfiguration`](../types/runtime.md#host-configuration) struct.
@@ -23,7 +28,8 @@ The session change routine works as follows:
- If there is no pending configurations, then return early.
- Take all pending configurations that are less than or equal to the current session index.
- Get the pending configuration with the highest session index and apply it to the current configuration. Discard the earlier ones if any.
- Get the pending configuration with the highest session index and apply it to the current configuration. Discard the
earlier ones if any.
## Routines
@@ -41,17 +47,17 @@ pub fn configuration() -> HostConfiguration {
Configuration::get()
}
/// Schedules updating the host configuration. The update is given by the `updater` closure. The
/// closure takes the current version of the configuration and returns the new version.
/// Returns an `Err` if the closure returns a broken configuration. However, there are a couple of
/// exceptions:
/// Schedules updating the host configuration. The update is given by the `updater` closure. The
/// closure takes the current version of the configuration and returns the new version.
/// Returns an `Err` if the closure returns a broken configuration. However, there are a couple of
/// exceptions:
///
/// - if the configuration that was passed in the closure is already broken, then it will pass the
/// - if the configuration that was passed in the closure is already broken, then it will pass the
/// update: you cannot break something that is already broken.
/// - If the `BypassConsistencyCheck` flag is set, then the checks will be skipped.
///
/// The changes made by this function will always be scheduled at session X, where X is the current session index + 2.
/// If there is already a pending update for X, then the closure will receive the already pending configuration for
/// If there is already a pending update for X, then the closure will receive the already pending configuration for
/// session X.
///
/// If there is already a pending update for the current session index + 1, then it won't be touched. Otherwise,
@@ -61,4 +67,6 @@ fn schedule_config_update(updater: impl FnOnce(&mut HostConfiguration<BlockNumbe
## Entry-points
The Configuration module exposes an entry point for each configuration member. These entry-points accept calls only from governance origins. These entry-points will use the `update_configuration` routine to update the specific configuration field.
The Configuration module exposes an entry point for each configuration member. These entry-points accept calls only from
governance origins. These entry-points will use the `update_configuration` routine to update the specific configuration
field.
@@ -1,29 +1,47 @@
# Disputes Pallet
After a backed candidate is made available, it is included and proceeds into an acceptance period during which validators are randomly selected to do (secondary) approval checks of the parablock. Any reports disputing the validity of the candidate will cause escalation, where even more validators are requested to check the block, and so on, until either the parablock is determined to be invalid or valid. Those on the wrong side of the dispute are slashed and, if the parablock is deemed invalid, the relay chain is rolled back to a point before that block was included.
After a backed candidate is made available, it is included and proceeds into an acceptance period during which
validators are randomly selected to do (secondary) approval checks of the parablock. Any reports disputing the validity
of the candidate will cause escalation, where even more validators are requested to check the block, and so on, until
either the parablock is determined to be invalid or valid. Those on the wrong side of the dispute are slashed and, if
the parablock is deemed invalid, the relay chain is rolled back to a point before that block was included.
However, this isn't the end of the story. We are working in a forkful blockchain environment, which carries three important considerations:
However, this isn't the end of the story. We are working in a forkful blockchain environment, which carries three
important considerations:
1. For security, validators that misbehave shouldn't only be slashed on one fork, but on all possible forks. Validators that misbehave shouldn't be able to create a new fork of the chain when caught and get away with their misbehavior.
1. For security, validators that misbehave shouldn't only be slashed on one fork, but on all possible forks. Validators
that misbehave shouldn't be able to create a new fork of the chain when caught and get away with their misbehavior.
1. It is possible (and likely) that the parablock being contested has not appeared on all forks.
1. If a block author believes that there is a disputed parablock on a specific fork that will resolve to a reversion of the fork, that block author has more incentive to build on a different fork which does not include that parablock.
1. If a block author believes that there is a disputed parablock on a specific fork that will resolve to a reversion of
the fork, that block author has more incentive to build on a different fork which does not include that parablock.
This means that in all likelihood, there is the possibility of disputes that are started on one fork of the relay chain, and as soon as the dispute resolution process starts to indicate that the parablock is indeed invalid, that fork of the relay chain will be abandoned and the dispute will never be fully resolved on that chain.
This means that in all likelihood, there is the possibility of disputes that are started on one fork of the relay chain,
and as soon as the dispute resolution process starts to indicate that the parablock is indeed invalid, that fork of the
relay chain will be abandoned and the dispute will never be fully resolved on that chain.
Even if this doesn't happen, there is the possibility that there are two disputes underway, and one resolves leading to a reversion of the chain before the other has concluded. In this case we want to both transplant the concluded dispute onto other forks of the chain as well as the unconcluded dispute.
Even if this doesn't happen, there is the possibility that there are two disputes underway, and one resolves leading to
a reversion of the chain before the other has concluded. In this case we want to both transplant the concluded dispute
onto other forks of the chain as well as the unconcluded dispute.
We account for these requirements by having the disputes module handle two kinds of disputes.
1. Local disputes: those contesting the validity of the current fork by disputing a parablock included within it.
1. Remote disputes: a dispute that has partially or fully resolved on another fork which is transplanted to the local fork for completion and eventual slashing.
1. Remote disputes: a dispute that has partially or fully resolved on another fork which is transplanted to the local
fork for completion and eventual slashing.
When a local dispute concludes negatively, the chain needs to be abandoned and reverted back to a block where the state does not contain the bad parablock. We expect that due to the [Approval Checking Protocol](../protocol-approval.md), the current executing block should not be finalized. So we do two things when a local dispute concludes negatively:
When a local dispute concludes negatively, the chain needs to be abandoned and reverted back to a block where the state
does not contain the bad parablock. We expect that due to the [Approval Checking Protocol](../protocol-approval.md), the
current executing block should not be finalized. So we do two things when a local dispute concludes negatively:
1. Freeze the state of parachains so nothing further is backed or included.
1. Issue a digest in the header of the block that signals to nodes that this branch of the chain is to be abandoned.
If, as is expected, the chain is unfinalized, the freeze will have no effect as no honest validator will attempt to build on the frozen chain. However, if the approval checking protocol has failed and the bad parablock is finalized, the freeze serves to put the chain into a governance-only mode.
If, as is expected, the chain is unfinalized, the freeze will have no effect as no honest validator will attempt to
build on the frozen chain. However, if the approval checking protocol has failed and the bad parablock is finalized, the
freeze serves to put the chain into a governance-only mode.
The storage of this module is designed around tracking [`DisputeState`s](../types/disputes.md#disputestate), updating them with votes, and tracking blocks included by this branch of the relay chain. It also contains a `Frozen` parameter designed to freeze the state of all parachains.
The storage of this module is designed around tracking [`DisputeState`s](../types/disputes.md#disputestate), updating
them with votes, and tracking blocks included by this branch of the relay chain. It also contains a `Frozen` parameter
designed to freeze the state of all parachains.
## Storage
@@ -44,15 +62,18 @@ Included: double_map (SessionIndex, CandidateHash) -> Option<BlockNumber>,
Frozen: Option<BlockNumber>,
```
> `byzantine_threshold` refers to the maximum number `f` of validators which may be byzantine. The total number of validators is `n = 3f + e` where `e in { 1, 2, 3 }`.
> `byzantine_threshold` refers to the maximum number `f` of validators which may be byzantine. The total number of
> validators is `n = 3f + e` where `e in { 1, 2, 3 }`.
## Session Change
1. If the current session is not greater than `config.dispute_period + 1`, nothing to do here.
1. Set `pruning_target = current_session - config.dispute_period - 1`. We add the extra `1` because we want to keep things for `config.dispute_period` _full_ sessions.
The stuff at the end of the most recent session has been around for a little over 0 sessions, not a little over 1.
1. Set `pruning_target = current_session - config.dispute_period - 1`. We add the extra `1` because we want to keep
things for `config.dispute_period` _full_ sessions. The stuff at the end of the most recent session has been around
for a little over 0 sessions, not a little over 1.
1. If `LastPrunedSession` is `None`, then set `LastPrunedSession` to `Some(pruning_target)` and return.
2. Otherwise, clear out all disputes and included candidates entries in the range `last_pruned..=pruning_target` and set `LastPrunedSession` to `Some(pruning_target)`.
1. Otherwise, clear out all disputes and included candidates entries in the range `last_pruned..=pruning_target` and set
`LastPrunedSession` to `Some(pruning_target)`.
## Block Initialization
@@ -61,11 +82,10 @@ This is currently a `no op`.
## Routines
* `filter_multi_dispute_data(MultiDisputeStatementSet) -> MultiDisputeStatementSet`:
1. Takes a `MultiDisputeStatementSet` and filters it down to a `MultiDisputeStatementSet`
that satisfies all the criteria of `provide_multi_dispute_data`. That is, eliminating
ancient votes, duplicates and unconfirmed disputes.
This can be used by block authors to create the final submission in a block which is
guaranteed to pass the `provide_multi_dispute_data` checks.
1. Takes a `MultiDisputeStatementSet` and filters it down to a `MultiDisputeStatementSet` that satisfies all the
criteria of `provide_multi_dispute_data`. That is, eliminating ancient votes, duplicates and unconfirmed disputes.
This can be used by block authors to create the final submission in a block which is guaranteed to pass the
`provide_multi_dispute_data` checks.
* `provide_multi_dispute_data(MultiDisputeStatementSet) -> Vec<(SessionIndex, Hash)>`:
1. Pass on each dispute statement set to `provide_dispute_data`, propagating failure.
@@ -76,46 +96,55 @@ This is currently a `no op`.
1. `SessionInfo` is used to check statement signatures and this function should fail if any signatures are invalid.
1. If there is no dispute under `Disputes`, create a new `DisputeState` with blank bitfields.
1. If `concluded_at` is `Some`, and is `concluded_at + config.post_conclusion_acceptance_period < now`, return false.
2. Import all statements into the dispute. This should fail if any statements are duplicate or if the corresponding bit for the corresponding validator is set in the dispute already.
3. If `concluded_at` is `None`, reward all statements.
4. If `concluded_at` is `Some`, reward all statements slightly less.
5. If either side now has supermajority and did not previously, slash the other side. This may be both sides, and we support this possibility in code, but note that this requires validators to participate on both sides which has negative expected value. Set `concluded_at` to `Some(now)` if it was `None`.
6. If just concluded against the candidate and the `Included` map contains `(session, candidate)`: invoke `revert_and_freeze` with the stored block number.
7. Return true if just initiated, false otherwise.
1. Import all statements into the dispute. This should fail if any statements are duplicate or if the corresponding
bit for the corresponding validator is set in the dispute already.
1. If `concluded_at` is `None`, reward all statements.
1. If `concluded_at` is `Some`, reward all statements slightly less.
1. If either side now has supermajority and did not previously, slash the other side. This may be both sides, and we
support this possibility in code, but note that this requires validators to participate on both sides which has
negative expected value. Set `concluded_at` to `Some(now)` if it was `None`.
1. If just concluded against the candidate and the `Included` map contains `(session, candidate)`: invoke
`revert_and_freeze` with the stored block number.
1. Return true if just initiated, false otherwise.
* `disputes() -> Vec<(SessionIndex, CandidateHash, DisputeState)>`: Get a list of all disputes and info about dispute state.
* `disputes() -> Vec<(SessionIndex, CandidateHash, DisputeState)>`: Get a list of all disputes and info about dispute
state.
1. Iterate over all disputes in `Disputes` and collect into a vector.
* `note_included(SessionIndex, CandidateHash, included_in: BlockNumber)`:
1. Add `(SessionIndex, CandidateHash)` to the `Included` map with `included_in - 1` as the value.
1. If there is a dispute under `(SessionIndex, CandidateHash)` that has concluded against the candidate, invoke `revert_and_freeze` with the stored block number.
1. If there is a dispute under `(SessionIndex, CandidateHash)` that has concluded against the candidate, invoke
`revert_and_freeze` with the stored block number.
* `concluded_invalid(SessionIndex, CandidateHash) -> bool`: Returns whether a candidate has already concluded a dispute in the negative.
* `concluded_invalid(SessionIndex, CandidateHash) -> bool`: Returns whether a candidate has already concluded a dispute
in the negative.
* `is_frozen()`: Load the value of `Frozen` from storage. Return true if `Some` and false if `None`.
* `last_valid_block()`: Load the value of `Frozen` from storage and return. None indicates that all blocks in the chain are potentially valid.
* `last_valid_block()`: Load the value of `Frozen` from storage and return. None indicates that all blocks in the chain
are potentially valid.
* `revert_and_freeze(BlockNumber)`:
1. If `is_frozen()` return.
1. Set `Frozen` to `Some(BlockNumber)` to indicate a rollback to the block number.
1. Issue a `Revert(BlockNumber + 1)` log to indicate a rollback of the block's child in the header chain, which is the same as a rollback to the block number.
1. Issue a `Revert(BlockNumber + 1)` log to indicate a rollback of the block's child in the header chain, which is the
same as a rollback to the block number.
# Disputes filtering
All disputes delivered to the runtime by the client are filtered before the actual import. In this context actual import
means persisted in the runtime storage. The filtering has got two purposes:
- Limit the amount of data saved onchain.
- Prevent persisting malicious dispute data onchain.
* Limit the amount of data saved onchain.
* Prevent persisting malicious dispute data onchain.
*Implementation note*: Filtering is performed in function `filter_dispute_data` from `Disputes` pallet.
The filtering is performed on the whole statement set which is about to be imported onchain. The following filters are
applied:
1. Remove ancient disputes - if a dispute is concluded before the block number indicated in `OLDEST_ACCEPTED` parameter
it is removed from the set. `OLDEST_ACCEPTED` is a runtime configuration option.
*Implementation note*: `dispute_post_conclusion_acceptance_period` from
`HostConfiguration` is used in the current Polkadot/Kusama implementation.
it is removed from the set. `OLDEST_ACCEPTED` is a runtime configuration option. *Implementation note*:
`dispute_post_conclusion_acceptance_period` from `HostConfiguration` is used in the current Polkadot/Kusama
implementation.
2. Remove votes from unknown validators. If there is a vote from a validator which wasn't an authority in the session
where the dispute was raised - they are removed. Please note that this step removes only single votes instead of
removing the whole dispute.
@@ -138,4 +167,4 @@ inconclusive disputes are not slashed. Thanks to the applied filtering (describe
confident that there are no spam disputes in the runtime. So if a validator is not voting it is due to another reason
(e.g. being under DoS attack). There is no reason to punish such validators with a slash.
*Implementation note*: Slashing is performed in `process_checked_dispute_data` from `Disputes` pallet.
*Implementation note*: Slashing is performed in `process_checked_dispute_data` from `Disputes` pallet.
@@ -28,7 +28,8 @@ No initialization routine runs for this module.
Candidate Acceptance Function:
* `check_processed_downward_messages(P: ParaId, relay_parent_number: BlockNumber, processed_downward_messages: u32)`:
1. Checks that `processed_downward_messages` is at least 1 if `DownwardMessageQueues` for `P` is not empty at the given `relay_parent_number`.
1. Checks that `processed_downward_messages` is at least 1 if `DownwardMessageQueues` for `P` is not empty at the
given `relay_parent_number`.
1. Checks that `DownwardMessageQueues` for `P` is at least `processed_downward_messages` long.
Candidate Enactment:
@@ -38,11 +39,11 @@ Candidate Enactment:
Utility routines.
`queue_downward_message(P: ParaId, M: DownwardMessage)`:
1. Check if the size of `M` exceeds the `config.max_downward_message_size`. If so, return an error.
1. Wrap `M` into `InboundDownwardMessage` using the current block number for `sent_at`.
1. Obtain a new MQC link for the resulting `InboundDownwardMessage` and replace `DownwardMessageQueueHeads` for `P` with the resulting hash.
1. Add the resulting `InboundDownwardMessage` into `DownwardMessageQueues` for `P`.
`queue_downward_message(P: ParaId, M: DownwardMessage)`: 1. Check if the size of `M` exceeds the
`config.max_downward_message_size`. If so, return an error. 1. Wrap `M` into `InboundDownwardMessage` using the
current block number for `sent_at`. 1. Obtain a new MQC link for the resulting `InboundDownwardMessage` and replace
`DownwardMessageQueueHeads` for `P` with the resulting hash. 1. Add the resulting `InboundDownwardMessage` into
`DownwardMessageQueues` for `P`.
## Session Change
@@ -1,6 +1,7 @@
# HRMP Pallet
A module responsible for Horizontally Relay-routed Message Passing (HRMP). See [Messaging Overview](../messaging.md) for more details.
A module responsible for Horizontally Relay-routed Message Passing (HRMP). See [Messaging Overview](../messaging.md) for
more details.
## Storage
@@ -132,7 +133,8 @@ Candidate Acceptance Function:
1. or in `HrmpChannelDigests` for `P` an entry with the block number should exist
* `check_outbound_hrmp(sender: ParaId, Vec<OutboundHrmpMessage>)`:
1. Checks that there are at most `config.hrmp_max_message_num_per_candidate` messages.
1. Checks that horizontal messages are sorted by ascending recipient ParaId and there is no two horizontal messages have the same recipient.
1. Checks that horizontal messages are sorted by ascending recipient ParaId and there is no two horizontal messages
have the same recipient.
1. For each horizontal message `M` with the channel `C` identified by `(sender, M.recipient)` check:
1. exists
1. `M`'s payload size doesn't exceed a preconfigured limit `C.max_message_size`
@@ -143,42 +145,48 @@ Candidate Enactment:
* `queue_outbound_hrmp(sender: ParaId, Vec<OutboundHrmpMessage>)`:
1. For each horizontal message `HM` with the channel `C` identified by `(sender, HM.recipient)`:
1. Append `HM` into `HrmpChannelContents` that corresponds to `C` with `sent_at` equals to the current block number.
1. Locate or create an entry in `HrmpChannelDigests` for `HM.recipient` and append `sender` into the entry's list.
1. Append `HM` into `HrmpChannelContents` that corresponds to `C` with `sent_at` equals to the current block
number.
1. Locate or create an entry in `HrmpChannelDigests` for `HM.recipient` and append `sender` into the entry's
list.
1. Increment `C.msg_count`
1. Increment `C.total_size` by `HM`'s payload size
1. Append a new link to the MQC and save the new head in `C.mqc_head`. Note that the current block number as of enactment is used for the link.
1. Append a new link to the MQC and save the new head in `C.mqc_head`. Note that the current block number as of
enactment is used for the link.
* `prune_hrmp(recipient, new_hrmp_watermark)`:
1. From `HrmpChannelDigests` for `recipient` remove all entries up to an entry with block number equal to `new_hrmp_watermark`.
1. From the removed digests construct a set of paras that sent new messages within the interval between the old and new watermarks.
1. For each channel `C` identified by `(sender, recipient)` for each `sender` coming from the set, prune messages up to the `new_hrmp_watermark`.
1. From `HrmpChannelDigests` for `recipient` remove all entries up to an entry with block number equal to
`new_hrmp_watermark`.
1. From the removed digests construct a set of paras that sent new messages within the interval between the old and
new watermarks.
1. For each channel `C` identified by `(sender, recipient)` for each `sender` coming from the set, prune messages up
to the `new_hrmp_watermark`.
1. For each pruned message `M` from channel `C`:
1. Decrement `C.msg_count`
1. Decrement `C.total_size` by `M`'s payload size.
1. Set `HrmpWatermarks` for `P` to be equal to `new_hrmp_watermark`
> NOTE: That collecting digests can be inefficient and the time it takes grows very fast. Thanks to the aggressive
> parameterization this shouldn't be a big of a deal.
> If that becomes a problem consider introducing an extra dictionary which says at what block the given sender
> sent a message to the recipient.
> parameterization this shouldn't be a big of a deal. If that becomes a problem consider introducing an extra
> dictionary which says at what block the given sender sent a message to the recipient.
## Entry-points
The following entry-points are meant to be used for HRMP channel management.
Those entry-points are meant to be called from a parachain. `origin` is defined as the `ParaId` of
the parachain executed the message.
Those entry-points are meant to be called from a parachain. `origin` is defined as the `ParaId` of the parachain
executed the message.
* `hrmp_init_open_channel(recipient, proposed_max_capacity, proposed_max_message_size)`:
1. Check that the `origin` is not `recipient`.
1. Check that `proposed_max_capacity` is less or equal to `config.hrmp_channel_max_capacity` and greater than zero.
1. Check that `proposed_max_message_size` is less or equal to `config.hrmp_channel_max_message_size` and greater than zero.
1. Check that `proposed_max_message_size` is less or equal to `config.hrmp_channel_max_message_size` and greater
than zero.
1. Check that `recipient` is a valid para.
1. Check that there is no existing channel for `(origin, recipient)` in `HrmpChannels`.
1. Check that there is no existing open channel request (`origin`, `recipient`) in `HrmpOpenChannelRequests`.
1. Check that the sum of the number of already opened HRMP channels by the `origin` (the size
of the set found `HrmpEgressChannelsIndex` for `origin`) and the number of open requests by the
`origin` (the value from `HrmpOpenChannelRequestCount` for `origin`) doesn't exceed the limit of
channels (`config.hrmp_max_parachain_outbound_channels` or `config.hrmp_max_parathread_outbound_channels`) minus 1.
1. Check that the sum of the number of already opened HRMP channels by the `origin` (the size of the set found
`HrmpEgressChannelsIndex` for `origin`) and the number of open requests by the `origin` (the value from
`HrmpOpenChannelRequestCount` for `origin`) doesn't exceed the limit of channels
(`config.hrmp_max_parachain_outbound_channels` or `config.hrmp_max_parathread_outbound_channels`) minus 1.
1. Check that `origin`'s balance is more or equal to `config.hrmp_sender_deposit`
1. Reserve the deposit for the `origin` according to `config.hrmp_sender_deposit`
1. Increase `HrmpOpenChannelRequestCount` by 1 for `origin`.
@@ -189,27 +197,26 @@ the parachain executed the message.
1. Set `max_message_size` to `proposed_max_message_size`
1. Set `max_total_size` to `config.hrmp_channel_max_total_size`
1. Send a downward message to `recipient` notifying about an inbound HRMP channel request.
- The DM is sent using `queue_downward_message`.
- The DM is represented by the `HrmpNewChannelOpenRequest` XCM message.
- `sender` is set to `origin`,
- `max_message_size` is set to `proposed_max_message_size`,
- `max_capacity` is set to `proposed_max_capacity`.
* The DM is sent using `queue_downward_message`.
* The DM is represented by the `HrmpNewChannelOpenRequest` XCM message.
* `sender` is set to `origin`,
* `max_message_size` is set to `proposed_max_message_size`,
* `max_capacity` is set to `proposed_max_capacity`.
* `hrmp_accept_open_channel(sender)`:
1. Check that there is an existing request between (`sender`, `origin`) in `HrmpOpenChannelRequests`
1. Check that it is not confirmed.
1. Check that the sum of the number of inbound HRMP channels opened to `origin` (the size of the set
found in `HrmpIngressChannelsIndex` for `origin`) and the number of accepted open requests by the `origin`
(the value from `HrmpAcceptedChannelRequestCount` for `origin`) doesn't exceed the limit of channels
(`config.hrmp_max_parachain_inbound_channels` or `config.hrmp_max_parathread_inbound_channels`)
minus 1.
1. Check that the sum of the number of inbound HRMP channels opened to `origin` (the size of the set found in
`HrmpIngressChannelsIndex` for `origin`) and the number of accepted open requests by the `origin` (the value from
`HrmpAcceptedChannelRequestCount` for `origin`) doesn't exceed the limit of channels
(`config.hrmp_max_parachain_inbound_channels` or `config.hrmp_max_parathread_inbound_channels`) minus 1.
1. Check that `origin`'s balance is more or equal to `config.hrmp_recipient_deposit`.
1. Reserve the deposit for the `origin` according to `config.hrmp_recipient_deposit`
1. For the request in `HrmpOpenChannelRequests` identified by `(sender, P)`, set `confirmed` flag to `true`.
1. Increase `HrmpAcceptedChannelRequestCount` by 1 for `origin`.
1. Send a downward message to `sender` notifying that the channel request was accepted.
- The DM is sent using `queue_downward_message`.
- The DM is represented by the `HrmpChannelAccepted` XCM message.
- `recipient` is set to `origin`.
* The DM is sent using `queue_downward_message`.
* The DM is represented by the `HrmpChannelAccepted` XCM message.
* `recipient` is set to `origin`.
* `hrmp_cancel_open_request(ch)`:
1. Check that `origin` is either `ch.sender` or `ch.recipient`
1. Check that the open channel request `ch` exists.
@@ -221,15 +228,15 @@ the parachain executed the message.
1. Check that `origin` is either `ch.sender` or `ch.recipient`
1. Check that `HrmpChannels` for `ch` exists.
1. Check that `ch` is not in the `HrmpCloseChannelRequests` set.
1. If not already there, insert a new entry `Some(())` to `HrmpCloseChannelRequests` for `ch`
and append `ch` to `HrmpCloseChannelRequestsList`.
1. If not already there, insert a new entry `Some(())` to `HrmpCloseChannelRequests` for `ch` and append `ch` to
`HrmpCloseChannelRequestsList`.
1. Send a downward message to the opposite party notifying about the channel closing.
- The DM is sent using `queue_downward_message`.
- The DM is represented by the `HrmpChannelClosing` XCM message with:
- `initator` is set to `origin`,
- `sender` is set to `ch.sender`,
- `recipient` is set to `ch.recipient`.
- The opposite party is `ch.sender` if `origin` is `ch.recipient` and `ch.recipient` if `origin` is `ch.sender`.
* The DM is sent using `queue_downward_message`.
* The DM is represented by the `HrmpChannelClosing` XCM message with:
* `initator` is set to `origin`,
* `sender` is set to `ch.sender`,
* `recipient` is set to `ch.recipient`.
* The opposite party is `ch.sender` if `origin` is `ch.recipient` and `ch.recipient` if `origin` is `ch.sender`.
## Session Change
@@ -241,13 +248,15 @@ the parachain executed the message.
1. Remove `HrmpOpenChannelRequests` and `HrmpOpenChannelRequestsList` for `(P, _)` and `(_, P)`.
1. For each removed channel request `C`:
1. Unreserve the sender's deposit if the sender is not present in `outgoing_paras`
1. Unreserve the recipient's deposit if `C` is confirmed and the recipient is not present in `outgoing_paras`
1. For each channel designator `D` in `HrmpOpenChannelRequestsList` we query the request `R` from `HrmpOpenChannelRequests`:
1. Unreserve the recipient's deposit if `C` is confirmed and the recipient is not present in
`outgoing_paras`
1. For each channel designator `D` in `HrmpOpenChannelRequestsList` we query the request `R` from
`HrmpOpenChannelRequests`:
1. if `R.confirmed = true`,
1. if both `D.sender` and `D.recipient` are not offboarded.
1. create a new channel `C` between `(D.sender, D.recipient)`.
1. Initialize the `C.sender_deposit` with `R.sender_deposit` and `C.recipient_deposit`
with the value found in the configuration `config.hrmp_recipient_deposit`.
1. Initialize the `C.sender_deposit` with `R.sender_deposit` and `C.recipient_deposit` with the value
found in the configuration `config.hrmp_recipient_deposit`.
1. Insert `sender` into the set `HrmpIngressChannelsIndex` for the `recipient`.
1. Insert `recipient` into the set `HrmpEgressChannelsIndex` for the `sender`.
1. decrement `HrmpOpenChannelRequestCount` for `D.sender` by 1.
@@ -1,6 +1,7 @@
# Inclusion Pallet
The inclusion module is responsible for inclusion and availability of scheduled parachains. It also manages the UMP dispatch queue of each parachain.
The inclusion module is responsible for inclusion and availability of scheduled parachains. It also manages the UMP
dispatch queue of each parachain.
## Storage
@@ -37,11 +38,9 @@ PendingAvailabilityCommitments: map ParaId => CandidateCommitments;
## Config Dependencies
* `MessageQueue`:
The message queue provides general queueing and processing functionality. Currently it
replaces the old `UMP` dispatch queue. Other use-cases can be implemented as well by
adding new variants to `AggregateMessageOrigin`. Normally it should be set to an instance
of the `MessageQueue` pallet.
* `MessageQueue`: The message queue provides general queueing and processing functionality. Currently it replaces the
old `UMP` dispatch queue. Other use-cases can be implemented as well by adding new variants to
`AggregateMessageOrigin`. Normally it should be set to an instance of the `MessageQueue` pallet.
## Session Change
@@ -49,11 +48,13 @@ PendingAvailabilityCommitments: map ParaId => CandidateCommitments;
1. Clear out all validator bitfields.
Optional:
1. The UMP queue of all outgoing paras can be "swept". This would prevent the dispatch queue from automatically being serviced. It is a consideration for the chain and specific behaviour is not defined.
1. The UMP queue of all outgoing paras can be "swept". This would prevent the dispatch queue from automatically being
serviced. It is a consideration for the chain and specific behaviour is not defined.
## Initialization
No initialization routine runs for this module. However, the initialization of the `MessageQueue` pallet will attempt to process any pending UMP messages.
No initialization routine runs for this module. However, the initialization of the `MessageQueue` pallet will attempt to
process any pending UMP messages.
## Routines
@@ -63,20 +64,19 @@ All failed checks should lead to an unrecoverable error making the block invalid
* `process_bitfields(expected_bits, Bitfields, core_lookup: Fn(CoreIndex) -> Option<ParaId>)`:
1. Call `sanitize_bitfields<true>` and use the sanitized `signed_bitfields` from now on.
1. Call `sanitize_backed_candidates<true>` and use the sanitized `backed_candidates` from now on.
1. Apply each bit of bitfield to the corresponding pending candidate, looking up on-demand parachain cores using the `core_lookup`. Disregard bitfields that have a `1` bit for any free cores.
1. For each applied bit of each availability-bitfield, set the bit for the validator in the `CandidatePendingAvailability`'s `availability_votes` bitfield. Track all candidates that now have >2/3 of bits set in their `availability_votes`. These candidates are now available and can be enacted.
1. Apply each bit of bitfield to the corresponding pending candidate, looking up on-demand parachain cores using the
`core_lookup`. Disregard bitfields that have a `1` bit for any free cores.
1. For each applied bit of each availability-bitfield, set the bit for the validator in the
`CandidatePendingAvailability`'s `availability_votes` bitfield. Track all candidates that now have >2/3 of bits set
in their `availability_votes`. These candidates are now available and can be enacted.
1. For all now-available candidates, invoke the `enact_candidate` routine with the candidate and relay-parent number.
1. Return a list of `(CoreIndex, CandidateHash)` from freed cores consisting of the cores where candidates have become available.
* `sanitize_bitfields<T: crate::inclusion::Config>(
unchecked_bitfields: UncheckedSignedAvailabilityBitfields,
disputed_bitfield: DisputedBitfield,
expected_bits: usize,
parent_hash: T::Hash,
session_index: SessionIndex,
validators: &[ValidatorId],
full_check: FullCheck,
)`:
1. check that `disputed_bitfield` has the same number of bits as the `expected_bits`, iff not return early with an empty vec.
1. Return a list of `(CoreIndex, CandidateHash)` from freed cores consisting of the cores where candidates have become
available.
* `sanitize_bitfields<T: crate::inclusion::Config>( unchecked_bitfields: UncheckedSignedAvailabilityBitfields,
disputed_bitfield: DisputedBitfield, expected_bits: usize, parent_hash: T::Hash, session_index: SessionIndex,
validators: &[ValidatorId], full_check: FullCheck, )`:
1. check that `disputed_bitfield` has the same number of bits as the `expected_bits`, iff not return early with an
empty vec.
1. each of the below checks is for each bitfield. If a check does not pass the bitfield will be skipped.
1. check that there are no bits set that reference a disputed candidate.
1. check that the number of bits is equal to `expected_bits`.
@@ -84,41 +84,58 @@ All failed checks should lead to an unrecoverable error making the block invalid
1. check that the validator bit index is not out of bounds.
1. check the validators signature, iff `full_check=FullCheck::Yes`.
* `sanitize_backed_candidates<T: crate::inclusion::Config, F: FnMut(usize, &BackedCandidate<T::Hash>) -> bool>(
mut backed_candidates: Vec<BackedCandidate<T::Hash>>,
candidate_has_concluded_invalid_dispute: F,
scheduled: &[CoreAssignment],
) `
* `sanitize_backed_candidates<T: crate::inclusion::Config, F: FnMut(usize, &BackedCandidate<T::Hash>) -> bool>( mut
backed_candidates: Vec<BackedCandidate<T::Hash>>, candidate_has_concluded_invalid_dispute: F, scheduled:
&[CoreAssignment], )`
1. filter out any backed candidates that have concluded invalid.
1. filters backed candidates whom's paraid was scheduled by means of the provided `scheduled` parameter.
1. sorts remaining candidates with respect to the core index assigned to them.
* `process_candidates(allowed_relay_parents, BackedCandidates, scheduled: Vec<CoreAssignment>, group_validators: Fn(GroupIndex) -> Option<Vec<ValidatorIndex>>)`:
* `process_candidates(allowed_relay_parents, BackedCandidates, scheduled: Vec<CoreAssignment>, group_validators:
Fn(GroupIndex) -> Option<Vec<ValidatorIndex>>)`:
> For details on `AllowedRelayParentsTracker` see documentation for [Shared](./shared.md) module.
1. check that each candidate corresponds to a scheduled core and that they are ordered in the same order the cores appear in assignments in `scheduled`.
1. check that each candidate corresponds to a scheduled core and that they are ordered in the same order the cores
appear in assignments in `scheduled`.
1. check that `scheduled` is sorted ascending by `CoreIndex`, without duplicates.
1. check that the relay-parent from each candidate receipt is one of the allowed relay-parents.
1. check that there is no candidate pending availability for any scheduled `ParaId`.
1. check that each candidate's `validation_data_hash` corresponds to a `PersistedValidationData` computed from the state of the context block.
1. check that each candidate's `validation_data_hash` corresponds to a `PersistedValidationData` computed from the
state of the context block.
1. If the core assignment includes a specific collator, ensure the backed candidate is issued by that collator.
1. Ensure that any code upgrade scheduled by the candidate does not happen within `config.validation_upgrade_cooldown` of `Paras::last_code_upgrade(para_id, true)`, if any, comparing against the value of `Paras::FutureCodeUpgrades` for the given para ID.
1. Ensure that any code upgrade scheduled by the candidate does not happen within `config.validation_upgrade_cooldown`
of `Paras::last_code_upgrade(para_id, true)`, if any, comparing against the value of `Paras::FutureCodeUpgrades`
for the given para ID.
1. Check the collator's signature on the candidate data.
1. check the backing of the candidate using the signatures and the bitfields, comparing against the validators assigned to the groups, fetched with the `group_validators` lookup, while group indices are computed by `Scheduler` according to group rotation info.
1. call `check_upward_messages(config, para, commitments.upward_messages)` to check that the upward messages are valid.
1. call `Dmp::check_processed_downward_messages(para, commitments.processed_downward_messages)` to check that the DMQ is properly drained.
1. call `Hrmp::check_hrmp_watermark(para, commitments.hrmp_watermark)` for each candidate to check rules of processing the HRMP watermark.
1. using `Hrmp::check_outbound_hrmp(sender, commitments.horizontal_messages)` ensure that the each candidate sent a valid set of horizontal messages
1. create an entry in the `PendingAvailability` map for each backed candidate with a blank `availability_votes` bitfield.
1. check the backing of the candidate using the signatures and the bitfields, comparing against the validators
assigned to the groups, fetched with the `group_validators` lookup, while group indices are computed by `Scheduler`
according to group rotation info.
1. call `check_upward_messages(config, para, commitments.upward_messages)` to check that the upward messages are
valid.
1. call `Dmp::check_processed_downward_messages(para, commitments.processed_downward_messages)` to check that the DMQ
is properly drained.
1. call `Hrmp::check_hrmp_watermark(para, commitments.hrmp_watermark)` for each candidate to check rules of processing
the HRMP watermark.
1. using `Hrmp::check_outbound_hrmp(sender, commitments.horizontal_messages)` ensure that the each candidate sent a
valid set of horizontal messages
1. create an entry in the `PendingAvailability` map for each backed candidate with a blank `availability_votes`
bitfield.
1. create a corresponding entry in the `PendingAvailabilityCommitments` with the commitments.
1. Return a `Vec<CoreIndex>` of all scheduled cores of the list of passed assignments that a candidate was successfully backed for, sorted ascending by CoreIndex.
1. Return a `Vec<CoreIndex>` of all scheduled cores of the list of passed assignments that a candidate was
successfully backed for, sorted ascending by CoreIndex.
* `enact_candidate(relay_parent_number: BlockNumber, CommittedCandidateReceipt)`:
1. If the receipt contains a code upgrade, Call `Paras::schedule_code_upgrade(para_id, code, relay_parent_number, config)`.
> TODO: Note that this is safe as long as we never enact candidates where the relay parent is across a session boundary. In that case, which we should be careful to avoid with contextual execution, the configuration might have changed and the para may de-sync from the host's understanding of it.
1. If the receipt contains a code upgrade, Call `Paras::schedule_code_upgrade(para_id, code, relay_parent_number,
config)`.
> TODO: Note that this is safe as long as we never enact candidates where the relay parent is across a session
> boundary. In that case, which we should be careful to avoid with contextual execution, the configuration might
> have changed and the para may de-sync from the host's understanding of it.
1. Reward all backing validators of each candidate, contained within the `backers` field.
1. call `receive_upward_messages` for each backed candidate, using the [`UpwardMessage`s](../types/messages.md#upward-message) from the [`CandidateCommitments`](../types/candidate.md#candidate-commitments).
1. call `receive_upward_messages` for each backed candidate, using the
[`UpwardMessage`s](../types/messages.md#upward-message) from the
[`CandidateCommitments`](../types/candidate.md#candidate-commitments).
1. call `Dmp::prune_dmq` with the para id of the candidate and the candidate's `processed_downward_messages`.
1. call `Hrmp::prune_hrmp` with the para id of the candiate and the candidate's `hrmp_watermark`.
1. call `Hrmp::queue_outbound_hrmp` with the para id of the candidate and the list of horizontal messages taken from the commitment,
1. call `Hrmp::queue_outbound_hrmp` with the para id of the candidate and the list of horizontal messages taken from
the commitment,
1. Call `Paras::note_new_head` using the `HeadData` from the receipt and `relay_parent_number`.
* `collect_pending`:
@@ -130,21 +147,28 @@ All failed checks should lead to an unrecoverable error making the block invalid
// return a vector of cleaned-up core IDs.
}
```
* `force_enact(ParaId)`: Forcibly enact the candidate with the given ID as though it had been deemed available by bitfields. Is a no-op if there is no candidate pending availability for this para-id. This should generally not be used but it is useful during execution of Runtime APIs, where the changes to the state are expected to be discarded directly after.
* `candidate_pending_availability(ParaId) -> Option<CommittedCandidateReceipt>`: returns the `CommittedCandidateReceipt` pending availability for the para provided, if any.
* `pending_availability(ParaId) -> Option<CandidatePendingAvailability>`: returns the metadata around the candidate pending availability for the para, if any.
* `collect_disputed(disputed: Vec<CandidateHash>) -> Vec<CoreIndex>`: Sweeps through all paras pending availability. If the candidate hash is one of the disputed candidates, then clean up the corresponding storage for that candidate and the commitments. Return a vector of cleaned-up core IDs.
* `force_enact(ParaId)`: Forcibly enact the candidate with the given ID as though it had been deemed available by
bitfields. Is a no-op if there is no candidate pending availability for this para-id. This should generally not be
used but it is useful during execution of Runtime APIs, where the changes to the state are expected to be discarded
directly after.
* `candidate_pending_availability(ParaId) -> Option<CommittedCandidateReceipt>`: returns the `CommittedCandidateReceipt`
pending availability for the para provided, if any.
* `pending_availability(ParaId) -> Option<CandidatePendingAvailability>`: returns the metadata around the candidate
pending availability for the para, if any.
* `collect_disputed(disputed: Vec<CandidateHash>) -> Vec<CoreIndex>`: Sweeps through all paras pending availability. If
the candidate hash is one of the disputed candidates, then clean up the corresponding storage for that candidate and
the commitments. Return a vector of cleaned-up core IDs.
These functions were formerly part of the UMP pallet:
* `check_upward_messages(P: ParaId, Vec<UpwardMessage>)`:
1. Checks that the parachain is not currently offboarding and error otherwise.
1. Checks that the parachain is not currently offboarding and error otherwise.
1. Checks that there are at most `config.max_upward_message_num_per_candidate` messages to be enqueued.
1. Checks that no message exceeds `config.max_upward_message_size`.
1. Checks that the total resulting queue size would not exceed `co`.
1. Verify that queuing up the messages could not result in exceeding the queue's footprint
according to the config items `config.max_upward_queue_count` and `config.max_upward_queue_size`. The queue's current footprint is provided in `well_known_keys`
in order to facilitate oraclisation on to the para.
1. Verify that queuing up the messages could not result in exceeding the queue's footprint according to the config
items `config.max_upward_queue_count` and `config.max_upward_queue_size`. The queue's current footprint is provided
in `well_known_keys` in order to facilitate oraclisation on to the para.
Candidate Enactment:
@@ -1,6 +1,7 @@
# Initializer Pallet
This module is responsible for initializing the other modules in a deterministic order. It also has one other purpose as described in the overview of the runtime: accepting and forwarding session change notifications.
This module is responsible for initializing the other modules in a deterministic order. It also has one other purpose as
described in the overview of the runtime: accepting and forwarding session change notifications.
## Storage
@@ -15,7 +16,9 @@ BufferedSessionChanges: Vec<(BlockNumber, ValidatorSet, ValidatorSet)>;
## Initialization
Before initializing modules, remove all changes from the `BufferedSessionChanges` with number less than or equal to the current block number, and apply the last one. The session change is applied to all modules in the same order as initialization.
Before initializing modules, remove all changes from the `BufferedSessionChanges` with number less than or equal to the
current block number, and apply the last one. The session change is applied to all modules in the same order as
initialization.
The other parachains modules are initialized in this order:
@@ -30,16 +33,24 @@ The other parachains modules are initialized in this order:
1. UMP
1. HRMP
The [Configuration Module](configuration.md) is first, since all other modules need to operate under the same configuration as each other. Then the [Shared](shared.md) module is invoked, which determines the set of active validators. It would lead to inconsistency if, for example, the scheduler ran first and then the configuration was updated before the Inclusion module.
The [Configuration Module](configuration.md) is first, since all other modules need to operate under the same
configuration as each other. Then the [Shared](shared.md) module is invoked, which determines the set of active
validators. It would lead to inconsistency if, for example, the scheduler ran first and then the configuration was
updated before the Inclusion module.
Set `HasInitialized` to true.
## Session Change
Store the session change information in `BufferedSessionChange` along with the block number at which it was submitted, plus one. Although the expected operational parameters of the block authorship system should prevent more than one change from being buffered at any time, it may occur. Regardless, we always need to track the block number at which the session change can be applied so as to remain flexible over session change notifications being issued before or after initialization of the current block.
Store the session change information in `BufferedSessionChange` along with the block number at which it was submitted,
plus one. Although the expected operational parameters of the block authorship system should prevent more than one
change from being buffered at any time, it may occur. Regardless, we always need to track the block number at which the
session change can be applied so as to remain flexible over session change notifications being issued before or after
initialization of the current block.
## Finalization
Finalization order is less important in this case than initialization order, so we finalize the modules in the reverse order from initialization.
Finalization order is less important in this case than initialization order, so we finalize the modules in the reverse
order from initialization.
Set `HasInitialized` to false.
@@ -1,15 +1,27 @@
# `ParaInherent`
This module is responsible for providing all data given to the runtime by the block author to the various parachains modules. The entry-point is mandatory, in that it must be invoked exactly once within every block, and it is also "inherent", in that it is provided with no origin by the block author. The data within it carries its own authentication; i.e. the data takes the form of signed statements by validators. Invalid data will be filtered and not applied.
This module is responsible for providing all data given to the runtime by the block author to the various parachains
modules. The entry-point is mandatory, in that it must be invoked exactly once within every block, and it is also
"inherent", in that it is provided with no origin by the block author. The data within it carries its own
authentication; i.e. the data takes the form of signed statements by validators. Invalid data will be filtered and not
applied.
This module does not have the same initialization/finalization concerns as the others, as it only requires that entry points be triggered after all modules have initialized and that finalization happens after entry points are triggered. Both of these are assumptions we have already made about the runtime's order of operations, so this module doesn't need to be initialized or finalized by the `Initializer`.
This module does not have the same initialization/finalization concerns as the others, as it only requires that entry
points be triggered after all modules have initialized and that finalization happens after entry points are triggered.
Both of these are assumptions we have already made about the runtime's order of operations, so this module doesn't need
to be initialized or finalized by the `Initializer`.
There are a couple of important notes to the operations in this inherent as they relate to disputes.
1. We don't accept bitfields or backed candidates if in "governance-only" mode from having a local dispute conclude on this fork.
1. When disputes are initiated, we remove the block from pending availability. This allows us to roll back chains to the block before blocks are included as opposed to backing. It's important to do this before processing bitfields.
1. `Inclusion::collect_disputed` is kind of expensive so it's important to gate this on whether there are actually any new disputes. Which should be never.
1. And we don't accept parablocks that have open disputes or disputes that have concluded against the candidate. It's important to import dispute statements before backing, but this is already the case as disputes are imported before processing bitfields.
1. We don't accept bitfields or backed candidates if in "governance-only" mode from having a local dispute conclude on
this fork.
1. When disputes are initiated, we remove the block from pending availability. This allows us to roll back chains to the
block before blocks are included as opposed to backing. It's important to do this before processing bitfields.
1. `Inclusion::collect_disputed` is kind of expensive so it's important to gate this on whether there are actually any
new disputes. Which should be never.
1. And we don't accept parablocks that have open disputes or disputes that have concluded against the candidate. It's
important to import dispute statements before backing, but this is already the case as disputes are imported before
processing bitfields.
## Storage
@@ -32,26 +44,19 @@ OnChainVotes: Option<ScrapedOnChainVotes>,
* `enter`: This entry-point accepts one parameter: [`ParaInherentData`](../types/runtime.md#ParaInherentData).
* `create_inherent`: This entry-point accepts one parameter: `InherentData`.
Both entry points share mostly the same code. `create_inherent` will
meaningfully limit inherent data to adhere to the weight limit, in addition to
sanitizing any inputs and filtering out invalid data. Conceptually it is part of
the block production. The `enter` call on the other hand is part of block import
and consumes/imports the data previously produced by `create_inherent`.
Both entry points share mostly the same code. `create_inherent` will meaningfully limit inherent data to adhere to the
weight limit, in addition to sanitizing any inputs and filtering out invalid data. Conceptually it is part of the block
production. The `enter` call on the other hand is part of block import and consumes/imports the data previously produced
by `create_inherent`.
In practice both calls process inherent data and apply it to the state. Block
production and block import should arrive at the same new state. Hence we re-use
the same logic to ensure this is the case.
In practice both calls process inherent data and apply it to the state. Block production and block import should arrive
at the same new state. Hence we re-use the same logic to ensure this is the case.
The only real difference between the two is, that on `create_inherent` we
actually need the processed and filtered inherent data to build the block, while
on `enter` the processed data should for one be identical to the incoming
inherent data (assuming honest block producers) and second it is irrelevant, as
we are not building a block but just processing it, so the processed inherent
data is simply dropped.
This also means that the `enter` function keeps data around for no good reason.
This seems acceptable though as the size of a block is rather limited.
Nevertheless if we ever wanted to optimize this we can easily implement an
inherent collector that has two implementations, where one clones and stores the
data and the other just passes it on.
The only real difference between the two is, that on `create_inherent` we actually need the processed and filtered
inherent data to build the block, while on `enter` the processed data should for one be identical to the incoming
inherent data (assuming honest block producers) and second it is irrelevant, as we are not building a block but just
processing it, so the processed inherent data is simply dropped.
This also means that the `enter` function keeps data around for no good reason. This seems acceptable though as the size
of a block is rather limited. Nevertheless if we ever wanted to optimize this we can easily implement an inherent
collector that has two implementations, where one clones and stores the data and the other just passes it on.
@@ -1,14 +1,12 @@
# Paras Pallet
The Paras module is responsible for storing information on parachains. Registered
parachains cannot change except at session boundaries and after at least a full
session has passed. This is primarily to ensure that the number and meaning of bits required for the
availability bitfields does not change except at session boundaries.
The Paras module is responsible for storing information on parachains. Registered parachains cannot change except at
session boundaries and after at least a full session has passed. This is primarily to ensure that the number and meaning
of bits required for the availability bitfields does not change except at session boundaries.
It's also responsible for:
- managing parachain validation code upgrades as well as maintaining availability of old parachain
code and its pruning.
- managing parachain validation code upgrades as well as maintaining availability of old parachain code and its pruning.
- vetting PVFs by means of the PVF pre-checking mechanism.
## Storage
@@ -102,8 +100,8 @@ struct PvfCheckActiveVoteState {
#### Para Lifecycle
Because the state changes of parachains are delayed, we track the specific state of
the para using the `ParaLifecycle` enum.
Because the state changes of parachains are delayed, we track the specific state of the para using the `ParaLifecycle`
enum.
```
None Parathread (on-demand parachain) Parachain
@@ -132,8 +130,8 @@ None Parathread (on-demand parachain) Parachain
+ + +
```
Note that if PVF pre-checking is enabled, onboarding of a para may potentially be delayed. This can
happen due to PVF pre-checking voting concluding late.
Note that if PVF pre-checking is enabled, onboarding of a para may potentially be delayed. This can happen due to PVF
pre-checking voting concluding late.
During the transition period, the para object is still considered in its existing state.
@@ -210,7 +208,7 @@ UpcomingUpgrades: Vec<(ParaId, BlockNumberFor<T>)>;
ActionsQueue: map SessionIndex => Vec<ParaId>;
/// Upcoming paras instantiation arguments.
///
/// NOTE that after PVF pre-checking is enabled the para genesis arg will have it's code set
/// NOTE that after PVF pre-checking is enabled the para genesis arg will have it's code set
/// to empty. Instead, the code will be saved into the storage right away via `CodeByHash`.
UpcomingParasGenesis: map ParaId => Option<ParaGenesisArgs>;
/// The number of references on the validation code in `CodeByHash` storage.
@@ -223,12 +221,13 @@ CodeByHash: map ValidationCodeHash => Option<ValidationCode>
1. Execute all queued actions for paralifecycle changes:
1. Clean up outgoing paras.
1. This means removing the entries under `Heads`, `CurrentCode`, `FutureCodeUpgrades`,
`FutureCode` and `MostRecentContext`. An according entry should be added to `PastCode`, `PastCodeMeta`, and `PastCodePruning` using the outgoing `ParaId` and removed `CurrentCode` value. This is because any outdated validation code must remain available on-chain for a determined amount
of blocks, and validation code outdated by de-registering the para is still subject to that
invariant.
1. Apply all incoming paras by initializing the `Heads` and `CurrentCode` using the genesis
parameters as well as `MostRecentContext` to `0`.
1. This means removing the entries under `Heads`, `CurrentCode`, `FutureCodeUpgrades`, `FutureCode` and
`MostRecentContext`. An according entry should be added to `PastCode`, `PastCodeMeta`, and `PastCodePruning`
using the outgoing `ParaId` and removed `CurrentCode` value. This is because any outdated validation code must
remain available on-chain for a determined amount of blocks, and validation code outdated by de-registering the
para is still subject to that invariant.
1. Apply all incoming paras by initializing the `Heads` and `CurrentCode` using the genesis parameters as well as
`MostRecentContext` to `0`.
1. Amend the `Parachains` list and `ParaLifecycle` to reflect changes in registered parachains.
1. Amend the `ParaLifecycle` set to reflect changes in registered on-demand parachains.
1. Upgrade all on-demand parachains that should become lease holding parachains, updating the `Parachains` list and
@@ -239,40 +238,50 @@ CodeByHash: map ValidationCodeHash => Option<ValidationCode>
1. Go over all active PVF pre-checking votes:
1. Increment `age` of the vote.
1. If `age` reached `cfg.pvf_voting_ttl`, then enact PVF rejection and remove the vote from the active list.
1. Otherwise, reinitialize the ballots.
1. Resize the `votes_accept`/`votes_reject` to have the same length as the incoming validator set.
1. Zero all the votes.
1. Otherwise, reinitialize the ballots. 1. Resize the `votes_accept`/`votes_reject` to have the same length as the
incoming validator set. 1. Zero all the votes.
## Initialization
1. Do pruning based on all entries in `PastCodePruning` with `BlockNumber <= now`. Update the
corresponding `PastCodeMeta` and `PastCode` accordingly.
1. Do pruning based on all entries in `PastCodePruning` with `BlockNumber <= now`. Update the corresponding
`PastCodeMeta` and `PastCode` accordingly.
1. Toggle the upgrade related signals
1. Collect all `(para_id, expected_at)` from `UpcomingUpgrades` where `expected_at <= now` and prune them. For each para pruned set `UpgradeGoAheadSignal` to `GoAhead`. Reserve weight for the state modification to upgrade each para pruned.
1. Collect all `(para_id, next_possible_upgrade_at)` from `UpgradeCooldowns` where `next_possible_upgrade_at <= now`. For each para obtained this way reserve weight to remove its `UpgradeRestrictionSignal` on finalization.
1. Collect all `(para_id, expected_at)` from `UpcomingUpgrades` where `expected_at <= now` and prune them. For each
para pruned set `UpgradeGoAheadSignal` to `GoAhead`. Reserve weight for the state modification to upgrade each para
pruned.
1. Collect all `(para_id, next_possible_upgrade_at)` from `UpgradeCooldowns` where `next_possible_upgrade_at <= now`.
For each para obtained this way reserve weight to remove its `UpgradeRestrictionSignal` on finalization.
## Routines
* `schedule_para_initialize(ParaId, ParaGenesisArgs)`: Schedule a para to be initialized at the next
session. Noop if para is already registered in the system with some `ParaLifecycle`.
* `schedule_para_cleanup(ParaId)`: Schedule a para to be cleaned up after the next full session.
* `schedule_parathread_upgrade(ParaId)`: Schedule a parathread (on-demand parachain) to be upgraded to a parachain.
* `schedule_parachain_downgrade(ParaId)`: Schedule a parachain to be downgraded from lease holding to on-demand.
* `schedule_code_upgrade(ParaId, new_code, relay_parent: BlockNumber, HostConfiguration)`: Schedule a future code
upgrade of the given parachain. In case the PVF pre-checking is disabled, or the new code is already present in the storage, the upgrade will be applied after inclusion of a block of the same parachain
executed in the context of a relay-chain block with number >= `relay_parent + config.validation_upgrade_delay`. If the upgrade is scheduled `UpgradeRestrictionSignal` is set and it will remain set until `relay_parent + config.validation_upgrade_cooldown`.
In case the PVF pre-checking is enabled, or the new code is not already present in the storage, then the PVF pre-checking run will be scheduled for that validation code. If the pre-checking concludes with rejection, then the upgrade is canceled. Otherwise, after pre-checking is concluded the upgrade will be scheduled and be enacted as described above.
* `note_new_head(ParaId, HeadData, BlockNumber)`: note that a para has progressed to a new head,
where the new head was executed in the context of a relay-chain block with given number, the latter value is inserted into the `MostRecentContext` mapping. This will apply pending code upgrades based on the block number provided. If an upgrade took place it will clear the `UpgradeGoAheadSignal`.
* `lifecycle(ParaId) -> Option<ParaLifecycle>`: Return the `ParaLifecycle` of a para.
* `is_parachain(ParaId) -> bool`: Returns true if the para ID references any live lease holding parachain,
including those which may be transitioning to an on-demand parachain in the future.
* `is_parathread(ParaId) -> bool`: Returns true if the para ID references any live parathread (on-demand parachain),
- `schedule_para_initialize(ParaId, ParaGenesisArgs)`: Schedule a para to be initialized at the next session. Noop if
para is already registered in the system with some `ParaLifecycle`.
- `schedule_para_cleanup(ParaId)`: Schedule a para to be cleaned up after the next full session.
- `schedule_parathread_upgrade(ParaId)`: Schedule a parathread (on-demand parachain) to be upgraded to a parachain.
- `schedule_parachain_downgrade(ParaId)`: Schedule a parachain to be downgraded from lease holding to on-demand.
- `schedule_code_upgrade(ParaId, new_code, relay_parent: BlockNumber, HostConfiguration)`: Schedule a future code
upgrade of the given parachain. In case the PVF pre-checking is disabled, or the new code is already present in the
storage, the upgrade will be applied after inclusion of a block of the same parachain executed in the context of a
relay-chain block with number >= `relay_parent + config.validation_upgrade_delay`. If the upgrade is scheduled
`UpgradeRestrictionSignal` is set and it will remain set until `relay_parent + config.validation_upgrade_cooldown`. In
case the PVF pre-checking is enabled, or the new code is not already present in the storage, then the PVF pre-checking
run will be scheduled for that validation code. If the pre-checking concludes with rejection, then the upgrade is
canceled. Otherwise, after pre-checking is concluded the upgrade will be scheduled and be enacted as described above.
- `note_new_head(ParaId, HeadData, BlockNumber)`: note that a para has progressed to a new head, where the new head was
executed in the context of a relay-chain block with given number, the latter value is inserted into the
`MostRecentContext` mapping. This will apply pending code upgrades based on the block number provided. If an upgrade
took place it will clear the `UpgradeGoAheadSignal`.
- `lifecycle(ParaId) -> Option<ParaLifecycle>`: Return the `ParaLifecycle` of a para.
- `is_parachain(ParaId) -> bool`: Returns true if the para ID references any live lease holding parachain, including
those which may be transitioning to an on-demand parachain in the future.
- `is_parathread(ParaId) -> bool`: Returns true if the para ID references any live parathread (on-demand parachain),
including those which may be transitioning to a lease holding parachain in the future.
* `is_valid_para(ParaId) -> bool`: Returns true if the para ID references either a live on-demand parachain
or live lease holding parachain.
* `can_upgrade_validation_code(ParaId) -> bool`: Returns true if the given para can signal code upgrade right now.
* `pvfs_require_prechecking() -> Vec<ValidationCodeHash>`: Returns the list of PVF validation code hashes that require PVF pre-checking votes.
- `is_valid_para(ParaId) -> bool`: Returns true if the para ID references either a live on-demand parachain or live
lease holding parachain.
- `can_upgrade_validation_code(ParaId) -> bool`: Returns true if the given para can signal code upgrade right now.
- `pvfs_require_prechecking() -> Vec<ValidationCodeHash>`: Returns the list of PVF validation code hashes that require
PVF pre-checking votes.
## Finalization
Collect all `(para_id, next_possible_upgrade_at)` from `UpgradeCooldowns` where `next_possible_upgrade_at <= now` and prune them. For each para pruned remove its `UpgradeRestrictionSignal`.
Collect all `(para_id, next_possible_upgrade_at)` from `UpgradeCooldowns` where `next_possible_upgrade_at <= now` and
prune them. For each para pruned remove its `UpgradeRestrictionSignal`.

Some files were not shown because too many files have changed in this diff Show More