Convert guide from single markdown file to mdbook (#1247)

* move old implementers' guide, add skeleton of new * Split the old implementers' guide into the new one's sections This is mostly a straightforward copying operation, moving the appropriate sections from the old guide to the new. However, there are certain differences between the old text and the new: - removed horizontal rules between the sections - promoted headers appropriately within each section - deleted certain sections which were in the old guide's ToC but which were not actually present in the old guide. - added Peer Set Manager to the new ToC * remove description headers It is redundant and unnecessary. Descriptions fall directly under the top-level header for any given section. * add stub description of the backing module * add stub description for the availability module * add stub description for collators * add stub description for validity * add stub description for utility * highlight TODO and REVIEW comments * add guide readme describing how to use mdbook * fix markdownlint lints * re-title parachains overview * internal linking for types * module and subsystem internal links * .gitignore should have a trailing newline * node does not have modules, just subsystems
2026-05-31 01:41:03 +00:00 · 2020-06-11 17:04:23 +02:00
parent 41ef46e60b
commit 053bfc2d0c
44 changed files with 1816 additions and 1947 deletions
@@ -0,0 +1,13 @@
+# Node Architecture
+
+## Design Goals
+
+* Modularity: Components of the system should be as self-contained as possible. Communication boundaries between components should be well-defined and mockable. This is key to creating testable, easily reviewable code.
+* Minimizing side effects: Components of the system should aim to minimize side effects and to communicate with other components via message-passing.
+* Operational Safety: The software will be managing signing keys where conflicting messages can lead to large amounts of value to be slashed. Care should be taken to ensure that no messages are signed incorrectly or in conflict with each other.
+
+The architecture of the node-side behavior aims to embody the Rust principles of ownership and message-passing to create clean, isolatable code. Each resource should have a single owner, with minimal sharing where unavoidable.
+
+Many operations that need to be carried out involve the network, which is asynchronous. This asynchrony affects all core subsystems that rely on the network as well. The approach of hierarchical state machines is well-suited to this kind of environment.
+
+We introduce a hierarchy of state machines consisting of an overseer supervising subsystems, where Subsystems can contain their own internal hierarchy of jobs. This is elaborated on in the next section on Subsystems.
@@ -0,0 +1,3 @@
+# Availability Subsystems
+
+The availability subsystems are responsible for ensuring that Proofs of Validity of backed candidates are widely available within the validator set, without requiring every node to retain a full copy. They accomplish this by broadly distributing erasure-coded chunks of the PoV, keeping track of which validator has which chunk by means of signed bitfields. They are also responsible for reassembling a complete PoV when required, e.g. when a fisherman reports a potentially invalid block.
@@ -0,0 +1,41 @@
+# Availability Distribution
+
+Distribute availability erasure-coded chunks to validators.
+
+After a candidate is backed, the availability of the PoV block must be confirmed by 2/3+ of all validators. Validating a candidate successfully and contributing it to being backable leads to the PoV and erasure-coding being stored in the [Availability Store](/node/utility/availability-store.html).
+
+## Protocol
+
+`ProtocolId`:`b"avad"`
+
+Input:
+
+- NetworkBridgeUpdate(update)
+
+Output:
+
+- NetworkBridge::RegisterEventProducer(`ProtocolId`)
+- NetworkBridge::SendMessage(`[PeerId]`, `ProtocolId`, `Bytes`)
+- NetworkBridge::ReportPeer(PeerId, cost_or_benefit)
+- AvailabilityStore::QueryPoV(candidate_hash, response_channel)
+- AvailabilityStore::StoreChunk(candidate_hash, chunk_index, inclusion_proof, chunk_data)
+
+## Functionality
+
+Register on startup an event producer with  `NetworkBridge::RegisterEventProducer`.
+
+For each relay-parent in our local view update, look at all backed candidates pending availability. Distribute via gossip all erasure chunks for all candidates that we have to peers.
+
+We define an operation `live_candidates(relay_heads) -> Set<AbridgedCandidateReceipt>` which returns a set of candidates a given set of relay chain heads that implies a set of candidates whose availability chunks should be currently gossiped. This is defined as all candidates pending availability in any of those relay-chain heads or any of their last `K` ancestors. We assume that state is not pruned within `K` blocks of the chain-head.
+
+We will send any erasure-chunks that correspond to candidates in `live_candidates(peer_most_recent_view_update)`. Likewise, we only accept and forward messages pertaining to a candidate in `live_candidates(current_heads)`. Each erasure chunk should be accompanied by a merkle proof that it is committed to by the erasure trie root in the candidate receipt, and this gossip system is responsible for checking such proof.
+
+We re-attempt to send anything live to a peer upon any view update from that peer.
+
+On our view change, for all live candidates, we will check if we have the PoV by issuing a `QueryPoV` message and waiting for the response. If the query returns `Some`, we will perform the erasure-coding and distribute all messages to peers that will accept them.
+
+If we are operating as a validator, we note our index `i` in the validator set and keep the `i`th availability chunk for any live candidate, as we receive it. We keep the chunk and its merkle proof in the [Availability Store](/node/utility/availability-store.html) by sending a `StoreChunk` command. This includes chunks and proofs generated as the result of a successful `QueryPoV`.
+
+> TODO: back-and-forth is kind of ugly but drastically simplifies the pruning in the availability store, as it creates an invariant that chunks are only stored if the candidate was actually backed
+>
+> K=3?
@@ -0,0 +1,25 @@
+# Bitfield Distribution
+
+Validators vote on the availability of a backed candidate by issuing signed bitfields, where each bit corresponds to a single candidate. These bitfields can be used to compactly determine which backed candidates are available or not based on a 2/3+ quorum.
+
+## Protocol
+
+`ProtocolId`: `b"bitd"`
+
+Input:
+
+- `DistributeBitfield(relay_parent, SignedAvailabilityBitfield)`: distribute a bitfield via gossip to other validators.
+- `NetworkBridgeUpdate(NetworkBridgeUpdate)`
+
+Output:
+
+- `NetworkBridge::RegisterEventProducer(ProtocolId)`
+- `NetworkBridge::SendMessage([PeerId], ProtocolId, Bytes)`
+- `NetworkBridge::ReportPeer(PeerId, cost_or_benefit)`
+- `BlockAuthorshipProvisioning::Bitfield(relay_parent, SignedAvailabilityBitfield)`
+
+## Functionality
+
+This is implemented as a gossip system. Register a [network bridge](/node/utility/network-bridge.html) event producer on startup and track peer connection, view change, and disconnection events. Only accept bitfields relevant to our current view and only distribute bitfields to other peers when relevant to their most recent view. Check bitfield signatures in this subsystem and accept and distribute only one bitfield per validator.
+
+When receiving a bitfield either from the network or from a `DistributeBitfield` message, forward it along to the block authorship (provisioning) subsystem for potential inclusion in a block.
@@ -0,0 +1,25 @@
+# Bitfield Signing
+
+Validators vote on the availability of a backed candidate by issuing signed bitfields, where each bit corresponds to a single candidate. These bitfields can be used to compactly determine which backed candidates are available or not based on a 2/3+ quorum.
+
+## Protocol
+
+Output:
+
+- BitfieldDistribution::DistributeBitfield: distribute a locally signed bitfield
+- AvailabilityStore::QueryChunk(CandidateHash, validator_index, response_channel)
+
+## Functionality
+
+Upon onset of a new relay-chain head with `StartWork`, launch bitfield signing job for the head. Stop the job on `StopWork`.
+
+## Bitfield Signing Job
+
+Localized to a specific relay-parent `r`
+If not running as a validator, do nothing.
+
+- Determine our validator index `i`, the set of backed candidates pending availability in `r`, and which bit of the bitfield each corresponds to.
+- > TODO: wait T time for availability distribution?
+- Start with an empty bitfield. For each bit in the bitfield, if there is a candidate pending availability, query the [Availability Store](/node/utility/availability-store.html) for whether we have the availability chunk for our validator index.
+- For all chunks we have, set the corresponding bit in the bitfield.
+- Sign the bitfield and dispatch a `BitfieldDistribution::DistributeBitfield` message.
@@ -0,0 +1,10 @@
+# Backing Subsystems
+
+The backing subsystems, when conceived as a black box, receive an arbitrary quantity of parablock candidates and associated proofs of validity from arbitrary untrusted collators. From these, they produce a bounded quantity of backable candidates which relay chain block authors may choose to include in a subsequent block.
+
+In broad strokes, the flow operates like this:
+
+- **Candidate Selection** winnows the field of parablock candidates, selecting up to one of them to second.
+- **Candidate Backing** ensures that a seconding candidate is valid, then generates the appropriate `Statement`. It also keeps track of which candidates have received the backing of a quorum of other validators.
+- **Statement Distribution** is the networking component which ensures that all validators receive each others' statements.
+- **PoV Distribution** is the networking component which ensures that validators considering a candidate can get the appropriate PoV.
@@ -0,0 +1,92 @@
+# Candidate Backing
+
+The Candidate Backing subsystem ensures every parablock considered for relay block inclusion has been seconded by at least one validator, and approved by a quorum. Parablocks for which no validator will assert correctness are discarded. If the block later proves invalid, the initial backers are slashable; this gives polkadot a rational threat model during subsequent stages.
+
+Its role is to produce backable candidates for inclusion in new relay-chain blocks. It does so by issuing signed [`Statement`s](/type-definitions.html#statement-type) and tracking received statements signed by other validators. Once enough statements are received, they can be combined into backing for specific candidates.
+
+Note that though the candidate backing subsystem attempts to produce as many backable candidates as possible, it does _not_ attempt to choose a single authoritative one. The choice of which actually gets included is ultimately up to the block author, by whatever metrics it may use; those are opaque to this subsystem.
+
+Once a sufficient quorum has agreed that a candidate is valid, this subsystem notifies the [Overseer](/node/overseer.html), which in turn engages block production mechanisms to include the parablock.
+
+## Protocol
+
+The [Candidate Selection subsystem](/node/backing/candidate-selection.html) is the primary source of non-overseer messages into this subsystem. That subsystem generates appropriate [`CandidateBackingMessage`s](/type-definitions.html#candidate-backing-message), and passes them to this subsystem.
+
+This subsystem validates the candidates and generates an appropriate [`Statement`](/type-definitions.html#statement-type). All `Statement`s are then passed on to the [Statement Distribution subsystem](/node/backing/statement-distribution.html) to be gossiped to peers. When this subsystem decides that a candidate is invalid, and it was recommended to us to second by our own Candidate Selection subsystem, a message is sent to the Candidate Selection subsystem with the candidate's hash so that the collator which recommended it can be penalized.
+
+## Functionality
+
+The subsystem should maintain a set of handles to Candidate Backing Jobs that are currently live, as well as the relay-parent to which they correspond.
+
+### On Overseer Signal
+
+* If the signal is an [`OverseerSignal`](/type-definitions.html#overseer-signal)`::StartWork(relay_parent)`, spawn a Candidate Backing Job with the given relay parent, storing a bidirectional channel with the Candidate Backing Job in the set of handles.
+* If the signal is an [`OverseerSignal`](/type-definitions.html#overseer-signal)`::StopWork(relay_parent)`, cease the Candidate Backing Job under that relay parent, if any.
+
+### On `CandidateBackingMessage`
+
+* If the message corresponds to a particular relay-parent, forward the message to the Candidate Backing Job for that relay-parent, if any is live.
+
+> big TODO: "contextual execution"
+>
+> * At the moment we only allow inclusion of _new_ parachain candidates validated by _current_ validators.
+> * Allow inclusion of _old_ parachain candidates validated by _current_ validators.
+> * Allow inclusion of _old_ parachain candidates validated by _old_ validators.
+>
+> This will probably blur the lines between jobs, will probably require inter-job communication and a short-term memory of recently backable, but not backed candidates.
+
+## Candidate Backing Job
+
+The Candidate Backing Job represents the work a node does for backing candidates with respect to a particular relay-parent.
+
+The goal of a Candidate Backing Job is to produce as many backable candidates as possible. This is done via signed [`Statement`s](/type-definitions.html#statement-type) by validators. If a candidate receives a majority of supporting Statements from the Parachain Validators currently assigned, then that candidate is considered backable.
+
+### On Startup
+
+* Fetch current validator set, validator -> parachain assignments from runtime API.
+* Determine if the node controls a key in the current validator set. Call this the local key if so.
+* If the local key exists, extract the parachain head and validation function for the parachain the local key is assigned to.
+
+### On Receiving New Signed Statement
+
+```rust
+if let Statement::Seconded(candidate) = signed.statement {
+  if candidate is unknown and in local assignment {
+    spawn_validation_work(candidate, parachain head, validation function)
+  }
+}
+
+// add `Seconded` statements and `Valid` statements to a quorum. If quorum reaches validator-group
+// majority, send a `BlockAuthorshipProvisioning::BackableCandidate(relay_parent, Candidate, Backing)` message.
+```
+
+### Spawning Validation Work
+
+```rust
+fn spawn_validation_work(candidate, parachain head, validation function) {
+  asynchronously {
+    let pov = (fetch pov block).await
+
+    // dispatched to sub-process (OS process) pool.
+    let valid = validate_candidate(candidate, validation function, parachain head, pov).await;
+    if valid {
+      // make PoV available for later distribution. Send data to the availability store to keep.
+      // sign and dispatch `valid` statement to network if we have not seconded the given candidate.
+    } else {
+      // sign and dispatch `invalid` statement to network.
+    }
+  }
+}
+```
+
+### Fetch Pov Block
+
+Create a `(sender, receiver)` pair.
+Dispatch a `PovFetchSubsystemMessage(relay_parent, candidate_hash, sender)` and listen on the receiver for a response.
+
+### On Receiving `CandidateBackingMessage`
+
+* If the message is a `CandidateBackingMessage::RegisterBackingWatcher`, register the watcher and trigger it each time a new candidate is backable. Also trigger it once initially if there are any backable candidates at the time of receipt.
+* If the message is a `CandidateBackingMessage::Second`, sign and dispatch a `Seconded` statement only if we have not seconded any other candidate and have not signed a `Valid` statement for the requested candidate. Signing both a `Seconded` and `Valid` message is a double-voting misbehavior with a heavy penalty, and this could occur if another validator has seconded the same candidate and we've received their message before the internal seconding request.
+
+> TODO: send statements to Statement Distribution subsystem, handle shutdown signal from candidate backing subsystem
@@ -0,0 +1,39 @@
+# Candidate Selection
+
+The Candidate Selection Subsystem is run by validators, and is responsible for interfacing with Collators to select a candidate, along with its PoV, to second during the backing process relative to a specific relay parent.
+
+This subsystem includes networking code for communicating with collators, and tracks which collations specific collators have submitted. This subsystem is responsible for disconnecting and blacklisting collators who have submitted collations that are found to have submitted invalid collations by other subsystems.
+
+This subsystem is only ever interested in parablocks assigned to the particular parachain which this validator is currently handling.
+
+New parablock candidates may arrive from a potentially unbounded set of collators. This subsystem chooses either 0 or 1 of them per relay parent to second. If it chooses to second a candidate, it sends an appropriate message to the [Candidate Backing subsystem](/node/backing/candidate-backing.html) to generate an appropriate [`Statement`](/type-definitions.html#statement-type).
+
+In the event that a parablock candidate proves invalid, this subsystem will receive a message back from the Candidate Backing subsystem indicating so. If that parablock candidate originated from a collator, this subsystem will blacklist that collator. If that parablock candidate originated from a peer, this subsystem generates a report for the [Misbehavior Arbitration subsystem](/node/utility/misbehavior-arbitration.html).
+
+## Protocol
+
+Input: None
+
+Output:
+
+- Validation requests to Validation subsystem
+- [`CandidateBackingMessage`](/type-definitions.html#candidate-backing-message)`::Second`
+- Peer set manager: report peers (collators who have misbehaved)
+
+## Functionality
+
+Overarching network protocol + job for every relay-parent
+
+> TODO The Candidate Selection network protocol is currently intentionally unspecified pending further discussion.
+
+Several approaches have been selected, but all have some issues:
+
+- The most straightforward approach is for this subsystem to simply second the first valid parablock candidate which it sees per relay head. However, that protocol is vulnerable to a single collator which, as an attack or simply through chance, gets its block candidate to the node more often than its fair share of the time.
+- It may be possible to do some BABE-like selection algorithm to choose an "Official" collator for the round, but that is tricky because the collator which produces the PoV does not necessarily actually produce the block.
+- We could use relay-chain BABE randomness to generate some delay `D` on the order of 1 second, +- 1 second. The collator would then second the first valid parablock which arrives after `D`, or in case none has arrived by `2*D`, the last valid parablock which has arrived. This makes it very hard for a collator to game the system to always get its block nominated, but it reduces the maximum throughput of the system by introducing delay into an already tight schedule.
+- A variation of that scheme would be to randomly choose a number `I`, and have a fixed acceptance window `D` for parablock candidates. At the end of the period `D`, count `C`: the number of parablock candidates received. Second the one with index `I % C`. Its drawback is the same: it must wait the full `D` period before seconding any of its received candidates, reducing throughput.
+
+## Candidate Selection Job
+
+- Aware of validator key and assignment
+- One job for each relay-parent, which selects up to one collation for the Candidate Backing Subsystem
@@ -0,0 +1,13 @@
+# PoV Distribution
+
+This subsystem is responsible for distributing PoV blocks. For now, unified with [Statement Distribution subsystem](/node/backing/statement-distribution.html).
+
+## Protocol
+
+Handle requests for PoV block by candidate hash and relay-parent.
+
+## Functionality
+
+Implemented as a gossip system, where `PoV`s are not accepted unless we know a `Seconded` message.
+
+> TODO: this requires a lot of cross-contamination with statement distribution even if we don't implement this as a gossip system. In a point-to-point implementation, we still have to know _who to ask_, which means tracking who's submitted `Seconded`, `Valid`, or `Invalid` statements - by validator and by peer. One approach is to have the Statement gossip system to just send us this information and then we can separate the systems from the beginning instead of combining them
@@ -0,0 +1,54 @@
+# Statement Distribution
+
+The Statement Distribution Subsystem is responsible for distributing statements about seconded candidates between validators.
+
+## Protocol
+
+`ProtocolId`: `b"stmd"`
+
+Input:
+
+- NetworkBridgeUpdate(update)
+
+Output:
+
+- NetworkBridge::RegisterEventProducer(`ProtocolId`)
+- NetworkBridge::SendMessage(`[PeerId]`, `ProtocolId`, `Bytes`)
+- NetworkBridge::ReportPeer(PeerId, cost_or_benefit)
+
+## Functionality
+
+Implemented as a gossip protocol. Register a network event producer on startup. Handle updates to our view and peers' views. Neighbor packets are used to inform peers which chain heads we are interested in data for.
+
+Statement Distribution is the only backing subsystem which has any notion of peer nodes, who are any full nodes on the network. Validators will also act as peer nodes.
+
+It is responsible for signing statements that we have generated and forwarding them, and for detecting a variety of Validator misbehaviors for reporting to [Misbehavior Arbitration](/node/utility/misbehavior-arbitration.html). During the Backing stage of the inclusion pipeline, it's the main point of contact with peer nodes, who distribute statements by validators. On receiving a signed statement from a peer, assuming the peer receipt state machine is in an appropriate state, it sends the Candidate Receipt to the [Candidate Backing subsystem](/node/backing/candidate-backing.html) to handle the validator's statement.
+
+Track equivocating validators and stop accepting information from them. Forward double-vote proofs to the double-vote reporting system. Establish a data-dependency order:
+
+- In order to receive a `Seconded` message we have the on corresponding chain head in our view
+- In order to receive an `Invalid` or `Valid` message we must have received the corresponding `Seconded` message.
+
+And respect this data-dependency order from our peers by respecting their views. This subsystem is responsible for checking message signatures.
+
+The Statement Distribution subsystem sends statements to peer nodes and detects double-voting by validators. When validators conflict with each other or themselves, the Misbehavior Arbitration system is notified.
+
+## Peer Receipt State Machine
+
+There is a very simple state machine which governs which messages we are willing to receive from peers. Not depicted in the state machine: on initial receipt of any [`SignedStatement`](/type-definitions.html#signed-statement-type), validate that the provided signature does in fact sign the included data. Note that each individual parablock candidate gets its own instance of this state machine; it is perfectly legal to receive a `Valid(X)` before a `Seconded(Y)`, as long as a `Seconded(X)` has been received.
+
+A: Initial State. Receive `SignedStatement(Statement::Second)`: extract `Statement`, forward to Candidate Backing, proceed to B. Receive any other `SignedStatement` variant: drop it.
+B: Receive any `SignedStatement`: extract `Statement`, forward to Candidate Backing. Receive `OverseerMessage::StopWork`: proceed to C.
+C: Receive any message for this block: drop it.
+
+## Peer Knowledge Tracking
+
+The peer receipt state machine implies that for parsimony of network resources, we should model the knowledge of our peers, and help them out. For example, let's consider a case with peers A, B, and C, validators X and Y, and candidate M. A sends us a `Statement::Second(M)` signed by X. We've double-checked it, and it's valid. While we're checking it, we receive a copy of X's `Statement::Second(M)` from `B`, along with a `Statement::Valid(M)` signed by Y.
+
+Our response to A is just the `Statement::Valid(M)` signed by Y. However, we haven't heard anything about this from C. Therefore, we send it everything we have: first a copy of X's `Statement::Second`, then Y's `Statement::Valid`.
+
+This system implies a certain level of duplication of messages--we received X's `Statement::Second` from both our peers, and C may experience the same--but it minimizes the degree to which messages are simply dropped.
+
+And respect this data-dependency order from our peers. This subsystem is responsible for checking message signatures.
+
+No jobs, `StartWork` and `StopWork` pulses are used to control neighbor packets and what we are currently accepting.
@@ -0,0 +1,3 @@
+# Collators
+
+Collators are special nodes which bridge a parachain to the relay chain. They are simultaneously full nodes of the parachain, and at least light clients of the relay chain. Their overall contribution to the system is the generation of Proofs of Validity for parachain candidates.
@@ -0,0 +1,9 @@
+# Collation Distribution
+
+> TODO
+
+## Protocol
+
+## Functionality
+
+## Jobs, if any
@@ -0,0 +1,9 @@
+# Collation Generation
+
+> TODO
+
+## Protocol
+
+## Functionality
+
+## Jobs, if any
@@ -0,0 +1,92 @@
+# Overseer
+
+The overseer is responsible for these tasks:
+
+1. Setting up, monitoring, and handing failure for overseen subsystems.
+1. Providing a "heartbeat" of which relay-parents subsystems should be working on.
+1. Acting as a message bus between subsystems.
+
+The hierarchy of subsystems:
+
+```text
+--------------+      +------------------+    +--------------------+
+|              |      |                  |---->   Subsystem A      |
+| Block Import |      |                  |    +--------------------+
+|    Events    |------>                  |    +--------------------+
+--------------+      |                  |---->   Subsystem B      |
+                      |   Overseer       |    +--------------------+
+--------------+      |                  |    +--------------------+
+|              |      |                  |---->   Subsystem C      |
+| Finalization |------>                  |    +--------------------+
+|    Events    |      |                  |    +--------------------+
+|              |      |                  |---->   Subsystem D      |
+--------------+      +------------------+    +--------------------+
+
+```
+
+The overseer determines work to do based on block import events and block finalization events. It does this by keeping track of the set of relay-parents for which work is currently being done. This is known as the "active leaves" set. It determines an initial set of active leaves on startup based on the data on-disk, and uses events about blockchain import to update the active leaves. Updates lead to [`OverseerSignal`](/type-definitions.html#overseer-signal)`::StartWork` and [`OverseerSignal`](/type-definitions.html#overseer-signal)`::StopWork` being sent according to new relay-parents, as well as relay-parents to stop considering. Block import events inform the overseer of leaves that no longer need to be built on, now that they have children, and inform us to begin building on those children. Block finalization events inform us when we can stop focusing on blocks that appear to have been orphaned.
+
+The overseer's logic can be described with these functions:
+
+## On Startup
+
+* Start all subsystems
+* Determine all blocks of the blockchain that should be built on. This should typically be the head of the best fork of the chain we are aware of. Sometimes add recent forks as well.
+* For each of these blocks, send an `OverseerSignal::StartWork` to all subsystems.
+* Begin listening for block import and finality events
+
+## On Block Import Event
+
+* Apply the block import event to the active leaves. A new block should lead to its addition to the active leaves set and its parent being deactivated.
+* For any deactivated leaves send an `OverseerSignal::StopWork` message to all subsystems.
+* For any activated leaves send an `OverseerSignal::StartWork` message to all subsystems.
+* Ensure all `StartWork` messages are flushed before resuming activity as a message router.
+
+> TODO: in the future, we may want to avoid building on too many sibling blocks at once. the notion of a "preferred head" among many competing sibling blocks would imply changes in our "active leaves" update rules here
+
+## On Finalization Event
+
+* Note the height `h` of the newly finalized block `B`.
+* Prune all leaves from the active leaves which have height `<= h` and are not `B`.
+* Issue `OverseerSignal::StopWork` for all deactivated leaves.
+
+## On Subsystem Failure
+
+Subsystems are essential tasks meant to run as long as the node does. Subsystems can spawn ephemeral work in the form of jobs, but the subsystems themselves should not go down. If a subsystem goes down, it will be because of a critical error that should take the entire node down as well.
+
+## Communication Between Subsystems
+
+When a subsystem wants to communicate with another subsystem, or, more typically, a job within a subsystem wants to communicate with its counterpart under another subsystem, that communication must happen via the overseer. Consider this example where a job on subsystem A wants to send a message to its counterpart under subsystem B. This is a realistic scenario, where you can imagine that both jobs correspond to work under the same relay-parent.
+
+```text
+     +--------+                                                           +--------+
+     |        |                                                           |        |
+     |Job A-1 | (sends message)                       (receives message)  |Job B-1 |
+     |        |                                                           |        |
+     +----|---+                                                           +----^---+
+          |                  +------------------------------+                  ^
+          v                  |                              |                  |
+---------v---------+        |                              |        +---------|---------+
+|                   |        |                              |        |                   |
+| Subsystem A       |        |       Overseer / Message     |        | Subsystem B       |
+|                   -------->>                  Bus         -------->>                   |
+|                   |        |                              |        |                   |
+-------------------+        |                              |        +-------------------+
+                             |                              |
+                             +------------------------------+
+```
+
+First, the subsystem that spawned a job is responsible for handling the first step of the communication. The overseer is not aware of the hierarchy of tasks within any given subsystem and is only responsible for subsystem-to-subsystem communication. So the sending subsystem must pass on the message via the overseer to the receiving subsystem, in such a way that the receiving subsystem can further address the communication to one of its internal tasks, if necessary.
+
+This communication prevents a certain class of race conditions. When the Overseer determines that it is time for subsystems to begin working on top of a particular relay-parent, it will dispatch a `StartWork` message to all subsystems to do so, and those messages will be handled asynchronously by those subsystems. Some subsystems will receive those messsages before others, and it is important that a message sent by subsystem A after receiving `StartWork` message will arrive at subsystem B after its `StartWork` message. If subsystem A maintaned an independent channel with subsystem B to communicate, it would be possible for subsystem B to handle the side message before the `StartWork` message, but it wouldn't have any logical course of action to take with the side message - leading to it being discarded or improperly handled. Well-architectured state machines should have a single source of inputs, so that is what we do here.
+
+One exception is reasonable to make for responses to requests. A request should be made via the overseer in order to ensure that it arrives after any relevant `StartWork` message. A subsystem issuing a request as a result of a `StartWork` message can safely receive the response via a side-channel for two reasons:
+
+1. It's impossible for a request to be answered before it arrives, it is provable that any response to a request obeys the same ordering constraint.
+1. The request was sent as a result of handling a `StartWork` message. Then there is no possible future in which the `StartWork` message has not been handled upon the receipt of the response.
+
+So as a single exception to the rule that all communication must happen via the overseer we allow the receipt of responses to requests via a side-channel, which may be established for that purpose. This simplifies any cases where the outside world desires to make a request to a subsystem, as the outside world can then establish a side-channel to receive the response on.
+
+It's important to note that the overseer is not aware of the internals of subsystems, and this extends to the jobs that they spawn. The overseer isn't aware of the existence or definition of those jobs, and is only aware of the outer subsystems with which it interacts. This gives subsystem implementations leeway to define internal jobs as they see fit, and to wrap a more complex hierarchy of state machines than having a single layer of jobs for relay-parent-based work. Likewise, subsystems aren't required to spawn jobs. Certain types of subsystems, such as those for shared storage or networking resources, won't perform block-based work but would still benefit from being on the Overseer's message bus. These subsystems can just ignore the overseer's signals for block-based work.
+
+Furthermore, the protocols by which subsystems communicate with each other should be well-defined irrespective of the implementation of the subsystem. In other words, their interface should be distinct from their implementation. This will prevent subsystems from accessing aspects of each other that are beyond the scope of the communication boundary.
@@ -0,0 +1,11 @@
+# Subsystems and Jobs
+
+In this section we define the notions of Subsystems and Jobs. These are guidelines for how we will employ an architecture of hierarchical state machines. We'll have a top-level state machine which oversees the next level of state machines which oversee another layer of state machines and so on. The next sections will lay out these guidelines for what we've called subsystems and jobs, since this model applies to many of the tasks that the Node-side behavior needs to encompass, but these are only guidelines and some Subsystems may have deeper hierarchies internally.
+
+Subsystems are long-lived worker tasks that are in charge of performing some particular kind of work. All subsystems can communicate with each other via a well-defined protocol. Subsystems can't generally communicate directly, but must coordinate communication through an [Overseer](/node/overseer.html), which is responsible for relaying messages, handling subsystem failures, and dispatching work signals.
+
+Most work that happens on the Node-side is related to building on top of a specific relay-chain block, which is contextually known as the "relay parent". We call it the relay parent to explicitly denote that it is a block in the relay chain and not on a parachain. We refer to the parent because when we are in the process of building a new block, we don't know what that new block is going to be. The parent block is our only stable point of reference, even though it is usually only useful when it is not yet a parent but in fact a leaf of the block-DAG expected to soon become a parent (because validators are authoring on top of it). Furthermore, we are assuming a forkful blockchain-extension protocol, which means that there may be multiple possible children of the relay-parent. Even if the relay parent has multiple children blocks, the parent of those children is the same, and the context in which those children is authored should be the same. The parent block is the best and most stable reference to use for defining the scope of work items and messages, and is typically referred to by its cryptographic hash.
+
+Since this goal of determining when to start and conclude work relative to a specific relay-parent is common to most, if not all subsystems, it is logically the job of the Overseer to distribute those signals as opposed to each subsystem duplicating that effort, potentially being out of synchronization with each other. Subsystem A should be able to expect that subsystem B is working on the same relay-parents as it is. One of the Overseer's tasks is to provide this heartbeat, or synchronized rhythm, to the system.
+
+The work that subsystems spawn to be done on a specific relay-parent is known as a job. Subsystems should set up and tear down jobs according to the signals received from the overseer. Subsystems may share or cache state between jobs.
@@ -0,0 +1,3 @@
+# Utility Subsystems
+
+The utility subsystems are an assortment which don't have a natural home in another subsystem collection.
@@ -0,0 +1,56 @@
+# Availability Store
+
+This is a utility subsystem responsible for keeping available certain data and pruning that data.
+
+The two data types:
+
+- Full PoV blocks of candidates we have validated
+- Availability chunks of candidates that were backed and noted available on-chain.
+
+For each of these data we have pruning rules that determine how long we need to keep that data available.
+
+PoV hypothetically only need to be kept around until the block where the data was made fully available is finalized. However, disputes can revert finality, so we need to be a bit more conservative. We should keep the PoV until a block that finalized availability of it has been finalized for 1 day.
+
+> TODO: arbitrary, but extracting `acceptance_period` is kind of hard here...
+
+Availability chunks need to be kept available until the dispute period for the corresponding candidate has ended. We can accomplish this by using the same criterion as the above, plus a delay. This gives us a pruning condition of the block finalizing availability of the chunk being final for 1 day + 1 hour.
+
+> TODO: again, concrete acceptance-period would be nicer here, but complicates things
+
+There is also the case where a validator commits to make a PoV available, but the corresponding candidate is never backed. In this case, we keep the PoV available for 1 hour.
+
+> TODO: ideally would be an upper bound on how far back contextual execution is OK.
+
+There may be multiple competing blocks all ending the availability phase for a particular candidate. Until (and slightly beyond) finality, it will be unclear which of those is actually the canonical chain, so the pruning records for PoVs and Availability chunks should keep track of all such blocks.
+
+## Protocol
+
+Input:
+
+- QueryPoV(candidate_hash, response_channel)
+- QueryChunk(candidate_hash, validator_index, response_channel)
+- StoreChunk(candidate_hash, validator_index, inclusion_proof, chunk_data)
+
+## Functionality
+
+On `StartWork`:
+
+- Note any new candidates backed in the block. Update pruning records for any stored `PoVBlock`s.
+- Note any newly-included candidates backed in the block. Update pruning records for any stored availability chunks.
+
+On block finality events:
+
+- > TODO: figure out how we get block finality events from overseer
+- Handle all pruning based on the newly-finalized block.
+
+On `QueryPoV` message:
+
+- Return the PoV block, if any, for that candidate hash.
+
+On `QueryChunk` message:
+
+- Determine if we have the chunk indicated by the parameters and return it via the response channel if so.
+
+On `StoreChunk` message:
+
+- Store the chunk along with its inclusion proof under the candidate hash and validator index.
@@ -0,0 +1,13 @@
+# Candidate Validation
+
+This subsystem is responsible for handling candidate validation requests. It is a simple request/response server.
+
+## Protocol
+
+Input:
+
+- [`CandidateValidationMessage`](/type-definitions.html#validation-request-type)
+
+## Functionality
+
+Given a candidate, its validation code, and its PoV, determine whether the candidate is valid. There are a few different situations this code will be called in, and this will lead to variance in where the parameters originate. Determining the parameters is beyond the scope of this subsystem.
@@ -0,0 +1,7 @@
+# Misbehavior Arbitration
+
+The Misbehavior Arbitration subsystem collects reports of validator misbehavior, and slashes the stake of both misbehaving validator nodes and false accusers.
+
+> TODO: It is not yet fully specified; that problem is postponed to a future PR.
+
+One policy question we've decided even so: in the event that MA has to call all validators to check some block about which some validators disagree, the minority voters all get slashed, and the majority voters all get rewarded. Validators which abstain have a minor slash penalty, but probably not in the same order of magnitude as those who vote wrong.
@@ -0,0 +1,65 @@
+# Network Bridge
+
+One of the main features of the overseer/subsystem duality is to avoid shared ownership of resources and to communicate via message-passing. However, implementing each networking subsystem as its own network protocol brings a fair share of challenges.
+
+The most notable challenge is coordinating and eliminating race conditions of peer connection and disconnection events. If we have many network protocols that peers are supposed to be connected on, it is difficult to enforce that a peer is indeed connected on all of them or the order in which those protocols receive notifications that peers have connected. This becomes especially difficult when attempting to share peer state across protocols. All of the Parachain-Host's gossip protocols eliminate DoS with a data-dependency on current chain heads. However, it is inefficient and confusing to implement the logic for tracking our current chain heads as well as our peers' on each of those subsystems. Having one subsystem for tracking this shared state and distributing it to the others is an improvement in architecture and efficiency.
+
+One other piece of shared state to track is peer reputation. When peers are found to have provided value or cost, we adjust their reputation accordingly.
+
+So in short, this Subsystem acts as a bridge between an actual network component and a subsystem's protocol.
+
+## Protocol
+
+> REVIEW: I am designing this using dynamic dispatch based on a ProtocolId discriminant rather than doing static dispatch to specific subsystems based on a concrete network message type. The reason for this is that doing static dispatch might break the property that Subsystem implementations can be swapped out for others. So this is actually implementing a subprotocol multiplexer. Pierre tells me this is OK for our use-case ;). One caveat is that now all network traffic will also flow through the overseer, but this overhead is probably OK.
+
+```rust
+use sc-network::ObservedRole;
+
+struct View(Vec<Hash>); // Up to `N` (5?) chain heads.
+
+enum NetworkBridgeEvent {
+ PeerConnected(PeerId, ObservedRole), // role is one of Full, Light, OurGuardedAuthority, OurSentry
+ PeerDisconnected(PeerId),
+ PeerMessage(PeerId, Bytes),
+ PeerViewChange(PeerId, View), // guaranteed to come after peer connected event.
+ OurViewChange(View),
+}
+```
+
+Input:
+
+- RegisterEventProducer(`ProtocolId`, `Fn(NetworkBridgeEvent) -> AllMessages`): call on startup.
+- ReportPeer(PeerId, cost_or_benefit)
+- SendMessage(`[PeerId]`, `ProtocolId`, Bytes): send a message to multiple peers.
+
+## Functionality
+
+Track a set of all Event Producers, each associated with a 4-byte protocol ID.
+There are two types of network messages this sends and receives:
+
+- ProtocolMessage(ProtocolId, Bytes)
+- ViewUpdate(View)
+
+`StartWork` and `StopWork` determine the computation of our local view. A `ViewUpdate` is issued to each connected peer, and a `NetworkBridgeUpdate::OurViewChange` is issued for each registered event producer.
+
+On `RegisterEventProducer`:
+
+- Add the event producer to the set of event producers. If there is a competing entry, ignore the request.
+
+On `ProtocolMessage` arrival:
+
+- If the protocol ID matches an event producer, produce the message from the `NetworkBridgeEvent::PeerMessage(sender, bytes)`, otherwise ignore and reduce peer reputation slightly
+- dispatch message via overseer.
+
+On `ViewUpdate` arrival:
+
+- Do validity checks and note the most recent view update of the peer.
+- For each event producer, dispatch the result of a `NetworkBridgeEvent::PeerViewChange(view)` via overseer.
+
+On `ReportPeer` message:
+
+- Adjust peer reputation according to cost or benefit provided
+
+On `SendMessage` message:
+
+- Issue a corresponding `ProtocolMessage` to each listed peer with given protocol ID and bytes.
@@ -0,0 +1,9 @@
+# Peer Set Manager
+
+> TODO
+
+## Protocol
+
+## Functionality
+
+## Jobs, if any
@@ -0,0 +1,30 @@
+# Provisioner
+
+This subsystem is responsible for providing data to an external block authorship service beyond the scope of the [Overseer](/node/overseer.html) so that the block authorship service can author blocks containing data produced by various subsystems.
+
+In particular, the data to provide:
+
+- backable candidates and their backings
+- signed bitfields
+- misbehavior reports
+- dispute inherent
+    > TODO: needs fleshing out in validity module, related to blacklisting
+
+## Protocol
+
+Input:
+
+- Bitfield(relay_parent, signed_bitfield)
+- BackableCandidate(relay_parent, candidate_receipt, backing)
+- RequestBlockAuthorshipData(relay_parent, response_channel)
+
+## Functionality
+
+Use `StartWork` and `StopWork` to manage a set of jobs for relay-parents we might be building upon.
+Forward all messages to corresponding job, if any.
+
+## Block Authorship Provisioning Job
+
+Track all signed bitfields, all backable candidates received. Provide them to the `RequestBlockAuthorshipData` requester via the `response_channel`. If more than one backable candidate exists for a given `Para`, provide the first one received.
+
+> TODO: better candidate-choice rules.
@@ -0,0 +1,3 @@
+# Validity
+
+The node validity subsystems exist to support the runtime [Validity module](/runtime/validity.html). Their behavior and specifications are as-yet undefined.