Add PVF module documentation (#6293)

* Add PVF module documentation

TODO (once the PRs land):

- [ ] Document executor parametrization.

- [ ] Document CPU time measurement of timeouts.

* Update node/core/pvf/src/lib.rs

Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>

* Clarify meaning of PVF acronym

* Move PVF doc to implementer's guide

* Clean up implementer's guide a bit

* Add page for PVF types

* pvf: Better separation between crate docs and implementer's guide

* ci: Add "prevalidating" to the dictionary

* ig: Remove types/chain.md

The types contained therein did not exist and the file was not referenced
anywhere.

Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
This commit is contained in:
Marcin S
2022-11-23 08:20:25 -05:00
committed by GitHub
parent 17ad9f5694
commit 1dec2433ae
13 changed files with 91 additions and 51 deletions
@@ -48,4 +48,39 @@ Once we have all parameters, we can spin up a background task to perform the val
If we can assume the presence of the relay-chain state (that is, during processing [`CandidateValidationMessage`][CVM]`::ValidateFromChainState`) we can run all the checks that the relay-chain would run at the inclusion time thus confirming that the candidate will be accepted.
### PVF Host
The PVF host is responsible for handling requests to prepare and execute PVF
code blobs.
One high-level goal is to make PVF operations as deterministic as possible, to
reduce the rate of disputes. Disputes can happen due to e.g. a job timing out on
one machine, but not another. While we do not yet have full determinism, there
are some dispute reduction mechanisms in place right now.
#### Retrying execution requests
If the execution request fails during **preparation**, we will retry if it is
possible that the preparation error was transient (e.g. if the error was a panic
or time out). We will only retry preparation if another request comes in after
15 minutes, to ensure any potential transient conditions had time to be
resolved. We will retry up to 5 times.
If the actual **execution** of the artifact fails, we will retry once if it was
an ambiguous error after a 1 second delay, to allow any potential transient
conditions to clear.
#### Preparation timeouts
We use timeouts for both preparation and execution jobs to limit the amount of
time they can take. As the time for a job can vary depending on the machine and
load on the machine, this can potentially lead to disputes where some validators
successfuly execute a PVF and others don't.
One mitigation we have in place is a more lenient timeout for preparation during
execution than during pre-checking. The rationale is that the PVF has already
passed pre-checking, so we know it should be valid, and we allow it to take
longer than expected, as this is likely due to an issue with the machine and not
the PVF.
[CVM]: ../../types/overseer-protocol.md#validationrequesttype
@@ -12,11 +12,11 @@ This subsytem does not produce any output messages either. The subsystem will, h
If the node is running in a collator mode, this subsystem will be disabled. The PVF pre-checker subsystem keeps track of the PVFs that are relevant for the subsystem.
To be relevant for the subsystem, a PVF must be returned by `pvfs_require_precheck` [`pvfs_require_precheck` runtime API][PVF pre-checking runtime API] in any of the active leaves. If the PVF is not present in any of the active leaves, it ceases to be relevant.
To be relevant for the subsystem, a PVF must be returned by the [`pvfs_require_precheck` runtime API][PVF pre-checking runtime API] in any of the active leaves. If the PVF is not present in any of the active leaves, it ceases to be relevant.
When a PVF just becomes relevant, the subsystem will send a message to the [Candidate Validation] subsystem asking for the pre-check.
Upon receving a message from the candidate-validation subsystem, the pre-checker will note down that the PVF has its judgement and will also sign and submit a [`PvfCheckStatement`] via the [`submit_pvf_check_statement` runtime API][PVF pre-checking runtime API]. In case, a judgement was received for a PVF that is no longer in view it is ignored. It is possible that the candidate validation was not able to check the PVF. In that case, the PVF pre-checker will abstain and won't submit any check statements.
Upon receving a message from the candidate-validation subsystem, the pre-checker will note down that the PVF has its judgement and will also sign and submit a [`PvfCheckStatement`][PvfCheckStatement] via the [`submit_pvf_check_statement` runtime API][PVF pre-checking runtime API]. In case, a judgement was received for a PVF that is no longer in view it is ignored. It is possible that the candidate validation was not able to check the PVF. In that case, the PVF pre-checker will abstain and won't submit any check statements.
Since a vote only is valid during [one session][overview], the subsystem will have to resign and submit the statements for for the new session. The new session is assumed to be started if at least one of the leaves has a greater session index that was previously observed in any of the leaves.
@@ -28,4 +28,4 @@ If the node is not in the active validator set, it will still perform all the ch
[Runtime API]: runtime-api.md
[PVF pre-checking runtime API]: ../../runtime-api/pvf-prechecking.md
[Candidate Validation]: candidate-validation.md
[`PvfCheckStatement`]: ../../types/pvf-prechecking.md
[PvfCheckStatement]: ../../types/pvf-prechecking.md#pvfcheckstatement