* Fix a couple of typos
* Retry failed PVF execution
PVF execution that fails due to AmbiguousWorkerDeath should be retried once.
This should reduce the occurrence of failures due to transient conditions.
Closes#6195
* Address a couple of nits
* Write tests; refactor (add `validate_candidate_with_retry`)
* Update node/core/candidate-validation/src/lib.rs
Co-authored-by: Andronik <write@reusable.software>
Co-authored-by: eskimor <eskimor@users.noreply.github.com>
Co-authored-by: Andronik <write@reusable.software>
* Rename timeout consts and timeout parameter; bump leniency
* Update implementor's guide with info about PVFs
* Make glossary a bit easier to read
* Add a note to LENIENT_PREPARATION_TIMEOUT
* Remove PVF-specific section from glossary
* Fix some typos
* Bump crate versions
* Bump spec_version to 9280 for kusama
* Bump spec_version to 9280 for polkadot
* Bump spec_version to 9280 for rococo
* Bump spec_version to 9280 for westend
* update Cargo.lock
Co-authored-by: parity-processbot <>
The PVF host is designed to avoid spawning tasks to minimize knowledge
of outer code. Using `async_std::task::spawn` (or Tokio's counterpart)
deemed unacceptable, `SpawnNamed` undesirable. Instead there is only one
task returned that is spawned by the candidate-validation subsystem.
The tasks from the sub-components are polled by that root task.
However, the way the tasks are bundled was incorrect. There was a giant
select that was polling those tasks. Particularly, that implies that as soon as
one of the arms of that select goes into await those sub-tasks stop
getting polled. This is a recipe for a deadlock which indeed happened
here.
Specifically, the deadlock happened during sending messages to the
execute queue by calling
[`send_execute`](https://github.com/paritytech/polkadot/blob/a68d9be35656dcd96e378fd9dd3d613af754d48a/node/core/pvf/src/host.rs#L601).
When the channel to the queue reaches the capacity, the control flow is
suspended until the queue handles those messages. Since this code is
essentially reached from [one of the select
arms](https://github.com/paritytech/polkadot/blob/a68d9be35656dcd96e378fd9dd3d613af754d48a/node/core/pvf/src/host.rs#L371),
the queue won't be given the control and thus no further progress can be
made.
This problem is solved by bundling the tasks one level higher instead,
by `selecting` over those long-running tasks.
We also stop treating returning from those long-running tasks as error
conditions, since that can happen during legit shutdown.
* Rename to BagError
* Additional parameter for 'revert' command
* Set aux revert param to None
* Align to changes in how the WASM executor is configured in `substrate`
* update lockfile for {"substrate"}
* update lockfile for {"substrate"}
* Update substrate
* Update substrate
Co-authored-by: Keith Yeung <kungfukeith11@gmail.com>
Co-authored-by: Davide Galassi <davxy@datawok.net>
Co-authored-by: Shawn Tabrizi <shawntabrizi@gmail.com>
Co-authored-by: parity-processbot <>
* merge master (do not compile)
* fix
* lock
* update lock
* Update to refactoring.
* runtime version
* fmt
* remove trie patch
* remove patch
* No layout alias for bridge proof.
* update depupdate depss
* No switch until migration.
* master lock
* test
* test
* Revert "test"
This reverts commit 57325ef73332bf4b054aa4a667bb716fcf8a0d89.
* Revert "test"
This reverts commit ce74d0e2062806f72c0e9e9ca07b14165f43521e.
* rename feature
* state version as parameter, use the feature only on runtimes.
* update
* update to state version in runtime
* state version from storage
* update lockfile for substrate
Co-authored-by: parity-processbot <>
We wanted to change niceness to accomodate the fact that some of the
preparation tasks are low priority. For example, when a node sees that
there is a new para was onboarded the node may start preparing right
away. Since all other activities are more important, such as network I/O
or validation of the backed candidates and preparation of the
immediatelly needed PVFs.
However, it turned out that this approach does not work: generally
non-root processes can only decrease niceness and they cannot increase
it to the previous value, as was assumed by the code.
Apart from that, https://github.com/paritytech/polkadot/pull/4123
assumes all PVFs are prepared in the same way. Specifically, that if a
PVF preparation failed before, then PVF pre-checking will also report
that it was failed, even though it could happen that preparation failed
due to being low-priority. In order to avoid such cases, we decided to
simplify the whole preparation model. Preparation under low priority
does not work well with that.
Closes https://github.com/paritytech/polkadot/issues/4520
* Companion PR for removing Prometheus metrics prefix
* Was missing some metrics
* Fix missing renames
* Fix test
* Fixes
* Update test
* Update Substrate
* Second time
* remove prefix from intergration test for zombienet
* update zombienet image
* Update Substrate
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Javier Viola <pepoviola@gmail.com>
Closes https://github.com/paritytech/polkadot/issues/4293
This PR changes the way how we treat a certain subset of PVF preparation
errors. Specifically, now only the deterministic errors are treated as
invalid candidates. That is, the errors that are easily
attributable to either the the PVF contents or the wasmtime code, but
not e.g. I/O errors that could be triggered by the OS (insufficient
memory, disk failure, too much load, etc). The latter are treated as
internal errors and thus do not trigger the disputes.