**Update:** Pushed additional changes based on the review comments.
**This pull request fixes various spelling mistakes in this
repository.**
Most of the changes are contained in the first **3** commits:
- `Fix spelling mistakes in comments and docs`
- `Fix spelling mistakes in test names`
- `Fix spelling mistakes in error messages, panic messages, logs and
tracing`
Other source code spelling mistakes are separated into individual
commits for easier reviewing:
- `Fix the spelling of 'authority'`
- `Fix the spelling of 'REASONABLE_HEADERS_IN_JUSTIFICATION_ANCESTRY'`
- `Fix the spelling of 'prev_enqueud_messages'`
- `Fix the spelling of 'endpoint'`
- `Fix the spelling of 'children'`
- `Fix the spelling of 'PenpalSiblingSovereignAccount'`
- `Fix the spelling of 'PenpalSudoAccount'`
- `Fix the spelling of 'insufficient'`
- `Fix the spelling of 'PalletXcmExtrinsicsBenchmark'`
- `Fix the spelling of 'subtracted'`
- `Fix the spelling of 'CandidatePendingAvailability'`
- `Fix the spelling of 'exclusive'`
- `Fix the spelling of 'until'`
- `Fix the spelling of 'discriminator'`
- `Fix the spelling of 'nonexistent'`
- `Fix the spelling of 'subsystem'`
- `Fix the spelling of 'indices'`
- `Fix the spelling of 'committed'`
- `Fix the spelling of 'topology'`
- `Fix the spelling of 'response'`
- `Fix the spelling of 'beneficiary'`
- `Fix the spelling of 'formatted'`
- `Fix the spelling of 'UNKNOWN_PROOF_REQUEST'`
- `Fix the spelling of 'succeeded'`
- `Fix the spelling of 'reopened'`
- `Fix the spelling of 'proposer'`
- `Fix the spelling of 'InstantiationNonce'`
- `Fix the spelling of 'depositor'`
- `Fix the spelling of 'expiration'`
- `Fix the spelling of 'phantom'`
- `Fix the spelling of 'AggregatedKeyValue'`
- `Fix the spelling of 'randomness'`
- `Fix the spelling of 'defendant'`
- `Fix the spelling of 'AquaticMammal'`
- `Fix the spelling of 'transactions'`
- `Fix the spelling of 'PassingTracingSubscriber'`
- `Fix the spelling of 'TxSignaturePayload'`
- `Fix the spelling of 'versioning'`
- `Fix the spelling of 'descendant'`
- `Fix the spelling of 'overridden'`
- `Fix the spelling of 'network'`
Let me know if this structure is adequate.
**Note:** The usage of the words `Merkle`, `Merkelize`, `Merklization`,
`Merkelization`, `Merkleization`, is somewhat inconsistent but I left it
as it is.
~~**Note:** In some places the term `Receival` is used to refer to
message reception, IMO `Reception` is the correct word here, but I left
it as it is.~~
~~**Note:** In some places the term `Overlayed` is used instead of the
more acceptable version `Overlaid` but I also left it as it is.~~
~~**Note:** In some places the term `Applyable` is used instead of the
correct version `Applicable` but I also left it as it is.~~
**Note:** Some usage of British vs American english e.g. `judgement` vs
`judgment`, `initialise` vs `initialize`, `optimise` vs `optimize` etc.
are both present in different places, but I suppose that's
understandable given the number of contributors.
~~**Note:** There is a spelling mistake in `.github/CODEOWNERS` but it
triggers errors in CI when I make changes to it, so I left it as it
is.~~
This PR aims to channel the backpressure of the PVF host's preparation
and execution queues to the candidate validation subsystem consumers.
Related: #708
resolve#2157
- [x] fix broken doc links
- [x] fix codec macro typo
https://github.com/paritytech/polkadot-sdk/blob/master/polkadot/node/core/pvf/common/src/error.rs#L81
(see the comment below)
- [x] refactor `ValidationError`, `PrepareError` and related error types
to use `thiserror` crate
## `codec` issue
`codec` macro was mistakenly applied two times to `Kernel` error (so it
was encoded with 10 instead of 11 and the same as `JobDied`). The PR
changes it to 11 because
- it was an initial goal of the code author
- Kernel is less frequent than JobDied so in case of existing error
encoding it is more probable to have 10 as JobDied than Kernel
See https://github.com/paritytech/parity-scale-codec/issues/555
----
polkadot address: 13zCyRG2a1W2ih5SioL8byqmQ6mc8vkgFwQgVzJSdRUUmp46
---------
Co-authored-by: s0me0ne-unkn0wn <48632512+s0me0ne-unkn0wn@users.noreply.github.com>
Fixes a potential memory leak.
`PR_SET_PDEATHSIG` is used to terminate children when the parent dies.
Note that this is subject to a race. There seems to be a raceless
alternative [here](https://stackoverflow.com/a/42498370/6085242), but
the concern is small enough that a bit more complexity doesn't seem
worth it. Left a bit more info in the code comment.
closes#695
Could potentially be helpful to preserving caches when applicable, as
discussed in #685
kusama address: FvpsvV1GQAAbwqX6oyRjemgdKV11QU5bXsMg9xsonD1FLGK
closes#622
Pros:
* simpler interface, just functions:
`create_runtime_from_artifact_bytes()` and `execute_artifact()`
Cons:
* extra overhead of constructing executor semantics each time
I could make it a combination of
* `create_runtime_config(params)` (such that we could clone the
constructed semantics)
* `create_runtime(blob, config)`
* `execute_artifact(blob, config, params)`
Not sure if it's worth it though.
---------
Co-authored-by: Bastian Köcher <git@kchr.de>
# Description
In a couple of cases, there were links pointing to the w3f github pages
domain. In other instances, there were links pointing to the old
polkadot repo's github pages. Both of these are now pointing to the
relevant links in
https://paritytech.github.io/polkadot-sdk/book/index.html.
These changes were made specifically because the w3f github pages
returns a 404, and while fixing the links, the old polkadot repo links
were touched up as well even if they do redirect properly.
This shouldn't affect anything as these are documentation link changes
only.
Closes#583
After the separation of PVF worker binaries, dedicated puppet workers
are not needed for tests anymore. The production workers can be used
instead, avoiding some code duplication and decreasing complexity.
The changes also make it possible to further refactor the code to
isolate workers completely.
* Rename `polkadot-parachain` to `polkadot-parachain-primitives`
While doing this it also fixes some last `rustdoc` issues and fixes
another Cargo warning related to `pallet-paged-list`.
* Fix compilation
* ".git/.scripts/commands/fmt/fmt.sh"
* Fix XCM docs
---------
Co-authored-by: command-bot <>
* pvf: use test-utils feature to export test only
* adding comment to test-utils feature
* make prepare-worker and execute-worker as optional dependencies and add comments to test-utils
* remove doc hidden from pvf testing
* add prepare worker and execute worker entrypoints to test-utils feature
* pvf: add sp_tracing as optional dependency of test-utils
* add test-utils for polkadot and malus
* add test-utils feature to prepare and execute workers script
* remove required features from prepare and executing
* Try to trigger CI again to fix broken jobs
---------
Co-authored-by: Marcin S <marcin@realemail.net>
* [WIP] PVF: Split out worker binaries
* Address compilation problems and re-design a bit
* Reorganize once more, fix tests
* Reformat with new nightly to make `cargo fmt` test happy
* Address `clippy` warnings
* Add temporary trace to debug zombienet tests
* Fix zombienet node upgrade test
* Fix malus and its CI
* Fix building worker binaries with malus
* More fixes for malus
* Remove unneeded cli subcommands
* Support placing auxiliary binaries to `/usr/libexec`
* Fix spelling
* Spelling
Co-authored-by: Marcin S. <marcin@realemail.net>
* Implement review comments (mostly nits)
* Fix worker node version flag
* Rework getting the worker paths
* Address a couple of review comments
* Minor restructuring
* Fix CI error
* Add tests for worker binaries detection
* Improve tests; try to fix CI
* Move workers module into separate file
* Try to fix failing test and workers not printing latest version
- Tests were not finding the worker binaries
- Workers were not being rebuilt when the version changed
- Made some errors easier to read
* Make a bunch of fixes
* Rebuild nodes on version change
* Fix more issues
* Fix tests
* Pass node version from node into dependencies to avoid recompiles
- [X] get version in CLI
- [X] pass it in to service
- [X] pass version along to PVF
- [X] remove rerun from service
- [X] add rerun to CLI
- [X] don’t rerun pvf/worker’s (these should be built by nodes which have rerun enabled)
* Some more improvements for smoother tests
- [X] Fix tests
- [X] Make puppet workers pass None for version and remove rerun
- [X] Make test collators self-contained
* Add back rerun to PVF workers
* Move worker binaries into files in cli crate
As a final optimization I've separated out each worker binary from its own crate
into the CLI crate. Before, the worker bin shared a crate with the worker lib,
so when the binaries got recompiled so did the libs and everything transitively
depending on the libs. This commit fixes this regression that was causing
recompiles after every commit.
* Fix bug (was passing worker version for node version)
* Move workers out of cli into root src/bin/ dir
- [X] Pass in node version from top-level (polkadot)
- [X] Add build.rs with rerun-git-head to root dir
* Add some sanity checks for workers to dockerfiles
* Update malus
+ [X] Make it self-contained
+ [X] Undo multiple binary changes
* Try to fix clippy errors
* Address `cargo run` issue
- [X] Add default-run for polkadot
- [X] Add note about installation to error
* Update readme (installation instructions)
* Allow disabling external workers for local/testing setups
+ [X] cli flag to enable single-binary mode
+ [X] Add message to error
* Revert unnecessary Cargo.lock changes
* Remove unnecessary build scripts from collators
* Add back missing malus commands (should fix failing ZN job)
* Some minor fixes
* Update Cargo.lock
* Fix some build errors
* Undo self-contained binaries; cli flag to disable version check
+ [X] Remove --dont-run-external-workers
+ [X] Add --disable-worker-version-check
+ [X] Remove PVF subcommands
+ [X] Redo malus changes
* Try to fix failing job and add some docs for local tests
---------
Co-authored-by: Dmitry Sinyavin <dmitry.sinyavin@parity.io>
Co-authored-by: s0me0ne-unkn0wn <48632512+s0me0ne-unkn0wn@users.noreply.github.com>
Co-authored-by: parity-processbot <>
* Begin adding landlock + test
* Move PVF implementer's guide section to own page, document security
* Implement test
* Add some docs
* Do some cleanup
* Fix typo
* Warn on host startup if landlock is not supported
* Clarify docs a bit
* Minor improvements
* Add some docs about determinism
* Address review comments (mainly add warning on landlock error)
* Update node/core/pvf/src/host.rs
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* Update node/core/pvf/src/host.rs
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* Fix unused fn
* Update ABI docs to reflect latest discussions
* Remove outdated notes
* Try to trigger new test-linux-oldkernel-stable job
Job introduced in https://github.com/paritytech/polkadot/pull/7371.
---------
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* PVF: Refactor workers into separate crates, remove host dependency
* Fix compile error
* Remove some leftover code
* Fix compile errors
* Update Cargo.lock
* Remove worker main.rs files
I accidentally copied these from the other PR. This PR isn't intended to
introduce standalone workers yet.
* Address review comments
* cargo fmt
* Update a couple of comments
* Update log targets
* PVF: Remove `rayon` and some uses of `tokio`
1. We were using `rayon` to spawn a superfluous thread to do execution, so it was removed.
2. We were using `rayon` to set a threadpool-specific thread stack size, and AFAIK we couldn't do that with `tokio` (it's possible [per-runtime](https://docs.rs/tokio/latest/tokio/runtime/struct.Builder.html#method.thread_stack_size) but not per-thread). Since we want to remove `tokio` from the workers [anyway](https://github.com/paritytech/polkadot/issues/7117), I changed it to spawn threads with the `std::thread` API instead of `tokio`.[^1]
[^1]: NOTE: This PR does not totally remove the `tokio` dependency just yet.
3. Since `std::thread` API is not async, we could no longer `select!` on the threads as futures, so the `select!` was changed to a naive loop.
4. The order of thread selection was flipped to make (3) sound (see note in code).
I left some TODO's related to panics which I'm going to address soon as part of https://github.com/paritytech/polkadot/issues/7045.
* PVF: Vote invalid on panics in execution thread (after a retry)
Also make sure we kill the worker process on panic errors and internal errors to
potentially clear any error states independent of the candidate.
* Address a couple of TODOs
Addresses a couple of follow-up TODOs from
https://github.com/paritytech/polkadot/pull/7153.
* Add some documentation to implementer's guide
* Fix compile error
* Fix compile errors
* Fix compile error
* Update roadmap/implementers-guide/src/node/utility/candidate-validation.md
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* Address comments + couple other changes (see message)
- Measure the CPU time in the prepare thread, so the observed time is not
affected by any delays in joining on the thread.
- Measure the full CPU time in the execute thread.
* Implement proper thread synchronization
Use condvars i.e. `Arc::new((Mutex::new(true), Condvar::new()))` as per the std
docs.
Considered also using a condvar to signal the CPU thread to end, in place of an
mpsc channel. This was not done because `Condvar::wait_timeout_while` is
documented as being imprecise, and `mpsc::Receiver::recv_timeout` is not
documented as such. Also, we would need a separate condvar, to avoid this case:
the worker thread finishes its job, notifies the condvar, the CPU thread returns
first, and we join on it and not the worker thread. So it was simpler to leave
this part as is.
* Catch panics in threads so we always notify condvar
* Use `WaitOutcome` enum instead of bool condition variable
* Fix retry timeouts to depend on exec timeout kind
* Address review comments
* Make the API for condvars in workers nicer
* Add a doc
* Use condvar for memory stats thread
* Small refactor
* Enumerate internal validation errors in an enum
* Fix comment
* Add a log
* Fix test
* Update variant naming
* Address a missed TODO
---------
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* PVF: Don't dispute on missing artifact
A dispute should never be raised if the local cache doesn't provide a certain
artifact. You can not dispute based on this reason, as it is a local hardware
issue and not related to the candidate to check.
Design:
Currently we assume that if we prepared an artifact, it remains there on-disk
until we prune it, i.e. we never check again if it's still there.
We can change it so that instead of artifact-not-found triggering a dispute, we
retry once (like we do for AmbiguousWorkerDeath, except we don't dispute if it
still doesn't work). And when enqueuing an execute job, we check for the
artifact on-disk, and start preparation if not found.
Changes:
- [x] Integration test (should fail without the following changes)
- [x] Check if artifact exists when executing, prepare if not
- [x] Return an internal error when file is missing
- [x] Retry once on internal errors
- [x] Document design (update impl guide)
* Add some context to wasm error message (it is quite long)
* Fix impl guide
* Add check for missing/inaccessible file
* Add comment referencing Substrate issue
* Add test for retrying internal errors
---------
Co-authored-by: parity-processbot <>
* Happy New Year!
* Remove year entierly
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
* Remove years from copyright notice in the entire repo
---------
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
* Move version check to `worker_event_loop`
* More minor refactors
- More consistent use of `format_invalid` and `format_internal`.
- Fix a doc error.
- Fix some poorly-named local variables.
* Check spawned worker version vs node version before PVF preparation
* Address discussions
* Propagate errors and shutdown preparation and execution pipelines properly
* Add logs; Fix execution worker checks
* Revert "Propagate errors and shutdown preparation and execution pipelines properly"
This reverts commit b96cc3160ff58db5ff001d8ca0bfea9bd4bdd0f2.
* Don't try to shut down; report the condition and exit worker
* Get rid of `VersionMismatch` preparation error
* Merge master
* Add docs; Fix tests
* Update Cargo.lock
* Kill again, but only the main node process
* Move unsafe code to a common safe function
* Fix libc dependency error on MacOS
* pvf spawning: Add some logging, add a small integration test
* Minor fixes
* Restart CI
---------
Co-authored-by: Marcin S <marcin@realemail.net>