closes#695
Could potentially be helpful to preserving caches when applicable, as
discussed in #685
kusama address: FvpsvV1GQAAbwqX6oyRjemgdKV11QU5bXsMg9xsonD1FLGK
closes#622
Pros:
* simpler interface, just functions:
`create_runtime_from_artifact_bytes()` and `execute_artifact()`
Cons:
* extra overhead of constructing executor semantics each time
I could make it a combination of
* `create_runtime_config(params)` (such that we could clone the
constructed semantics)
* `create_runtime(blob, config)`
* `execute_artifact(blob, config, params)`
Not sure if it's worth it though.
---------
Co-authored-by: Bastian Köcher <git@kchr.de>
# Description
In a couple of cases, there were links pointing to the w3f github pages
domain. In other instances, there were links pointing to the old
polkadot repo's github pages. Both of these are now pointing to the
relevant links in
https://paritytech.github.io/polkadot-sdk/book/index.html.
These changes were made specifically because the w3f github pages
returns a 404, and while fixing the links, the old polkadot repo links
were touched up as well even if they do redirect properly.
This shouldn't affect anything as these are documentation link changes
only.
It seems the old strategy have been depracted more than one year.
So maybe it's time to clean up old strategy for wasm executor.
---
polkadot address: 15ouFh2SHpGbHtDPsJ6cXQfes9Cx1gEFnJJsJVqPGzBSTudr
---------
Co-authored-by: Bastian Köcher <git@kchr.de>
Co-authored-by: Koute <koute@users.noreply.github.com>
Closes#583
After the separation of PVF worker binaries, dedicated puppet workers
are not needed for tests anymore. The production workers can be used
instead, avoiding some code duplication and decreasing complexity.
The changes also make it possible to further refactor the code to
isolate workers completely.
* Rename `polkadot-parachain` to `polkadot-parachain-primitives`
While doing this it also fixes some last `rustdoc` issues and fixes
another Cargo warning related to `pallet-paged-list`.
* Fix compilation
* ".git/.scripts/commands/fmt/fmt.sh"
* Fix XCM docs
---------
Co-authored-by: command-bot <>
* Fix polkadot-node-core-pvf-prepare-worker build with jemalloc
The jemalloc feature on polkadot-node-core-pvf-prepare-worker depended
on some feature gated code in polkadot-node-core-pvf-common but there
way no way to enable this feature gate.
This commit adds the feature and makes prepare-worker enable it.
* More jemalloc-allocator fixes
* Fix jemalloc-allocator feature dep
* Run `zepter format features`
---------
Co-authored-by: Marcin S <marcin@realemail.net>
* PVF worker: random fixes
- Fixes possible panic due to non-UTF-8 env vars
(https://github.com/paritytech/polkadot/pull/7330#discussion_r1300101716)
- Very small refactor of some duplicated code
* Don't need `to_str()` for comparison between OsString and str
* Check edge cases that can cause env::remove_var to panic
In case of a key or value that would cause env::remove_var to panic, we first
log a warning and then proceed to attempt to remove the env var.
* Make warning message clearer for end users
* Backslash was unescaped, but can just remove it from error messages
* Remove superflous parameter `overseer_enable_anyways`
We don't need this flag, as we don't need the overseer enabled when the
node isn't a collator or validator.
* Rename `IsCollator` to `IsParachainNode`
`IsParachainNode` is more expressive and also encapsulates the state of
the parachain node being a full node. Some functionality like the
overseer needs to run always when the node runs alongside a parachain
node. The parachain node needs the overseer to e.g. recover PoVs. Other
things like candidate validation or pvf checking are only required for
when the node is running as validator.
* FMT
* Fix CI
* pvf: use test-utils feature to export test only
* adding comment to test-utils feature
* make prepare-worker and execute-worker as optional dependencies and add comments to test-utils
* remove doc hidden from pvf testing
* add prepare worker and execute worker entrypoints to test-utils feature
* pvf: add sp_tracing as optional dependency of test-utils
* add test-utils for polkadot and malus
* add test-utils feature to prepare and execute workers script
* remove required features from prepare and executing
* Try to trigger CI again to fix broken jobs
---------
Co-authored-by: Marcin S <marcin@realemail.net>
* [WIP] PVF: Split out worker binaries
* Address compilation problems and re-design a bit
* Reorganize once more, fix tests
* Reformat with new nightly to make `cargo fmt` test happy
* Address `clippy` warnings
* Add temporary trace to debug zombienet tests
* Fix zombienet node upgrade test
* Fix malus and its CI
* Fix building worker binaries with malus
* More fixes for malus
* Remove unneeded cli subcommands
* Support placing auxiliary binaries to `/usr/libexec`
* Fix spelling
* Spelling
Co-authored-by: Marcin S. <marcin@realemail.net>
* Implement review comments (mostly nits)
* Fix worker node version flag
* Rework getting the worker paths
* Address a couple of review comments
* Minor restructuring
* Fix CI error
* Add tests for worker binaries detection
* Improve tests; try to fix CI
* Move workers module into separate file
* Try to fix failing test and workers not printing latest version
- Tests were not finding the worker binaries
- Workers were not being rebuilt when the version changed
- Made some errors easier to read
* Make a bunch of fixes
* Rebuild nodes on version change
* Fix more issues
* Fix tests
* Pass node version from node into dependencies to avoid recompiles
- [X] get version in CLI
- [X] pass it in to service
- [X] pass version along to PVF
- [X] remove rerun from service
- [X] add rerun to CLI
- [X] don’t rerun pvf/worker’s (these should be built by nodes which have rerun enabled)
* Some more improvements for smoother tests
- [X] Fix tests
- [X] Make puppet workers pass None for version and remove rerun
- [X] Make test collators self-contained
* Add back rerun to PVF workers
* Move worker binaries into files in cli crate
As a final optimization I've separated out each worker binary from its own crate
into the CLI crate. Before, the worker bin shared a crate with the worker lib,
so when the binaries got recompiled so did the libs and everything transitively
depending on the libs. This commit fixes this regression that was causing
recompiles after every commit.
* Fix bug (was passing worker version for node version)
* Move workers out of cli into root src/bin/ dir
- [X] Pass in node version from top-level (polkadot)
- [X] Add build.rs with rerun-git-head to root dir
* Add some sanity checks for workers to dockerfiles
* Update malus
+ [X] Make it self-contained
+ [X] Undo multiple binary changes
* Try to fix clippy errors
* Address `cargo run` issue
- [X] Add default-run for polkadot
- [X] Add note about installation to error
* Update readme (installation instructions)
* Allow disabling external workers for local/testing setups
+ [X] cli flag to enable single-binary mode
+ [X] Add message to error
* Revert unnecessary Cargo.lock changes
* Remove unnecessary build scripts from collators
* Add back missing malus commands (should fix failing ZN job)
* Some minor fixes
* Update Cargo.lock
* Fix some build errors
* Undo self-contained binaries; cli flag to disable version check
+ [X] Remove --dont-run-external-workers
+ [X] Add --disable-worker-version-check
+ [X] Remove PVF subcommands
+ [X] Redo malus changes
* Try to fix failing job and add some docs for local tests
---------
Co-authored-by: Dmitry Sinyavin <dmitry.sinyavin@parity.io>
Co-authored-by: s0me0ne-unkn0wn <48632512+s0me0ne-unkn0wn@users.noreply.github.com>
Co-authored-by: parity-processbot <>
* add tests to worker common thread
* fix formatting
* move worker commons unit test from integration tests to worker file and do some improvements
* fix import on it/worker_common
* move worker commons unit test to test module
* cargo fmt
* move cpu_time_monitor_loop to test outside of thread module
* change worker thread unit test to use assert_eq
* fix formatting
* adding new methods to WaitOucome, fix pvf worker unit test
* fix formatting
* remove is_finished and is_timeout methods from WaitOutcome
* fix wait_for_threads_with_timeout_returns_outcome test
* ".git/.scripts/commands/fmt/fmt.sh"
* add common worker cond_notify_on_done_should_update_wait_outcome_when_panic test
---------
Co-authored-by: Marcin S <marcin@realemail.net>
Co-authored-by: command-bot <>
* Begin adding landlock + test
* Move PVF implementer's guide section to own page, document security
* Implement test
* Add some docs
* Do some cleanup
* Fix typo
* Warn on host startup if landlock is not supported
* Clarify docs a bit
* Minor improvements
* Add some docs about determinism
* Address review comments (mainly add warning on landlock error)
* Update node/core/pvf/src/host.rs
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* Update node/core/pvf/src/host.rs
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* Fix unused fn
* Update ABI docs to reflect latest discussions
* Remove outdated notes
* Try to trigger new test-linux-oldkernel-stable job
Job introduced in https://github.com/paritytech/polkadot/pull/7371.
---------
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* PVF: Refactor workers into separate crates, remove host dependency
* Fix compile error
* Remove some leftover code
* Fix compile errors
* Update Cargo.lock
* Remove worker main.rs files
I accidentally copied these from the other PR. This PR isn't intended to
introduce standalone workers yet.
* Address review comments
* cargo fmt
* Update a couple of comments
* Update log targets
* PVF: Remove `rayon` and some uses of `tokio`
1. We were using `rayon` to spawn a superfluous thread to do execution, so it was removed.
2. We were using `rayon` to set a threadpool-specific thread stack size, and AFAIK we couldn't do that with `tokio` (it's possible [per-runtime](https://docs.rs/tokio/latest/tokio/runtime/struct.Builder.html#method.thread_stack_size) but not per-thread). Since we want to remove `tokio` from the workers [anyway](https://github.com/paritytech/polkadot/issues/7117), I changed it to spawn threads with the `std::thread` API instead of `tokio`.[^1]
[^1]: NOTE: This PR does not totally remove the `tokio` dependency just yet.
3. Since `std::thread` API is not async, we could no longer `select!` on the threads as futures, so the `select!` was changed to a naive loop.
4. The order of thread selection was flipped to make (3) sound (see note in code).
I left some TODO's related to panics which I'm going to address soon as part of https://github.com/paritytech/polkadot/issues/7045.
* PVF: Vote invalid on panics in execution thread (after a retry)
Also make sure we kill the worker process on panic errors and internal errors to
potentially clear any error states independent of the candidate.
* Address a couple of TODOs
Addresses a couple of follow-up TODOs from
https://github.com/paritytech/polkadot/pull/7153.
* Add some documentation to implementer's guide
* Fix compile error
* Fix compile errors
* Fix compile error
* Update roadmap/implementers-guide/src/node/utility/candidate-validation.md
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* Address comments + couple other changes (see message)
- Measure the CPU time in the prepare thread, so the observed time is not
affected by any delays in joining on the thread.
- Measure the full CPU time in the execute thread.
* Implement proper thread synchronization
Use condvars i.e. `Arc::new((Mutex::new(true), Condvar::new()))` as per the std
docs.
Considered also using a condvar to signal the CPU thread to end, in place of an
mpsc channel. This was not done because `Condvar::wait_timeout_while` is
documented as being imprecise, and `mpsc::Receiver::recv_timeout` is not
documented as such. Also, we would need a separate condvar, to avoid this case:
the worker thread finishes its job, notifies the condvar, the CPU thread returns
first, and we join on it and not the worker thread. So it was simpler to leave
this part as is.
* Catch panics in threads so we always notify condvar
* Use `WaitOutcome` enum instead of bool condition variable
* Fix retry timeouts to depend on exec timeout kind
* Address review comments
* Make the API for condvars in workers nicer
* Add a doc
* Use condvar for memory stats thread
* Small refactor
* Enumerate internal validation errors in an enum
* Fix comment
* Add a log
* Fix test
* Update variant naming
* Address a missed TODO
---------
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>