* pvf: use test-utils feature to export test only
* adding comment to test-utils feature
* make prepare-worker and execute-worker as optional dependencies and add comments to test-utils
* remove doc hidden from pvf testing
* add prepare worker and execute worker entrypoints to test-utils feature
* pvf: add sp_tracing as optional dependency of test-utils
* add test-utils for polkadot and malus
* add test-utils feature to prepare and execute workers script
* remove required features from prepare and executing
* Try to trigger CI again to fix broken jobs
---------
Co-authored-by: Marcin S <marcin@realemail.net>
* [WIP] PVF: Split out worker binaries
* Address compilation problems and re-design a bit
* Reorganize once more, fix tests
* Reformat with new nightly to make `cargo fmt` test happy
* Address `clippy` warnings
* Add temporary trace to debug zombienet tests
* Fix zombienet node upgrade test
* Fix malus and its CI
* Fix building worker binaries with malus
* More fixes for malus
* Remove unneeded cli subcommands
* Support placing auxiliary binaries to `/usr/libexec`
* Fix spelling
* Spelling
Co-authored-by: Marcin S. <marcin@realemail.net>
* Implement review comments (mostly nits)
* Fix worker node version flag
* Rework getting the worker paths
* Address a couple of review comments
* Minor restructuring
* Fix CI error
* Add tests for worker binaries detection
* Improve tests; try to fix CI
* Move workers module into separate file
* Try to fix failing test and workers not printing latest version
- Tests were not finding the worker binaries
- Workers were not being rebuilt when the version changed
- Made some errors easier to read
* Make a bunch of fixes
* Rebuild nodes on version change
* Fix more issues
* Fix tests
* Pass node version from node into dependencies to avoid recompiles
- [X] get version in CLI
- [X] pass it in to service
- [X] pass version along to PVF
- [X] remove rerun from service
- [X] add rerun to CLI
- [X] don’t rerun pvf/worker’s (these should be built by nodes which have rerun enabled)
* Some more improvements for smoother tests
- [X] Fix tests
- [X] Make puppet workers pass None for version and remove rerun
- [X] Make test collators self-contained
* Add back rerun to PVF workers
* Move worker binaries into files in cli crate
As a final optimization I've separated out each worker binary from its own crate
into the CLI crate. Before, the worker bin shared a crate with the worker lib,
so when the binaries got recompiled so did the libs and everything transitively
depending on the libs. This commit fixes this regression that was causing
recompiles after every commit.
* Fix bug (was passing worker version for node version)
* Move workers out of cli into root src/bin/ dir
- [X] Pass in node version from top-level (polkadot)
- [X] Add build.rs with rerun-git-head to root dir
* Add some sanity checks for workers to dockerfiles
* Update malus
+ [X] Make it self-contained
+ [X] Undo multiple binary changes
* Try to fix clippy errors
* Address `cargo run` issue
- [X] Add default-run for polkadot
- [X] Add note about installation to error
* Update readme (installation instructions)
* Allow disabling external workers for local/testing setups
+ [X] cli flag to enable single-binary mode
+ [X] Add message to error
* Revert unnecessary Cargo.lock changes
* Remove unnecessary build scripts from collators
* Add back missing malus commands (should fix failing ZN job)
* Some minor fixes
* Update Cargo.lock
* Fix some build errors
* Undo self-contained binaries; cli flag to disable version check
+ [X] Remove --dont-run-external-workers
+ [X] Add --disable-worker-version-check
+ [X] Remove PVF subcommands
+ [X] Redo malus changes
* Try to fix failing job and add some docs for local tests
---------
Co-authored-by: Dmitry Sinyavin <dmitry.sinyavin@parity.io>
Co-authored-by: s0me0ne-unkn0wn <48632512+s0me0ne-unkn0wn@users.noreply.github.com>
Co-authored-by: parity-processbot <>
* add tests to worker common thread
* fix formatting
* move worker commons unit test from integration tests to worker file and do some improvements
* fix import on it/worker_common
* move worker commons unit test to test module
* cargo fmt
* move cpu_time_monitor_loop to test outside of thread module
* change worker thread unit test to use assert_eq
* fix formatting
* adding new methods to WaitOucome, fix pvf worker unit test
* fix formatting
* remove is_finished and is_timeout methods from WaitOutcome
* fix wait_for_threads_with_timeout_returns_outcome test
* ".git/.scripts/commands/fmt/fmt.sh"
* add common worker cond_notify_on_done_should_update_wait_outcome_when_panic test
---------
Co-authored-by: Marcin S <marcin@realemail.net>
Co-authored-by: command-bot <>
* Begin adding landlock + test
* Move PVF implementer's guide section to own page, document security
* Implement test
* Add some docs
* Do some cleanup
* Fix typo
* Warn on host startup if landlock is not supported
* Clarify docs a bit
* Minor improvements
* Add some docs about determinism
* Address review comments (mainly add warning on landlock error)
* Update node/core/pvf/src/host.rs
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* Update node/core/pvf/src/host.rs
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* Fix unused fn
* Update ABI docs to reflect latest discussions
* Remove outdated notes
* Try to trigger new test-linux-oldkernel-stable job
Job introduced in https://github.com/paritytech/polkadot/pull/7371.
---------
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* Warn if participated in the losing side of a dispute
* Update naming
* Additionally filter by candidate hash
* Debug zombienet tests
* Update 0002-parachains-disputes.zndsl
* Debug zombienet
* Update node/core/dispute-coordinator/src/initialized.rs
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* Add checking to zombienet tests
---------
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* Move vstaging to production (and thus past session slashing).
WIP: test-runtime still needs to be fixed.
* Fix test-runtime.
---------
Co-authored-by: eskimor <eskimor@no-such-url.com>
* av-store: Move prune on a separate thread
There are situations where pruning of the data could take more than a few
seconds and that might make the whole subsystem unreponsive. To avoid this just
move the prune process on a separate thread.
See: https://github.com/paritytech/polkadot/issues/7237, for more details.
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
* av-store: Add log that prunning started
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
* av-store: modify log severity
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
---------
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
* runtime/vstaging: unapplied_slashes runtime API
* runtime/vstaging: key_ownership_proof runtime API
* runtime/ParachainHost: submit_report_dispute_lost
* fix key_ownership_proof API
* runtime: submit_report_dispute_lost runtime API
* nits
* Update node/subsystem-types/src/messages.rs
Co-authored-by: Marcin S. <marcin@bytedude.com>
* revert unrelated fmt changes
* dispute-coordinator: past session dispute slashing
* encapsule runtime api call for submitting report
* prettify: extract into a function
* do not exit on runtime api error
* fix tests
* try initial zombienet test
* try something
* fix a typo
* try cumulus-based collator
* fix clippy
* build polkadot-debug images with fast-runtime enabled
* wip
* runtime/inclusion: fix availability_threshold
* fix wip
* fix wip II
* revert native provider
* propagate tx submission error
* DEBUG: sync=trace
* print key ownership proof len
* panic repro
* log validator index in panic message
* post merge fixes
* replace debug assertion with a log
* fix compilation
* Let's log the dispatch info in validate block.
* fix double encoding
* Revert "Let's log the dispatch info in validate block."
This reverts commit a70fbc51b464d7f4355dbada5e16cd83cf71eab4.
* Revert "Let's log the dispatch info in validate block."
This reverts commit a70fbc51b464d7f4355dbada5e16cd83cf71eab4.
* fix compilation
* update to latest zombienet and fix test
* lower finality lag to 11
* bump zombienet again
* add a workaround, but still does not work
* Update .gitlab-ci.yml
bump zombienet.
* add a comment and search logs on all nodes
---------
Co-authored-by: Marcin S. <marcin@bytedude.com>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Javier Viola <javier@parity.io>
* Replace `RollingSessionWindow` with `RuntimeInfo` - initial commit
* Fix tests in import
* Fix the rest of the tests
* Remove dead code
* Fix todos
* Simplify session caching
* Comments for `SessionInfoProvider`
* Separate `SessionInfoProvider` from `State`
* `cache_session_info_for_head` becomes freestanding function
* Remove unneeded `mut` usage
* fn session_info -> fn get_session_info() to avoid name clashes. The function also tries to initialize `SessionInfoProvider`
* Fix SessionInfo retrieval
* Code cleanup
* Don't wrap `SessionInfoProvider` in an `Option`
* Remove `earliest_session()`
* Remove pre-caching -> wip
* Fix some tests and code cleanup
* Fix all tests
* Fixes in tests
* Fix comments, variable names and small style changes
* Fix a warning
* impl From<SessionWindowSize> for NonZeroUsize
* Fix logging for `get_session_info` - remove redundant logs and decrease log level to DEBUG
* Code review feedback
* Storage migration removing `COL_SESSION_WINDOW_DATA` from parachains db
* Remove `col_session_data` usages
* Storage migration clearing columns w/o removing them
* Remove session data column usages from `approval-voting` and `dispute-coordinator` tests
* Add some test cases from `RollingSessionWindow` to `dispute-coordinator` tests
* Fix formatting in initialized.rs
* Fix a corner case in `SessionInfo` caching for `dispute-coordinator`
* Remove `RollingSessionWindow` ;(
* Revert "Fix formatting in initialized.rs"
This reverts commit 0f94664ec9f3a7e3737a30291195990e1e7065fc.
* v2 to v3 migration drops `COL_DISPUTE_COORDINATOR_DATA` instead of clearing it
* Fix `NUM_COLUMNS` in `approval-voting`
* Use `columns::v3::NUM_COLUMNS` when opening db
* Update node/service/src/parachains_db/upgrade.rs
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* Don't write in `COL_DISPUTE_COORDINATOR_DATA` for `test_rocksdb_migrate_2_to_3`
* Fix `NUM+COLUMNS` in approval_voting
* Fix formatting
* Fix columns usage
* Clarification comments about the different db versions
---------
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* PVF: Refactor workers into separate crates, remove host dependency
* Fix compile error
* Remove some leftover code
* Fix compile errors
* Update Cargo.lock
* Remove worker main.rs files
I accidentally copied these from the other PR. This PR isn't intended to
introduce standalone workers yet.
* Address review comments
* cargo fmt
* Update a couple of comments
* Update log targets
* Make `issue_explicit_statement_with_index` regular function
* Make `issue_backing_statement_with_index` regular function
* Issue `RevertBlocks` as soon as a dispute has `byzantine threshold + 1` invalid votes.
* Remove a comment
* Fix `has_fresh_byzantine_threshold_against()`
* Extend `informs_chain_selection_when_dispute_concluded_against` test
* PVF: Remove `rayon` and some uses of `tokio`
1. We were using `rayon` to spawn a superfluous thread to do execution, so it was removed.
2. We were using `rayon` to set a threadpool-specific thread stack size, and AFAIK we couldn't do that with `tokio` (it's possible [per-runtime](https://docs.rs/tokio/latest/tokio/runtime/struct.Builder.html#method.thread_stack_size) but not per-thread). Since we want to remove `tokio` from the workers [anyway](https://github.com/paritytech/polkadot/issues/7117), I changed it to spawn threads with the `std::thread` API instead of `tokio`.[^1]
[^1]: NOTE: This PR does not totally remove the `tokio` dependency just yet.
3. Since `std::thread` API is not async, we could no longer `select!` on the threads as futures, so the `select!` was changed to a naive loop.
4. The order of thread selection was flipped to make (3) sound (see note in code).
I left some TODO's related to panics which I'm going to address soon as part of https://github.com/paritytech/polkadot/issues/7045.
* PVF: Vote invalid on panics in execution thread (after a retry)
Also make sure we kill the worker process on panic errors and internal errors to
potentially clear any error states independent of the candidate.
* Address a couple of TODOs
Addresses a couple of follow-up TODOs from
https://github.com/paritytech/polkadot/pull/7153.
* Add some documentation to implementer's guide
* Fix compile error
* Fix compile errors
* Fix compile error
* Update roadmap/implementers-guide/src/node/utility/candidate-validation.md
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* Address comments + couple other changes (see message)
- Measure the CPU time in the prepare thread, so the observed time is not
affected by any delays in joining on the thread.
- Measure the full CPU time in the execute thread.
* Implement proper thread synchronization
Use condvars i.e. `Arc::new((Mutex::new(true), Condvar::new()))` as per the std
docs.
Considered also using a condvar to signal the CPU thread to end, in place of an
mpsc channel. This was not done because `Condvar::wait_timeout_while` is
documented as being imprecise, and `mpsc::Receiver::recv_timeout` is not
documented as such. Also, we would need a separate condvar, to avoid this case:
the worker thread finishes its job, notifies the condvar, the CPU thread returns
first, and we join on it and not the worker thread. So it was simpler to leave
this part as is.
* Catch panics in threads so we always notify condvar
* Use `WaitOutcome` enum instead of bool condition variable
* Fix retry timeouts to depend on exec timeout kind
* Address review comments
* Make the API for condvars in workers nicer
* Add a doc
* Use condvar for memory stats thread
* Small refactor
* Enumerate internal validation errors in an enum
* Fix comment
* Add a log
* Fix test
* Update variant naming
* Address a missed TODO
---------
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* Replace `RollingSessionWindow` with `RuntimeInfo` - initial commit
* Fix tests in import
* Fix the rest of the tests
* Remove dead code
* Fix todos
* Simplify session caching
* Comments for `SessionInfoProvider`
* Separate `SessionInfoProvider` from `State`
* `cache_session_info_for_head` becomes freestanding function
* Remove unneeded `mut` usage
* fn session_info -> fn get_session_info() to avoid name clashes. The function also tries to initialize `SessionInfoProvider`
* Fix SessionInfo retrieval
* Code cleanup
* Don't wrap `SessionInfoProvider` in an `Option`
* Remove `earliest_session()`
* Remove pre-caching -> wip
* Fix some tests and code cleanup
* Fix all tests
* Fixes in tests
* Fix comments, variable names and small style changes
* Fix a warning
* impl From<SessionWindowSize> for NonZeroUsize
* Fix logging for `get_session_info` - remove redundant logs and decrease log level to DEBUG
* Code review feedback
* Pass `SessionInfo` directly to `CandidateEnvironment::new` otherwise it should be an async function
* Replace calls to `RollingSessionWindow` with `RuntimeInfo`
Adjust `dispute-coordinator` initialization to use `RuntimeInfo`
* Modify `dispute-coordinator` initialization
* Pass `Hash` to `process_on_chain_votes` so that `RuntimeInfo` calls can be made
Remove some fixmes
* Pass `Hash` to `handle_import_statements` to perform `RuntimeInfo` calls
* remove todo comments
* Remove `error` from `Initialized`
Rework new session handling code
* Remove db code which is no longer used
* Update stale comment and remove unneeded type specification
* Cache SessionInfo on startup
* Use `DISPUTE_WINDOW` from primitives
* Fix caching in `process_active_leaves_update`
* handle_import_statements: leaf_hash -> block_hash
* Restore `ensure_available_session_info`
* Don't interrupt `process_on_chain_votes` if SessionInfo can't be fetched
* Small style improvements in logging
* process_on_chain_votes: leaf_hash -> block_hash
* Restore `note_earliest_session` - it is required to prune disputes and votes
* Cache new sessions only when there is an actual session change
* Fix tests
* `CandidateEnvironment::new` gets `session_idx` and fetches SessionInfo by itself to avoid the invariant where the input SessionIndex and SessionInfo parameters don't match
* Fix handling of missing session info
* Move sessions caching in `handle_startup` and fix tests
* Load `relay_parent` from db in `handle_import_statements` instead of passing it as a parameter via two functions
* Don't do two db reads
* Fix the twisted logic in `handle_import_statements`
* fixup
* Small style fix
* Decrease log levels for caching errors to debug and fix a typo
* Update outdated comment
* Remove `ensure_available_session_info`
* Load relay parent from db in `process_on_chain_votes`
* Revert "Load relay parent from db in `process_on_chain_votes`"
This reverts commit 978ad4f223d517faa7a7fbad96e3f8de4fa17501.
* Keep track of highest seen session and last session cached without gaps.
* Apply suggestions from code review
Co-authored-by: ordian <write@reusable.software>
* Handle session caching failure on startup correctly
* Update node/core/dispute-coordinator/src/initialized.rs
Co-authored-by: ordian <write@reusable.software>
* Simplify session caching retries
* Update stale comment
* Fix lower bound calculation for session caching
---------
Co-authored-by: ordian <write@reusable.software>
* PVF: Don't dispute on missing artifact
A dispute should never be raised if the local cache doesn't provide a certain
artifact. You can not dispute based on this reason, as it is a local hardware
issue and not related to the candidate to check.
Design:
Currently we assume that if we prepared an artifact, it remains there on-disk
until we prune it, i.e. we never check again if it's still there.
We can change it so that instead of artifact-not-found triggering a dispute, we
retry once (like we do for AmbiguousWorkerDeath, except we don't dispute if it
still doesn't work). And when enqueuing an execute job, we check for the
artifact on-disk, and start preparation if not found.
Changes:
- [x] Integration test (should fail without the following changes)
- [x] Check if artifact exists when executing, prepare if not
- [x] Return an internal error when file is missing
- [x] Retry once on internal errors
- [x] Document design (update impl guide)
* Add some context to wasm error message (it is quite long)
* Fix impl guide
* Add check for missing/inaccessible file
* Add comment referencing Substrate issue
* Add test for retrying internal errors
---------
Co-authored-by: parity-processbot <>
* Happy New Year!
* Remove year entierly
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
* Remove years from copyright notice in the entire repo
---------
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
* Onchain scraper in `dispute-coordinator` will scrape `SCRAPED_FINALIZED_BLOCKS_COUNT` blocks before finality
The purpose is to make the availability of a `CandidateReceipt` for finalized candidates more likely.
For details see: https://github.com/paritytech/polkadot/issues/7009
* Fix off by one error
* Replace `SCRAPED_FINALIZED_BLOCKS_COUNT` with `DISPUTE_CANDIDATE_LIFETIME_AFTER_FINALIZATION`
* Move version check to `worker_event_loop`
* More minor refactors
- More consistent use of `format_invalid` and `format_internal`.
- Fix a doc error.
- Fix some poorly-named local variables.
* Pass the PerLeafSpan as mutable reference to handle_new_head function
* cargo +nightly fmt --all
* Add mock span for test
* cargo +nightly fmt --all
* add new-blocks-hashes to span
* ref span in match statement, set span to disabled if not passed
* remove second match clause, make handle_new_head_span mutable
* cargo +nightly fmt --all
* improve tag on error and warning
* add imported blocks and info span
* cargo +nightly fmt --all
* Improve error for imported_blocks_and_info trace
* format tags on get_header_span
* add lost-to-finality tag
* add missing bracket
* - Add bitfield child span
- Add block db insertion span
* - fix update-bitfield span tag
* - Fix type conversion to u64
- Add missing argument
* - Cargo fmt
* - Test add_follows_from
* - Revert as relationship between spans not working correctly
* - use drop to test if parent-child relationship can be re-established
* - remove bitfield span, check if parent-child relationship can be reestablished
* - Remove dangling bitfield span which is not used, to see if parent-child relationship can be re-established
* Another dangling bitfield span
* cargo fmt
* - add imported blocks and info span
- add candidate span per candidate
* add tags before moving block_header to push scope
* - Add db-insertion span
* cargo fmt
* fix types
* * Pass mutable reference to span in handle_new_head
* Change get-header-span tags in handle_new_head
* Create cache-session-info span in handle_new_head
* Create optional argument in determine_new_blocks
* Pass mutable reference to handle_new_head_span in determine_new_blocks in handle_new_head function
* Add candidate-hash, candidate-number, lost-to-finality tags to candidate_span in handle_new_head function
* Manually drop db_insertion_span and remove superfluous tags to it, only keeping approved-bitfields tag
* Add ApprovalVoting stage in jaeger
* * Pass mutable reference to jaeger::Span in stead of PerLeafSpan
* Add block-import span
* *Pass optional_span (optional argument) to determine_new_blocks util function
* * Add num-candidates int tag to block_import_span
* * Add head tag to cache_session_span
* * Create PerLeafSpan in handle_from_overseer (this is required to establish parent-child relationship between approval-voting span, and leaf-activated root span)
* * Add candidate-import-span as child of block-import-span
* Add candidate-hash and num-approval tags to candidate-import-span
* * Fix num-candidate tag to bitvec-len tag in candidate-import-span
* *Fix imported_blocKs_and_info span to create new-block-span as not dealing with candidates
* Consider the future::select! block
* Use HashMap<Hash, jaeger::PerLeafSpan>
* Remove Stage 9
* Add missing spans
* cargo +nightly fmt --all
* Remove optional span argument for determine_new_blocks
* * Remove no-longer needed default PerLeafSpan implementation
* Remove no-longer necessary mock span given re-factoring of handle_new_head() no longer neeing mutable span
* Split validation-result and request-data (availability and validation code) spans into two by dropping request_validation_data_spans
* Remove drop statements for cache_session_info_span
*
* Remove unnecessary span
* Remove another excessively spammy span
* Add missing spans from State in import tests
* Use functional approach to get spans
* - Add functional approach for the approval-voting span
- Add doc on block_numbers given labelling ambiguity
- Add span pruning logic
- Use .add_para_id on validation_result_span
* Replace for hash_set in hash_set_iter with map closure
* cargo +nightly fmt --all
* Change from unconsumed `map` to `.for_each`
* cargo +nightly fmt --all
* Refactor add_para_id to validation_result_span
* cargo +nightly fmt --all
* Remove duplicate tag
* Add missing tag to handle-approved-ancestor span
* Refactor span pruning to only invoke retain once
* Typo in span name
* - Replace unwrap_or with unwrap_or_else due to lazy evaluation of trace-identifier in polkadot_node_jaeger
- Remove some redundant spans
* Add approval-distribution spans
* - Add unwrap_or_else on note-approved-in-chain-selection
- Use child_with_trace_id to add traceID string tag on span (note this does not change the traceID, but just adds a tag)
* cargo +nightly fmt --all
* - Add traceID tags were necessary in approval-voting and availability-distribution
- Always use block-hash tag in stead of relay-parent tag in approval-distribution
* Remove schedule-wakeup span as it will duplicate spans on existing wakeups (which should be a no-op)
* Remove a couple of warnings related to mutability
* Fix failing tests in availability distribution
* Add traceID tag to launch-approval and validation-result
* Reshuffle the validation and validation result spans to where more appropriate and add block-hash tag
* - Add tranche and should-trigger tag to process-wakeup span
- Add candidate-hash and traceID to check-and-import-approval span
* cargo fmt
* - Adjustments after PR comments
* Move span pruning after other pruning logic
* Remove DerefMut - no longer needed
* Relabel request-chunk spans
* - Fix typo in span label
- Add docs for drops
* Add new approval-voting span pruning logic
* Undo removal of !
* cargo fmt