* Make `issue_explicit_statement_with_index` regular function
* Make `issue_backing_statement_with_index` regular function
* Issue `RevertBlocks` as soon as a dispute has `byzantine threshold + 1` invalid votes.
* Remove a comment
* Fix `has_fresh_byzantine_threshold_against()`
* Extend `informs_chain_selection_when_dispute_concluded_against` test
* PVF: Remove `rayon` and some uses of `tokio`
1. We were using `rayon` to spawn a superfluous thread to do execution, so it was removed.
2. We were using `rayon` to set a threadpool-specific thread stack size, and AFAIK we couldn't do that with `tokio` (it's possible [per-runtime](https://docs.rs/tokio/latest/tokio/runtime/struct.Builder.html#method.thread_stack_size) but not per-thread). Since we want to remove `tokio` from the workers [anyway](https://github.com/paritytech/polkadot/issues/7117), I changed it to spawn threads with the `std::thread` API instead of `tokio`.[^1]
[^1]: NOTE: This PR does not totally remove the `tokio` dependency just yet.
3. Since `std::thread` API is not async, we could no longer `select!` on the threads as futures, so the `select!` was changed to a naive loop.
4. The order of thread selection was flipped to make (3) sound (see note in code).
I left some TODO's related to panics which I'm going to address soon as part of https://github.com/paritytech/polkadot/issues/7045.
* PVF: Vote invalid on panics in execution thread (after a retry)
Also make sure we kill the worker process on panic errors and internal errors to
potentially clear any error states independent of the candidate.
* Address a couple of TODOs
Addresses a couple of follow-up TODOs from
https://github.com/paritytech/polkadot/pull/7153.
* Add some documentation to implementer's guide
* Fix compile error
* Fix compile errors
* Fix compile error
* Update roadmap/implementers-guide/src/node/utility/candidate-validation.md
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* Address comments + couple other changes (see message)
- Measure the CPU time in the prepare thread, so the observed time is not
affected by any delays in joining on the thread.
- Measure the full CPU time in the execute thread.
* Implement proper thread synchronization
Use condvars i.e. `Arc::new((Mutex::new(true), Condvar::new()))` as per the std
docs.
Considered also using a condvar to signal the CPU thread to end, in place of an
mpsc channel. This was not done because `Condvar::wait_timeout_while` is
documented as being imprecise, and `mpsc::Receiver::recv_timeout` is not
documented as such. Also, we would need a separate condvar, to avoid this case:
the worker thread finishes its job, notifies the condvar, the CPU thread returns
first, and we join on it and not the worker thread. So it was simpler to leave
this part as is.
* Catch panics in threads so we always notify condvar
* Use `WaitOutcome` enum instead of bool condition variable
* Fix retry timeouts to depend on exec timeout kind
* Address review comments
* Make the API for condvars in workers nicer
* Add a doc
* Use condvar for memory stats thread
* Small refactor
* Enumerate internal validation errors in an enum
* Fix comment
* Add a log
* Fix test
* Update variant naming
* Address a missed TODO
---------
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* Replace `RollingSessionWindow` with `RuntimeInfo` - initial commit
* Fix tests in import
* Fix the rest of the tests
* Remove dead code
* Fix todos
* Simplify session caching
* Comments for `SessionInfoProvider`
* Separate `SessionInfoProvider` from `State`
* `cache_session_info_for_head` becomes freestanding function
* Remove unneeded `mut` usage
* fn session_info -> fn get_session_info() to avoid name clashes. The function also tries to initialize `SessionInfoProvider`
* Fix SessionInfo retrieval
* Code cleanup
* Don't wrap `SessionInfoProvider` in an `Option`
* Remove `earliest_session()`
* Remove pre-caching -> wip
* Fix some tests and code cleanup
* Fix all tests
* Fixes in tests
* Fix comments, variable names and small style changes
* Fix a warning
* impl From<SessionWindowSize> for NonZeroUsize
* Fix logging for `get_session_info` - remove redundant logs and decrease log level to DEBUG
* Code review feedback
There is a deny(clippy::dbg_macro) in the crate root, so newer
Clippy fails here since tests use dbg.
But dbg in tests are fine IMHO.
Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
* companion for #13384
* adjust parsing RPC address from process output
* update rpc cli
* update lockfile for {"substrate"}
* bump zombienet v1.3.48
* bump zombienet version
* allow zombienet-tests-misc-upgrade-node to fail
* add comment and issue link to allowed_failure
* grumbles: disable failed job
* disabled the correct test
---------
Co-authored-by: parity-processbot <>
Co-authored-by: Javier Viola <javier@parity.io>
* Pass `SessionInfo` directly to `CandidateEnvironment::new` otherwise it should be an async function
* Replace calls to `RollingSessionWindow` with `RuntimeInfo`
Adjust `dispute-coordinator` initialization to use `RuntimeInfo`
* Modify `dispute-coordinator` initialization
* Pass `Hash` to `process_on_chain_votes` so that `RuntimeInfo` calls can be made
Remove some fixmes
* Pass `Hash` to `handle_import_statements` to perform `RuntimeInfo` calls
* remove todo comments
* Remove `error` from `Initialized`
Rework new session handling code
* Remove db code which is no longer used
* Update stale comment and remove unneeded type specification
* Cache SessionInfo on startup
* Use `DISPUTE_WINDOW` from primitives
* Fix caching in `process_active_leaves_update`
* handle_import_statements: leaf_hash -> block_hash
* Restore `ensure_available_session_info`
* Don't interrupt `process_on_chain_votes` if SessionInfo can't be fetched
* Small style improvements in logging
* process_on_chain_votes: leaf_hash -> block_hash
* Restore `note_earliest_session` - it is required to prune disputes and votes
* Cache new sessions only when there is an actual session change
* Fix tests
* `CandidateEnvironment::new` gets `session_idx` and fetches SessionInfo by itself to avoid the invariant where the input SessionIndex and SessionInfo parameters don't match
* Fix handling of missing session info
* Move sessions caching in `handle_startup` and fix tests
* Load `relay_parent` from db in `handle_import_statements` instead of passing it as a parameter via two functions
* Don't do two db reads
* Fix the twisted logic in `handle_import_statements`
* fixup
* Small style fix
* Decrease log levels for caching errors to debug and fix a typo
* Update outdated comment
* Remove `ensure_available_session_info`
* Load relay parent from db in `process_on_chain_votes`
* Revert "Load relay parent from db in `process_on_chain_votes`"
This reverts commit 978ad4f223d517faa7a7fbad96e3f8de4fa17501.
* Keep track of highest seen session and last session cached without gaps.
* Apply suggestions from code review
Co-authored-by: ordian <write@reusable.software>
* Handle session caching failure on startup correctly
* Update node/core/dispute-coordinator/src/initialized.rs
Co-authored-by: ordian <write@reusable.software>
* Simplify session caching retries
* Update stale comment
* Fix lower bound calculation for session caching
---------
Co-authored-by: ordian <write@reusable.software>
* PVF: Don't dispute on missing artifact
A dispute should never be raised if the local cache doesn't provide a certain
artifact. You can not dispute based on this reason, as it is a local hardware
issue and not related to the candidate to check.
Design:
Currently we assume that if we prepared an artifact, it remains there on-disk
until we prune it, i.e. we never check again if it's still there.
We can change it so that instead of artifact-not-found triggering a dispute, we
retry once (like we do for AmbiguousWorkerDeath, except we don't dispute if it
still doesn't work). And when enqueuing an execute job, we check for the
artifact on-disk, and start preparation if not found.
Changes:
- [x] Integration test (should fail without the following changes)
- [x] Check if artifact exists when executing, prepare if not
- [x] Return an internal error when file is missing
- [x] Retry once on internal errors
- [x] Document design (update impl guide)
* Add some context to wasm error message (it is quite long)
* Fix impl guide
* Add check for missing/inaccessible file
* Add comment referencing Substrate issue
* Add test for retrying internal errors
---------
Co-authored-by: parity-processbot <>
* Switch to DNS name based bootnodes for Rococo
Signed-off-by: bakhtin <a@bakhtin.net>
* Switch Rococo bootnodes from WS to WSS
Signed-off-by: bakhtin <a@bakhtin.net>
---------
Signed-off-by: bakhtin <a@bakhtin.net>
* globally upgrade syn to 1.0.109
* globally upgrade quote to 1.0.26
* globally upgrade proc-macro2 to 1.0.56
* globally bump syn to v2.0.13
* update expander to v1.0.0
* temporary commit to prove new version of expander works
(new version hasn't been released yet so using git)
* use expander 2.0.0
* upgrade to syn 2.0.14
* update lock file
* Happy New Year!
* Remove year entierly
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
* Remove years from copyright notice in the entire repo
---------
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
* Onchain scraper in `dispute-coordinator` will scrape `SCRAPED_FINALIZED_BLOCKS_COUNT` blocks before finality
The purpose is to make the availability of a `CandidateReceipt` for finalized candidates more likely.
For details see: https://github.com/paritytech/polkadot/issues/7009
* Fix off by one error
* Replace `SCRAPED_FINALIZED_BLOCKS_COUNT` with `DISPUTE_CANDIDATE_LIFETIME_AFTER_FINALIZATION`
* Move version check to `worker_event_loop`
* More minor refactors
- More consistent use of `format_invalid` and `format_internal`.
- Fix a doc error.
- Fix some poorly-named local variables.
* Pass the PerLeafSpan as mutable reference to handle_new_head function
* cargo +nightly fmt --all
* Add mock span for test
* cargo +nightly fmt --all
* add new-blocks-hashes to span
* ref span in match statement, set span to disabled if not passed
* remove second match clause, make handle_new_head_span mutable
* cargo +nightly fmt --all
* improve tag on error and warning
* add imported blocks and info span
* cargo +nightly fmt --all
* Improve error for imported_blocks_and_info trace
* format tags on get_header_span
* add lost-to-finality tag
* add missing bracket
* - Add bitfield child span
- Add block db insertion span
* - fix update-bitfield span tag
* - Fix type conversion to u64
- Add missing argument
* - Cargo fmt
* - Test add_follows_from
* - Revert as relationship between spans not working correctly
* - use drop to test if parent-child relationship can be re-established
* - remove bitfield span, check if parent-child relationship can be reestablished
* - Remove dangling bitfield span which is not used, to see if parent-child relationship can be re-established
* Another dangling bitfield span
* cargo fmt
* - add imported blocks and info span
- add candidate span per candidate
* add tags before moving block_header to push scope
* - Add db-insertion span
* cargo fmt
* fix types
* * Pass mutable reference to span in handle_new_head
* Change get-header-span tags in handle_new_head
* Create cache-session-info span in handle_new_head
* Create optional argument in determine_new_blocks
* Pass mutable reference to handle_new_head_span in determine_new_blocks in handle_new_head function
* Add candidate-hash, candidate-number, lost-to-finality tags to candidate_span in handle_new_head function
* Manually drop db_insertion_span and remove superfluous tags to it, only keeping approved-bitfields tag
* Add ApprovalVoting stage in jaeger
* * Pass mutable reference to jaeger::Span in stead of PerLeafSpan
* Add block-import span
* *Pass optional_span (optional argument) to determine_new_blocks util function
* * Add num-candidates int tag to block_import_span
* * Add head tag to cache_session_span
* * Create PerLeafSpan in handle_from_overseer (this is required to establish parent-child relationship between approval-voting span, and leaf-activated root span)
* * Add candidate-import-span as child of block-import-span
* Add candidate-hash and num-approval tags to candidate-import-span
* * Fix num-candidate tag to bitvec-len tag in candidate-import-span
* *Fix imported_blocKs_and_info span to create new-block-span as not dealing with candidates
* Consider the future::select! block
* Use HashMap<Hash, jaeger::PerLeafSpan>
* Remove Stage 9
* Add missing spans
* cargo +nightly fmt --all
* Remove optional span argument for determine_new_blocks
* * Remove no-longer needed default PerLeafSpan implementation
* Remove no-longer necessary mock span given re-factoring of handle_new_head() no longer neeing mutable span
* Split validation-result and request-data (availability and validation code) spans into two by dropping request_validation_data_spans
* Remove drop statements for cache_session_info_span
*
* Remove unnecessary span
* Remove another excessively spammy span
* Add missing spans from State in import tests
* Use functional approach to get spans
* - Add functional approach for the approval-voting span
- Add doc on block_numbers given labelling ambiguity
- Add span pruning logic
- Use .add_para_id on validation_result_span
* Replace for hash_set in hash_set_iter with map closure
* cargo +nightly fmt --all
* Change from unconsumed `map` to `.for_each`
* cargo +nightly fmt --all
* Refactor add_para_id to validation_result_span
* cargo +nightly fmt --all
* Remove duplicate tag
* Add missing tag to handle-approved-ancestor span
* Refactor span pruning to only invoke retain once
* Typo in span name
* - Replace unwrap_or with unwrap_or_else due to lazy evaluation of trace-identifier in polkadot_node_jaeger
- Remove some redundant spans
* Add approval-distribution spans
* - Add unwrap_or_else on note-approved-in-chain-selection
- Use child_with_trace_id to add traceID string tag on span (note this does not change the traceID, but just adds a tag)
* cargo +nightly fmt --all
* - Add traceID tags were necessary in approval-voting and availability-distribution
- Always use block-hash tag in stead of relay-parent tag in approval-distribution
* Remove schedule-wakeup span as it will duplicate spans on existing wakeups (which should be a no-op)
* Remove a couple of warnings related to mutability
* Fix failing tests in availability distribution
* Add traceID tag to launch-approval and validation-result
* Reshuffle the validation and validation result spans to where more appropriate and add block-hash tag
* - Add tranche and should-trigger tag to process-wakeup span
- Add candidate-hash and traceID to check-and-import-approval span
* cargo fmt
* - Adjustments after PR comments
* Move span pruning after other pruning logic
* Remove DerefMut - no longer needed
* Relabel request-chunk spans
* - Fix typo in span label
- Add docs for drops
* Add new approval-voting span pruning logic
* Undo removal of !
* cargo fmt