* Replace `RollingSessionWindow` with `RuntimeInfo` - initial commit
* Fix tests in import
* Fix the rest of the tests
* Remove dead code
* Fix todos
* Simplify session caching
* Comments for `SessionInfoProvider`
* Separate `SessionInfoProvider` from `State`
* `cache_session_info_for_head` becomes freestanding function
* Remove unneeded `mut` usage
* fn session_info -> fn get_session_info() to avoid name clashes. The function also tries to initialize `SessionInfoProvider`
* Fix SessionInfo retrieval
* Code cleanup
* Don't wrap `SessionInfoProvider` in an `Option`
* Remove `earliest_session()`
* Remove pre-caching -> wip
* Fix some tests and code cleanup
* Fix all tests
* Fixes in tests
* Fix comments, variable names and small style changes
* Fix a warning
* impl From<SessionWindowSize> for NonZeroUsize
* Fix logging for `get_session_info` - remove redundant logs and decrease log level to DEBUG
* Code review feedback
* Pass `SessionInfo` directly to `CandidateEnvironment::new` otherwise it should be an async function
* Replace calls to `RollingSessionWindow` with `RuntimeInfo`
Adjust `dispute-coordinator` initialization to use `RuntimeInfo`
* Modify `dispute-coordinator` initialization
* Pass `Hash` to `process_on_chain_votes` so that `RuntimeInfo` calls can be made
Remove some fixmes
* Pass `Hash` to `handle_import_statements` to perform `RuntimeInfo` calls
* remove todo comments
* Remove `error` from `Initialized`
Rework new session handling code
* Remove db code which is no longer used
* Update stale comment and remove unneeded type specification
* Cache SessionInfo on startup
* Use `DISPUTE_WINDOW` from primitives
* Fix caching in `process_active_leaves_update`
* handle_import_statements: leaf_hash -> block_hash
* Restore `ensure_available_session_info`
* Don't interrupt `process_on_chain_votes` if SessionInfo can't be fetched
* Small style improvements in logging
* process_on_chain_votes: leaf_hash -> block_hash
* Restore `note_earliest_session` - it is required to prune disputes and votes
* Cache new sessions only when there is an actual session change
* Fix tests
* `CandidateEnvironment::new` gets `session_idx` and fetches SessionInfo by itself to avoid the invariant where the input SessionIndex and SessionInfo parameters don't match
* Fix handling of missing session info
* Move sessions caching in `handle_startup` and fix tests
* Load `relay_parent` from db in `handle_import_statements` instead of passing it as a parameter via two functions
* Don't do two db reads
* Fix the twisted logic in `handle_import_statements`
* fixup
* Small style fix
* Decrease log levels for caching errors to debug and fix a typo
* Update outdated comment
* Remove `ensure_available_session_info`
* Load relay parent from db in `process_on_chain_votes`
* Revert "Load relay parent from db in `process_on_chain_votes`"
This reverts commit 978ad4f223d517faa7a7fbad96e3f8de4fa17501.
* Keep track of highest seen session and last session cached without gaps.
* Apply suggestions from code review
Co-authored-by: ordian <write@reusable.software>
* Handle session caching failure on startup correctly
* Update node/core/dispute-coordinator/src/initialized.rs
Co-authored-by: ordian <write@reusable.software>
* Simplify session caching retries
* Update stale comment
* Fix lower bound calculation for session caching
---------
Co-authored-by: ordian <write@reusable.software>
* PVF: Don't dispute on missing artifact
A dispute should never be raised if the local cache doesn't provide a certain
artifact. You can not dispute based on this reason, as it is a local hardware
issue and not related to the candidate to check.
Design:
Currently we assume that if we prepared an artifact, it remains there on-disk
until we prune it, i.e. we never check again if it's still there.
We can change it so that instead of artifact-not-found triggering a dispute, we
retry once (like we do for AmbiguousWorkerDeath, except we don't dispute if it
still doesn't work). And when enqueuing an execute job, we check for the
artifact on-disk, and start preparation if not found.
Changes:
- [x] Integration test (should fail without the following changes)
- [x] Check if artifact exists when executing, prepare if not
- [x] Return an internal error when file is missing
- [x] Retry once on internal errors
- [x] Document design (update impl guide)
* Add some context to wasm error message (it is quite long)
* Fix impl guide
* Add check for missing/inaccessible file
* Add comment referencing Substrate issue
* Add test for retrying internal errors
---------
Co-authored-by: parity-processbot <>
* Happy New Year!
* Remove year entierly
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
* Remove years from copyright notice in the entire repo
---------
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
* Onchain scraper in `dispute-coordinator` will scrape `SCRAPED_FINALIZED_BLOCKS_COUNT` blocks before finality
The purpose is to make the availability of a `CandidateReceipt` for finalized candidates more likely.
For details see: https://github.com/paritytech/polkadot/issues/7009
* Fix off by one error
* Replace `SCRAPED_FINALIZED_BLOCKS_COUNT` with `DISPUTE_CANDIDATE_LIFETIME_AFTER_FINALIZATION`
* Move version check to `worker_event_loop`
* More minor refactors
- More consistent use of `format_invalid` and `format_internal`.
- Fix a doc error.
- Fix some poorly-named local variables.
* Pass the PerLeafSpan as mutable reference to handle_new_head function
* cargo +nightly fmt --all
* Add mock span for test
* cargo +nightly fmt --all
* add new-blocks-hashes to span
* ref span in match statement, set span to disabled if not passed
* remove second match clause, make handle_new_head_span mutable
* cargo +nightly fmt --all
* improve tag on error and warning
* add imported blocks and info span
* cargo +nightly fmt --all
* Improve error for imported_blocks_and_info trace
* format tags on get_header_span
* add lost-to-finality tag
* add missing bracket
* - Add bitfield child span
- Add block db insertion span
* - fix update-bitfield span tag
* - Fix type conversion to u64
- Add missing argument
* - Cargo fmt
* - Test add_follows_from
* - Revert as relationship between spans not working correctly
* - use drop to test if parent-child relationship can be re-established
* - remove bitfield span, check if parent-child relationship can be reestablished
* - Remove dangling bitfield span which is not used, to see if parent-child relationship can be re-established
* Another dangling bitfield span
* cargo fmt
* - add imported blocks and info span
- add candidate span per candidate
* add tags before moving block_header to push scope
* - Add db-insertion span
* cargo fmt
* fix types
* * Pass mutable reference to span in handle_new_head
* Change get-header-span tags in handle_new_head
* Create cache-session-info span in handle_new_head
* Create optional argument in determine_new_blocks
* Pass mutable reference to handle_new_head_span in determine_new_blocks in handle_new_head function
* Add candidate-hash, candidate-number, lost-to-finality tags to candidate_span in handle_new_head function
* Manually drop db_insertion_span and remove superfluous tags to it, only keeping approved-bitfields tag
* Add ApprovalVoting stage in jaeger
* * Pass mutable reference to jaeger::Span in stead of PerLeafSpan
* Add block-import span
* *Pass optional_span (optional argument) to determine_new_blocks util function
* * Add num-candidates int tag to block_import_span
* * Add head tag to cache_session_span
* * Create PerLeafSpan in handle_from_overseer (this is required to establish parent-child relationship between approval-voting span, and leaf-activated root span)
* * Add candidate-import-span as child of block-import-span
* Add candidate-hash and num-approval tags to candidate-import-span
* * Fix num-candidate tag to bitvec-len tag in candidate-import-span
* *Fix imported_blocKs_and_info span to create new-block-span as not dealing with candidates
* Consider the future::select! block
* Use HashMap<Hash, jaeger::PerLeafSpan>
* Remove Stage 9
* Add missing spans
* cargo +nightly fmt --all
* Remove optional span argument for determine_new_blocks
* * Remove no-longer needed default PerLeafSpan implementation
* Remove no-longer necessary mock span given re-factoring of handle_new_head() no longer neeing mutable span
* Split validation-result and request-data (availability and validation code) spans into two by dropping request_validation_data_spans
* Remove drop statements for cache_session_info_span
*
* Remove unnecessary span
* Remove another excessively spammy span
* Add missing spans from State in import tests
* Use functional approach to get spans
* - Add functional approach for the approval-voting span
- Add doc on block_numbers given labelling ambiguity
- Add span pruning logic
- Use .add_para_id on validation_result_span
* Replace for hash_set in hash_set_iter with map closure
* cargo +nightly fmt --all
* Change from unconsumed `map` to `.for_each`
* cargo +nightly fmt --all
* Refactor add_para_id to validation_result_span
* cargo +nightly fmt --all
* Remove duplicate tag
* Add missing tag to handle-approved-ancestor span
* Refactor span pruning to only invoke retain once
* Typo in span name
* - Replace unwrap_or with unwrap_or_else due to lazy evaluation of trace-identifier in polkadot_node_jaeger
- Remove some redundant spans
* Add approval-distribution spans
* - Add unwrap_or_else on note-approved-in-chain-selection
- Use child_with_trace_id to add traceID string tag on span (note this does not change the traceID, but just adds a tag)
* cargo +nightly fmt --all
* - Add traceID tags were necessary in approval-voting and availability-distribution
- Always use block-hash tag in stead of relay-parent tag in approval-distribution
* Remove schedule-wakeup span as it will duplicate spans on existing wakeups (which should be a no-op)
* Remove a couple of warnings related to mutability
* Fix failing tests in availability distribution
* Add traceID tag to launch-approval and validation-result
* Reshuffle the validation and validation result spans to where more appropriate and add block-hash tag
* - Add tranche and should-trigger tag to process-wakeup span
- Add candidate-hash and traceID to check-and-import-approval span
* cargo fmt
* - Adjustments after PR comments
* Move span pruning after other pruning logic
* Remove DerefMut - no longer needed
* Relabel request-chunk spans
* - Fix typo in span label
- Add docs for drops
* Add new approval-voting span pruning logic
* Undo removal of !
* cargo fmt
* Check spawned worker version vs node version before PVF preparation
* Address discussions
* Propagate errors and shutdown preparation and execution pipelines properly
* Add logs; Fix execution worker checks
* Revert "Propagate errors and shutdown preparation and execution pipelines properly"
This reverts commit b96cc3160ff58db5ff001d8ca0bfea9bd4bdd0f2.
* Don't try to shut down; report the condition and exit worker
* Get rid of `VersionMismatch` preparation error
* Merge master
* Add docs; Fix tests
* Update Cargo.lock
* Kill again, but only the main node process
* Move unsafe code to a common safe function
* Fix libc dependency error on MacOS
* pvf spawning: Add some logging, add a small integration test
* Minor fixes
* Restart CI
---------
Co-authored-by: Marcin S <marcin@realemail.net>
* Added participation and queue sizes metrics
* First draft of all metric code
* Tests pass
* Changed Metrics to field on participation + queues
* fmt
* Improving naming
* Refactor, placing timer in ParticipationRequest
* fmt
* Final cleanup
* Revert "Final cleanup"
This reverts commit 02e5608df64b2e0f7810905e4508673b2037d351.
* Changing metric names
* Implementing Eq only for unit tests
* fmt
* Removing Clone trait from ParticipationRequest
* fmt
* Moved clone functionality to tests helper
* fmt
* Fixing dropped timers on repeat requests
* Keep older best effort timers
* Removing comment redundency and explaining better
* Updating queue() to use single mem read
* fmt
* Added participation and queue sizes metrics
* First draft of all metric code
* Tests pass
* Changed Metrics to field on participation + queues
* fmt
* Improving naming
* Refactor, placing timer in ParticipationRequest
* fmt
* Final cleanup
* Revert "Final cleanup"
This reverts commit 02e5608df64b2e0f7810905e4508673b2037d351.
* Changing metric names
* Implementing Eq only for unit tests
* fmt
* Removing Clone trait from ParticipationRequest
* fmt
* Moved clone functionality to tests helper
* Added participation and queue sizes metrics
* First draft of all metric code
* Tests pass
* Changed Metrics to field on participation + queues
* fmt
* Improving naming
* Refactor, placing timer in ParticipationRequest
* fmt
* Final cleanup
* Revert "Final cleanup"
This reverts commit 02e5608df64b2e0f7810905e4508673b2037d351.
* Changing metric names
* Implementing Eq only for unit tests
* fmt
* Additional tracing in `provisioner`, `vote_selection`
* Add `fetched_onchain_disputes` metric to provisioner
* Some tracelines in dispute-coordinator
TODO: cherry pick this in the initial branch!!!
* Remove spammy logs
* Remove some trace lines
* Change `MaxMemorySize` to `MaxMemoryPages`
We should set the max memory for the executor in pages (64KiB) and not in bytes.
The wasm memory is always a multiple of a page and we should use the
same terminology.
* FMT
* Fix warning
* Use a `BoundedVec` in `ValidationResult`
> Use a `BoundedVec` for `upward_messages` and `horizontal_messages` in order to
> limit the number of individual messages/memory allocations right at decoding
> time. The reason for this is that the `ValidationResult` may contain a code
> upgrade (including a full PVF binary), so the total size limit can't be set
> too low and this limit will still allow several millions of upward messages,
> which will (due to the memory allocator overhead) already have a
> non-negligible memory footprint in decoded form.
* List all fields when hashing so we don't miss one
* Define types for `BoundedVec`s of messages
* Fix test compile errors
* Depend on `bounded-collections` 0.1.4 (fixes allocation issue)
* Fix compilation issue
* Derive `Hash` instead of manual `impl`
* Avoid use of unwrap
* Re-apply changes without Diener, rebase to the lastest master
* Cache pruning
* Bit-pack InstantiationStrategy
* Move ExecutorParams version inside the structure itself
* Rework runtime API and executor parameters storage
* Pass executor parameters through backing subsystem
* Update Cargo.lock
* Introduce `ExecutorParams` to approval voting subsys
* Introduce `ExecutorParams` to dispute coordinator
* `cargo fmt`
* Simplify requests from backing subsys
* Fix tests
* Replace manual config cloning with `.clone()`
* Move constants to module
* Parametrize executor performing PVF pre-check
* Fix Malus
* Fix test runtime
* Introduce session executor params as a constant defined by session info
pallet
* Use Parity SCALE codec instead of hand-crafted binary encoding
* Get rid of constants; Add docs
* Get rid of constants
* Minor typo
* Fix Malus after rebase
* `cargo fmt`
* Use transparent SCALE encoding instead of explicit
* Clean up
* Get rid of relay parent to session index mapping
* Join environment type and version in a single enum element
* Use default execution parameters if running an old runtime
* `unwrap()` -> `expect()`
* Correct API version
* Constants are back in town
* Use constants for execution environment types
* Artifact separation, first try
* Get rid of explicit version
* PVF execution queue worker separation
* Worker handshake
* Global renaming
* Minor fixes resolving discussions
* Two-stage requesting of executor params to make use of runtime API cache
* Proper error handling in pvf-checker
* Executor params storage bootstrapping
* Propagate migration to v3 network runtimes
* Fix storage versioning
* Ensure `ExecutorParams` serialization determinism; Add comments
* Rename constants to make things a bit more deterministic
Get rid of stale code
* Tidy up a structure of active PVFs
* Minor formatting
* Fix comment
* Add try-runtime hooks
* Add storage version write on upgrade
Co-authored-by: Andronik <write@reusable.software>
* Add pre- and post-upgrade assertions
* Require to specify environment type; Remove redundant `impl`s
* Add `ExecutorParamHash` creation from `H256`
* Fix candidate validation subsys tests
* Return splittable error from executor params request fn
* Revert "Return splittable error from executor params request fn"
This reverts commit a0b274177d8bb2f6e13c066741892ecd2e72a456.
* Decompose approval voting metrics
* Use more relevant errors
* Minor formatting fix
* Assert a valid environment type instead of checking
* Fix `try-runtime` hooks
* After-merge fixes
* Add migration logs
* Remove dead code
* Fix tests
* Fix tests
* Back to the strongly typed implementation
* Promote strong types to executor interface
* Remove stale comment
* Move executor params to `SessionInfo`: primitives and runtime
* Move executor params to `SessionInfo`: node
* Try to bump primitives and API version
* Get rid of `MallocSizeOf`
* Bump target API version to v4
* Make use of session index already in place
* Back to v3
* Fix all the tests
* Add migrations to all the runtimes
* Make use of existing `SessionInfo` in approval voting subsys
* Rename `TARGET` -> `LOG_TARGET`
* Bump all the primitives to v3
* Fix Rococo ParachainHost API version
* Use `RollingSessionWindow` to acquire `ExecutorParams` in disputes
* Fix nits from discussions; add comments
* Re-evaluate queue logic
* Rework job assignment in execution queue
* Add documentation
* Use `RuntimeInfo` to obtain `SessionInfo` (with blackjack and caching)
* Couple `Pvf` with `ExecutorParams` wherever possible
* Put members of `PvfWithExecutorParams` under `Arc` for cheap cloning
* Fix comment
* Fix CI tests
* Fix clippy warnings
* Address nits from discussions
* Add a placeholder for raw data
* Fix non exhaustive match
* Remove redundant reexports and fix imports
* Keep only necessary semantic features, as discussed
* Rework `RuntimeInfo` to support mock implementation for tests
* Remove unneeded bound
* `cargo fmt`
* Revert "Remove unneeded bound"
This reverts commit 932463f26b00ce290e1e61848eb9328632ef8a61.
* Fix PVF host tests
* Fix PVF checker tests
* Fix overseer declarations
* Simplify tests
* `MAX_KEEP_WAITING` timeout based on `BACKGING_EXECUTION_TIMEOUT`
* Add a unit test for varying executor parameters
* Minor fixes from discussions
* Add prechecking max. memory parameter (see paritytech/srlabs_findings#110)
* Fix and improve a test
* Remove `ExecutionEnvironment` and `RawData`
* New primitives versioning in parachain host API
* `disputes()` implementation for Kusama and Polkadot
* Move `ExecutorParams` from `vstaging` to stable primitives
* Move disputes from `vstaging` to stable implementation
* Fix `try-runtime`
* Fixes after merge
* Move `ExecutorParams` to the bottom of `SessionInfo`
* Revert "Move executor params to `SessionInfo`: primitives and runtime"
This reverts commit dfcfb85fefd1c5be6c8a8f72dc09fd1809cfa9ce.
* Always use fresh activated live hash in pvf precheck
(re-apply 34b09a4c20de17e7926ed942cd0d657d18f743fa)
* Fixing tests (broken commit)
* Fix candidate validation tests
* Fix PVF host test
* Minor fixes
* Address discussions
* Restore migration
* Fix `use` to only include what is needed instead of `*`
* Add comment to never touch `DEFAULT_CONFIG`
* Update migration to set default `ExecutorParams` for `dispute_period`
sessions back
* Use `earliest_stored_session` instead of calculations
* Nit
* Add logs
* Treat any runtime error as `NotSupported` again
* Always return default executor params if not available
* Revert "Always return default executor params if not available"
This reverts commit b58ac4482ef444c67a9852d5776550d08e312f30.
* Add paritytech/substrate#9997 workaround
* `cargo fmt`
* Remove migration (again!)
* Bump executor params to API v4 (backport from #6698)
---------
Co-authored-by: Andronik <write@reusable.software>
* Refactor PVF preparation memory stats
The original purpose of this change was to gate metrics that are unsupported by
some systems behind conditional compilation directives (#[cfg]); see
https://github.com/paritytech/polkadot/pull/6675#discussion_r1099996209.
Then I started doing some random cleanups and simplifications and got a bit
carried away. 🙈 The code should be overall tidier than before.
Changes:
- Don't register unsupported metrics (e.g. `max_rss` on non-Linux systems)
- Introduce `PrepareStats` struct as an abstraction over the `Ok` values of
`PrepareResult`. It is cleaner, and can be easily modified in the future.
- Other small changes
* Minor fixes to comments
* Fix compile errors
* Try to fix some Linux errors
* Mep
* Fix candidate-validation tests
* Update docstring