* Refactor PVF preparation memory stats
The original purpose of this change was to gate metrics that are unsupported by
some systems behind conditional compilation directives (#[cfg]); see
https://github.com/paritytech/polkadot/pull/6675#discussion_r1099996209.
Then I started doing some random cleanups and simplifications and got a bit
carried away. 🙈 The code should be overall tidier than before.
Changes:
- Don't register unsupported metrics (e.g. `max_rss` on non-Linux systems)
- Introduce `PrepareStats` struct as an abstraction over the `Ok` values of
`PrepareResult`. It is cleaner, and can be easily modified in the future.
- Other small changes
* Minor fixes to comments
* Fix compile errors
* Try to fix some Linux errors
* Mep
* Fix candidate-validation tests
* Update docstring
* Introduce jemalloc-stats feature flag
* remove unneeded space
* Update node/overseer/src/lib.rs
Co-authored-by: Marcin S. <marcin@bytedude.com>
* Update Cargo.toml
Co-authored-by: Marcin S. <marcin@bytedude.com>
* revert making tikv-jemallocator depend on jemalloc-stats
* conditionally import memory_stats instead of using dead_code
* fix test via expllicit import
* Add jemalloc-stats feature to crates, propagate it from root
* Apply `jemalloc-stats` feature to prepare mem stats; small refactor
* effect changes recommended on PR
* Update node/overseer/src/metrics.rs
Co-authored-by: Marcin S. <marcin@bytedude.com>
* fix compile error on in pipeline for linux. missing import
* Update node/overseer/src/lib.rs
Co-authored-by: Bastian Köcher <git@kchr.de>
* revert to defining collect_memory_stats inline
---------
Co-authored-by: Marcin S. <marcin@bytedude.com>
Co-authored-by: Marcin S <marcin@realemail.net>
Co-authored-by: Bastian Köcher <git@kchr.de>
* Add getrusage and memory tracker for precheck preparation
* Log memory stats metrics after prechecking
* Fix tests
* Try to fix errors (linux-only so I'm relying on CI here)
* Try to fix CI
* Add module docs for `prepare/memory_stats.rs`; fix CI error
* Report memory stats for all preparation jobs
* Use `RUSAGE_SELF` instead of `RUSAGE_THREAD`
Not sure why I did that -- was a brainfart on my end.
* Revert last commit (RUSAGE_THREAD is correct)
* Use exponential buckets
* Use `RUSAGE_SELF` for `getrusage`; enable `max_rss` metric for MacOS
* Increase poll interval
* Revert "Use `RUSAGE_SELF` for `getrusage`; enable `max_rss` metric for MacOS"
This reverts commit becf7a815409ab530fc61370abffcd1b97b9a777.
* Introduce a job scanning and ensuring there are licenses
* Showcase a red test
* Add missing licenses
* Cleanup
* Extend the check
* Add missing licenses
* CI trigger
* Setting up new ChainSelectionMessage
* Partial first pass
* Got dispute conclusion data to provisioner
* Finished first draft for 4804 code
* A bit of polish and code comments
* cargo fmt
* Implementers guide and code comments
* More formatting, and naming issues
* Wrote test for ChainSelection side of change
* Added dispute coordinator side test
* FMT
* Addressing Marcin's comments
* fmt
* Addressing further Marcin comment
* Removing unnecessary test line
* Rough draft addressing Robert changes
* Clean up and test modification
* Majorly refactored scraper change
* Minor fixes for ChainSelection
* Polish and fmt
* Condensing inclusions per candidate logic
* Addressing Tsveto's comments
* Addressing Robert's Comments
* Altered inclusions struct to use nested BTreeMaps
* Naming fix
* Fixing inclusions struct comments
* Update node/core/dispute-coordinator/src/scraping/mod.rs
Add comment to split_off() use
Co-authored-by: Marcin S. <marcin@bytedude.com>
* Optimizing removal at block height for inclusions
* fmt
* Using copy trait
Co-authored-by: Marcin S. <marcin@bytedude.com>
* Remove lifetime from `KeyIterator`
* Remove the lifetime from the method too
* update lockfile for {"substrate"}
Co-authored-by: parity-processbot <>
* pre-checking: Reject failed PVFs
* paras: immediately reject any PVF that cannot reach a supermajority
* Make the `quorum` reject condition a bit more clear semantically
* Add comment
* Update implementer's guide
* Update a link
Not related to the rest of the PR, but I randomly noticed and fixed this.
* Update runtime/parachains/src/paras/tests.rs
Co-authored-by: s0me0ne-unkn0wn <48632512+s0me0ne-unkn0wn@users.noreply.github.com>
* Remove unneeded loop
* Log PVF retries using `info!`
* Change retry logs to `warn!` and add preparation failure log
* Log PVF execution failure
* Clarify why we reject failed PVFs
* Fix PVF reject runtime benchmarks
Co-authored-by: s0me0ne-unkn0wn <48632512+s0me0ne-unkn0wn@users.noreply.github.com>
* Replace async-std with tokio in PVF subsystem
* Rework workers to use `select!` instead of a mutex
The improvement in code readability is more important than the thread overhead.
* Remove unnecessary `fuse`
* Add explanation for `expect()`
* Update node/core/pvf/src/worker_common.rs
Co-authored-by: Bastian Köcher <info@kchr.de>
* Update node/core/pvf/src/worker_common.rs
Co-authored-by: Bastian Köcher <info@kchr.de>
* Address some review comments
* Shutdown tokio runtime
* Run cargo fmt
* Add a small note about retries
* Fix up merge
* Rework `cpu_time_monitor_loop` to return when other thread finishes
* Add error string to PrepareError::IoErr variant
* Log when artifacts fail to prepare
* Fix `cpu_time_monitor_loop`; fix test
* Fix text
* Fix a couple of potential minor data races.
First data race was due to logging in the CPU monitor thread even if the
job (other thread) finished. It can technically finish before or after the log.
Maybe best would be to move this log to the `select!`s, where we are guaranteed
to have chosen the timed-out branch, although there would be a bit of
duplication.
Also, it was possible for this thread to complete before we executed
`finished_tx.send` in the other thread, which would trigger an error as the
receiver has already been dropped. And right now, such a spurious error from
`send` would be returned even if the job otherwise succeeded.
* Update Cargo.lock
Co-authored-by: Bastian Köcher <info@kchr.de>
https://github.com/paritytech/polkadot/pull/6494 updates disputes
participation priority on Active Leaves update. This operation might
trigger participation in some cases and as a result some of the message
ordering is not as nice as it used to be.
As a side effect of this `resume_dispute_without_local_statement` was
failing occasionally. The solution is not to expect that `BlockNumber`,
`CandidateEvents`, `FetchOnChainVotes` and `Ancestors` messages are
executed after `FinalizedBlockNumber` and in any specific order.
This should be okay as the code is in helper function and doesn't affect
the actual test behaviour.
Fixes https://github.com/paritytech/polkadot/issues/6514
* BlockId removal: refactor: BlockBackend::block|block_status
It changes the arguments of:
- `BlockBackend::block`
- `BlockBackend::block_status`
method from: `BlockId<Block>` to: `Block::Hash`
This PR is part of BlockId::Number refactoring analysis (paritytech/substrate#11292)
* update lockfile for {"substrate"}
* ".git/.scripts/fmt.sh"
Co-authored-by: parity-processbot <>
* Passed candidate events from scraper to participation
* First draft PR 5875
* Added support for timestamp in changes
* Some necessary refactoring
* Removed SessionIndex from unconfirmed_disputes key
* Removed duplicate logic in import statements
* Replaced queue_participation call with re-prio
* Simplifying refactor. Backed were already handled
* Removed unneeded spam slots logic
* Implementers guide edits
* Undid the spam slots refactor
* Added comments and implementers guide edit
* Added test for participation upon backing
* Round of fixes + ran fmt
* Round of changes + fmt
* Error handling draft
* Changed errors to bubble up from reprioritization
* Starting to construct new test
* Clarifying participation function rename
* Reprio test draft
* Very rough bump to priority queue test draft
* Improving logging
* Most concise reproduction of error on third import
* Add `handle_approval_vote_request`
* Removing reprioritization on included event test
* Removing unneeded test config
* cargo fmt
* Test works
* Fixing final nits
* Tweaks to test Tsveto figured out
Co-authored-by: eskimor <eskimor@no-such-url.com>
Co-authored-by: Tsvetomir Dimitrov <tsvetomir@parity.io>
* BlockId removal: refactor: HeaderBackend::status
It changes the arguments of `HeaderBackend::status` method from: `BlockId<Block>` to: `Block::Hash`
This PR is part of BlockId::Number refactoring analysis (paritytech/substrate#11292)
* Update node/core/chain-api/src/tests.rs
Co-authored-by: Adrian Catangiu <adrian@parity.io>
* unused import removed
* update lockfile for {"substrate"}
Co-authored-by: Adrian Catangiu <adrian@parity.io>
Co-authored-by: parity-processbot <>
* PVF preparation: do not conflate errors
+ Adds some more granularity to the prepare errors.
+ Better distinguish whether errors occur on the host side or the worker.
+ Do not kill the worker if the error happened on the host side.
+ Do not retry preparation if the error was `Panic`.
+ Removes unnecessary indirection with `Selected` type.
* Add missing docs, resolve TODOs
* Address review comments and remove TODOs
* Fix error in CI
* Undo unnecessary change
* Update couple of comments
* Don't return error for stream shutdown
* Update node/core/pvf/src/worker_common.rs
* BlockId removal: refactor: HeaderBackend::header
It changes the arguments of:
- `HeaderBackend::header`,
- `Client::header`
methods from: `BlockId<Block>` to: `Block::Hash`
This PR is part of BlockId::Number refactoring analysis (paritytech/substrate#11292)
* missed fixes
* BlockId removal: refactor: HeaderBackend::expect_header
It changes the arguments of `HeaderBackend::expect_header` method from: `BlockId<Block>` to: `Block::Hash`
* update lockfile for {"substrate"}
* misspell fixed
Co-authored-by: parity-processbot <>