* Introduce jemalloc-stats feature flag
* remove unneeded space
* Update node/overseer/src/lib.rs
Co-authored-by: Marcin S. <marcin@bytedude.com>
* Update Cargo.toml
Co-authored-by: Marcin S. <marcin@bytedude.com>
* revert making tikv-jemallocator depend on jemalloc-stats
* conditionally import memory_stats instead of using dead_code
* fix test via expllicit import
* Add jemalloc-stats feature to crates, propagate it from root
* Apply `jemalloc-stats` feature to prepare mem stats; small refactor
* effect changes recommended on PR
* Update node/overseer/src/metrics.rs
Co-authored-by: Marcin S. <marcin@bytedude.com>
* fix compile error on in pipeline for linux. missing import
* Update node/overseer/src/lib.rs
Co-authored-by: Bastian Köcher <git@kchr.de>
* revert to defining collect_memory_stats inline
---------
Co-authored-by: Marcin S. <marcin@bytedude.com>
Co-authored-by: Marcin S <marcin@realemail.net>
Co-authored-by: Bastian Köcher <git@kchr.de>
* Add getrusage and memory tracker for precheck preparation
* Log memory stats metrics after prechecking
* Fix tests
* Try to fix errors (linux-only so I'm relying on CI here)
* Try to fix CI
* Add module docs for `prepare/memory_stats.rs`; fix CI error
* Report memory stats for all preparation jobs
* Use `RUSAGE_SELF` instead of `RUSAGE_THREAD`
Not sure why I did that -- was a brainfart on my end.
* Revert last commit (RUSAGE_THREAD is correct)
* Use exponential buckets
* Use `RUSAGE_SELF` for `getrusage`; enable `max_rss` metric for MacOS
* Increase poll interval
* Revert "Use `RUSAGE_SELF` for `getrusage`; enable `max_rss` metric for MacOS"
This reverts commit becf7a815409ab530fc61370abffcd1b97b9a777.
* Replace async-std with tokio in PVF subsystem
* Rework workers to use `select!` instead of a mutex
The improvement in code readability is more important than the thread overhead.
* Remove unnecessary `fuse`
* Add explanation for `expect()`
* Update node/core/pvf/src/worker_common.rs
Co-authored-by: Bastian Köcher <info@kchr.de>
* Update node/core/pvf/src/worker_common.rs
Co-authored-by: Bastian Köcher <info@kchr.de>
* Address some review comments
* Shutdown tokio runtime
* Run cargo fmt
* Add a small note about retries
* Fix up merge
* Rework `cpu_time_monitor_loop` to return when other thread finishes
* Add error string to PrepareError::IoErr variant
* Log when artifacts fail to prepare
* Fix `cpu_time_monitor_loop`; fix test
* Fix text
* Fix a couple of potential minor data races.
First data race was due to logging in the CPU monitor thread even if the
job (other thread) finished. It can technically finish before or after the log.
Maybe best would be to move this log to the `select!`s, where we are guaranteed
to have chosen the timed-out branch, although there would be a bit of
duplication.
Also, it was possible for this thread to complete before we executed
`finished_tx.send` in the other thread, which would trigger an error as the
receiver has already been dropped. And right now, such a spurious error from
`send` would be returned even if the job otherwise succeeded.
* Update Cargo.lock
Co-authored-by: Bastian Köcher <info@kchr.de>
* PVF preparation: do not conflate errors
+ Adds some more granularity to the prepare errors.
+ Better distinguish whether errors occur on the host side or the worker.
+ Do not kill the worker if the error happened on the host side.
+ Do not retry preparation if the error was `Panic`.
+ Removes unnecessary indirection with `Selected` type.
* Add missing docs, resolve TODOs
* Address review comments and remove TODOs
* Fix error in CI
* Undo unnecessary change
* Update couple of comments
* Don't return error for stream shutdown
* Update node/core/pvf/src/worker_common.rs
* Put in skeleton logic for CPU-time-preparation
Still needed:
- Flesh out logic
- Refactor some spots
- Tests
* Continue filling in logic for prepare worker CPU time changes
* Fix compiler errors
* Update lenience factor
* Fix some clippy lints for PVF module
* Fix compilation errors
* Address some review comments
* Add logging
* Add another log
* Address some review comments; change Mutex to AtomicBool
* Refactor handling response bytes
* Add CPU clock timeout logic for execute jobs
* Properly handle AtomicBool flag
* Use `Ordering::Relaxed`
* Refactor thread coordination logic
* Fix bug
* Add some timing information to execute tests
* Add section about the mitigation to the IG
* minor: Change more `Ordering`s to `Relaxed`
* candidate-validation: Fix build errors
* Add clippy config and remove .cargo from gitignore
* first fixes
* Clippyfied
* Add clippy CI job
* comment out rusty-cachier
* minor
* fix ci
* remove DAG from check-dependent-project
* add DAG to clippy
Co-authored-by: alvicsam <alvicsam@gmail.com>
* Rename timeout consts and timeout parameter; bump leniency
* Update implementor's guide with info about PVFs
* Make glossary a bit easier to read
* Add a note to LENIENT_PREPARATION_TIMEOUT
* Remove PVF-specific section from glossary
* Fix some typos
We wanted to change niceness to accomodate the fact that some of the
preparation tasks are low priority. For example, when a node sees that
there is a new para was onboarded the node may start preparing right
away. Since all other activities are more important, such as network I/O
or validation of the backed candidates and preparation of the
immediatelly needed PVFs.
However, it turned out that this approach does not work: generally
non-root processes can only decrease niceness and they cannot increase
it to the previous value, as was assumed by the code.
Apart from that, https://github.com/paritytech/polkadot/pull/4123
assumes all PVFs are prepared in the same way. Specifically, that if a
PVF preparation failed before, then PVF pre-checking will also report
that it was failed, even though it could happen that preparation failed
due to being low-priority. In order to avoid such cases, we decided to
simplify the whole preparation model. Preparation under low priority
does not work well with that.
Closes https://github.com/paritytech/polkadot/issues/4520
* pvf host: store only compiled artifacts on disk
* Correctly handle failed artifacts
* Serialize result of PVF preparation uniquely
* Set the artifact state depending on the result
* Return the result of PVF preparation directly
* Move PrepareError to the error module
* Update doc comments
* Update misleading comment
* pvf host: turn off parallel compilation
* pvf host: implement precheck requests
* Fix warnings
* Unnecessary clone
* Add a note about timed out outcome
* Revert the pool outcome handling behavior
* Move the prepare result type into error mod
* Test prepare done
* fmt
* Add an explanation to wasmtime config
* Split pvf host test
* Add precheck to dictionary
Co-authored-by: Sergei Shulepov <sergei@parity.io>
* pvf host: store only compiled artifacts on disk
* Correctly handle failed artifacts
* Serialize result of PVF preparation uniquely
* Set the artifact state depending on the result
* Return the result of PVF preparation directly
* Move PrepareError to the error module
* Update doc comments
* Update misleading comment
* Cleanup docs
* Conclude a test job with an error
Co-authored-by: Sergei Shulepov <sergei@parity.io>
* CI: add spellcheck
* revert me
* CI: explicit command for spellchecker
* spellcheck: edit misspells
* CI: run spellcheck on diff
* spellcheck: edits
* spellcheck: edit misspells
* spellcheck: add rules
* spellcheck: mv configs
* spellcheck: more edits
* spellcheck: chore
* spellcheck: one more thing
* spellcheck: and another one
* spellcheck: seems like it doesn't get to an end
* spellcheck: new words after rebase
* spellcheck: new words appearing out of nowhere
* chore
* review edits
* more review edits
* more edits
* wonky behavior
* wonky behavior 2
* wonky behavior 3
* change git behavior
* spellcheck: another bunch of new edits
* spellcheck: new words are koming out of nowhere
* CI: finding the master
* CI: fetching master implicitly
* CI: undebug
* new errors
* a bunch of new edits
* and some more
* Update node/core/approval-voting/src/approval_db/v1/mod.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* Update xcm/xcm-executor/src/assets.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* Apply suggestions from code review
Co-authored-by: Andronik Ordian <write@reusable.software>
* Suggestions from the code review
* CI: scan only changed files
Co-authored-by: Andronik Ordian <write@reusable.software>
* Implement PVF validation host
* WIP: Diener
* Increase the alloted compilation time
* Add more comments
* Minor clean up
* Apply suggestions from code review
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
* Fix pruning artifact removal
* Fix formatting and newlines
* Fix the thread pool
* Update node/core/pvf/src/executor_intf.rs
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
* Remove redundant test declaration
* Don't convert the path into an intermediate string
* Try to workaround the test failure
* Use the puppet_worker trick again
* Fix a blip
* Move `ensure_wasmtime_version` under the tests mod
* Add a macro for puppet_workers
* fix build for not real-overseer
* Rename the puppet worker for adder collator
* play it safe with the name of adder puppet worker
* Typo: triggered
* Add more comments
* Do not kill exec worker on every error
* Plumb Duration for timeouts
* typo: critical
* Add proofs
* Clean unused imports
* Revert "WIP: Diener"
This reverts commit b9f54e513366c7a6dfdd117ac19fbdc46b900b4d.
* Sync version of wasmtime
* Update cargo.lock
* Update Substrate
* Merge fixes still
* Update wasmtime version in test
* bastifmt
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
* Squash spaces
* Trailing new line for testing.rs
* Remove controversial code
* comment about biasing
* Fix suggestion
* Add comments
* make it more clear why unwrap_err
* tmpfile retry
* proper proofs for claim_idle
* Remove mutex from ValidationHost
* Add some more logging
* Extract exec timeout into a constant
* Add some clarifying logging
* Use blake2_256
* Clean up the merge
Specifically the leftovers after removing real-overseer
* Update parachain/test-parachains/adder/collator/Cargo.toml
Co-authored-by: Andronik Ordian <write@reusable.software>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Co-authored-by: Andronik Ordian <write@reusable.software>