Commit Graph

26 Commits

Author SHA1 Message Date
Anthony Alaribe b8eaf25040 Introduce jemalloc-allocator feature flag (#6675)
* Introduce jemalloc-stats feature flag

* remove unneeded space

* Update node/overseer/src/lib.rs

Co-authored-by: Marcin S. <marcin@bytedude.com>

* Update Cargo.toml

Co-authored-by: Marcin S. <marcin@bytedude.com>

* revert making tikv-jemallocator depend on jemalloc-stats

* conditionally import memory_stats instead of using dead_code

* fix test via expllicit import

* Add jemalloc-stats feature to crates, propagate it from root

* Apply `jemalloc-stats` feature to prepare mem stats; small refactor

* effect changes recommended on PR

* Update node/overseer/src/metrics.rs

Co-authored-by: Marcin S. <marcin@bytedude.com>

* fix compile error on in pipeline for linux. missing import

* Update node/overseer/src/lib.rs

Co-authored-by: Bastian Köcher <git@kchr.de>

* revert to defining collect_memory_stats inline

---------

Co-authored-by: Marcin S. <marcin@bytedude.com>
Co-authored-by: Marcin S <marcin@realemail.net>
Co-authored-by: Bastian Köcher <git@kchr.de>
2023-02-09 09:09:10 +00:00
Marcin S f317115b99 pvf: Log memory metrics from preparation (#6565)
* Add getrusage and memory tracker for precheck preparation

* Log memory stats metrics after prechecking

* Fix tests

* Try to fix errors (linux-only so I'm relying on CI here)

* Try to fix CI

* Add module docs for `prepare/memory_stats.rs`; fix CI error

* Report memory stats for all preparation jobs

* Use `RUSAGE_SELF` instead of `RUSAGE_THREAD`

Not sure why I did that -- was a brainfart on my end.

* Revert last commit (RUSAGE_THREAD is correct)

* Use exponential buckets

* Use `RUSAGE_SELF` for `getrusage`; enable `max_rss` metric for MacOS

* Increase poll interval

* Revert "Use `RUSAGE_SELF` for `getrusage`; enable `max_rss` metric for MacOS"

This reverts commit becf7a815409ab530fc61370abffcd1b97b9a777.
2023-02-06 11:17:21 +00:00
Marcin S 44fd95661c Replace async-std with tokio in PVF subsystem (#6419)
* Replace async-std with tokio in PVF subsystem

* Rework workers to use `select!` instead of a mutex

The improvement in code readability is more important than the thread overhead.

* Remove unnecessary `fuse`

* Add explanation for `expect()`

* Update node/core/pvf/src/worker_common.rs

Co-authored-by: Bastian Köcher <info@kchr.de>

* Update node/core/pvf/src/worker_common.rs

Co-authored-by: Bastian Köcher <info@kchr.de>

* Address some review comments

* Shutdown tokio runtime

* Run cargo fmt

* Add a small note about retries

* Fix up merge

* Rework `cpu_time_monitor_loop` to return when other thread finishes

* Add error string to PrepareError::IoErr variant

* Log when artifacts fail to prepare

* Fix `cpu_time_monitor_loop`; fix test

* Fix text

* Fix a couple of potential minor data races.

First data race was due to logging in the CPU monitor thread even if the
job (other thread) finished. It can technically finish before or after the log.

Maybe best would be to move this log to the `select!`s, where we are guaranteed
to have chosen the timed-out branch, although there would be a bit of
duplication.

Also, it was possible for this thread to complete before we executed
`finished_tx.send` in the other thread, which would trigger an error as the
receiver has already been dropped. And right now, such a spurious error from
`send` would be returned even if the job otherwise succeeded.

* Update Cargo.lock

Co-authored-by: Bastian Köcher <info@kchr.de>
2023-01-10 10:51:13 +01:00
Marcin S e0a0475a05 PVF preparation: do not conflate errors (#6384)
* PVF preparation: do not conflate errors

+ Adds some more granularity to the prepare errors.
+ Better distinguish whether errors occur on the host side or the worker.
+ Do not kill the worker if the error happened on the host side.
+ Do not retry preparation if the error was `Panic`.
+ Removes unnecessary indirection with `Selected` type.

* Add missing docs, resolve TODOs

* Address review comments and remove TODOs

* Fix error in CI

* Undo unnecessary change

* Update couple of comments

* Don't return error for stream shutdown

* Update node/core/pvf/src/worker_common.rs
2022-12-20 08:32:12 -05:00
Marcin S ab090ab7d5 Let the PVF host kill the worker on timeout (#6381)
* Let the PVF host kill the worker on timeout

* Fix comment

* Fix inaccurate comments; add missing return statement

* Fix a comment

* Fix comment
2022-12-06 13:03:18 -05:00
Marcin S 28a4e90912 Use CPU clock timeout for PVF jobs (#6282)
* Put in skeleton logic for CPU-time-preparation

Still needed:
- Flesh out logic
- Refactor some spots
- Tests

* Continue filling in logic for prepare worker CPU time changes

* Fix compiler errors

* Update lenience factor

* Fix some clippy lints for PVF module

* Fix compilation errors

* Address some review comments

* Add logging

* Add another log

* Address some review comments; change Mutex to AtomicBool

* Refactor handling response bytes

* Add CPU clock timeout logic for execute jobs

* Properly handle AtomicBool flag

* Use `Ordering::Relaxed`

* Refactor thread coordination logic

* Fix bug

* Add some timing information to execute tests

* Add section about the mitigation to the IG

* minor: Change more `Ordering`s to `Relaxed`

* candidate-validation: Fix build errors
2022-11-30 13:17:31 +01:00
alexgparity 9ea14e66c8 Clippyfy (#6341)
* Add clippy config and remove .cargo from gitignore

* first fixes

* Clippyfied

* Add clippy CI job

* comment out rusty-cachier

* minor

* fix ci

* remove DAG from check-dependent-project

* add DAG to clippy

Co-authored-by: alvicsam <alvicsam@gmail.com>
2022-11-30 08:34:06 +00:00
Marcin S 1f8219767e PVF timeouts follow-up (#6151)
* Rename timeout consts and timeout parameter; bump leniency

* Update implementor's guide with info about PVFs

* Make glossary a bit easier to read

* Add a note to LENIENT_PREPARATION_TIMEOUT

* Remove PVF-specific section from glossary

* Fix some typos
2022-11-01 10:59:53 -04:00
Marcin S 17730b85be Separate preparation timeouts for PVF prechecking and execution (#6139)
* Add some documentation

* Add `compilation_timeout` parameter for PVF preparation job

* Update buckets in prometheus metrics

* Update prepare/queue tests

* Update pvf-prechecking overview in implementer docs

* Fix some CI checks
2022-10-13 11:00:57 +00:00
Sergei Shulepov 94a85eeac7 pvf: ensure enough stack space (#5712)
* pvf: ensure enough stack space

* fix typos

Co-authored-by: Andronik <write@reusable.software>

* Use rayon to cache the thread

Co-authored-by: Andronik <write@reusable.software>
2022-06-24 13:16:36 +02:00
Koute d9eff4ecd4 Switch to pooling copy-on-write instantiation strategy for WASM (companion for Substrate#11232) (#5337)
* Switch to pooling copy-on-write instantiation strategy for WASM

* Fix compilation of `polkadot-test-service`

* Update comments

* Move `max_memory_size` to `Semantics`

* Rename `WasmInstantiationStrategy` to `WasmtimeInstantiationStrategy`

* Update a safety comment

* update lockfile for {"substrate"}

Co-authored-by: parity-processbot <>
2022-05-19 13:06:34 +02:00
Bernhard Schuster d631f1dea8 observability: tracing gum, automatically cross ref traceID (#5079)
* add some gum

* bump expander

* gum

* fix all remaining issues

* last fixup

* Update node/gum/proc-macro/src/lib.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* change

* netowrk

* fixins

* chore

* allow optional fmt str + args, prep for expr as kv field

* tracing -> gum rename fallout

* restrict further

* allow multiple levels of field accesses

* another round of docs and a slip of the pen

* update ADR

* fixup lock fiel

* use target: instead of target=

* minors

* fix

* chore

* Update node/gum/README.md

Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
2022-03-15 11:05:16 +00:00
Sergei Shulepov 1493fed1ed PVF validation host: do not alter niceness (#4525)
We wanted to change niceness to accomodate the fact that some of the
preparation tasks are low priority. For example, when a node sees that
there is a new para was onboarded the node may start preparing right
away. Since all other activities are more important, such as network I/O
or validation of the backed candidates and preparation of the
immediatelly needed PVFs.

However, it turned out that this approach does not work: generally
non-root processes can only decrease niceness and they cannot increase
it to the previous value, as was assumed by the code.

Apart from that, https://github.com/paritytech/polkadot/pull/4123
assumes all PVFs are prepared in the same way. Specifically, that if a
PVF preparation failed before, then PVF pre-checking will also report
that it was failed, even though it could happen that preparation failed
due to being low-priority. In order to avoid such cases, we decided to
simplify the whole preparation model. Preparation under low priority
does not work well with that.

Closes https://github.com/paritytech/polkadot/issues/4520
2021-12-14 17:17:45 +01:00
Bernhard Schuster 4adb8466a3 dev-comment spelling mistakes (#4434) 2021-12-06 15:20:29 +01:00
Sergei Shulepov bd422af092 prepare worker: Catch unexpected unwinds (#4304)
* prepare worker: Catch unexpected unwinds

* Use more specific wording for unknown panic payload
2021-11-18 19:11:13 +01:00
Sergei Shulepov 2769066136 Increase preparation timeout (#4270)
* Increase preparation-timeout to 60 seconds

* Adapt `pvf_preparation_time` metric to the new value
2021-11-15 12:48:00 +01:00
Chris Sosnin f5fbaa139f PVF host prechecking support v2 (#4123)
* pvf host: store only compiled artifacts on disk

* Correctly handle failed artifacts

* Serialize result of PVF preparation uniquely

* Set the artifact state depending on the result

* Return the result of PVF preparation directly

* Move PrepareError to the error module

* Update doc comments

* Update misleading comment

* pvf host: turn off parallel compilation

* pvf host: implement precheck requests

* Fix warnings

* Unnecessary clone

* Add a note about timed out outcome

* Revert the pool outcome handling behavior

* Move the prepare result type into error mod

* Test prepare done

* fmt

* Add an explanation to wasmtime config

* Split pvf host test

* Add precheck to dictionary

Co-authored-by: Sergei Shulepov <sergei@parity.io>
2021-11-13 17:25:59 +01:00
Chris Sosnin 182667830f Move artifacts states into memory in PVF validation host (#3907)
* pvf host: store only compiled artifacts on disk

* Correctly handle failed artifacts

* Serialize result of PVF preparation uniquely

* Set the artifact state depending on the result

* Return the result of PVF preparation directly

* Move PrepareError to the error module

* Update doc comments

* Update misleading comment

* Cleanup docs

* Conclude a test job with an error

Co-authored-by: Sergei Shulepov <sergei@parity.io>
2021-10-22 16:37:58 +00:00
Sergei Shulepov ad0e42537d Introduce metrics into PVF validation host (#3603) 2021-08-20 11:50:47 +02:00
Sergei Shulepov 9d6ed7ecae Add logging to PVF and other related parts (#3596)
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
2021-08-08 19:39:16 +02:00
Sergei Shulepov 68c03f66f3 Mass replace ,); pattern (#3580)
This is an artifact left by rustfmt which is not dare to remove the
comma being conservative.
2021-08-05 19:53:17 +02:00
Shawn Tabrizi ff5d56fb76 cargo +nightly fmt (#3540)
* cargo +nightly fmt

* add cargo-fmt check to ci

* update ci

* fmt

* fmt

* skip macro

* ignore bridges
2021-08-02 10:47:33 +00:00
Denis Pisarev fc253e6e4d WIP: CI: add spellcheck (#3421)
* CI: add spellcheck

* revert me

* CI: explicit command for spellchecker

* spellcheck: edit misspells

* CI: run spellcheck on diff

* spellcheck: edits

* spellcheck: edit misspells

* spellcheck: add rules

* spellcheck: mv configs

* spellcheck: more edits

* spellcheck: chore

* spellcheck: one more thing

* spellcheck: and another one

* spellcheck: seems like it doesn't get to an end

* spellcheck: new words after rebase

* spellcheck: new words appearing out of nowhere

* chore

* review edits

* more review edits

* more edits

* wonky behavior

* wonky behavior 2

* wonky behavior 3

* change git behavior

* spellcheck: another bunch of new edits

* spellcheck: new words are koming out of nowhere

* CI: finding the master

* CI: fetching master implicitly

* CI: undebug

* new errors

* a bunch of new edits

* and some more

* Update node/core/approval-voting/src/approval_db/v1/mod.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update xcm/xcm-executor/src/assets.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Apply suggestions from code review

Co-authored-by: Andronik Ordian <write@reusable.software>

* Suggestions from the code review

* CI: scan only changed files

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-07-14 19:22:58 +02:00
Sergei Shulepov b7b2276555 PVF: unresponsive worker doesn't mean the candidate is bad (#3418)
* PVF: unresponsive worker doesn't mean the candidate is bad

* s/if let Some/.is_some
2021-07-07 11:28:07 +03:00
Sergei Shulepov 20ab68270f Put WIP artifacts next to ready ones (#3057)
* Put WIP artifacts next to ready ones

Fixes #3044

* Apply suggestions from code review

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
2021-05-21 09:53:03 +02:00
Sergei Shulepov 59b4d6511f New PVF validation host (#2710)
* Implement PVF validation host

* WIP: Diener

* Increase the alloted compilation time

* Add more comments

* Minor clean up

* Apply suggestions from code review

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Fix pruning artifact removal

* Fix formatting and newlines

* Fix the thread pool

* Update node/core/pvf/src/executor_intf.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Remove redundant test declaration

* Don't convert the path into an intermediate string

* Try to workaround the test failure

* Use the puppet_worker trick again

* Fix a blip

* Move `ensure_wasmtime_version` under the tests mod

* Add a macro for puppet_workers

* fix build for not real-overseer

* Rename the puppet worker for adder collator

* play it safe with the name of adder puppet worker

* Typo: triggered

* Add more comments

* Do not kill exec worker on every error

* Plumb Duration for timeouts

* typo: critical

* Add proofs

* Clean unused imports

* Revert "WIP: Diener"

This reverts commit b9f54e513366c7a6dfdd117ac19fbdc46b900b4d.

* Sync version of wasmtime

* Update cargo.lock

* Update Substrate

* Merge fixes still

* Update wasmtime version in test

* bastifmt

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Squash spaces

* Trailing new line for testing.rs

* Remove controversial code

* comment about biasing

* Fix suggestion

* Add comments

* make it more clear why unwrap_err

* tmpfile retry

* proper proofs for claim_idle

* Remove mutex from ValidationHost

* Add some more logging

* Extract exec timeout into a constant

* Add some clarifying logging

* Use blake2_256

* Clean up the merge

Specifically the leftovers after removing real-overseer

* Update parachain/test-parachains/adder/collator/Cargo.toml

Co-authored-by: Andronik Ordian <write@reusable.software>

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Co-authored-by: Andronik Ordian <write@reusable.software>
2021-04-09 00:09:56 +02:00