mirror of
https://github.com/pezkuwichain/pezkuwi-subxt.git
synced 2026-06-12 10:01:17 +00:00
PVF: Don't dispute on missing artifact (#7011)
* PVF: Don't dispute on missing artifact A dispute should never be raised if the local cache doesn't provide a certain artifact. You can not dispute based on this reason, as it is a local hardware issue and not related to the candidate to check. Design: Currently we assume that if we prepared an artifact, it remains there on-disk until we prune it, i.e. we never check again if it's still there. We can change it so that instead of artifact-not-found triggering a dispute, we retry once (like we do for AmbiguousWorkerDeath, except we don't dispute if it still doesn't work). And when enqueuing an execute job, we check for the artifact on-disk, and start preparation if not found. Changes: - [x] Integration test (should fail without the following changes) - [x] Check if artifact exists when executing, prepare if not - [x] Return an internal error when file is missing - [x] Retry once on internal errors - [x] Document design (update impl guide) * Add some context to wasm error message (it is quite long) * Fix impl guide * Add check for missing/inaccessible file * Add comment referencing Substrate issue * Add test for retrying internal errors --------- Co-authored-by: parity-processbot <>
This commit is contained in:
@@ -23,7 +23,7 @@ Upon receiving a validation request, the first thing the candidate validation su
|
||||
* The [`CandidateDescriptor`](../../types/candidate.md#candidatedescriptor).
|
||||
* The [`ValidationData`](../../types/candidate.md#validationdata).
|
||||
* The [`PoV`](../../types/availability.md#proofofvalidity).
|
||||
|
||||
|
||||
The second category is for PVF pre-checking. This is primarly used by the [PVF pre-checker](pvf-prechecker.md) subsystem.
|
||||
|
||||
### Determining Parameters
|
||||
@@ -67,8 +67,20 @@ or time out). We will only retry preparation if another request comes in after
|
||||
resolved. We will retry up to 5 times.
|
||||
|
||||
If the actual **execution** of the artifact fails, we will retry once if it was
|
||||
an ambiguous error after a brief delay, to allow any potential transient
|
||||
conditions to clear.
|
||||
a possibly transient error, to allow the conditions that led to the error to
|
||||
hopefully resolve. We use a more brief delay here (1 second as opposed to 15
|
||||
minutes for preparation (see above)), because a successful execution must happen
|
||||
in a short amount of time.
|
||||
|
||||
We currently know of at least two specific cases that will lead to a retried
|
||||
execution request:
|
||||
|
||||
1. **OOM:** The host might have been temporarily low on memory due to other
|
||||
processes running on the same machine. **NOTE:** This case will lead to
|
||||
voting against the candidate (and possibly a dispute) if the retry is still
|
||||
not successful.
|
||||
2. **Artifact missing:** The prepared artifact might have been deleted due to
|
||||
operator error or some bug in the system. We will re-create it on retry.
|
||||
|
||||
#### Preparation timeouts
|
||||
|
||||
|
||||
Reference in New Issue
Block a user