Introduce trie level cache and remove state cache (#11407)

* trie state cache

* Also cache missing access on read.

* fix comp

* bis

* fix

* use has_lru

* remove local storage cache on size 0.

* No cache.

* local cache only

* trie cache and local cache

* storage cache (with local)

* trie cache no local cache

* Add state access benchmark

* Remove warnings etc

* Add trie cache benchmark

* No extra "clone" required

* Change benchmark to use multiple blocks

* Use patches

* Integrate shitty implementation

* More stuff

* Revert "Merge branch 'master' into trie_state_cache"

This reverts commit 947cd8e6d43fced10e21b76d5b92ffa57b57c318, reversing
changes made to 29ff036463.

* Improve benchmark

* Adapt to latest changes

* Adapt to changes in trie

* Add a test that uses iterator

* Start fixing it

* Remove obsolete file

* Make it compile

* Start rewriting the trie node cache

* More work on the cache

* More docs and code etc

* Make data cache an optional

* Tests

* Remove debug stuff

* Recorder

* Some docs and a simple test for the recorder

* Compile fixes

* Make it compile

* More fixes

* More fixes

* Fix fix fix

* Make sure cache and recorder work together for basic stuff

* Test that data caching and recording works

* Test `TrieDBMut` with caching

* Try something

* Fixes, fixes, fixes

* Forward the recorder

* Make it compile

* Use recorder in more places

* Switch to new `with_optional_recorder` fn

* Refactor and cleanups

* Move `ProvingBackend` tests

* Simplify

* Move over all functionality to the essence

* Fix compilation

* Implement estimate encoded size for StorageProof

* Start using the `cache` everywhere

* Use the cache everywhere

* Fix compilation

* Fix tests

* Adds `TrieBackendBuilder` and enhances the tests

* Ensure that recorder drain checks that values are found as expected

* Switch over to `TrieBackendBuilder`

* Start fixing the problem with child tries and recording

* Fix recording of child tries

* Make it compile

* Overwrite `storage_hash` in `TrieBackend`

* Add `storage_cache` to  the benchmarks

* Fix `no_std` build

* Speed up cache lookup

* Extend the state access benchmark to also hash a runtime

* Fix build

* Fix compilation

* Rewrite value cache

* Add lru cache

* Ensure that the cache lru works

* Value cache should not be optional

* Add support for keeping the shared node cache in its bounds

* Make the cache configurable

* Check that the cache respects the bounds

* Adds a new test

* Fixes

* Docs and some renamings

* More docs

* Start using the new recorder

* Fix more code

* Take `self` argument

* Remove warnings

* Fix benchmark

* Fix accounting

* Rip off the state cache

* Start fixing fallout after removing the state cache

* Make it compile after trie changes

* Fix test

* Add some logging

* Some docs

* Some fixups and clean ups

* Fix benchmark

* Remove unneeded file

* Use git for patching

* Make CI happy

* Update primitives/trie/Cargo.toml

Co-authored-by: Koute <koute@users.noreply.github.com>

* Update primitives/state-machine/src/trie_backend.rs

Co-authored-by: cheme <emericchevalier.pro@gmail.com>

* Introduce new `AsTrieBackend` trait

* Make the LocalTrieCache not clonable

* Make it work in no_std and add docs

* Remove duplicate dependency

* Switch to ahash for better performance

* Speedup value cache merge

* Output errors on underflow

* Ensure the internal LRU map doesn't grow too much

* Use const fn to calculate the value cache element size

* Remove cache configuration

* Fix

* Clear the cache in between for more testing

* Try to come up with a failing test case

* Make the test fail

* Fix the child trie recording

* Make everything compile after the changes to trie

* Adapt to latest trie-db changes

* Fix on stable

* Update primitives/trie/src/cache.rs

Co-authored-by: cheme <emericchevalier.pro@gmail.com>

* Fix wrong merge

* Docs

* Fix warnings

* Cargo.lock

* Bump pin-project

* Fix warnings

* Switch to released crate version

* More fixes

* Make clippy and rustdocs happy

* More clippy

* Print error when using deprecated `--state-cache-size`

* 🤦

* Fixes

* Fix storage_hash linkings

* Update client/rpc/src/dev/mod.rs

Co-authored-by: Arkadiy Paronyan <arkady.paronyan@gmail.com>

* Review feedback

* encode bound

* Rework the shared value cache

Instead of using a `u64` to represent the key we now use an `Arc<[u8]>`. This arc is also stored in
some extra `HashSet`. We store the key are in an extra `HashSet` to de-duplicate the keys accross
different storage roots. When the latest key usage is dropped in the lru, we also remove the key
from the `HashSet`.

* Improve of the cache by merging the old and new solution

* FMT

* Please stop coming back all the time :crying:

* Update primitives/trie/src/cache/shared_cache.rs

Co-authored-by: Arkadiy Paronyan <arkady.paronyan@gmail.com>

* Fixes

* Make clippy happy

* Ensure we don't deadlock

* Only use one lock to simplify the code

* Do not depend on `Hasher`

* Fix tests

* FMT

* Clippy 🤦

Co-authored-by: cheme <emericchevalier.pro@gmail.com>
Co-authored-by: Koute <koute@users.noreply.github.com>
Co-authored-by: Arkadiy Paronyan <arkady.paronyan@gmail.com>
This commit is contained in:
Bastian Köcher
2022-08-18 20:59:22 +02:00
committed by GitHub
parent d46f6f0d34
commit 73d9ae3284
55 changed files with 3977 additions and 1344 deletions
+284
View File
@@ -0,0 +1,284 @@
// This file is part of Substrate.
// Copyright (C) 2021 Parity Technologies (UK) Ltd.
// SPDX-License-Identifier: Apache-2.0
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//! Trie recorder
//!
//! Provides an implementation of the [`TrieRecorder`](trie_db::TrieRecorder) trait. It can be used
//! to record storage accesses to the state to generate a [`StorageProof`].
use crate::{NodeCodec, StorageProof};
use codec::Encode;
use hash_db::Hasher;
use parking_lot::Mutex;
use std::{
collections::HashMap,
marker::PhantomData,
mem,
ops::DerefMut,
sync::{
atomic::{AtomicUsize, Ordering},
Arc,
},
};
use trie_db::{RecordedForKey, TrieAccess};
const LOG_TARGET: &str = "trie-recorder";
/// The internals of [`Recorder`].
struct RecorderInner<H> {
/// The keys for that we have recorded the trie nodes and if we have recorded up to the value.
recorded_keys: HashMap<Vec<u8>, RecordedForKey>,
/// The encoded nodes we accessed while recording.
accessed_nodes: HashMap<H, Vec<u8>>,
}
impl<H> Default for RecorderInner<H> {
fn default() -> Self {
Self { recorded_keys: Default::default(), accessed_nodes: Default::default() }
}
}
/// The trie recorder.
///
/// It can be used to record accesses to the trie and then to convert them into a [`StorageProof`].
pub struct Recorder<H: Hasher> {
inner: Arc<Mutex<RecorderInner<H::Out>>>,
/// The estimated encoded size of the storage proof this recorder will produce.
///
/// We store this in an atomic to be able to fetch the value while the `inner` is may locked.
encoded_size_estimation: Arc<AtomicUsize>,
}
impl<H: Hasher> Default for Recorder<H> {
fn default() -> Self {
Self { inner: Default::default(), encoded_size_estimation: Arc::new(0.into()) }
}
}
impl<H: Hasher> Clone for Recorder<H> {
fn clone(&self) -> Self {
Self {
inner: self.inner.clone(),
encoded_size_estimation: self.encoded_size_estimation.clone(),
}
}
}
impl<H: Hasher> Recorder<H> {
/// Returns the recorder as [`TrieRecorder`](trie_db::TrieRecorder) compatible type.
pub fn as_trie_recorder(&self) -> impl trie_db::TrieRecorder<H::Out> + '_ {
TrieRecorder::<H, _> {
inner: self.inner.lock(),
encoded_size_estimation: self.encoded_size_estimation.clone(),
_phantom: PhantomData,
}
}
/// Drain the recording into a [`StorageProof`].
///
/// While a recorder can be cloned, all share the same internal state. After calling this
/// function, all other instances will have their internal state reset as well.
///
/// If you don't want to drain the recorded state, use [`Self::to_storage_proof`].
///
/// Returns the [`StorageProof`].
pub fn drain_storage_proof(self) -> StorageProof {
let mut recorder = mem::take(&mut *self.inner.lock());
StorageProof::new(recorder.accessed_nodes.drain().map(|(_, v)| v))
}
/// Convert the recording to a [`StorageProof`].
///
/// In contrast to [`Self::drain_storage_proof`] this doesn't consumes and doesn't clears the
/// recordings.
///
/// Returns the [`StorageProof`].
pub fn to_storage_proof(&self) -> StorageProof {
let recorder = self.inner.lock();
StorageProof::new(recorder.accessed_nodes.iter().map(|(_, v)| v.clone()))
}
/// Returns the estimated encoded size of the proof.
///
/// The estimation is based on all the nodes that were accessed until now while
/// accessing the trie.
pub fn estimate_encoded_size(&self) -> usize {
self.encoded_size_estimation.load(Ordering::Relaxed)
}
/// Reset the state.
///
/// This discards all recorded data.
pub fn reset(&self) {
mem::take(&mut *self.inner.lock());
self.encoded_size_estimation.store(0, Ordering::Relaxed);
}
}
/// The [`TrieRecorder`](trie_db::TrieRecorder) implementation.
struct TrieRecorder<H: Hasher, I> {
inner: I,
encoded_size_estimation: Arc<AtomicUsize>,
_phantom: PhantomData<H>,
}
impl<H: Hasher, I: DerefMut<Target = RecorderInner<H::Out>>> trie_db::TrieRecorder<H::Out>
for TrieRecorder<H, I>
{
fn record<'b>(&mut self, access: TrieAccess<'b, H::Out>) {
let mut encoded_size_update = 0;
match access {
TrieAccess::NodeOwned { hash, node_owned } => {
tracing::trace!(
target: LOG_TARGET,
hash = ?hash,
"Recording node",
);
self.inner.accessed_nodes.entry(hash).or_insert_with(|| {
let node = node_owned.to_encoded::<NodeCodec<H>>();
encoded_size_update += node.encoded_size();
node
});
},
TrieAccess::EncodedNode { hash, encoded_node } => {
tracing::trace!(
target: LOG_TARGET,
hash = ?hash,
"Recording node",
);
self.inner.accessed_nodes.entry(hash).or_insert_with(|| {
let node = encoded_node.into_owned();
encoded_size_update += node.encoded_size();
node
});
},
TrieAccess::Value { hash, value, full_key } => {
tracing::trace!(
target: LOG_TARGET,
hash = ?hash,
key = ?sp_core::hexdisplay::HexDisplay::from(&full_key),
"Recording value",
);
self.inner.accessed_nodes.entry(hash).or_insert_with(|| {
let value = value.into_owned();
encoded_size_update += value.encoded_size();
value
});
self.inner
.recorded_keys
.entry(full_key.to_vec())
.and_modify(|e| *e = RecordedForKey::Value)
.or_insert(RecordedForKey::Value);
},
TrieAccess::Hash { full_key } => {
tracing::trace!(
target: LOG_TARGET,
key = ?sp_core::hexdisplay::HexDisplay::from(&full_key),
"Recorded hash access for key",
);
// We don't need to update the `encoded_size_update` as the hash was already
// accounted for by the recorded node that holds the hash.
self.inner
.recorded_keys
.entry(full_key.to_vec())
.or_insert(RecordedForKey::Hash);
},
TrieAccess::NonExisting { full_key } => {
tracing::trace!(
target: LOG_TARGET,
key = ?sp_core::hexdisplay::HexDisplay::from(&full_key),
"Recorded non-existing value access for key",
);
// Non-existing access means we recorded all trie nodes up to the value.
// Not the actual value, as it doesn't exist, but all trie nodes to know
// that the value doesn't exist in the trie.
self.inner
.recorded_keys
.entry(full_key.to_vec())
.and_modify(|e| *e = RecordedForKey::Value)
.or_insert(RecordedForKey::Value);
},
};
self.encoded_size_estimation.fetch_add(encoded_size_update, Ordering::Relaxed);
}
fn trie_nodes_recorded_for_key(&self, key: &[u8]) -> RecordedForKey {
self.inner.recorded_keys.get(key).copied().unwrap_or(RecordedForKey::None)
}
}
#[cfg(test)]
mod tests {
use trie_db::{Trie, TrieDBBuilder, TrieDBMutBuilder, TrieHash, TrieMut};
type MemoryDB = crate::MemoryDB<sp_core::Blake2Hasher>;
type Layout = crate::LayoutV1<sp_core::Blake2Hasher>;
type Recorder = super::Recorder<sp_core::Blake2Hasher>;
const TEST_DATA: &[(&[u8], &[u8])] =
&[(b"key1", b"val1"), (b"key2", b"val2"), (b"key3", b"val3"), (b"key4", b"val4")];
fn create_trie() -> (MemoryDB, TrieHash<Layout>) {
let mut db = MemoryDB::default();
let mut root = Default::default();
{
let mut trie = TrieDBMutBuilder::<Layout>::new(&mut db, &mut root).build();
for (k, v) in TEST_DATA {
trie.insert(k, v).expect("Inserts data");
}
}
(db, root)
}
#[test]
fn recorder_works() {
let (db, root) = create_trie();
let recorder = Recorder::default();
{
let mut trie_recorder = recorder.as_trie_recorder();
let trie = TrieDBBuilder::<Layout>::new(&db, &root)
.with_recorder(&mut trie_recorder)
.build();
assert_eq!(TEST_DATA[0].1.to_vec(), trie.get(TEST_DATA[0].0).unwrap().unwrap());
}
let storage_proof = recorder.drain_storage_proof();
let memory_db: MemoryDB = storage_proof.into_memory_db();
// Check that we recorded the required data
let trie = TrieDBBuilder::<Layout>::new(&memory_db, &root).build();
assert_eq!(TEST_DATA[0].1.to_vec(), trie.get(TEST_DATA[0].0).unwrap().unwrap());
}
}