Semi-Stateless Initial Sync Experiment – Igor Mandrigin

So, to test the Semi-Stateless approach, we need to measure 2 things:

how much additional space/bandwidth is required for this approach; is it any better than the fully stateful approach?
how much faster does it make the initial sync.

In this article, we will focus on the disk space.

Setting up the experiment

Max size of the trie (Merkle trie): 1.000.000 nodes. When the number of nodes exceeds this value, the LRU nodes will be evicted to free up the memory. This way we can keep the used RAM under control.
The partial witnesses are stored in a db (our fork of boltdb). Each entry has the following structure:

key: [12]byte // block number + max number of nodes in the trie
value: []byte // witnessses, serialized as described in this doc

We don’t store the contract code in the witnesses (that is a limitation of the current architecture).

How the data was collected (required a synced turbo-geth).

(in the turbo-geth repository)make state./build/bin/state stateless 
— chaindata ~/nvme1/mainnet/mainnet/geth/chaindata  
— statefile semi_stateless.statefile 
— snapshotInterval 1000000 
— snapshotFrom 10000000 
— statsfile new_witness.stats.compressed.2.csv 
— witnessDbFile semi_stateless_witnesses.db 
— statelessResolver 
— triesize 1000000

Total Storage

Witnesses DB (bolt db) to sync 6.169.246 blocks from scratch takes 99Gb

Quantile analysis

python quantile-analysis.py cache_1_000_000/semi_stateless_witnesses.db.stats.1.csv

mean   0.038 MB
median 0.028 MB
p90    0.085 MB
p95    0.102 MB
p99    0.146 MB
max    2.350 MB

Full Data

python absolute_values_plot.py cache_1_000_000/semi_stateless_witnesses.db.stats.1.csv

Witnesses sizes for blocks from 1 to 6100.000, capped at 1.0 MB. Sliding avg 1024.

Normalized Data (after DDoSes)

absolute_values_plot.py cache_1_000_000/semi_stateless_witnesses.db.stats.1.csv 3000000

Witnesses sizes after DDoS values, sliding avg 1024.

DDos Zoom In

python ddos_zoom.py cache_1_000_000/semi_stateless_witnesses.db.stats.1.csv

Zoomed in section to DDoSes influence on witnesses sizes (raw data).

We can see that due to the DDoSes around blocks 2.3M-2.5M and 2.65M-2.75M the sizes of witnesses are significantly bigger.

Full vs Semi

python full_vs_semi.py cache_1_000_000/semi_stateless_witnesses.db.stats.1.csv

Full Witness sizes are adjusted for missing codes components.

As we see from this chart, using the semi-stateless approach saves quite a lot of data if we compare it to the full stateless approach.

Having a stateless resolver adds around 0.4 MB additional information per block that needs to be transferred/stored. That is significantly less data than having a witness per block even when we adjust for code (you can see some charts in my previous post).

If the performance is good that can be a good mode for the initial sync to speed it up, but requiring less data than a fully stateful approach.

Published at Wed, 12 Feb 2020 07:46:59 +0000

{flickr|100|campaign}

Semi-Stateless Initial Sync Experiment – Igor Mandrigin

Setting up the experiment

Total Storage

Quantile analysis

Full Data

Normalized Data (after DDoSes)

DDos Zoom In

Full vs Semi

You might be interested in …

PR: Hashcube Announces Bitcoin (BTC) Mining Investment Forum in Thailand

Correlation between Bitcoin and Tether supply – TradeLog

Leadership Feud at Bitmain: 10,000 Antminers Go Missing in Inner Mongolia