Live Ethereum and Bitcoin Data in Google BigQuery and Pub/Sub
The crypto_ethereum and crypto_bitcoin datasets in BigQuery are now updated in near real-time. You can also subscribe to public Pub/Sub topics that feed those tables (see below).
The overall architecture is depicted below:
The following blockchains are covered:
- Ethereum
- Bitcoin
- ZCash
- Litecoin
- Doge
- Dash
We added delays for each blockchain that prevent streaming orphaned blocks resulting from chain reorganisations. You can look up how many blocks we lag behind the tip of the chain in the LAG_BLOCKS parameter in the configuration files in the Github repository https://github.com/blockchain-etl-streaming. Those values were calculated based on the longest orphaned chains within the last year, multiplied by a safety factor. For Ethereum we ended up with an offset equivalent to approximately 4-minute lag. For Bitcoin this value is equivalent to a 30-minute lag.
We are currently working on a solution that will handle chain reorganisations using additional message types and will allow subscribing to the most up-to-date data.
Subscribing to Live Data Feeds
Install Google Cloud SDK:
> curl https://sdk.cloud.google.com | bash
> exec -l $SHELL
> gcloud init
Create a Pub/Sub subscription for Ethereum internal transactions:
> gcloud pubsub subscriptions create crypto_ethereum.traces.test --topic=crypto_ethereum.traces --topic-project=blockchain-etl
Read a single message from the subscription to test it works:
> gcloud pubsub subscriptions pull crypto_ethereum.traces.test
Now you can run a subscriber and process the messages in the subscription, using this Python script:
subscribe.py
import timefrom google.cloud import pubsub_v1# TODO project_id = "Set Your Google Cloud Project ID Here"
subscription_name = "crypto_ethereum.traces.test"subscriber = pubsub_v1.SubscriberClient()
# The `subscription_path` method creates a fully qualified identifier
# in the form `projects//subscriptions/`
subscription_path = subscriber.subscription_path(
project_id, subscription_name)def callback(message):
print('Received message: '.format(message))
message.ack()subscriber.subscribe(subscription_path, callback=callback)# The subscriber is non-blocking. We must keep the main thread from
# exiting to allow it to process messages asynchronously in the background.
print('Listening for messages on '.format(subscription_path))
while True:
time.sleep(60)
Install the dependencies and run the script:
> pip install google-cloud-pubsub==1.0.1
> python subscribe.pyListening for messages...
Received message: Message
}
Received message: Message
}
...
You can also use Go, Java, Node.js or C#: https://cloud.google.com/pubsub/docs/pull.
The first 10GB of data in Pub/Sub is free, after that you’ll pay ~$40/TB. There is on average 40GB of Ethereum traces per month, which would amount to ~$1/month.
You can also subscribe to topics with Bitcoin blocks and transactions and Ethereum blocks, transactions, logs, contracts, and tokens. The topic names follow the naming convention of the BigQuery tables so you can easily locate them: projects/blockchain-etl/topics/crypto_., where:
chaincan be one ofethereum,bitcoin,zcash,litecoin,dogecoin, ordash.table_namecan be one ofblocksortransactions. Additionally for Ethereum:logs,token_transfers,traces,contracts, andtokens.
Next up is an article about a Twitter bot that posts anomalous transactions using Dataflow and Cloud Functions.
Also read:
Follow us on Twitter: https://twitter.com/BlockchainETL
Published at Mon, 07 Oct 2019 12:16:36 +0000
{flickr|100|campaign}
