Skip to content

Latest commit

 

History

History
322 lines (245 loc) · 11.9 KB

README.md

File metadata and controls

322 lines (245 loc) · 11.9 KB

Bitcoin ETL

Join the chat at https://gitter.im/ethereum-eth Build Status Join Telegram Group

Install Bitcoin ETL:

pip install eos-etl

Export blocks and transactions (Schema, Reference):

> eosetl export_blocks_and_transactions --start-block 0 --end-block 500000 \
--provider-uri http://user:pass@localhost:8332 --chain bitcoin \
 --blocks-output blocks.json --transactions-output transactions.json

Supported chains:

  • bitcoin
  • bitcoin_cash
  • dogecoin
  • litecoin
  • dash
  • zcash

Stream blockchain data continually to console:

> pip install eos-etl[streaming]
> eosetl stream -p http://user:pass@localhost:8332 --start-block 500000

Stream blockchain data continually to Google Pub/Sub:

> export GOOGLE_APPLICATION_CREDENTIALS=/path_to_credentials_file.json
> eosetl stream -p http://user:pass@localhost:8332 --start-block 500000 --output projects/your-project/topics/bitcoin_blockchain

For the latest version, check out the repo and call

> pip install -e .[streaming] 
> python eosetl.py

Table of Contents

Schema

blocks.json

Field Type
hash hex_string
size bigint
stripped_size bigint
weight bigint
number bigint
version bigint
merkle_root hex_string
timestamp bigint
nonce hex_string
bits hex_string
coinbase_param hex_string
transaction_count bigint

transactions.json

Field Type
hash hex_string
size bigint
virtual_size bigint
version bigint
lock_time bigint
block_number bigint
block_hash hex_string
block_timestamp bigint
is_coinbase boolean
inputs []transaction_input
outputs []transaction_output
input_count bigint
output_count bigint
input_value bigint
output_value bigint
fee bigint

transaction_input

Field Type
index bigint
spent_transaction_hash hex_string
spent_output_index bigint
script_asm string
script_hex hex_string
sequence bigint
required_signatures bigint
type string
addresses []string
value bigint

transaction_output

Field Type
index bigint
script_asm string
script_hex hex_string
required_signatures bigint
type string
addresses []string
value bigint

You can find column descriptions in schemas

Notes:

  1. Output values returned by Dogecoin API had precision loss in the clients prior to version 1.14. It's caused by this issue dogecoin/dogecoin#1558
    The explorers that used older versions to export the data may show incorrect address balances and transaction amounts.

  2. For Zcash, vjoinsplit and valueBalance fields are converted to inputs and outputs with type 'shielded' https://zcash-rpc.github.io/getrawtransaction.html, https://zcash.readthedocs.io/en/latest/rtd_pages/zips/zip-0243.html

Exporting the Blockchain

  1. Install python 3.5.3+ https://www.python.org/downloads/

  2. Install Bitcoin node https://hackernoon.com/a-complete-beginners-guide-to-installing-a-bitcoin-full-node-on-linux-2018-edition-cb8e384479ea

  3. Start Bitcoin. Make sure it downloaded the blocks that you need by executing $ bitcoin-cli getblockchaininfo in the terminal. You can export blocks below blocks, there is no need to wait until the full sync

  4. Install Bitcoin ETL:

    > pip install eos-etl
  5. Export blocks & transactions:

    > eosetl export_all --start 0 --end 499999  \
    --partition-batch-size 100 \
    --provider-uri http://user:pass@localhost:8332 --chain bitcoin

    The result will be in the output subdirectory, partitioned in Hive style:

    output/blocks/start_block=00000000/end_block=00000099/blocks_00000000_00000099.csv
    output/blocks/start_block=00000100/end_block=00000199/blocks_00000100_=00000199.csv
    ...
    output/transactions/start_block=00000000/end_block=00000099/transactions_00000000_00000099.csv
    ...

    In case eosetl command is not available in PATH, use python -m eosetl instead.

Running in Docker

  1. Install Docker https://docs.docker.com/install/

  2. Build a docker image

    > docker build -t eos-etl:latest .
    > docker image ls
  3. Run a container out of the image

    > MSYS_NO_PATHCONV=1 docker run -v $HOME/output:/eos-etl/output eos-etl:latest \
        export_blocks_and_transactions --max-workers 50 --start-block 30000000 \
        --end-block 30000100 --provider-uri http://your_eos_node:node_port \
        --blocks-output ./output/blocks.csv --transactions-output ./output/transactions.csv \
        --actions-output ./output/actions.csv
  4. Run streaming to console or Pub/Sub

    > MSYS_NO_PATHCONV=1 docker build -t eos-etl:latest-streaming -f Dockerfile_with_streaming .
    > echo "Stream to console"
    > MSYS_NO_PATHCONV=1 docker run eos-etl:latest-streaming stream -p http://user:pass@localhost:8332 --start-block 500000
    > echo "Stream to Pub/Sub"
    > MSYS_NO_PATHCONV=1 docker run -v /path_to_credentials_file/:/eos-etl/ --env GOOGLE_APPLICATION_CREDENTIALS=/eos-etl/credentials_file.json eos-etl:latest-streaming stream -p http://user:pass@localhost:8332 --start-block 500000 --output projects/your-project/topics/crypto_eos
  5. Refer to https://github.com/blockchain-etl/blockchain-etl-streaming for deploying the streaming app to Google Kubernetes Engine.

Command Reference

All the commands accept -h parameter for help, e.g.:

> python eosetl.py export_blocks_and_transactions --help
Usage: eosetl.py export_blocks_and_transactions [OPTIONS]

  Export blocks and transactions.

Options:
  -s, --start-block INTEGER   Start block
  -e, --end-block INTEGER     End block  [required]
  -p, --provider-uri TEXT     The URI of the remote Bitcoin node
  -w, --max-workers INTEGER   The maximum number of workers.
  --blocks-output TEXT        The output file for blocks. If not provided
                              blocks will not be exported. Use "-" for stdout
  --transactions-output TEXT  The output file for transactions. If not
                              provided transactions will not be exported. Use
                              "-" for stdout
  --actions-output TEXT       The output file for actions. If not provided
                              transactions will not be exported. Use "-"
                              for stdout
  --help                      Show this message and exit.

For the --output parameters the supported type is json. The format type is inferred from the output file name.

export_blocks_and_transactions

> python eosetl.py export_blocks_and_transactions --start-block 0 --end-block 500000 \
  --provider-uri http://user:pass@localhost:8332 \
  --blocks-output blocks.json --transactions-output transactions.json

Omit --blocks-output or --transactions-output or --actions-output options if you want to export only transactions/blocks/actions.

You can tune --max-workers for performance.

Note that required_signatures, type, addresses, and value fields will be empty in transactions inputs. Use enrich_transactions to populate those fields.

enrich_transactions

> python eosetl.py enrich_transactions  \
  --provider-uri http://user:pass@localhost:8332 \
  --transactions-input transactions.json --transactions-output enriched_transactions.json

You can tune --batch-size, --max-workers for performance.

get_block_range_for_date

> python eosetl.py get_block_range_for_date --provider-uri http://user:pass@localhost:8332 --date=2017-03-01

export_all

> python eosetl.py export_all --provider-uri http://user:pass@localhost:8332 --start 2018-01-01 --end 2018-01-02

You can tune --export-batch-size, --max-workers for performance.

stream

> python eosetl.py stream --provider-uri http://user:pass@localhost:8332 --start-block 500000
  • This command outputs blocks and transactions to the console by default.
  • Use --output option to specify the Google Pub/Sub topic where to publish blockchain data, e.g. projects/your-project/topics/bitcoin_blockchain.
  • The command saves its state to last_synced_block.txt file where the last synced block number is saved periodically.
  • Specify either --start-block or --last-synced-block-file option. --last-synced-block-file should point to the file where the block number, from which to start streaming the blockchain data, is saved.
  • Use the --lag option to specify how many blocks to lag behind the head of the blockchain. It's the simplest way to handle chain reorganizations - they are less likely the further a block from the head.
  • Use the --chain option to specify the type of the chain, e.g. bitcoin, litecoin, dash, zcash, etc.
  • You can tune --period-seconds, --batch-size, --max-workers for performance.

Running Tests

> pip install -e .[dev]
> echo "The below variables are optional"
> export eosetl_BITCOIN_PROVIDER_URI=http://user:pass@localhost:8332
> export eosetl_LITECOIN_PROVIDER_URI=http://user:pass@localhost:8331
> export eosetl_DOGECOIN_PROVIDER_URI=http://user:pass@localhost:8330
> export eosetl_BITCOIN_CASH_PROVIDER_URI=http://user:pass@localhost:8329
> export eosetl_DASH_PROVIDER_URI=http://user:pass@localhost:8328
> export eosetl_ZCASH_PROVIDER_URI=http://user:pass@localhost:8327
> pytest -vv

Running Tox Tests

> pip install tox
> tox

Public Datasets in BigQuery

https://cloud.google.com/blog/products/data-analytics/introducing-six-new-cryptocurrencies-in-bigquery-public-datasets-and-how-to-analyze-them