Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adds libraries for data processing #28

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
813b9c9
merged updated block extractor library from local misc package
0x6861746366574 Feb 28, 2022
1373e98
merged block delegates lib from local misc package
0x6861746366574 Feb 28, 2022
7366515
merged block harvester lib from local misc package
0x6861746366574 Feb 28, 2022
8564989
merged nember nft lib from local misc package
0x6861746366574 Feb 28, 2022
1546048
import fix for block extractor modules
0x6861746366574 Feb 28, 2022
0df377c
removed integration test stubs for unit test development
0x6861746366574 Feb 28, 2022
dd80570
updated gitignore and README to reflect block library additions
0x6861746366574 Feb 28, 2022
c63eef0
added unit tests for module-level functions in process.py
0x6861746366574 Feb 28, 2022
e970466
added integration tests for extractor and processor
0x6861746366574 Mar 23, 2022
a4d23d3
added large fixtures to enable full extractor integration tests
0x6861746366574 Mar 27, 2022
7cde8ea
minor formatting changes and lint cleanup
0x6861746366574 May 11, 2022
ddce65c
added dev requirements file
0x6861746366574 May 11, 2022
4aa6ebc
suppress too many locals, branches, statements
May 12, 2022
262215c
missing packages
May 12, 2022
eff24ff
fix generators inside any
May 12, 2022
b82ea1f
missing encodings + silence consider-using-with
May 19, 2022
479d6b4
disable similarities checker under block dir
May 19, 2022
3091831
erm, fixed wrong order
May 19, 2022
8e2f1f9
refactored variable names and argument parsers; added example account…
0x6861746366574 May 30, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
*~
*.pch
*.pyc
*.ipynb
__pycache__/
.idea
.vscode/
Expand Down
103 changes: 103 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,109 @@ Example: check accounts in `account/samples/verify_ownership.yaml`.
python -m account.verify_ownership --input account/samples/verify_ownership.yaml
```

## block

Running block extraction scripts requires the installaton of the local **block** package. This can be accomplished as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, it should be possible (and it is already - assuming you pip install all requirement files), to run the tools like
PYTHONPATH=. python3 block/delegates/find_delegates.py

```sh
pip install -e setup.py
```

### extractor/extract

_extracts chain data from node files and produces compact output for applications_

The extractor is the first step in processing raw block data using the block-level scripts, either to drive visualizations or chain analysis.
There are two output types:
- blocks
- statements

Example: extract data from node files stored in `block/data`
Default output dir is `block/resources`

```sh
python extractor/extract.py --input data --output resources
```
gimre-xymcity marked this conversation as resolved.
Show resolved Hide resolved

### extractor/process

_processes extracted chain data to generate useful/readable representation of chain state_

The processor streams data output by the extractor and builds human-readable representations of the block headers
as well as a rich, indexable representation of the chain state.
There are two output types:
- block headers
- chain state

Example: process data from extractor output stored in `block/resources`
Default output dir is `block/resources`

```sh
python extractor/process.py --input resources --output resources
```

### delegates/find_delegates

_finds current delegates associated with one or more nodes using serialized state data_

This script requires a JSON containing accounts similar to what is receieved from the /node/info API endpoint; see example in `resources/accounts.json`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is not present (?)

As long as node URLs/names are available it will attempt to get missing information from the nodes.

Example: find all delegates from nodes listed in `resources/accounts.json` using chain state from `resources/state_map.msgpack`.
Default output dir is `block/delegates/output`.

```sh
python delegates/find_delegates.py --input resources/accounts.json --state_path resources/state_map.msgpack
```

### harvester/get_harvester_stats

_aggregate harvesting statistics using serialized state data_

This script requires a JSON containing harvester addresses; see example in `resources/accounts.json`.
Stats are aggregated for the full chain history and binned based on provided frequencies.
The output falls into three categories:
- blocks harvested
- fees collected
- total XYM balance

Example: get stats for harvesters listed in `resources/accounts.json` using chain state from `resources/state_map.msgpack` and `resources/block_header_df.pkl`
Default output dir is `block/harvester/output`

```sh
python harvester/get_harvester_stats.py --input resources/accounts.json --state_path resources/state_map.msgpack --headers_path resources/block_header_df.pkl
```

### nft/nember_extract

_extract transactions corresponding to minting of nember NFTs_

Produces two types of output
- NFT descriptions
- transactions involving NFTs after minting

Example: extract nember data from chain data in `resources/block_data.msgpack`
Default output dir is `block/nft/output`

```sh
python nft/nember_extract.py --input resources/block_data.msgpack --output nft/output
```

### nft/nember_scrape

_scrape transactions corresponding to minting of nember NFTs from API nodes_

Produces two types of output
- NFT descriptions
- transactions involving NFTs after minting

Example: scrape all transactions corresponding to nember NFTs (takes a couple hours minimum)
Default output dir is `block/nft/output`

```sh
python nft/nember_scrape.py
```


## health

### check_nem_balances
Expand Down
1 change: 1 addition & 0 deletions block/block/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__all__ = ['extractor', 'extractor.util', 'extractor.state', 'extractor.format']
3 changes: 3 additions & 0 deletions block/block/delegates/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from block.delegates.delegates import find_delegates

__all__ = ['find_delegates']
42 changes: 42 additions & 0 deletions block/block/delegates/delegates.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
"""Symbol delegate mapping utilities"""

from binascii import unhexlify

import requests

from block.extractor import public_key_to_address


def find_delegates(accounts, state_map):
"""Find current delegates for each node based on chain state at final height"""

accounts = accounts.copy()
for acc in accounts:
if 'nodePublicKey' in acc:
node_address = public_key_to_address(unhexlify(acc['nodePublicKey']))
else:
print('No node public key present, trying to collect from API')
try:
node_key = requests.get(f'http://{acc["name"]}:3000/node/info').json()['nodePublicKey']
node_address = public_key_to_address(unhexlify(node_key))
except requests.exceptions.ConnectionError:
print(f'Failed to connect, skipping node: {acc["name"]}')
continue

# initialize delegates with node address
valid_delegates = [acc['address']]
invalid_delegates = []

for key, val in state_map.items():
if node_address in val['node_key_link']:
if val['node_key_link'][node_address][-1][1] == float('inf'):
if sum(val['xym_balance'].values()) >= (10000 * 1e6):
valid_delegates.append(key)
else:
invalid_delegates.append(key)
acc.update({
'node_address': node_address,
'valid_delegates': valid_delegates,
'invalid_delegates': invalid_delegates
})
return accounts
33 changes: 33 additions & 0 deletions block/block/delegates/find_delegates.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/usr/bin/env python3
"""Symbol delegate identification script"""

import argparse
import json

from block.delegates.delegates import find_delegates
from block.extractor import XYMStateMap

if __name__ == '__main__':

parser = argparse.ArgumentParser()
parser.add_argument('--input', type=str, default='resources/accounts.json', help='path to load node information from')
parser.add_argument('--output', type=str, default='delegates/output/node_delegates.json', help='path to write delegates json')
parser.add_argument('--state_path', type=str, default='resources/state_map.msgpack', help='path to load state map from')

args = parser.parse_args()

print(f'Reading state from {args.state_path}')
state_map = XYMStateMap.read_msgpack(args.state_path)

print(f'Reading nodes from {args.input}')
with open(args.input, 'r', encoding='utf8') as f:
accounts = json.loads(f.read())['accounts']

print('Identifying delegates . . .')
delegate_accounts = find_delegates(accounts, state_map)

print(f'All accounts processed, writing output to {args.output}')
with open(args.output, 'w', encoding='utf8') as f:
f.write(json.dumps(delegate_accounts, indent=4))

print('Delegate analysis complete!')
15 changes: 15 additions & 0 deletions block/block/extractor/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from block.extractor.state import XYMStateMap
from block.extractor.util import encode_address, fmt_unpack, public_key_to_address

__all__ = [
'state',
'format',
'util',
'statements',
'body',
'process',
'XYMStateMap',
'fmt_unpack',
'encode_address',
'public_key_to_address'
]
Loading