Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adds libraries for data processing #28

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

0x6861746366574
Copy link

This pull request carves off individual portions of the extractor tool into their own python modules for better maintainability.

This includes:

  • extractor/extract: for pulling raw block and statement data from .blk files.
  • extractor/process: for streaming data output from /extract into block headers and chain states.
  • delegates/find_delegates: for quickly searching for delegates associated with one or more nodes during a specific period of time.
  • harvester/get_harvester_stats: for collecting harvesting data of an individual node.
  • nft/nember_extract: for extracting NFT descriptions and transactions related to NEMberArt NFTs
  • nft/nember_scrape: for pulling NEMBerArt transactions directly from API nodes.

Copy link
Member

@gimre-xymcity gimre-xymcity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general comments:

  1. we usually do not shortcut things if there's no need, so:
    • account rather than acc,
    • transaction rather than tx - although I do this one all the time as well,
    • rcpt -> recipient,
    • b_series/f_series -> balance_series/fee_series)


_finds current delegates associated with one or more nodes using serialized state data_

This script requires a JSON containing accounts similar to what is receieved from the /node/info API endpoint; see example in `resources/accounts.json`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is not present (?)


block_format_pattern = re.compile('[0-9]{5}'+args.block_extension)
block_paths = glob.glob(os.path.join(args.input, '**', '*'+args.block_extension), recursive=True)
block_paths = tqdm(sorted(list(filter(lambda x: block_format_pattern.match(os.path.basename(x)), block_paths))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're getting extra points for tqdm ;), heck we should be throwing it everywhere

README.md Show resolved Hide resolved
Comment on lines 219 to 221
parser.add_argument('--block_save_path', type=str, default='block_data.msgpack', help='file to write the extracted block data to')
parser.add_argument('--statement_save_path', type=str, default='stmt_data.msgpack', help='file to write extracted statement data to')
parser.add_argument('--state_save_path', type=str, default='state_map.msgpack', help='file to write the extracted chain state data to')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, but would probably drop those options and use hardcoded filenames - given that you can set output directory

(or could hide like this https://stackoverflow.com/questions/37303960/show-hidden-option-using-argparse)

for chunk in tx_chunks:
filtered.append(filter_transactions(chunk, address, tx_types, start_datetime, end_datetime))
return pd.concat(filtered, axis=0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none of process_tx_file, filter_transactions, guarded_convert are used here, so would move to some other file?

@@ -51,6 +51,109 @@ Example: check accounts in `account/samples/verify_ownership.yaml`.
python -m account.verify_ownership --input account/samples/verify_ownership.yaml
```

## block

Running block extraction scripts requires the installaton of the local **block** package. This can be accomplished as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, it should be possible (and it is already - assuming you pip install all requirement files), to run the tools like
PYTHONPATH=. python3 block/delegates/find_delegates.py

block/block/nft/nember_scrape.py Outdated Show resolved Hide resolved
escapechar='\\',
quoting=csv.QUOTE_MINIMAL)

unpacker = msgpack.Unpacker(open(args.input, 'rb'), unicode_errors=None, raw=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is unicode_errors actually needed? msgpack dock have this scary warning:

This option should be used only when you have msgpack data which contains invalid UTF-8 string.

# pylint: disable=too-many-nested-blocks, too-many-branches

with open(args.input, 'rb') as file:
blocks = msgpack.unpack(file, unicode_errors=None, raw=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried running extract, but it fails for me during unpack - not sure what I did wrong:

raceback (most recent call last):
  File "block/nft/nember_extract.py", line 90, in <module>
    main(parsed_args)
  File "block/nft/nember_extract.py", line 22, in main
    blocks = msgpack.unpack(file, unicode_errors=None, raw=True)
  File "/usr/local/lib/python3.8/dist-packages/msgpack/__init__.py", line 58, in unpack
    return unpackb(data, **kwargs)
  File "msgpack/_unpacker.pyx", line 208, in msgpack._unpacker.unpackb
msgpack.exceptions.ExtraData: unpack(b) received extra data.

Comment on lines 41 to 43
gen_tx = gen_tx[0]
meta_tx = meta_tx[0]
supply_tx = supply_tx[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually don't reuse variables like this.
things like blocks = sorted(blocks) are fine - does not change the type

this one changes from array to single entity (all 3 gen_tx, meta_tx, supply_tx)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants