-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adds libraries for data processing #28
base: main
Are you sure you want to change the base?
Conversation
6c77252
to
7cde8ea
Compare
# Conflicts: # block/block/extractor/extract.py # block/block/extractor/process.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general comments:
- we usually do not shortcut things if there's no need, so:
account
rather thanacc
,transaction
rather thantx
- although I do this one all the time as well,rcpt
->recipient
,b_series/f_series
->balance_series
/fee_series
)
|
||
_finds current delegates associated with one or more nodes using serialized state data_ | ||
|
||
This script requires a JSON containing accounts similar to what is receieved from the /node/info API endpoint; see example in `resources/accounts.json`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file is not present (?)
|
||
block_format_pattern = re.compile('[0-9]{5}'+args.block_extension) | ||
block_paths = glob.glob(os.path.join(args.input, '**', '*'+args.block_extension), recursive=True) | ||
block_paths = tqdm(sorted(list(filter(lambda x: block_format_pattern.match(os.path.basename(x)), block_paths)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're getting extra points for tqdm ;), heck we should be throwing it everywhere
block/block/extractor/extract.py
Outdated
parser.add_argument('--block_save_path', type=str, default='block_data.msgpack', help='file to write the extracted block data to') | ||
parser.add_argument('--statement_save_path', type=str, default='stmt_data.msgpack', help='file to write extracted statement data to') | ||
parser.add_argument('--state_save_path', type=str, default='state_map.msgpack', help='file to write the extracted chain state data to') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure, but would probably drop those options and use hardcoded filenames - given that you can set output directory
(or could hide like this https://stackoverflow.com/questions/37303960/show-hidden-option-using-argparse)
for chunk in tx_chunks: | ||
filtered.append(filter_transactions(chunk, address, tx_types, start_datetime, end_datetime)) | ||
return pd.concat(filtered, axis=0) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
none of process_tx_file
, filter_transactions
, guarded_convert
are used here, so would move to some other file?
@@ -51,6 +51,109 @@ Example: check accounts in `account/samples/verify_ownership.yaml`. | |||
python -m account.verify_ownership --input account/samples/verify_ownership.yaml | |||
``` | |||
|
|||
## block | |||
|
|||
Running block extraction scripts requires the installaton of the local **block** package. This can be accomplished as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, it should be possible (and it is already - assuming you pip install all requirement files), to run the tools like
PYTHONPATH=. python3 block/delegates/find_delegates.py
escapechar='\\', | ||
quoting=csv.QUOTE_MINIMAL) | ||
|
||
unpacker = msgpack.Unpacker(open(args.input, 'rb'), unicode_errors=None, raw=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is unicode_errors
actually needed? msgpack dock have this scary warning:
This option should be used only when you have msgpack data which contains invalid UTF-8 string.
# pylint: disable=too-many-nested-blocks, too-many-branches | ||
|
||
with open(args.input, 'rb') as file: | ||
blocks = msgpack.unpack(file, unicode_errors=None, raw=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried running extract, but it fails for me during unpack - not sure what I did wrong:
raceback (most recent call last):
File "block/nft/nember_extract.py", line 90, in <module>
main(parsed_args)
File "block/nft/nember_extract.py", line 22, in main
blocks = msgpack.unpack(file, unicode_errors=None, raw=True)
File "/usr/local/lib/python3.8/dist-packages/msgpack/__init__.py", line 58, in unpack
return unpackb(data, **kwargs)
File "msgpack/_unpacker.pyx", line 208, in msgpack._unpacker.unpackb
msgpack.exceptions.ExtraData: unpack(b) received extra data.
block/block/nft/nember_extract.py
Outdated
gen_tx = gen_tx[0] | ||
meta_tx = meta_tx[0] | ||
supply_tx = supply_tx[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually don't reuse variables like this.
things like blocks = sorted(blocks)
are fine - does not change the type
this one changes from array to single entity (all 3 gen_tx, meta_tx, supply_tx)
…s.json using old NGL nodes
This pull request carves off individual portions of the extractor tool into their own python modules for better maintainability.
This includes:
extractor/extract:
for pulling raw block and statement data from .blk files.extractor/process:
for streaming data output from/extract
into block headers and chain states.delegates/find_delegates:
for quickly searching for delegates associated with one or more nodes during a specific period of time.harvester/get_harvester_stats:
for collecting harvesting data of an individual node.nft/nember_extract:
for extracting NFT descriptions and transactions related to NEMberArt NFTsnft/nember_scrape:
for pulling NEMBerArt transactions directly from API nodes.