feat: adds libraries for data processing #28

0x6861746366574 · 2022-05-09T23:06:33Z

This pull request carves off individual portions of the extractor tool into their own python modules for better maintainability.

This includes:

extractor/extract: for pulling raw block and statement data from .blk files.
extractor/process: for streaming data output from /extract into block headers and chain states.
delegates/find_delegates: for quickly searching for delegates associated with one or more nodes during a specific period of time.
harvester/get_harvester_stats: for collecting harvesting data of an individual node.
nft/nember_extract: for extracting NFT descriptions and transactions related to NEMberArt NFTs
nft/nember_scrape: for pulling NEMBerArt transactions directly from API nodes.

# Conflicts: # block/block/extractor/extract.py # block/block/extractor/process.py

gimre-xymcity

general comments:

we usually do not shortcut things if there's no need, so:
- account rather than acc,
- transaction rather than tx - although I do this one all the time as well,
- rcpt -> recipient,
- b_series/f_series -> balance_series/fee_series)

gimre-xymcity · 2022-05-20T14:08:53Z

README.md

+
+_finds current delegates associated with one or more nodes using serialized state data_
+
+This script requires a JSON containing accounts similar to what is receieved from the /node/info API endpoint; see example in `resources/accounts.json`.


this file is not present (?)

gimre-xymcity · 2022-05-20T14:20:38Z

block/block/extractor/extract.py

+
+    block_format_pattern = re.compile('[0-9]{5}'+args.block_extension)
+    block_paths = glob.glob(os.path.join(args.input, '**', '*'+args.block_extension), recursive=True)
+    block_paths = tqdm(sorted(list(filter(lambda x: block_format_pattern.match(os.path.basename(x)), block_paths))))


you're getting extra points for tqdm ;), heck we should be throwing it everywhere

README.md

gimre-xymcity · 2022-05-20T15:02:13Z

block/block/extractor/extract.py

+    parser.add_argument('--block_save_path', type=str, default='block_data.msgpack', help='file to write the extracted block data to')
+    parser.add_argument('--statement_save_path', type=str, default='stmt_data.msgpack', help='file to write extracted statement data to')
+    parser.add_argument('--state_save_path', type=str, default='state_map.msgpack', help='file to write the extracted chain state data to')


not sure, but would probably drop those options and use hardcoded filenames - given that you can set output directory

(or could hide like this https://stackoverflow.com/questions/37303960/show-hidden-option-using-argparse)

gimre-xymcity · 2022-05-20T15:14:48Z

block/block/extractor/process.py

+    for chunk in tx_chunks:
+        filtered.append(filter_transactions(chunk, address, tx_types, start_datetime, end_datetime))
+    return pd.concat(filtered, axis=0)
+


none of process_tx_file, filter_transactions, guarded_convert are used here, so would move to some other file?

gimre-xymcity · 2022-05-20T20:32:49Z

README.md

@@ -51,6 +51,109 @@ Example: check accounts in `account/samples/verify_ownership.yaml`.
 python -m account.verify_ownership --input account/samples/verify_ownership.yaml
 ```

+## block
+
+Running block extraction scripts requires the installaton of the local **block** package. This can be accomplished as follows:


btw, it should be possible (and it is already - assuming you pip install all requirement files), to run the tools like
PYTHONPATH=. python3 block/delegates/find_delegates.py

block/block/nft/nember_scrape.py

gimre-xymcity · 2022-05-20T20:39:29Z

block/block/extractor/process.py

+        escapechar='\\',
+        quoting=csv.QUOTE_MINIMAL)
+
+    unpacker = msgpack.Unpacker(open(args.input, 'rb'), unicode_errors=None, raw=True)


is unicode_errors actually needed? msgpack dock have this scary warning:

This option should be used only when you have msgpack data which contains invalid UTF-8 string.

gimre-xymcity · 2022-05-20T20:40:51Z

block/block/nft/nember_extract.py

+    # pylint: disable=too-many-nested-blocks, too-many-branches
+
+    with open(args.input, 'rb') as file:
+        blocks = msgpack.unpack(file, unicode_errors=None, raw=True)


I've tried running extract, but it fails for me during unpack - not sure what I did wrong:

raceback (most recent call last): File "block/nft/nember_extract.py", line 90, in <module> main(parsed_args) File "block/nft/nember_extract.py", line 22, in main blocks = msgpack.unpack(file, unicode_errors=None, raw=True) File "/usr/local/lib/python3.8/dist-packages/msgpack/__init__.py", line 58, in unpack return unpackb(data, **kwargs) File "msgpack/_unpacker.pyx", line 208, in msgpack._unpacker.unpackb msgpack.exceptions.ExtraData: unpack(b) received extra data.

gimre-xymcity · 2022-05-20T20:46:30Z

block/block/nft/nember_extract.py

+                    gen_tx = gen_tx[0]
+                    meta_tx = meta_tx[0]
+                    supply_tx = supply_tx[0]


we usually don't reuse variables like this.
things like blocks = sorted(blocks) are fine - does not change the type

this one changes from array to single entity (all 3 gen_tx, meta_tx, supply_tx)

…s.json using old NGL nodes

0x6861746366574 requested a review from Jaguar0625 May 9, 2022 23:13

Jaguar0625 requested a review from gimre-xymcity May 9, 2022 23:16

0x6861746366574 added 11 commits May 10, 2022 19:31

merged updated block extractor library from local misc package

813b9c9

merged block delegates lib from local misc package

1373e98

merged block harvester lib from local misc package

7366515

merged nember nft lib from local misc package

8564989

import fix for block extractor modules

1546048

removed integration test stubs for unit test development

0df377c

updated gitignore and README to reflect block library additions

dd80570

added unit tests for module-level functions in process.py

c63eef0

added integration tests for extractor and processor

e970466

added large fixtures to enable full extractor integration tests

a4d23d3

minor formatting changes and lint cleanup

7cde8ea

0x6861746366574 force-pushed the block_lib branch from 6c77252 to 7cde8ea Compare May 11, 2022 02:54

0x6861746366574 and others added 7 commits May 10, 2022 22:12

added dev requirements file

ddce65c

suppress too many locals, branches, statements

4aa6ebc

# Conflicts: # block/block/extractor/extract.py # block/block/extractor/process.py

missing packages

262215c

fix generators inside any

eff24ff

missing encodings + silence consider-using-with

b82ea1f

disable similarities checker under block dir

479d6b4

erm, fixed wrong order

3091831

gimre-xymcity reviewed May 20, 2022

View reviewed changes

refactored variable names and argument parsers; added example account…

8e2f1f9

…s.json using old NGL nodes

0x6861746366574 requested a review from gimre-xymcity May 31, 2022 18:13

gimre-xymcity approved these changes Nov 30, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adds libraries for data processing #28

feat: adds libraries for data processing #28

0x6861746366574 commented May 9, 2022

gimre-xymcity left a comment

gimre-xymcity May 20, 2022

gimre-xymcity May 20, 2022

gimre-xymcity May 20, 2022

gimre-xymcity May 20, 2022

gimre-xymcity May 20, 2022

gimre-xymcity May 20, 2022

gimre-xymcity May 20, 2022

gimre-xymcity May 20, 2022


		_finds current delegates associated with one or more nodes using serialized state data_

		This script requires a JSON containing accounts similar to what is receieved from the /node/info API endpoint; see example in `resources/accounts.json`.

feat: adds libraries for data processing #28

Are you sure you want to change the base?

feat: adds libraries for data processing #28

Conversation

0x6861746366574 commented May 9, 2022

gimre-xymcity left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment