192 Improve `get_code_dependency` (Code Parser) #201

m7pr · 2023-11-23T13:30:31Z

Closes #192

Overview

The Code Parser feature accepts code in text form and outputs all necessary code to recreate any object from the original input.

If the code contains side effects that don't precisely specify the influenced object(s), users can add a comment with the tag # @linksto object_name1 object_name2 to explicitly identify the affected objects. This ensures that the side effects are included when generating the code needed to create the objects. This is also why the input needs to be a character, not expression or language because comments are preserved in character and are removed in expression and language.

Historical Background

The CodeDepends package was evaluated to check if it provided the features required to accomplish our goals (#238).
- However, CodeDepends was found to have excessive dependencies and was deemed more than what we needed.
- Consequently, we developed the current solution, named Code Parser`
  - It is based on the graph dependency concept in CodeDepends and utilizes utils::getParseData.
  - utils::getParseData works on a srcref data.frame structure,
  - srcref is created by calling attr(parse(text = code, keep.source = TRUE), 'srcref')
  - That's why code is limited to character or expression with srcref attribute.
The Code Parser was initially introduced in teal.code (PR #146), designed to take the qenv class as input.
Subsequently, we adopted a more general approach where the input is assumed to be character. This allows passing teal_data with code in teal_data@code slot, leading to the migration of this functionality to the teal.data (PR #194).

Implementation Plan

Code Parser consists of:

Code Graph - The code_dependency / code_graph function constructs the structure of dependencies between objects and their occurrences in specific calls derived from the code.
Graph Parser - The get_code_dependency / get_object_code function take a code graph as input and the names of objects existing in the code. It returns the code, including all necessary dependencies, to recreate the specified object and its influencers.

Pseudo code / algorithm

Code Graph

Take code as an input (a character or an expression with srcref attribute.
Put the code in utils::getParseData to extract information about the parsed code with built-in functions.
utils::getParseData creates a data.frame structure (pd) enumerating each call, and enumerating each object/symbol within calls. Each object/symbol has a token metadata specifying how it's treated by R (e.g., SYMBOL, ASSIGNMENT OPERATOR, FUNCTION_CALL, SYMBOL_FORMALS etc).
Thanks to pd, we are able to bind all elements of all calls of the input code into a list. The list has a length equal to the number of calls (calls_pd).
Then within calls, we would like to extract objects by their metadata (included in token) so that we seek for "SYMBOL", "SYMBOL_FUNCTION_CALL" and grep for ASSIGN operators to understand which object is influenced by other objects in this call.
We also check for COMMENT tokens that contain @linksto tag to understand whether some calls should be assigned as influencers of other objects.
With the above information, we need to build a structure that in some way presents:
- which objects exist in which calls
- which objects influence other objects and in which calls
- which side-effects influence which objects in which calls

⚠️ the above structure and its creation is a part of this PR which simplifies the current implementation

Graph Parser

Having the Code Graph we take an object name (of multiple names) and we

(1) seek for a call in which this object was created (let's call it call X)
(2) we limit the input calls_pd of all calls until call X (let's call it calls_pd_x)
(3) we identify all influencers and side effects of object in the calls_pd_x
we repeat (1-3) for influencers from (3) and new calls_pd_x until all considered objects no longer have influencers

⚠️ the above process simplification is also a part of this PR since side_effects are detected by Code Parser and could be detected by Code Graph
⚠️ the above process will also be simplified in this PR as it is based on object names and calls indexes, but could be merged into a process that uses one of those two

Notes

The relationship between objects is assumed to be conveyed through <-, =, or -> assignment operators. No other object creation methods (such as assign, <<-, or any non-standard-evaluation method) are supported. This is addressed by using the # @linktso tag.
We do not assume any non-standard operations nor evaluations in data processing code; however, if someone needs to create objects the other way than with the assignment operators, the # @linktso tag is meant for it.
Any specific side effects that should be returned with a specific object should be tagged with # @linktso tag at the end of the line where the side effect is created.

m7pr · 2023-11-23T16:26:38Z

Pushed a small alternative 3224a48 but this is not finished yet

…which'

m7pr · 2023-11-27T14:53:29Z

Hey @insightsengineering/nest-core-dev I prepared curated version of Code Parser. I would appreciate your review!
For now there is few utils functions, like assert_classes, assert_code, assert_names or is_empty that are here just to make the code review and code readability easier from the high level perspective. Those should be incorporated in the main get_code_dependencies function. I also divided code_graph into 2 smaller functions extract_occurence and extract_side_effects so that we can track pieces of code responsible for single purpose. There is also a couple of smaller functions that makes the code review easier and allow to name pieces of the code. This was easier for moving things around on a prototyping phase and I though it's gonna be still helpful on the review side.

If anybody is willing to dive deeper into the code I think the biggest help that I need is simplifying the extract_occurence function and writing more edge-case tests.

Lastly I did not decide to export get_code_dependencies function as this is limited to our cases of simple data preparations and I dont think it will have bigger applications in broader situations for more sophisticated R codes.

Co-authored-by: Aleksander Chlebowski <[email protected]> Signed-off-by: Marcin <[email protected]>

…ightsengineering/teal.data into 192_improve_code_parser@main

Signed-off-by: Marcin <[email protected]>

…ightsengineering/teal.data into 192_improve_code_parser@main

comments addressed

gogonzo · 2023-12-05T14:45:43Z

Good job @m7pr @chlebowa and Me :D

m7pr · 2023-12-11T08:13:50Z

Hey @chlebowa thanks for the final review and a huge documentation cleanup! You are da man! Thanks @gogonzo for all the feedback related to implementation. It looks like code parser is way simpler than what we had in the first attempt.

Empty-Commit

b1cf8e7

m7pr added the core label Nov 23, 2023

m7pr added 2 commits November 23, 2023 15:53

rename object_names to names

3e3da2e

alternative approach for code parser

3224a48

Merge branch 'refactor' into 192_improve_code_parser@main

3942498

m7pr changed the base branch from main to refactor November 24, 2023 11:12

m7pr added 20 commits November 24, 2023 12:31

rewerite tests from get_code_dependency into get_code(datanames)

d5566e2

rename test file

50f9de1

add assertion functions

acddea9

skip parents in influencers detection

ea4d726

include elements of code_graph

cc20126

curate code_parser

5c19b7b

one more thing to be fixed in which(influencers_deps) : argument to '…

cecdc15

…which'

rename append to prepend

3f64511

add documentation

6ab563d

reorder

f02a6df

fix influencers ids extraction

0897296

remove old code_dependency

0b35554

mark tests to be fixed

02734bb

rename files

fb37a00

fix few more tests

a4c5003

fix multiple @linktso tags apperance

c77c956

last TODO: apply used_in_function in extract_occurence

58354be

fix cases for parameters used in functions

a26b24e

fix functions edge-case for code parser

efcaf23

create remove_graph_dyplicates function

9908e3c

m7pr marked this pull request as ready for review November 27, 2023 14:47

gogonzo self-assigned this Nov 28, 2023

m7pr and others added 14 commits December 1, 2023 15:06

Update R/utils-get_code_dependency.R

e459689

Co-authored-by: Aleksander Chlebowski <[email protected]> Signed-off-by: Marcin <[email protected]>

Update R/utils-get_code_dependency.R

4143e00

Co-authored-by: Aleksander Chlebowski <[email protected]> Signed-off-by: Marcin <[email protected]>

change notes to details

543e302

Merge branch '192_improve_code_parser@main' of https://github.com/ins…

28a37c9

…ightsengineering/teal.data into 192_improve_code_parser@main

add a statement about get_code_dependency to get_code documentation

d8a1eaa

comment out 2 tests for now

28d22ad

change ":" to "<-"

91327a9

remove skip

0891a1f

add any(occurence) to a check for influencers

bb5bb39

bring back old fix_comments

07a000b

Signed-off-by: Marcin <[email protected]>

remove remove_graph_duplicates

5a5cfc2

Merge branch '192_improve_code_parser@main' of https://github.com/ins…

98bd6e8

…ightsengineering/teal.data into 192_improve_code_parser@main

extend value in get_code_dependency

548b3b0

unskip 2 passing tests

652b0e1

m7pr requested a review from chlebowa December 1, 2023 15:26

Aleksander Chlebowski added 9 commits December 4, 2023 16:13

update documentation for get_code

ac95af8

update documentation for get_code_dependency

de012a5

linter

62321a3

update documentation for code_graph

2c98525

update documentation for extract_side_effects

96217c8

update documentation for graph_parser

00888de

rename influecer/affected to dependdependent//dependency

6be4aa3

tweak code

317665e

style code

7b01a2f

gogonzo merged commit f00cffe into refactor Dec 5, 2023

gogonzo deleted the 192_improve_code_parser@main branch December 5, 2023 14:45

m7pr mentioned this pull request Jan 24, 2024

[Question]: Should we have a separate vignette or at least a GitHub issue explaining implementation of get_code_dependency #278

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

192 Improve `get_code_dependency` (Code Parser) #201

192 Improve `get_code_dependency` (Code Parser) #201

m7pr commented Nov 23, 2023 •

edited by gogonzo

Loading

m7pr commented Nov 23, 2023

m7pr commented Nov 27, 2023

gogonzo commented Dec 5, 2023

m7pr commented Dec 11, 2023

192 Improve get_code_dependency (Code Parser) #201

192 Improve get_code_dependency (Code Parser) #201

Conversation

m7pr commented Nov 23, 2023 • edited by gogonzo Loading

Overview

Historical Background

Implementation Plan

Pseudo code / algorithm

Notes

m7pr commented Nov 23, 2023

m7pr commented Nov 27, 2023

gogonzo commented Dec 5, 2023

m7pr commented Dec 11, 2023

192 Improve `get_code_dependency` (Code Parser) #201

192 Improve `get_code_dependency` (Code Parser) #201

m7pr commented Nov 23, 2023 •

edited by gogonzo

Loading