Clarifications about which Joern output nodes are being used #9

davidhin · 2021-05-11T02:36:11Z

In the following code:

ReVeal/data_processing/create_ggnn_data.py

Lines 301 to 303 in bef6c92

    
           cfgNode = node['isCFGNode'].strip() 
        
           if not cfg_only and (cfgNode == '' or cfgNode == 'False'): 
        
               continue

It seems like you're only using CFG nodes from Joern's output, and discarding the rest. Is this correct?

davidhin · 2021-05-12T22:27:02Z

Upon closer inspection, the cfg_only boolean appears to be the reverse of what it should be. We would expect that

if cfg_only is true, then we are only keeping cfg nodes.
If cfg_only is false, we are keeping non_cfgs as well.

However, what this code is actually doing, is the reverse, due to the not cfg_only condition. Is this intended behaviour? It results in graph_input_full actually missing many nodes, as it is only keeping CFG nodes, which is not what "full" suggests.

ReVeal/data_processing/create_ggnn_data.py

Lines 411 to 412 in bef6c92

    
           graph_input_full = inputGeneration( 
        
               nodes_path, edges_path, label, model, edgeType_full, False)

If you check the output of the above, compare it with cfg_only = True and cfg_only = False.

tl;dr which ggnn input should we generate to replicate the results? In the provided data, for example, data/ggnn_input/devign has cfg, cfg_dfg, and dfg. I.e. no AST. But in the paper, you mention using CPG, which includes AST

for-just-we · 2021-08-02T02:32:57Z

what

I presume he use python clang API to parse the statement(node) in CFG into AST, because the AST node is in a extent independent from CFG edge and PDG edge. I don't know whether I assume the right thing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarifications about which Joern output nodes are being used #9

Clarifications about which Joern output nodes are being used #9

davidhin commented May 11, 2021

davidhin commented May 12, 2021 •

edited

Loading

for-just-we commented Aug 2, 2021

Clarifications about which Joern output nodes are being used #9

Clarifications about which Joern output nodes are being used #9

Comments

davidhin commented May 11, 2021

davidhin commented May 12, 2021 • edited Loading

for-just-we commented Aug 2, 2021

davidhin commented May 12, 2021 •

edited

Loading