Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser #63

Open
wants to merge 4 commits into
base: parser
Choose a base branch
from
Open

Parser #63

wants to merge 4 commits into from

Conversation

BranBerAVI
Copy link

Added functions to the parser service that can be used to calculate the fanin, fanout, and npath metrics from contents in a file.

Isolated parser service. Made other function schemas in other services compatible with parser function schema
@nuthanmunaiah nuthanmunaiah self-requested a review June 7, 2021 18:10
@nuthanmunaiah nuthanmunaiah added the enhancement New feature or request label Jun 11, 2021
Copy link
Member

@nuthanmunaiah nuthanmunaiah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall - Round 1

  • Changes to metrics/functionchurn/functionchurn/schemas/parser.py and metrics/loc/loc/schemas/parser.py must not be a part of this pull request.
  • Run your changes through pylint using the attached .pylintrc configuration file.
  • Use find or findall from the ElementTree API to walk the XML tree generated by SrcML.
  • Where did you get the list of library functions, streams, and reserved words of C in parser/constants.py from? Add a documentation comment to the beginning of the file with some context on the source of these lists.
  • What about the list of library functions, streams, and reserved words for C++?
  • Enhance parser.models.Function model to include attributes being introduced. Model defines the data structure and schema makes it easier to serialize and deserialize the structured data.
  • Use _get_span(element) to get the position of an element.
  • [nit] Follow conventions from How to Write a Git Commit Message.
  • [nit] Use ' (single quote) instead of " (double quote) when defining strings.
  • [nit] Add a whitespace after the comma when defining list elements. For example, do a = ['one', 'two', 'three'] and not a = ['one','two','three'].

Comment on lines +61 to +63
@rpc
def get_language(self, name):
return self.inferer.infer(name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of the get_language service method?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to get a better understanding of nameko I made my own function that gets the language to test the service out. However, I forgot to remove it.

@@ -2,6 +2,7 @@

from nameko.dependency_providers import Config
from nameko.rpc import rpc
from nameko.testing.services import worker_factory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this import left here by mistake?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I forgot to remove it when trying to write tests for the parser service

"long", "signed", "switch", "void", "else", "for", "case",
"default", "register", "sizeof", "typedef", "volatile",
"enum", "goto", "char", "do", "return", "static",
"union", "while", "extern", "if"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always end text files (including source files) with a newline character. If you are using VS Code, set files.insertFinalNewline to true in the settings.json file.

See https://thoughtbot.com/blog/no-newline-at-end-of-file for more information and historical context for the convention.

@@ -1,7 +1,7 @@
from .srcmlparser import SrcMLParser
from ..exceptions import NoParser

PARSERS = {'C': SrcMLParser, 'C++': SrcMLParser}
PARSERS = {'C': SrcMLParser, 'C++': SrcMLParser, 'Java': SrcMLParser}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SrcMLParser, as it stands today, has not been explicitly tested to parse Java source yet. Issue #23 has been specifically created to extend the capabilities of the parser beyond C and C++. As a result, this change must be undone.


parent_structure_name = fields.String(allow_none=True)
parent_structure_type = fields.String(allow_none=True)
file_name = fields.String(allow_none=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this field necessary since the name of the file is the same that that with which service method was invoked?

Copy link
Author

@BranBerAVI BranBerAVI Jun 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not necessary since the file name is being passed in through the service method; that field is removed in the latest commit.

@@ -101,9 +122,849 @@ def _get_srcml(contents, language):
)
return process.stdout
except subprocess.CalledProcessError as error:
logger.exception(error)
print(error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logger.exception, not print


function_dict = {}
for child in list(root_element):
if re.search(rf"{{{SRC_NS}}}class|struct|namespace|unit", child.tag):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a unit? The SrcML documentation on elements does not include an element of type <unit>.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the parent element for all the other elements that SrcMl returns. SrcML Unit

name_txt = name.text
else:
for n_txt in name.itertext():
name_txt += n_txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use str.join() instead of += when concatenating strings.

Context

Strings, in most languages (including Python), are immutable; so, adding two strings using 'foo' + 'bar' will result in the creation of three strings—'foo', 'bar', and 'foobar'—but 'foo' and 'bar' are discarded after 'foobar' is created. Each time += is used, a new string object is created with all characters from the original string copied over which can get incredibly inefficient as the string that you appending to gets larger.

However, Python's implementation of += for strings is reportedly efficient but only in the Cython implementation of the Python specification. If we were to switch over from Cython to PyPy, we would have to deal with the inefficient (albeit, according to the specification) implementation of += for strings. PEP8 recommends:

Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such).

For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b. This optimization is fragile even in CPython (it only works for some types) and isn't present at all in implementations that don't use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.

Comment on lines 129 to 137
def _get_name_from_nested_name(name):
if name is not None and name.text is not None:
return name
else:
next_elname = name.find(rf"{{{SRC_NS}}}name")
if next_elname is None:
return next_elname

return _get_name_from_nested_name(next_elname)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is recursion necessary? Recursion always contributes to an increased space with the use of a stack to persist the recursive calls. The increased space is inconsequential in this context but you must be aware of this overhead.

Would the following implementation of the function work?

def _get_name_from_nested_name(name):
    curr, prev = name, None
    while curr is not None:
        prev, curr = curr, curr.find(rf'{{{SRC_NS}}}name')
    return prev

if name is not None and name.text is not None:
return name
else:
next_elname = name.find(rf"{{{SRC_NS}}}name")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If name is None, the control would flow to the else and name.find will raise a AttributeError.

This commit implements the changes suggested by the reviewer, Nuthan Munaiah, in the pull request for this branch to merge into the development branch. This mainly consists of minor improvements in code efficiency, sources of constant values, and a stronger enforcecment of pylint rules. The exact improvements can be seen in the conversations had on the parser branch pull request.
The original additions to the function schema and function model made it so that other metrics using the parser function schema were not compatible with the newly added fields. Without an extra schema, we would have to modify the function schemas of other metrics that rely on the parser service. Other changes include removing the functionality that parses macro calls in C due to the limitations of srcml, and doing away with the FunctionCollector class and moving the functions from the FunctionCollector class into the SrcMlParser class. It didn't make sense keeping two separate classes that serve a similar purpose when all it takes is one class to manage a single state.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants