-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser #63
base: parser
Are you sure you want to change the base?
Parser #63
Conversation
Isolated parser service. Made other function schemas in other services compatible with parser function schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall - Round 1
- Changes to
metrics/functionchurn/functionchurn/schemas/parser.py
andmetrics/loc/loc/schemas/parser.py
must not be a part of this pull request. - Run your changes through
pylint
using the attached.pylintrc
configuration file. - Use
find
orfindall
from the ElementTree API to walk the XML tree generated by SrcML. - Where did you get the list of library functions, streams, and reserved words of C in
parser/constants.py
from? Add a documentation comment to the beginning of the file with some context on the source of these lists. - What about the list of library functions, streams, and reserved words for C++?
- Enhance
parser.models.Function
model to include attributes being introduced. Model defines the data structure and schema makes it easier to serialize and deserialize the structured data. - Use
_get_span(element)
to get the position of an element. - [nit] Follow conventions from How to Write a Git Commit Message.
- [nit] Use
'
(single quote) instead of"
(double quote) when defining strings. - [nit] Add a whitespace after the comma when defining list elements. For example, do
a = ['one', 'two', 'three']
and nota = ['one','two','three']
.
@rpc | ||
def get_language(self, name): | ||
return self.inferer.infer(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of the get_language
service method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to get a better understanding of nameko I made my own function that gets the language to test the service out. However, I forgot to remove it.
parser/parser/service.py
Outdated
@@ -2,6 +2,7 @@ | |||
|
|||
from nameko.dependency_providers import Config | |||
from nameko.rpc import rpc | |||
from nameko.testing.services import worker_factory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this import left here by mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I forgot to remove it when trying to write tests for the parser service
parser/parser/constants.py
Outdated
"long", "signed", "switch", "void", "else", "for", "case", | ||
"default", "register", "sizeof", "typedef", "volatile", | ||
"enum", "goto", "char", "do", "return", "static", | ||
"union", "while", "extern", "if"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always end text files (including source files) with a newline character. If you are using VS Code, set files.insertFinalNewline
to true
in the settings.json
file.
See https://thoughtbot.com/blog/no-newline-at-end-of-file for more information and historical context for the convention.
parser/parser/parsers/__init__.py
Outdated
@@ -1,7 +1,7 @@ | |||
from .srcmlparser import SrcMLParser | |||
from ..exceptions import NoParser | |||
|
|||
PARSERS = {'C': SrcMLParser, 'C++': SrcMLParser} | |||
PARSERS = {'C': SrcMLParser, 'C++': SrcMLParser, 'Java': SrcMLParser} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SrcMLParser
, as it stands today, has not been explicitly tested to parse Java source yet. Issue #23 has been specifically created to extend the capabilities of the parser
beyond C and C++. As a result, this change must be undone.
parser/parser/schemas.py
Outdated
|
||
parent_structure_name = fields.String(allow_none=True) | ||
parent_structure_type = fields.String(allow_none=True) | ||
file_name = fields.String(allow_none=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this field necessary since the name of the file is the same that that with which service method was invoked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not necessary since the file name is being passed in through the service method; that field is removed in the latest commit.
parser/parser/parsers/srcmlparser.py
Outdated
@@ -101,9 +122,849 @@ def _get_srcml(contents, language): | |||
) | |||
return process.stdout | |||
except subprocess.CalledProcessError as error: | |||
logger.exception(error) | |||
print(error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use logger.exception
, not print
parser/parser/parsers/srcmlparser.py
Outdated
|
||
function_dict = {} | ||
for child in list(root_element): | ||
if re.search(rf"{{{SRC_NS}}}class|struct|namespace|unit", child.tag): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a unit
? The SrcML documentation on elements does not include an element of type <unit>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the parent element for all the other elements that SrcMl returns. SrcML Unit
parser/parser/parsers/srcmlparser.py
Outdated
name_txt = name.text | ||
else: | ||
for n_txt in name.itertext(): | ||
name_txt += n_txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use str.join()
instead of +=
when concatenating strings.
Context
Strings, in most languages (including Python), are immutable; so, adding two strings using 'foo' + 'bar'
will result in the creation of three strings—'foo'
, 'bar'
, and 'foobar'
—but 'foo'
and 'bar'
are discarded after 'foobar'
is created. Each time +=
is used, a new string object is created with all characters from the original string copied over which can get incredibly inefficient as the string that you appending to gets larger.
However, Python's implementation of +=
for strings is reportedly efficient but only in the Cython implementation of the Python specification. If we were to switch over from Cython to PyPy, we would have to deal with the inefficient (albeit, according to the specification) implementation of +=
for strings. PEP8 recommends:
Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such).
For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b. This optimization is fragile even in CPython (it only works for some types) and isn't present at all in implementations that don't use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.
parser/parser/parsers/srcmlparser.py
Outdated
def _get_name_from_nested_name(name): | ||
if name is not None and name.text is not None: | ||
return name | ||
else: | ||
next_elname = name.find(rf"{{{SRC_NS}}}name") | ||
if next_elname is None: | ||
return next_elname | ||
|
||
return _get_name_from_nested_name(next_elname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is recursion necessary? Recursion always contributes to an increased space with the use of a stack to persist the recursive calls. The increased space is inconsequential in this context but you must be aware of this overhead.
Would the following implementation of the function work?
def _get_name_from_nested_name(name):
curr, prev = name, None
while curr is not None:
prev, curr = curr, curr.find(rf'{{{SRC_NS}}}name')
return prev
parser/parser/parsers/srcmlparser.py
Outdated
if name is not None and name.text is not None: | ||
return name | ||
else: | ||
next_elname = name.find(rf"{{{SRC_NS}}}name") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If name
is None
, the control would flow to the else
and name.find
will raise a AttributeError
.
This commit implements the changes suggested by the reviewer, Nuthan Munaiah, in the pull request for this branch to merge into the development branch. This mainly consists of minor improvements in code efficiency, sources of constant values, and a stronger enforcecment of pylint rules. The exact improvements can be seen in the conversations had on the parser branch pull request.
The original additions to the function schema and function model made it so that other metrics using the parser function schema were not compatible with the newly added fields. Without an extra schema, we would have to modify the function schemas of other metrics that rely on the parser service. Other changes include removing the functionality that parses macro calls in C due to the limitations of srcml, and doing away with the FunctionCollector class and moving the functions from the FunctionCollector class into the SrcMlParser class. It didn't make sense keeping two separate classes that serve a similar purpose when all it takes is one class to manage a single state.
Added functions to the parser service that can be used to calculate the fanin, fanout, and npath metrics from contents in a file.