Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Lark with its cache feature, instead of creating a standalone parser #53

Merged
merged 6 commits into from
Jan 13, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 9 additions & 43 deletions hcl2/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,46 +8,12 @@

from hcl2.transformer import DictTransformer

PARSER_FILE = os.path.join(dirname(__file__), 'lark_parser.py')

PARSER_FILE_TEMPLATE = """
from lark import Lark

DATA = (%s)
MEMO = (%s)

def Lark_StandAlone(**kwargs):
return Lark._load_from_dict(DATA, MEMO, **kwargs)
"""


def create_parser_file():
"""
Parsing the Lark grammar takes about 0.5 seconds. In order to improve performance we can cache the parser
file. The below code caches the entire python file which is generated by Lark's standalone parser feature
See: https://github.com/lark-parser/lark/blob/master/lark/tools/standalone.py

Lark also supports serializing the parser config but the deserialize function did not work for me.
The lark state contains dicts with numbers as keys which is not supported by json so the serialized
state can't be written to a json file. Exporting to other file types would have required
adding additional dependencies or writing a lot more code. Lark's standalone parser
feature works great but it expects to be run as a separate shell command
The below code copies some of the standalone parser generator code in a way that we can use
"""
lark_file = os.path.join(dirname(__file__), 'hcl2.lark')
with open(lark_file, 'r') as lark_file, open(PARSER_FILE, 'w') as parser_file:
lark_inst = Lark(lark_file.read(), parser="lalr", lexer="standard")

data, memo = lark_inst.memo_serialize([TerminalDef, Rule])

print(PARSER_FILE_TEMPLATE % (data, memo), file=parser_file)


if not exists(PARSER_FILE):
create_parser_file()

# pylint: disable=wrong-import-position
# Lark_StandAlone needs to be imported after the above block of code because lark_parser.py might not exist
from hcl2.lark_parser import Lark_StandAlone

hcl2 = Lark_StandAlone(transformer=DictTransformer())
PARSER_FILE = os.path.join(dirname(__file__), '.lark_cache.bin')

hcl2 = Lark.open(
'hcl2.lark',
parser='lalr',
cache=PARSER_FILE, # Disable/Delete file to effect changes to the grammar
rel_to=__file__,
transformer=DictTransformer()
)
2 changes: 1 addition & 1 deletion requirements.pip
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# Place dependencies in this file, following the distutils format:
# http://docs.python.org/2/distutils/setupscript.html#relationships-between-distributions-and-packages
lark-parser>=0.10.0,<0.11.0
lark-parser>=0.11.0,<0.12.0
4 changes: 1 addition & 3 deletions test/unit/test_load.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
""" Test parsing a variety of hcl files"""

import json
import os
from os.path import dirname
from unittest import TestCase

import hcl2
from hcl2.parser import PARSER_FILE, create_parser_file

HCL2_DIR = 'terraform-config'
JSON_DIR = 'terraform-config-json'
Expand All @@ -21,8 +21,6 @@ def setUp(self):
def test_load_terraform(self):
"""Test parsing a set of hcl2 files and force recreating the parser file"""
# delete the parser file to force it to be recreated
os.remove(os.path.join(dirname(hcl2.__file__), PARSER_FILE))
htorianik marked this conversation as resolved.
Show resolved Hide resolved
create_parser_file()
self._load_test_files()

def test_load_terraform_from_cache(self):
Expand Down