Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get scheme by id to use constant time lookup #162

Open
mpkocher opened this issue Feb 8, 2021 · 1 comment
Open

Get scheme by id to use constant time lookup #162

mpkocher opened this issue Feb 8, 2021 · 1 comment

Comments

@mpkocher
Copy link

mpkocher commented Feb 8, 2021

There's a core method call of get_schema_by_id which is doing an O(N) call.

class ESSE(object):
    """
    Exabyte Source of Schemas and Examples class.
    """

    def __init__(self):
        self.schemas = SCHEMAS
        self.examples = EXAMPLES

    def get_schema_by_id(self, schemaId):
        return next((s for s in SCHEMAS if s.get("schemaId") == schemaId), None)

While parsing in libs like Exabtye's express are probably limited by file parsing IO and N is small here (~200), get_schema_by_id is called from serialize_and_validate on every property. The call can be converted to a O(1) lookup with a minor change.

class ESSE(object):
    def __init__(self):
        self.schemas = SCHEMAS
        self._schemas = {s['schemaId']: s for s in self.schemas if s.get('schemaId') is not None}
        self.examples = EXAMPLES

    def get_schema_by_id(self, schemaId):
        return self._schemas.get(schemaId)
@timurbazhirov
Copy link
Member

Just a quick note - thanks for this helpful suggestion, Michael! We'll review and plan to schedule this for inclusion in the next release (later in Q1 or early Q2).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants