Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index Support #51

Closed
wants to merge 5 commits into from
Closed

Index Support #51

wants to merge 5 commits into from

Conversation

kiril
Copy link
Contributor

@kiril kiril commented Jul 7, 2015

Hi @tleach !

I couldn't help but want to fix #1 , because I am trying to use Mongothon for a side project during vacation, and I want to be able to create indexes in a good way like this.

I'm opening this pull request before it's done, because I want to start the dialogue here.

  • this has explicit indexes support in mongothon.schema.Schema
  • I am considering adding "index":True support to the Schema too
  • I have a basic set of tests for checking and applying indexes
  • I have very basic index validation

TODO:

  • need to create an IndexSpec class and use those rather than dicts DONE
  • need to do better validation that only the normal index types are currently supported DONE
  • document this functionality

Open to feedback!

def __init__(self, doc_spec, **kwargs):
super(Schema, self).__init__(doc_spec, **kwargs)
super(Schema, self).__init__(doc_spec, **{k:v for k,v in kwargs.iteritems() if k != 'indexes'})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aware this is hackish... is there a better way to do it? I want to pass an extra kwarg here that the underlying class doesn't support.

@kiril
Copy link
Contributor Author

kiril commented Jul 7, 2015

@tleach I know kids and stuff, but I will keep working on this if you give me feedback. 😗

@tleach
Copy link
Contributor

tleach commented Jul 7, 2015

Sorry got sidetracked by selenium bullshit. Will take a look tonight if I can. 

Tom LeachPrincipal Engineer, GameChanger Media

On Tue, Jul 7, 2015 at 6:20 PM, Kiril Savino [email protected]
wrote:

@tleach I know kids and stuff, but I will keep working on this if you give me feedback. 😗

Reply to this email directly or view it on GitHub:
#51 (comment)

def __init__(self, doc_spec, **kwargs):
indexes = []

def __init__(self, doc_spec, indexes=[], **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tleach
Copy link
Contributor

tleach commented Jul 8, 2015

I was pondering this on the way home. I've realized there is a problem with adding indexes to the Schema class itself.

Though it conceptually makes sense to declare indexes with the Schema as that's where the fields are declared, the problem is that Schemas are intended to be composable. i.e. You should be able to declare a Schema once and reuse/embed it at multiple levels of the hierarchy.

The use cases are obvious (roles and hist keys in user_role for example). It would arguably make the API less clear if the top level entity on which indexes are declared could also be embedded all over the place as a sub-document schema.

Furthermore, this is a useful simplifying assumption which Mongothon inherits from Schemer - i.e. all Schema objects at any level of the hierarchy are treated the same, there are no "special variables".

So, what's the alternative?

We could declare them with the Model class. I'm not jazzed about that either as it arguably has too many responsibilities already. It's also ignorant of Schema-related concerns by design. Leaking field-related responsibilities into it is not appealing.

One possibility is some kind of special IndexedSchema wrapper class which can only be used at the top level of a hierarchy, wrapping a vanilla Schema you give it in a constructor:

my_thing_schema = IndexedSchema(Schema({
    'name':  {'type': basestring},
    ...
}))

my_thing_schema.add_index({'name': 1})

MyThing = create_model(db.my_thing, my_thing_schema)

Nice things about this:

  • You can still separately declare and reuse Schema instances and selectively wrap them as needed.
  • You don't have to use IndexedSchema, it's opt-in.
  • The indexing logic is decoupled from the main Schema class.
  • IndexedSchema would only be valid at the root, not as a child node schema declaration, so there would be no confusion around when to apply/ignore index semantics.

We could even make the constructor take a dict or a Schema for added conciseness:

my_thing_schema_1 = IndexedSchema(Schema({
    'name':  {'type': basestring},
}))

my_thing_schema_2 = IndexedSchema({  # creates a wrapped `Schema` instance internally
    'name':  {'type': basestring},
})

So, yeah. Probably want to mull this over a little more, but I think that seems like a better direction.

Thoughts?

@thieman
Copy link
Contributor

thieman commented Jul 8, 2015

Why do the indexes need to be coupled to Schema at all? Based off of the example in Tom's last comment it seems like the indexes have no real relation to the data in the Schema other than you need to pass them to the model constructor somehow. Would it not be simpler to add a separate IndexSpec class or something, then pass that instead?

@tleach
Copy link
Contributor

tleach commented Jul 8, 2015

I think there is something to be gained by schemas and index declarations being co-located.

Indexes do indirectly depend on the data model declared in a schema. In my mind, declaring indexes alongside a collection-level schema is a light-touch way to prod the user to keep the two consistent. If you declare your schema in one place and your indexes somewhere else then I can see the two diverging over time.

I may even be veering towards mandating that we actually validate index specs against their associated schemas to ensure they make sense (though could probably be persuaded that that's overkill).

So yeah - definitely don't want schemas to know about indexes, but I think it's desirable for indexes to know about schemas.

@kiril
Copy link
Contributor Author

kiril commented Jul 8, 2015

I have an idea.

Sent from my Enormous iPhone

On Jul 8, 2015, at 10:15 AM, Tom Leach [email protected] wrote:

I think there is something to be gained by schemas and index declarations
being co-located.

Indexes do indirectly depend on the data model declared in a schema. In my
mind, declaring indexes alongside a collection-level schema is a
light-touch way to prod the user to keep the two consistent. If you declare
your schema in one place and your indexes somewhere else then I can see the
two diverging over time.

I may even be veering towards mandating that we actually validate index
specs against their associated schemas to ensure they make sense (though
could probably be persuaded that that's overkill).

So yeah - definitely don't want schemas to know about indexes, but I think
it's desirable for indexes to know about schemas.


Reply to this email directly or view it on GitHub
#51 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Support for declaring indexes
4 participants