Skip to content

Commit

Permalink
Merge branch 'feature/twidi/contrib-multi-indexes' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
twidi committed Jan 29, 2018
2 parents 08dade3 + f70fd28 commit 6fb81bd
Show file tree
Hide file tree
Showing 8 changed files with 1,245 additions and 134 deletions.
53 changes: 53 additions & 0 deletions doc/collections.rst
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,59 @@ And, of course, you can use fields with different indexes in the same query:
>>> Person.collection(birth_year__gte=1960, lastname='Doe', nickname__startswith='S').instances()
[<[4] Susan "Sue" Doe (1960)>]
If you want to use an index with a different behavior, you can use the `configure` class method of the index. Note that you can also create a new class by yourself but we provide this ability.
It accepts one or many arguments (`prefix`, `transform` and `handle_uniqueness`) and returns a new index class to be passed to the `indexes` argument of the field.
About the `prefix` argument:
If you use two indexes accepting the same suffix, for example `eq`, you can specify which one to use on the collection by assigning a prefix to the index:
.. code:: python
>>> class MyModel(model.RedisModel):
... myfield = fields.StringField(indexable=True, indexes=[
... EqualIndex,
... MyOtherIndex.configure(prefix='foo')
... ])
Then to query:
.. code:: python
>>> MyModel.collection(myfield='bar') # will use EqualIndex
>>> MyModel.collection(myfield__foo='bar') # will use MyOtherIndex
About the `transform` argument:
If you want to index on a value different than the one stored on the field, you can transform it by assigning a transform function to the index.
This function accepts a value as argument ("normalized", ie converted to string or to float for `NumberRangeIndex`) and should return the value
to store.
.. code:: python
>>> def reverse_value(value):
... return value[::-1]
>>> class MyModel(model.RedisModel):
... myfield = fields.StringField(indexable=True, indexes=[EqualIndex.configure(transform=reverse_value)])
Then you query with the expected transformed value:
.. code:: python
>>> MyModel.collection(myfield__foo='rab')
Note that the argument of this function must be named `value`. If you need this function to behave like a method of the index class, you can make it accepts
two arguments, `self` and `value`.
About the `handle_uniqueness` argument:
It will simply override the default value set on the index class. Useful if your `transform` function make the value not suitable to check uniqueness, so you can pass it to `False`.
Note that if your field is marked as `unique`, you'll need to have at least one index capable of handling uniqueness.
Laziness
========
Expand Down
171 changes: 170 additions & 1 deletion doc/contrib.rst
Original file line number Diff line number Diff line change
Expand Up @@ -545,7 +545,7 @@ All of these new capabilities are described below:


Retrieving values
=================
-----------------

If you don't want only primary keys, but instances are too much, or too slow, you can ask the collection to return values with two methods: `values` and `values_list` (inspired by django)

Expand Down Expand Up @@ -753,5 +753,174 @@ An example to show all of this, based on the previous example (see `Sort by scor
>>> my_database.connection.persist(store_key)
Multi-indexes
=============
If you found yourself adding the same indexes many times to different fields, the `MultiIndexes` class provided in `limpyd.contrib.indexes` can be useful.
Its aim is to let the field only have one index, but in the background, many indexes are managed. The `DateTimeIndex` presented later in this document is a good example of this and the different accepted arguments.
Usage
-----
This works by composition: you compose one index with many ones. So simply call the `compose` class method of the `MultiIndexes` classes:
.. code:: python
>>> EqualAndRangeIndex = MultiIndexes.compose([EqualIndex, TextRangeIndex])
You can pass some arguments to change the behavior:
name
""""
The call to `MultiIndexes.compose` will create a new class. The name will be the name of the new class, instead of `MultiIndexes`.
key
***
If you have many indexes based on the same index class (for example `TextRangeIndex`), if they are not prefixed, they will share the same index key. This collision is in general not wanted.
So pass the `key` argument to compose with any string you want.
transform
"""""""""
Each index can accept a transform argument, a callable, but also the multi-indexes. The one passed to `compose` will be applied before the ones on the indexes it contains.
DateTimeIndex
-------------
The `limpyd.contrib.indexes` module provides a `DateTimeIndex` (and other friends). In this section we'll explain how it is constructed using only the `configure` method of the normal indexes, and
the `compose` method of `MultiIndexes`
Goal
""""
We'll store date+times in the format `YYYY-MM-SS HH:MM:SS`.
We want to be able to:
- filter on an exact date+time
- filter on ranges on the date+time
- filter on dates
- filter on times
- filter on dates parts (year, month, day)
- filter on times parts (hour, minute, second)
Date and time parts
"""""""""""""""""""
Let's separate the date, and the time into `YYYY-MM-SS` and `HH:MM:SS`.
How to filter only on the year of a date: we want to extract the 4 first characters, and filter it as number, using `NumberRangeIndex`:
Also, we don't want uniqueness on this index, and we want to prefix the part to be able to filter with `myfield__year=`
So this part could be:
.. code:: python
>>> NumberRangeIndex.configure(prefix='year', transform=lambda value: value[:4], handle_uniqueness=False, name='YearIndex')
Doing the same for the month and day, and composing a multi-indexes with the three, we have:
.. code:: python
>>> DateIndexParts = MultiIndexes.compose([
... NumberRangeIndex.configure(prefix='year', transform=lambda value: value[:4], handle_uniqueness=False, name='YearIndex'),
... NumberRangeIndex.configure(prefix='month', transform=lambda value: value[5:7], handle_uniqueness=False, name='MonthIndex'),
... NumberRangeIndex.configure(prefix='day', transform=lambda value: value[8:10], handle_uniqueness=False, name='DayIndex'),
... ], name='DateIndexParts')
If we do the same for the time only (assuming a time field without date), we have:
.. code:: python
>>> TimeIndexParts = MultiIndexes.compose([
... NumberRangeIndex.configure(prefix='hour', transform=lambda value: value[0:2], handle_uniqueness=False, name='HourIndex'),
... NumberRangeIndex.configure(prefix='minute', transform=lambda value: value[3:5], handle_uniqueness=False, name='MinuteIndex'),
... NumberRangeIndex.configure(prefix='second', transform=lambda value: value[6:8], handle_uniqueness=False, name='SecondIndex'),
... ], name='TimeIndexParts')
Range indexes
"""""""""""""
If we want to filter not only on parts but also on the full date with a `TextRangeIndex`, to be able to do `date_field__gt=2015`, we'll need another index.
We don't want to use a prefix, but if we have another `TextRangeIndex` on the field, we need a key:
.. code:: python
>>> DateRangeIndex = TextRangeIndex.configure(key='date', transform=lambda value: value[:10], name='DateRangeIndex')
The same for the time:
.. code:: python
>>> TimeRangeIndex = TextRangeIndex.configure(key='time', transform=lambda value: value[:8], name='TimeRangeIndex')
We don't keep theses two indexes apart from the `DateIndexParts` and `TimeIndexParts` because we'll need them independently later to prefix them when used together.
Full indexes
""""""""""""
But if we wan't full indexes for dates and times, including the range and the parts, we can easily compose them:
.. code:: python
>>> DateIndex = MultiIndexes.compose([DateRangeIndex, DateIndexParts], name='DateIndex')
>>> TimeIndex = MultiIndexes.compose([TimeRangeIndex, TimeIndexParts], name='TimeIndex')
Now that we have all that is needed for fields that manage date OR time, we'll combine them. three things to take in consideration:
- we'll have two `TextRangeIndex`, one for date one for time. So we need to explicitly prefix the filter, to be able to do `datetime_field__date__gt=2015` and `datetime_field__time__gt=15:`.
- we'll have to extract the date and time separately
- we'll need a `TextRangeIndex` to filter on the whole datetime to be able do to `datetime_field__gt='2015-12-21 15:'`
In the first time, we'll want an index without the time parts, to allow filtering and the three "ranges" (full, date, and time), but only on date parts, not time parts. It can be useful if you know you won't have to search on this.
So, to summarize, we need:
- a `TextRangeIndex` for the full datetime
- the `DateRangeIndex`, prefixed
- the `DateIndexParts`
- the `TimeRangeIndex`, prefixed
Which gives us:
.. code:: python
>>> DateSimpleTimeIndex = MultiIndexes.compose([
... TextRangeIndex.configure(key='full', name='FullDateTimeRangeIndex'),
... DateRangeIndex.configure(prefix='date'),
... DateIndexParts,
... TimeRangeIndex.configure(prefix='time', transform=lambda value: value[11:]) # pass only time
... ], name='DateSimpleTimeIndex', transform=lambda value: value[:19]) # restrict on date+time
And to have the same with the time parts, simply compose a new index with the last one and the `TimeIndexPart`:
.. code:: python
>>> DateTimeIndex = MultiIndexes.compose([
... DateSimpleTimeIndex,
... TimeIndexParts.configure(transform=lambda value: value[11:]), # pass only time
... ], name='DateTimeIndex')
For simplest cases let's make a ``SimpleDateTimeIndex`` that doesn't contains parts:
.. code:: python
>>> SimpleDateTimeIndex = MultiIndexes.compose([
... TextRangeIndex.configure(key='full', name='FullDateTimeRangeIndex'),
... DateRangeIndex.configure(prefix='date'),
... TimeRangeIndex.configure(prefix='time', transform=lambda value: value[11:]) # pass only time
... ], name='SimpleDateTimeIndex', transform=lambda value: value[:19]) # restrict on date+time
And we're done!
.. _Redis: http://redis.io
.. _redis-py: https://github.com/andymccurdy/redis-py
24 changes: 12 additions & 12 deletions limpyd/collection.py
Original file line number Diff line number Diff line change
Expand Up @@ -245,18 +245,18 @@ def _prepare_sets(self, sets):
final_sets.add(set_)
elif isinstance(set_, ParsedFilter):

index_key, key_type, is_tmp = set_.index.get_filtered_key(
set_.suffix,
accepted_key_types=self._accepted_key_types,
*(set_.extra_field_parts + [set_.value])
)
if key_type not in self._accepted_key_types:
raise ValueError('The index key returned by the index %s is not valid' % (
set_.index.__class__.__name__
))
final_sets.add(index_key)
if is_tmp:
tmp_keys.add(index_key)
for index_key, key_type, is_tmp in set_.index.get_filtered_keys(
set_.suffix,
accepted_key_types=self._accepted_key_types,
*(set_.extra_field_parts + [set_.value])
):
if key_type not in self._accepted_key_types:
raise ValueError('The index key returned by the index %s is not valid' % (
set_.index.__class__.__name__
))
final_sets.add(index_key)
if is_tmp:
tmp_keys.add(index_key)
else:
raise ValueError('Invalid filter type')

Expand Down
45 changes: 26 additions & 19 deletions limpyd/contrib/collection.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ def _prepare_sets(self, sets):

all_sets = set()
tmp_keys = set()
only_one_set = len(sets) == 1
lists = []

def add_key(key, key_type=None, is_tmp=False):
if not key_type:
Expand All @@ -115,14 +115,9 @@ def add_key(key, key_type=None, is_tmp=False):
all_sets.add(key)
self._has_sortedsets = True
elif key_type == 'list':
if only_one_set:
# we only have this list, use it directly
all_sets.add(key)
else:
# many sets, convert the list to a simple redis set
tmp_key = self._unique_key()
self._list_to_set(key, tmp_key)
add_key(tmp_key, 'set', True)
# if only one list, and no sets, at the end we'll directly use the list
# else lists will be converted to sets
lists.append(key)
elif key_type == 'none':
# considered as an empty set
all_sets.add(key)
Expand All @@ -148,16 +143,16 @@ def add_key(key, key_type=None, is_tmp=False):
elif isinstance(value, RedisField):
raise ValueError(u'Invalid filter value for %s: %s' % (set_.index.field.name, value))

index_key, key_type, is_tmp = set_.index.get_filtered_key(
set_.suffix,
accepted_key_types=self._accepted_key_types,
*(set_.extra_field_parts + [value])
)
if key_type not in self._accepted_key_types:
raise ValueError('The index key returned by the index %s is not valid' % (
set_.index.__class__.__name__
))
add_key(index_key, key_type, is_tmp)
for index_key, key_type, is_tmp in set_.index.get_filtered_keys(
set_.suffix,
accepted_key_types=self._accepted_key_types,
*(set_.extra_field_parts + [value])
):
if key_type not in self._accepted_key_types:
raise ValueError('The index key returned by the index %s is not valid' % (
set_.index.__class__.__name__
))
add_key(index_key, key_type, is_tmp)

elif isinstance(set_, SetField):
# Use the set key. If we need to intersect, we'll use
Expand All @@ -177,6 +172,18 @@ def add_key(key, key_type=None, is_tmp=False):
else:
raise ValueError('Invalid filter type')

if lists:
if not len(all_sets) and len(lists) == 1:
# only one list, nothing else, we can return the list key
all_sets = {lists[0]}
else:
# we have many sets/lists, we need to convert them to sets
for list_key in lists:
# many sets, convert the list to a simple redis set
tmp_key = self._unique_key()
self._list_to_set(list_key, tmp_key)
add_key(tmp_key, 'set', True)

return all_sets, tmp_keys

def filter(self, **filters):
Expand Down
Loading

0 comments on commit 6fb81bd

Please sign in to comment.