Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AQL: Filter Search #14

Closed
cw00dw0rd opened this issue Feb 2, 2021 · 5 comments
Closed

AQL: Filter Search #14

cw00dw0rd opened this issue Feb 2, 2021 · 5 comments
Labels
AQL AQL queries that are needed

Comments

@cw00dw0rd
Copy link
Collaborator

The AQL query that support #13 to allow for keywords to restrict the results shown. This may be a search over the description of the rental or whatever makes the most sense for the dataset.

@cw00dw0rd cw00dw0rd added the AQL AQL queries that are needed label Feb 2, 2021
@cw00dw0rd cw00dw0rd added this to the Initial Release milestone Feb 10, 2021
@Simran-B
Copy link
Collaborator

Group and count amenities (this is pretty fast, ~500ms):

RETURN MERGE(
  FOR doc IN arangobnb
    FOR a IN doc.amenities
      COLLECT amenity = a WITH COUNT INTO count
      RETURN { [amenity]: count }
)

It's apparently free-text, not sure how suitable it is for auto-complete. We could still display a list of everything that has a count of over 20 or so and provide auto-complete for that. To implement this server side, we would need another View and collection (update periodically with a Foxx job?).

@cw00dw0rd
Copy link
Collaborator Author

cw00dw0rd commented Mar 10, 2021

Why the merge? This seems to provide the expected results and is ~150ms:

FOR doc in arangobnb
FOR amenity in doc.amenities
 COLLECT item = amenity with COUNT into c
 SORT c DESC
 LIMIT 20
 RETURN item

This could have a LIMIT and then we could have a separate query for text search. WDYT?

@Simran-B
Copy link
Collaborator

MERGE() is merely used to create a mapping, amenity to count, instead of returning an object per amenity with two keys. It's just for displaying the result in a more compact way here.

150ms is still not particularly fast for production purposes if we consider that this is Berlin only and that query caching would potentially not help in a real world application where the data changes quite often. It's fine for demonstration purposes, but this seems like an important point if we want to show scalability.

@cw00dw0rd
Copy link
Collaborator Author

We can get this down to around 1ms if we combine it with the mapResults query but that means we would need to adjust the markers being displayed to keep them consistent with the filters, this shouldn't be a problem but will likely result in fewer markers on the map.

I would need to test the performance of loading the increased number of markers with each map drag, instead of only adding new ones.

Increasing the LIMIT on the listings returned does not add much to the query time, it would just be a matter of response time.

LET listings = (
FOR listing IN arangobnb
    SEARCH ANALYZER(GEO_CONTAINS(GEO_POLYGON(@poly), listing.location), "geo")
    LIMIT 20
    RETURN listing
)

Let filters = (
FOR doc in listings
    FOR amenity in doc.amenities
        COLLECT item = amenity with COUNT into c
        SORT c DESC
        LIMIT 20
        RETURN item
)

RETURN {listings,filters}

@cw00dw0rd
Copy link
Collaborator Author

Returning the filters with the results, increasing the number of listings returned, and displaying more markers on-screen has a negligible performance impact.
Here is what I have done so far:

  • Added a sort to the listings based on ratings, this keeps results relevant but to keep it performant required adding primarySort. I decided on the number of reviews and the review rating, open to suggestions.
  • Updated the results query to also, optionally, consider the price range, amenities, and room type (similar to here)
  • Added some basic Vue elements to interact with this (see PR)

This results in a fast query that is able to return the filters found in the results as well as filter the results based on user-selected criteria. This also takes advantage of more ArangoSearch optimization with primarySort.

However, the new issue that has arisen is the need to refactor the map markers to handle the higher volume and update when filters are selected.
Here is the query with some preset values.

LET listings = (
FOR listing IN arangobnb
    SEARCH ANALYZER(GEO_CONTAINS(GEO_POLYGON(@poly), listing.location), "geo")
    AND 
    ANALYZER(["Private room", "Entire home/apt"] ANY IN listing.room_type, "identity")
    AND
    ANALYZER(["Wifi", "Heating"] ALL IN listing.amenities, "identity") 
    AND 
    IN_RANGE(listing.price, 30, 50, true, true)
    
    SORT listing.number_of_reviews DESC, listing.review_scores_rating DESC
    LIMIT 100
    RETURN listing
)

Let filters = (
FOR doc in listings
    FOR amenity in doc.amenities
        COLLECT item = amenity with COUNT into c
        SORT c DESC
        LIMIT 100
        RETURN item
)

RETURN {listings, filters}

cw00dw0rd added a commit that referenced this issue Mar 19, 2021
Adds Filter endpoint and frontend controls.
Fixes #14 and #13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AQL AQL queries that are needed
Projects
None yet
Development

No branches or pull requests

2 participants