Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2: main page/search #249

Closed
5 tasks done
josh-chamberlain opened this issue Apr 19, 2024 · 22 comments
Closed
5 tasks done

v2: main page/search #249

josh-chamberlain opened this issue Apr 19, 2024 · 22 comments
Assignees
Labels
api fixed_in_dev This is merged into the dev environment and waiting to be merged into main front end

Comments

@josh-chamberlain
Copy link
Contributor

josh-chamberlain commented Apr 19, 2024

part of #248

see wireframe

Front end

  • Search/main page replaces data-sources.pdap.io and the gold box on pdap.io
    • search options are now a bit refined; coarse selection of record types instead of an entry field
  • use typeahead to help disambiguate place names. Treating places as objects people can predictably search, not strings we have to try to match.
    • we can source the typeahead list from our states and counties table, with locality from the agencies table.
    • ideally this is one magical search box which could help people select a state, county, or locality
    • trigger after 2 characters (https://en.wikipedia.org/wiki/Ai,_Ohio)
    • debouncing as discussed below
    • order:
      • matches that start with the string, followed by contains (if the endpoint supports that)
        • e.g. search for Al yields Alabama before Topal County
      • alphabetically
      • state, county, locality (if any share the same name)
    • if there are no typeahead results: the option should be No results found. Search for "string"? and load quick search results

API changes

@josh-chamberlain josh-chamberlain moved this to Needs Refinement in Open issues & Roadmap Apr 19, 2024
@josh-chamberlain josh-chamberlain changed the title v2 view: main page v2: main page Apr 19, 2024
@josh-chamberlain josh-chamberlain changed the title v2: main page v2: main page/search Apr 19, 2024
@mbodeantor
Copy link
Contributor

This will be a simplified version of the current quick_search_query. All the potential fields for trying to match a search term in the where clause will be replaced by record_type = '{coarse_record_type}'

@mbodeantor mbodeantor moved this from Needs Refinement to Todo in Open issues & Roadmap Apr 19, 2024
@maxachis maxachis self-assigned this Apr 25, 2024
@maxachis
Copy link
Contributor

maxachis commented Apr 25, 2024

This will be a simplified version of the current quick_search_query. All the potential fields for trying to match a search term in the where clause will be replaced by record_type = '{coarse_record_type}'

@josh-chamberlain @mbodeantor So this will be replacing the quick search and associated API endpoint?

Additionally, will users only be able to search for one coarse record type at a time, or multiple?

@mbodeantor
Copy link
Contributor

@maxachis In order to maintain backwards compatibility, I would suggest a new endpoint with similar functionality

@josh-chamberlain
Copy link
Contributor Author

@maxachis let's let them multi-select coarse record types, why not? Even if the UI doesn't support it. Agreed about keeping the quick search endpoint—we are likely still going to have that option

@maxachis
Copy link
Contributor

maxachis commented Apr 26, 2024

@josh-chamberlain @mbodeantor PR has been created, incorporating both the API changes described here and #258 (which as I indicated there, is a very trivial add and is quick to reverse if nonfunctional). Next questions:

  1. Should I also develop the frontend component as well, or have someone else do that part (it will take me time to figure out how the frontend works)
  2. There is a quick_search_query_logs table. Should there be a search_query_logs table as well, or a table that consolidates both Quick Search and Search?
  3. How to test the api call is working as expected? I know it was discussed that there were ways to point the api towards a dev environment, but I don't know how to do that.

@josh-chamberlain
Copy link
Contributor Author

@maxachis

  1. Joshua G is the front end lead, so I would like to let him do that part.
  2. I don't have a strong opinion. Adding to the same table is simple in a way, but we should keep track of which endpoint they're hitting so we know the difference between quick and other searches.
  3. I know about this in general terms, but can't give you super specific instructions about routing

@mbodeantor
Copy link
Contributor

@maxachis Any code you merge into the dev branch will get deployed in the dev instance of the API on DigitalOcean. It uses the same database, but is otherwise separate from the live prod app

@maxachis
Copy link
Contributor

@mbodeantor Once I get further along in setting up the dev database, it may be prudent to switch the dev instance to point to the dev database. That probably also encourages us to copy the data as well as the schema to the dev database, to better emulate it.

@joshuagraber
Copy link
Contributor

joshuagraber commented Jun 10, 2024

Hey @josh-chamberlain I'm pretty close to starting on the updated UI here. A couple questions:

  • Are the backend changes being handled by someone else, or should I plan to work on those here, too?

we should structure the URL like /search/{record_type}/{state}/{county}/{muni}

  • Would it make more sense to pass record_type, state, county, and municipality, as well as searchId, as data in the request body, rather than trying to add all of them to the URL string? Particularly since all of those fields could be null when we pass a saved search searchId, it seems less complex to use the request body here.

For instance, if we hit /search/ohio or /search/9b7ddde7-5871-4f45-a8a1-3dce52847736, the API would need to check the value against some static list of states to know whether state or searchId were being passed here, no?

@josh-chamberlain
Copy link
Contributor Author

@joshuagraber nice!

  • yes, @maxachis is working on the back end, potentially with help from others. If you would like to be involved, that can work, as the front end is the primary consumer of the API. It's mostly happening on this fork: https://github.com/Police-Data-Accessibility-Project/data-sources-app-v2

  • Good point about the URL structure.

    • For the record, the "saved searches" concept is not part of v2, so we don't need to account for complex searches with an ID attached. If we did, maybe we could use search/id/123?
    • Ideally, we'd still surface the location in the URL, especially when someone's just viewing the page of a given geography. The URLs should ideally be simple and shareable, like search/oh/hamilton/cincinnati.
    • Passing it via the request body makes sense to me conceptually, and it's OK if we end up going that way, or even mixing strategies: search/oh/hamilton/cincinnati?=arrests
    • If you have a muni, the state and county are just inferred—so it's really just record type and location. It won't be possible to create other kinds of searches yet.

@joshuagraber
Copy link
Contributor

Ah, gotcha @josh-chamberlain, that all makes sense to me.

If you have a muni, the state and county are just inferred—so it's really just record type and location. It won't be possible to create other kinds of searches yet.

I think we will have edge cases here. I just did a basic filtering of all data source records and it shows that we have duplicate municipality names in different states. But I suppose as long as we're okay with that, it's nbd.

As far as flexibility of API and sharability of links, the easiest path forward here, to me, is to use query params.

So that way/search?state=oh&county=franklin&muni=cincinnati&term=arrests, /search?muni=cincinnati&term=arrests, /search?term=arrests could all be valid requests, and we don't have to worry about ugliness like /search/undefined/undefined/cincinatti/arrests or w/e

@josh-chamberlain
Copy link
Contributor Author

@joshuagraber yes, the duplicate muni names are why you would need the /state/county as a disambiguation.

To be clear: The goal of the typeahead is to treat places as objects and stop trying to match on strings. I would like to not allow people to create /undefined/undefined/cincinnati—we should only let people find a municipality object, which would require having the state and county. One problem we found with quick search was that people would search for a term like "Dallas" and get results from every "Dallas" in the country, rather than the typeahead helping them choose between Dallases.

That said, you quickly get into trouble if you let someone search for multiple record types (or, more accurate to the wireframe, record type categories). So far, I think the cleanest way is still /state/county/muni?category={record_type_category}.

@maxachis
Copy link
Contributor

@joshuagraber @josh-chamberlain From the backend perspective, creating a typeahead endpoint will be fun! But it will be fun in the same way running the Pittsburgh Half Marathon was fun.

I've never developed a typeahead endpoint, which automatically adds to the fun.

Additional points of fun:

  1. How do we handle possible misspellings? Cincinnati is a great example, because I would probably get it wrong on my first attempt. We may want to ignore that for now, as it adds a lot of additional dimensions.
  2. When should typeahead kick in? My guess is at 3 characters in, at least, because otherwise suggestions will be quite broad, and I suspect the number of people searching for Ai, Ohio will be relatively slim.
  3. We may need to implement rate limiting, because otherwise we might send a lot of requests.
  4. Also, we probably want to implement debouncing, which will make sure we wait for some delay before we start sending requests.
  5. I'll need to know the format in which typeahead suggestions should be returned, as well as the format which the final search result will be in.

@joshuagraber
Copy link
Contributor

The goal of the typeahead is to treat places as objects and stop trying to match on strings.

@josh-chamberlain gotcha, that makes sense. Sorry, I think the design is a bit misleading to me in this respect, because it appears that the user can select state / county / municipality, then search.

That said, you quickly get into trouble if you let someone search for multiple record types

Hmm, I think I also misread the design here. So rather than checkboxes, these will need to be radio buttons with only one selectable, no? I'll need to make another update to design-system. Will create an issue for that shortly.

How do we handle possible misspellings?

@maxachis would this library be potentially helpful? I know this is more or less Google's strategy as well, although their algorithm is almost certainly proprietary.

Also, we probably want to implement debouncing, which will make sure we wait for some delay before we start sending requests.

Yes, debouncing is a best practice in client engineering for this type of work. I won't bother implementing it though, because we use lodash utils, and their debounce function is well tested and consistently maintained.

When should typeahead kick in?

I think we should just wait for the debounced function to fire a request, rather than set a hardcoded number of characters.

We may need to implement rate limiting, because otherwise we might send a lot of requests.

Unless this endpoint will be available publicly, we shouldn't strictly need rate limiting if we're debouncing properly on the client. But if you feel like doing it, I suppose it's a good performance guardrail anyways.

@josh-chamberlain
Copy link
Contributor Author

@joshuagraber

the design is a bit misleading

Yes—the design has not yet been updated to reflect that we want to do a typeahead. Sorry about that. Vikram has responded and will be back online starting ~Monday to make edits

multiple record types

Sorry again for the confusion! I think the design is correct, and I was just thinking out loud—we should allow multi-select. Current favorite answer is a url path for /search/state/county/muni/?type={args} so that we have a nice solid URL for geography (most common search facet) with room for flexible args. Hopefully that works...I updated the issue.

misspellings

IMO, typeahead helps here—if someone gets a "no locations found" message, that's a clue that they mistyped. We can get better at this over time.

I think Joshua covered the rest. Thanks for helping think it through, y'all

@joshuagraber
Copy link
Contributor

@josh-chamberlain Ah, gotcha, that makes more sense now, thank you for explaining.

Current favorite answer is a url path for /search/state/county/muni/?type={args} so that we have a nice solid URL for geography (most common search facet) with room for flexible args

I like that. We could just pass an array of record type values: /search/state/county/muni/?types={serialized record_type[]}

IMO, typeahead helps here—if someone gets a "no locations found" message, that's a clue that they mistyped. We can get better at this over time.

True enough, and if we collect a db of search queries by session, it could be an interesting problem to throw at ML.

@maxachis
Copy link
Contributor

maxachis commented Jul 2, 2024

Putting together potentially useful articles on the matter:

@maxachis
Copy link
Contributor

maxachis commented Jul 2, 2024

@josh-chamberlain Regarding pulling locality from the agencies table, what column would that be? municipality? name?

@josh-chamberlain
Copy link
Contributor Author

@maxachis yep, municipality is what we currently call that. switching to locality is generally more inclusive; for now, we can have a front end and back end terminology adjustment, which is not the end of the world, but creates tech/information debt we should still fix: #332

@maxachis
Copy link
Contributor

maxachis commented Jul 6, 2024

@joshuagraber @josh-chamberlain Just to confirm -- debouncing is done on the frontend through the Iodash library that Joshua mentioned, correct? So no need to do that on the backend?

@joshuagraber
Copy link
Contributor

@maxachis @josh-chamberlain Yes, we'll debounce client-side, but IMO it might not hurt to add some rate limiting on the API side as well (for all endpoints, really) if it's not too big of a lift, just as a protective measure against a surprise hosting bill.

@josh-chamberlain
Copy link
Contributor Author

rate limiting: #353

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api fixed_in_dev This is merged into the dev environment and waiting to be merged into main front end
Projects
Status: Done
Development

No branches or pull requests

4 participants