v2: main page/search #249

josh-chamberlain · 2024-04-19T17:31:03Z

part of #248

see wireframe

Front end

Search/main page replaces data-sources.pdap.io and the gold box on pdap.io
- search options are now a bit refined; coarse selection of record types instead of an entry field
use typeahead to help disambiguate place names. Treating places as objects people can predictably search, not strings we have to try to match.
- we can source the typeahead list from our states and counties table, with locality from the agencies table.
- ideally this is one magical search box which could help people select a state, county, or locality
- trigger after 2 characters (https://en.wikipedia.org/wiki/Ai,_Ohio)
- debouncing as discussed below
- order:
  - matches that start with the string, followed by contains (if the endpoint supports that)
    - e.g. search for Al yields Alabama before Topal County
  - alphabetically
  - state, county, locality (if any share the same name)
- if there are no typeahead results: the option should be No results found. Search for "string"? and load quick search results

API changes

Create typeahead endpoint which generates a list of all of our localities, counties, and states. #345
- rate limiting as discussed below
Main search endpoint #352
Use parameterized queries to reduce risk of SQL Injection #212

The text was updated successfully, but these errors were encountered:

mbodeantor · 2024-04-19T20:37:44Z

This will be a simplified version of the current quick_search_query. All the potential fields for trying to match a search term in the where clause will be replaced by record_type = '{coarse_record_type}'

maxachis · 2024-04-25T19:51:04Z

This will be a simplified version of the current quick_search_query. All the potential fields for trying to match a search term in the where clause will be replaced by record_type = '{coarse_record_type}'

@josh-chamberlain @mbodeantor So this will be replacing the quick search and associated API endpoint?

Additionally, will users only be able to search for one coarse record type at a time, or multiple?

mbodeantor · 2024-04-25T19:53:58Z

@maxachis In order to maintain backwards compatibility, I would suggest a new endpoint with similar functionality

josh-chamberlain · 2024-04-26T16:56:41Z

@maxachis let's let them multi-select coarse record types, why not? Even if the UI doesn't support it. Agreed about keeping the quick search endpoint—we are likely still going to have that option

maxachis · 2024-04-26T22:13:15Z

@josh-chamberlain @mbodeantor PR has been created, incorporating both the API changes described here and #258 (which as I indicated there, is a very trivial add and is quick to reverse if nonfunctional). Next questions:

Should I also develop the frontend component as well, or have someone else do that part (it will take me time to figure out how the frontend works)
There is a quick_search_query_logs table. Should there be a search_query_logs table as well, or a table that consolidates both Quick Search and Search?
How to test the api call is working as expected? I know it was discussed that there were ways to point the api towards a dev environment, but I don't know how to do that.

josh-chamberlain · 2024-04-29T15:41:12Z

@maxachis

Joshua G is the front end lead, so I would like to let him do that part.
I don't have a strong opinion. Adding to the same table is simple in a way, but we should keep track of which endpoint they're hitting so we know the difference between quick and other searches.
I know about this in general terms, but can't give you super specific instructions about routing

mbodeantor · 2024-04-29T17:46:32Z

@maxachis Any code you merge into the dev branch will get deployed in the dev instance of the API on DigitalOcean. It uses the same database, but is otherwise separate from the live prod app

maxachis · 2024-04-29T19:48:27Z

@mbodeantor Once I get further along in setting up the dev database, it may be prudent to switch the dev instance to point to the dev database. That probably also encourages us to copy the data as well as the schema to the dev database, to better emulate it.

joshuagraber · 2024-06-10T11:32:12Z

Hey @josh-chamberlain I'm pretty close to starting on the updated UI here. A couple questions:

Are the backend changes being handled by someone else, or should I plan to work on those here, too?

we should structure the URL like /search/{record_type}/{state}/{county}/{muni}

Would it make more sense to pass record_type, state, county, and municipality, as well as searchId, as data in the request body, rather than trying to add all of them to the URL string? Particularly since all of those fields could be null when we pass a saved search searchId, it seems less complex to use the request body here.

For instance, if we hit /search/ohio or /search/9b7ddde7-5871-4f45-a8a1-3dce52847736, the API would need to check the value against some static list of states to know whether state or searchId were being passed here, no?

josh-chamberlain · 2024-06-10T20:54:50Z

@joshuagraber nice!

yes, @maxachis is working on the back end, potentially with help from others. If you would like to be involved, that can work, as the front end is the primary consumer of the API. It's mostly happening on this fork: https://github.com/Police-Data-Accessibility-Project/data-sources-app-v2
Good point about the URL structure.
- For the record, the "saved searches" concept is not part of v2, so we don't need to account for complex searches with an ID attached. If we did, maybe we could use search/id/123?
- Ideally, we'd still surface the location in the URL, especially when someone's just viewing the page of a given geography. The URLs should ideally be simple and shareable, like search/oh/hamilton/cincinnati.
- Passing it via the request body makes sense to me conceptually, and it's OK if we end up going that way, or even mixing strategies: search/oh/hamilton/cincinnati?=arrests
- If you have a muni, the state and county are just inferred—so it's really just record type and location. It won't be possible to create other kinds of searches yet.

joshuagraber · 2024-06-11T20:32:31Z

Ah, gotcha @josh-chamberlain, that all makes sense to me.

If you have a muni, the state and county are just inferred—so it's really just record type and location. It won't be possible to create other kinds of searches yet.

I think we will have edge cases here. I just did a basic filtering of all data source records and it shows that we have duplicate municipality names in different states. But I suppose as long as we're okay with that, it's nbd.

As far as flexibility of API and sharability of links, the easiest path forward here, to me, is to use query params.

So that way/search?state=oh&county=franklin&muni=cincinnati&term=arrests, /search?muni=cincinnati&term=arrests, /search?term=arrests could all be valid requests, and we don't have to worry about ugliness like /search/undefined/undefined/cincinatti/arrests or w/e

josh-chamberlain · 2024-06-12T16:36:22Z

@joshuagraber yes, the duplicate muni names are why you would need the /state/county as a disambiguation.

To be clear: The goal of the typeahead is to treat places as objects and stop trying to match on strings. I would like to not allow people to create /undefined/undefined/cincinnati—we should only let people find a municipality object, which would require having the state and county. One problem we found with quick search was that people would search for a term like "Dallas" and get results from every "Dallas" in the country, rather than the typeahead helping them choose between Dallases.

That said, you quickly get into trouble if you let someone search for multiple record types (or, more accurate to the wireframe, record type categories). So far, I think the cleanest way is still /state/county/muni?category={record_type_category}.

maxachis · 2024-06-12T21:23:58Z

@joshuagraber @josh-chamberlain From the backend perspective, creating a typeahead endpoint will be fun! But it will be fun in the same way running the Pittsburgh Half Marathon was fun.

I've never developed a typeahead endpoint, which automatically adds to the fun.

Additional points of fun:

How do we handle possible misspellings? Cincinnati is a great example, because I would probably get it wrong on my first attempt. We may want to ignore that for now, as it adds a lot of additional dimensions.
When should typeahead kick in? My guess is at 3 characters in, at least, because otherwise suggestions will be quite broad, and I suspect the number of people searching for Ai, Ohio will be relatively slim.
We may need to implement rate limiting, because otherwise we might send a lot of requests.
Also, we probably want to implement debouncing, which will make sure we wait for some delay before we start sending requests.
I'll need to know the format in which typeahead suggestions should be returned, as well as the format which the final search result will be in.

joshuagraber · 2024-06-13T12:59:04Z

The goal of the typeahead is to treat places as objects and stop trying to match on strings.

@josh-chamberlain gotcha, that makes sense. Sorry, I think the design is a bit misleading to me in this respect, because it appears that the user can select state / county / municipality, then search.

That said, you quickly get into trouble if you let someone search for multiple record types

Hmm, I think I also misread the design here. So rather than checkboxes, these will need to be radio buttons with only one selectable, no? I'll need to make another update to design-system. Will create an issue for that shortly.

How do we handle possible misspellings?

@maxachis would this library be potentially helpful? I know this is more or less Google's strategy as well, although their algorithm is almost certainly proprietary.

Also, we probably want to implement debouncing, which will make sure we wait for some delay before we start sending requests.

Yes, debouncing is a best practice in client engineering for this type of work. I won't bother implementing it though, because we use lodash utils, and their debounce function is well tested and consistently maintained.

When should typeahead kick in?

I think we should just wait for the debounced function to fire a request, rather than set a hardcoded number of characters.

We may need to implement rate limiting, because otherwise we might send a lot of requests.

Unless this endpoint will be available publicly, we shouldn't strictly need rate limiting if we're debouncing properly on the client. But if you feel like doing it, I suppose it's a good performance guardrail anyways.

josh-chamberlain · 2024-06-13T15:28:47Z

@joshuagraber

the design is a bit misleading

Yes—the design has not yet been updated to reflect that we want to do a typeahead. Sorry about that. Vikram has responded and will be back online starting ~Monday to make edits

multiple record types

Sorry again for the confusion! I think the design is correct, and I was just thinking out loud—we should allow multi-select. Current favorite answer is a url path for /search/state/county/muni/?type={args} so that we have a nice solid URL for geography (most common search facet) with room for flexible args. Hopefully that works...I updated the issue.

misspellings

IMO, typeahead helps here—if someone gets a "no locations found" message, that's a clue that they mistyped. We can get better at this over time.

I think Joshua covered the rest. Thanks for helping think it through, y'all

joshuagraber · 2024-06-13T19:20:22Z

@josh-chamberlain Ah, gotcha, that makes more sense now, thank you for explaining.

Current favorite answer is a url path for /search/state/county/muni/?type={args} so that we have a nice solid URL for geography (most common search facet) with room for flexible args

I like that. We could just pass an array of record type values: /search/state/county/muni/?types={serialized record_type[]}

IMO, typeahead helps here—if someone gets a "no locations found" message, that's a clue that they mistyped. We can get better at this over time.

True enough, and if we collect a db of search queries by session, it could be an interesting problem to throw at ML.

maxachis · 2024-07-02T00:28:00Z

Putting together potentially useful articles on the matter:

Implementing Auto Complete with Postgres and Python

maxachis · 2024-07-02T00:48:20Z

@josh-chamberlain Regarding pulling locality from the agencies table, what column would that be? municipality? name?

josh-chamberlain · 2024-07-02T17:00:23Z

@maxachis yep, municipality is what we currently call that. switching to locality is generally more inclusive; for now, we can have a front end and back end terminology adjustment, which is not the end of the world, but creates tech/information debt we should still fix: #332

maxachis · 2024-07-06T14:13:39Z

@joshuagraber @josh-chamberlain Just to confirm -- debouncing is done on the frontend through the Iodash library that Joshua mentioned, correct? So no need to do that on the backend?

joshuagraber · 2024-07-08T10:38:35Z

@maxachis @josh-chamberlain Yes, we'll debounce client-side, but IMO it might not hurt to add some rate limiting on the API side as well (for all endpoints, really) if it's not too big of a lift, just as a protective measure against a surprise hosting bill.

josh-chamberlain · 2024-07-08T17:24:20Z

rate limiting: #353

josh-chamberlain added the front end label Apr 19, 2024

josh-chamberlain added this to Open issues & Roadmap Apr 19, 2024

josh-chamberlain removed the front end label Apr 19, 2024

josh-chamberlain moved this to Needs Refinement in Open issues & Roadmap Apr 19, 2024

josh-chamberlain changed the title ~~v2 view: main page~~ v2: main page Apr 19, 2024

josh-chamberlain mentioned this issue Apr 19, 2024

data sources app v2 #248

Open

josh-chamberlain added front end api labels Apr 19, 2024

josh-chamberlain changed the title ~~v2: main page~~ v2: main page/search Apr 19, 2024

mbodeantor moved this from Needs Refinement to Todo in Open issues & Roadmap Apr 19, 2024

maxachis self-assigned this Apr 25, 2024

maxachis mentioned this issue Apr 26, 2024

Add search API functionality #266

Closed

josh-chamberlain assigned maxachis and joshuagraber and unassigned maxachis Apr 29, 2024

maxachis mentioned this issue May 7, 2024

Add search API functionality Police-Data-Accessibility-Project/data-sources-app-v2#2

Closed

josh-chamberlain mentioned this issue Jun 20, 2024

v2: remove search tokens #322

Closed

3 tasks

josh-chamberlain mentioned this issue Jul 2, 2024

v1 quicksearch string matching bug #346

Closed

1 task

maxachis mentioned this issue Jul 3, 2024

Create Database Client to Encapsulate Database Interactions #337

Closed

7 tasks

josh-chamberlain mentioned this issue Aug 2, 2024

Main search endpoint #352

Closed

3 tasks

maxachis moved this from Todo to In Progress in Open issues & Roadmap Aug 4, 2024

joshuagraber mentioned this issue Aug 7, 2024

feature: main search route Police-Data-Accessibility-Project/data-sources-app-v2#70

Merged

1 task

josh-chamberlain added the fixed_in_dev This is merged into the dev environment and waiting to be merged into main label Aug 14, 2024

josh-chamberlain moved this from In Progress to Done in Open issues & Roadmap Aug 14, 2024

maxachis mentioned this issue Aug 21, 2024

Merge Dev into Main Police-Data-Accessibility-Project/data-sources-app-v2#82

Merged

josh-chamberlain mentioned this issue Aug 21, 2024

geographic grouping for v2 search results #409

Closed

2 tasks

josh-chamberlain closed this as completed Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2: main page/search #249

v2: main page/search #249

josh-chamberlain commented Apr 19, 2024 •

edited

Loading

mbodeantor commented Apr 19, 2024

maxachis commented Apr 25, 2024 •

edited

Loading

mbodeantor commented Apr 25, 2024

josh-chamberlain commented Apr 26, 2024

maxachis commented Apr 26, 2024 •

edited

Loading

josh-chamberlain commented Apr 29, 2024

mbodeantor commented Apr 29, 2024

maxachis commented Apr 29, 2024

joshuagraber commented Jun 10, 2024 •

edited

Loading

josh-chamberlain commented Jun 10, 2024

joshuagraber commented Jun 11, 2024

josh-chamberlain commented Jun 12, 2024

maxachis commented Jun 12, 2024

joshuagraber commented Jun 13, 2024

josh-chamberlain commented Jun 13, 2024

joshuagraber commented Jun 13, 2024

maxachis commented Jul 2, 2024

maxachis commented Jul 2, 2024

josh-chamberlain commented Jul 2, 2024

maxachis commented Jul 6, 2024

joshuagraber commented Jul 8, 2024

josh-chamberlain commented Jul 8, 2024

v2: main page/search #249

v2: main page/search #249

Comments

josh-chamberlain commented Apr 19, 2024 • edited Loading

Front end

API changes

mbodeantor commented Apr 19, 2024

maxachis commented Apr 25, 2024 • edited Loading

mbodeantor commented Apr 25, 2024

josh-chamberlain commented Apr 26, 2024

maxachis commented Apr 26, 2024 • edited Loading

josh-chamberlain commented Apr 29, 2024

mbodeantor commented Apr 29, 2024

maxachis commented Apr 29, 2024

joshuagraber commented Jun 10, 2024 • edited Loading

josh-chamberlain commented Jun 10, 2024

joshuagraber commented Jun 11, 2024

josh-chamberlain commented Jun 12, 2024

maxachis commented Jun 12, 2024

joshuagraber commented Jun 13, 2024

josh-chamberlain commented Jun 13, 2024

joshuagraber commented Jun 13, 2024

maxachis commented Jul 2, 2024

maxachis commented Jul 2, 2024

josh-chamberlain commented Jul 2, 2024

maxachis commented Jul 6, 2024

joshuagraber commented Jul 8, 2024

josh-chamberlain commented Jul 8, 2024

josh-chamberlain commented Apr 19, 2024 •

edited

Loading

maxachis commented Apr 25, 2024 •

edited

Loading

maxachis commented Apr 26, 2024 •

edited

Loading

joshuagraber commented Jun 10, 2024 •

edited

Loading