Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[10.x] Add Typesense engine #773

Merged
merged 37 commits into from
Jan 9, 2024

Conversation

jasonbosco
Copy link
Contributor

@jasonbosco jasonbosco commented Oct 17, 2023

@taylorotwell @driesvints Following up from our conversation, this PR adds native support for Typesense (an open source alternative to Algolia) to Scout.

It follows a code structure similar to the Algolia engine to maintain consistency.

CC: @karakhanyans @arayiksmbatyan

P.S. This PR supersedes #772.


Edit: Adding documentation for the Typesense Scout engine, similar to the docs here

Typesense

Typesense is an open source, typo tolerant search engine that is optimized for instant sub-50ms searches,
while providing an intuitive developer experience.

When using the Typesense driver you will need to install the Typesense PHP SDK via the Composer package manager:

composer require typesense/typesense-php

Then, set the SCOUT_DRIVER environment variable as well as your Typesense host and API key credentials within your application's .env file:

SCOUT_DRIVER=typesense
TYPESENSE_API_KEY=xyz
TYPESENSE_HOST=localhost

You can also modify port, path and protocol accordingly:

TYPESENSE_PORT=8108
TYPESENSE_PATH=ts
TYPESENSE_PROTOCOL=https

For more information regarding Typesense, please consult the Typesense documentation.

toSearchableArray

To make your search compatible with Typsense, cast your model id to string and created_at to int32 timestamp as shown below.

<?php

namespace App;

use Illuminate\Database\Eloquent\Model;
use Laravel\Scout\Searchable;

class Todo extends Model
{
    use Searchable;
    
     /**
     * Get the indexable data array for the model.
     *
     * @return array
     */
    public function toSearchableArray()
    {
        return array_merge(
            $this->toArray(), 
            [
                // Cast id to string and turn created_at into an int32 timestamp
                // in order to maintain compatibility with the Typesense index definition below
                'id' => (string) $this->id,
                'created_at' => $this->created_at->timestamp,
            ]
        );
    }
}

Collection Schemas and Search Parameters

To setup Typesense collection schemas, in config/scout.php find the typesense driver configs and modify model-settings array as shown in example below.

To modify the model query_by attributes you have to setup seach-parameters in model-settings

Note: query_by is a required parameter.

use App\Models\User;

 /*
    |--------------------------------------------------------------------------
    | Typesense Configuration
    |--------------------------------------------------------------------------
    |
    | Here you may configure your Typesense settings. Typesense is an open
    | source search engine with minimal configuration. Below, you can state
    | the host and key information for your own Typesense installation.
    |
    | See: https://typesense.org/docs/0.25.1/api/authentication.html
    |
    */

    'typesense' => [
        'client-settings' => [
            'api_key' => env('TYPESENSE_API_KEY', 'xyz'),
            'nodes' => [
                [
                    'host' => env('TYPESENSE_HOST', 'localhost'),
                    'port' => env('TYPESENSE_PORT', '8108'),
                    'path' => env('TYPESENSE_PATH', ''),
                    'protocol' => env('TYPESENSE_PROTOCOL', 'http'),
                ],
            ],
            'nearest_node' => [
                'host' => env('TYPESENSE_HOST', 'localhost'),
                'port' => env('TYPESENSE_PORT', '8108'),
                'path' => env('TYPESENSE_PATH', ''),
                'protocol' => env('TYPESENSE_PROTOCOL', 'http'),
            ],
            'connection_timeout_seconds' => env('TYPESENSE_CONNECTION_TIMEOUT_SECONDS', 2),
            'healthcheck_interval_seconds' => env('TYPESENSE_HEALTHCHECK_INTERVAL_SECONDS', 30),
            'num_retries' => env('TYPESENSE_NUM_RETRIES', 3),
            'retry_interval_seconds' => env('TYPESENSE_RETRY_INTERVAL_SECONDS', 1),
        ],
        'model-settings' => [
            User::class => [
                'collection-schema' => [
                    'fields' => [
                        [
                            'name' => 'name',
                            'type' => 'string',
                        ],
                        [
                            'name' => 'created_at',
                            'type' => 'int64',
                        ],
                        [
                            'name' => '__soft_deleted', // <==== When scout.soft_delete is set to true, this field is set to 1 if the record is deleted.
                            'type' => 'int32',
                            'optional' => true
                        ],
                    ],
                    'default_sorting_field' => 'created_at',
                ],
                'search-parameters' => [ 
                    'query_by' => 'name, title' // required
                ],
            ],
        ],
    ],

Search Parameters On The Fly

Typesense supports the ability to set almost all search parameters on the fly, if needed.

To modify search parameters on the fly you can use setSearchParameters method of the Typesense Engine.

use App\Models\Todo;

Todo::search('Do grocceries')->setSearchParameters(['query_by' => 'title, description'])->get();

Here are some example search parameters:

'highlight_start_tag' => '<mark>',
'highlight_end_tag' => '</mark>',
'snippet_threshold' => 30,
'exhaustive_search' => false,
'use_cache' => false,
'cache_ttl' => 60,
'prioritize_exact_match' => true,
'enable_overrides' => true,
'highlight_affix_num_tokens' => 4,

Full list of search parameters and descriptions can be found here.

Copy link
Member

@driesvints driesvints left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jasonbosco, thanks for your PR. There's still quite a bit of styling improvements needed here. Can you make sure the code style of this PR adheres to the rest of the library? When done we can mark it as ready for Taylor to review. Thanks

config/scout.php Outdated Show resolved Hide resolved
config/scout.php Outdated Show resolved Hide resolved
src/EngineManager.php Outdated Show resolved Hide resolved
src/Engines/TypesenseEngine.php Outdated Show resolved Hide resolved
src/Engines/TypesenseEngine.php Outdated Show resolved Hide resolved
src/Engines/TypesenseEngine.php Outdated Show resolved Hide resolved
src/Engines/TypesenseEngine.php Outdated Show resolved Hide resolved
src/Engines/TypesenseEngine.php Outdated Show resolved Hide resolved
src/Engines/TypesenseEngine.php Outdated Show resolved Hide resolved
src/Engines/TypesenseEngine.php Show resolved Hide resolved
@driesvints driesvints marked this pull request as draft October 17, 2023 14:28
@jasonbosco
Copy link
Contributor Author

@driesvints Thank you for the quick review!

@karakhanyans has fixed the linting issues and the continuous-integration/styleci/pr check now passes. Could you take a look now?

@jasonbosco jasonbosco marked this pull request as ready for review October 17, 2023 15:56
@jasonbosco jasonbosco requested a review from driesvints October 17, 2023 15:57
@driesvints
Copy link
Member

@jasonbosco a lot of the DocBlocks are still missing their first line comment. Could you add those?

@driesvints driesvints requested review from taylorotwell and removed request for driesvints October 19, 2023 12:33
@driesvints
Copy link
Member

Thanks all 👍

@taylorotwell
Copy link
Member

taylorotwell commented Oct 26, 2023

@jasonbosco You're calling a few model methods I've never heard of:

getCollectionSchema
typesenseQueryBy

Can you explain both of these methods? What are they supposed to do? Examples? More detail in general please.

I generally need full documentation for how to use this from start to finish. Please mark as ready for review when posted.

@taylorotwell taylorotwell marked this pull request as draft October 26, 2023 21:32
@jasonbosco
Copy link
Contributor Author

getCollectionSchema
typesenseQueryBy

Can you explain both of these methods? What are they supposed to do? Examples? More detail in general please.

@taylorotwell In the current engine we have, these are methods that needed to be defined in the model. But @karakhanyans and I just discussed this, and we're going to take this opportunity to simplify this interface, and move it into the engine configuration, so these methods won't be required any more. Will share detailed docs once we make these changes.

@taylorotwell
Copy link
Member

Thanks

@jasonbosco
Copy link
Contributor Author

@taylorotwell -

@karakhanyans has fixed the root cause of the issue in typesense-php. The latest code should now pull the correct version of typesense-php and fix the issue you saw.

@jasonbosco jasonbosco marked this pull request as ready for review December 5, 2023 05:54
@taylorotwell
Copy link
Member

taylorotwell commented Dec 13, 2023

@jasonbosco how do I handle soft deleted records? What do I add to my schema? Furthermore, if I'm using Typesense how can I delete all my collections for a fresh start?

Please mark as ready for review when questions are answered.

@taylorotwell taylorotwell marked this pull request as draft December 13, 2023 20:08
@jasonbosco
Copy link
Contributor Author

@taylorotwell

how do I handle soft deleted records? What do I add to my schema?

Soft deletes are handled here.

Essentially if soft_delete is set to true in the Scout config, then the engine syncs the soft delete metadata generated by Scout into Typesense.

So you'd want to add a field called __soft_deleted to the Typesense Collection schema to handle soft deletes (see comment below):

...

'typesense' => [
       ...
        'model-settings' => [
            User::class => [
                'collection-schema' => [
                    'fields' => [
                        [
                            'name' => 'name',
                            'type' => 'string',
                        ],
                        [
                            'name' => 'created_at',
                            'type' => 'int64',
                        ],
                        [ /****** Add this field for soft deletes ********************************/
                            'name' => '__soft_deleted', 
                            'type' => 'int32',
                            'optional' => true
                        ],
                    ],
                    'default_sorting_field' => 'created_at',
                ],
                'search-parameters' => [ 
                    'query_by' => 'name, title' 
                ],
            ],
        ],
    ],

I'll update the docs in the description with this information shortly.

Furthermore, if I'm using Typesense how can I delete all my collections for a fresh start?

If you're running on dev, to get a clean slate, you'd want to stop the Typesense server, delete the contents of the --data-dir which you specified when starting the process, and then start the process again.

In a production environment, you'd want to call DELETE /collections/<name> on each collection to drop each collection. Docs

@jasonbosco jasonbosco marked this pull request as ready for review December 13, 2023 20:36
@taylorotwell
Copy link
Member

@jasonbosco another bug related to soft delete.

If I do this:

return User::search('Delpha')->withTrashed()->get();

I get a result (as expected because Delpha is trashed).

But if I do this:

return User::search('Delpha')->onlyTrashed()->get();

I don't get any results. I would expect to still get my result.

@taylorotwell
Copy link
Member

@jasonbosco drafting this again pending response on above.

@taylorotwell taylorotwell marked this pull request as draft December 23, 2023 15:34
@jasonbosco
Copy link
Contributor Author

jasonbosco commented Jan 5, 2024

@taylorotwell - could you give it a shot now?

@karakhanyans has fixed the soft delete issue and also written a test for it.

@jasonbosco jasonbosco marked this pull request as ready for review January 5, 2024 22:00
@taylorotwell
Copy link
Member

@jasonbosco @karakhanyans

OK - thanks. Another question. Let's imagine I have a schema for my user model. I later add another column to the database and want to add it to my schema. I update my Scout configuration file's Typesense schema configuration... then what? I assume the schema isn't automatically updated in Typesense? What is the user story here?

@jasonbosco
Copy link
Contributor Author

@taylorotwell

I update my Scout configuration file's Typesense schema configuration... then what? I assume the schema isn't automatically updated in Typesense?

That's correct. Schema changes need to be done in Typesense directly using the alter schema endpoints.

Alternatively, there's also an auto-schema detection mode in Typesense, where users can define field names using regex in the collection schema, so at least new field additions matching a pattern do not need schema alterations, but will be auto-added by Typesense when a document is indexed with that field name pattern.

We can probably add a note to the configuration section in Scout saying that the schema defined here is only used when the collection is created the first time to make this clear?

@taylorotwell
Copy link
Member

@jasonbosco would it be reasonable to make the default config in Scout reflect auto-schema detection?

@jasonbosco
Copy link
Contributor Author

jasonbosco commented Jan 8, 2024

would it be reasonable to make the default config in Scout reflect auto-schema detection?

@taylorotwell Setting that to the default could potentially lead to unnecessary fields being indexed and taking up RAM, without the user realizing. For eg, I've commonly seen URL fields getting indexed accidentally, when typically they are only used for display purposes and not for full-text search purposes.

May be we can call out auto-schema detection as a commented-out field in the example schema, with a link to the docs, so users are aware of its existence?

@taylorotwell
Copy link
Member

@jasonbosco is there any kind of admin dashboard or something where users can make schema changes? It just feels cumbersome to have to manually craft HTTP calls to update the schema.

@jasonbosco
Copy link
Contributor Author

jasonbosco commented Jan 9, 2024

@taylorotwell

The dashboard that the core Typesense team maintains is available as part of Typesense Cloud. Happy to set you up with a free account. If you want to sign up and let me know, I can provision a free cluster for you to use.

There's a Postman collection available here.

There's a community-maintained dashboard here, but it looks like it doesn't support the alter schema endpoint yet.

@taylorotwell taylorotwell merged commit 6611bd7 into laravel:10.x Jan 9, 2024
11 checks passed
@taylorotwell
Copy link
Member

Thanks 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants