Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single preview or draft rule publishes #127

Open
pbartusch opened this issue Mar 1, 2023 · 9 comments
Open

Single preview or draft rule publishes #127

pbartusch opened this issue Mar 1, 2023 · 9 comments

Comments

@pbartusch
Copy link
Collaborator

pbartusch commented Mar 1, 2023

Search managers have no easy way to preview the impact of single rules they are currently working on. SMUI only enables the deployment of whole rules.txt files that belong to a rules collection. This is disadvantageous, e.g.:

  • Rule deployments might take too long, resulting in unacceptable wait times for the managers.
  • Rule deployments in the middle of their evolution is risky, because they might impact the end user search experience in an unexpected way (when done on LIVE). Also, rule deployments by one search manager always deploy all rules maintained in the system incl. rules from other managers at the time of deployment.

Note: Rules and spellings will be treated the same in the context of this improvement proposal. Also rules.txt incl. spellings will be used for simplicity.

Deployments to optionally configured PRELIVE instances, might be as time consuming as LIVE deployments. Also, PRELIVE instances do often not come with synchronous data compared to their LIVE equivalent (product data especially).

Goal: Enable publishing single rules in a draft status - robust & fast!

The (querqy) search tech stack:

Assume the following setup:

A (1) search frontend processing the user (URL) query,
and a (2) search engine (like Solr or Elasticsearch, to respond to the query) with a querqy Rewriter-Chain that is connected to the SMUI rules.txt deployment process.

A typical search query might enter the system like this:

https://example.com/search?query=laptop

The corresponding querqy Rewriter-Chain might enhance the user query like this:

common --> hyphen --> replace --> wordbreak

  • common and replace, in this example, receive rules.txt input from SMUI via the described deployment process.
  • hyphen and wordbreak work without SMUI input.
  • Also, the SMUI deployment process might add additional business logic like semantic checks of resulting rules.txt files, regression tests including searches being simulated, or other kinds of plausibility verification.

Single preview or draft rules should not be part of above’s productive Rewriter-Chain for the search end user. On the other hand, their impact shall be previewable in the LIVE context by the search managers who work on these rules.

Acceptance Criteria:

  • Search managers instantly see the impact of rules, they are editing, in the LIVE context.
  • The status of rules (draft vs. LIVE) is transparent to the search managers.
  • Individual search managers might work on different rule drafts at the same time.
  • Entitled search managers might trigger a deployment of rules.txt files at any time, making all finalized draft rules of all search managers accessible to the search end users.
  • To prevent a search manager, A, from accidently publishing an unfinished rule draft of B, the draft status of rules need to be explicitly closed so that they will only then be considered by the deployment process of SMUI.

Side criteria: The final rules.txt deployment, having the final impact on the search end user experience, should still be testable like suggested above (plausibility/regression tests).

Note: There is also the possibility of two search managers concurrently working on the same rule draft.

Introducing drafts to the (querqy) search tech stack:

Challenge / search manager UX hypothesis: While using the preview, search managers want to experience the impact of all rules end users experience, but with the additional flavor of draft rules. This means more specifically:

  • In general: All Rewriters applying to end users, also apply to preview searches for managers.
  • Exception: All search manager specific draft rules apply additionally while the (maybe outdated rule from the perspective of the manager) end user rule is suppressed.

To ensure a clean separation of (single) draft rules vs. the whole set of productive rules.txt files, draft Rewriters are suggested, e.g.:

common_drafts --> common --> hyphen --> replace_drafts --> replace --> wordbreak

This way, integrity of rules.txt file deployments for the search end users can be ensured with every (also time consuming) process necessary, while it technically only needs to be ensured draft Rewriters never apply to requests from end users (vs. requests from search managers). This can also be test automation ensured in the (querqy) search tech stack.

Request flow for the search end user:
--> search frontend: https://example.com/search?query=laptop
--> The common Rewriter applies the following rule:

laptop =>
    SYNONYM: notebook

Request flow for the search manager:
--> search frontend: https://example.com/search?query=laptop&smui.user=paul_bartusch
--> The common_draft Rewriter applies the following rule:

laptop =>
    SYNONYM: notebook
    @{
        "smui.user" : "paul_bartusch"
    }@

The Rewriter containing the draft rules can also be created dynamically.

The smui.user URL parameter can be part of the preview_link as introduced with v3.15.

Security note: Making draft rules accessible via the search frontend by adding a URL parameter incl. the clear text username of the search manager might be easily replicated by an outside party (and also easy to spoof). This is either be accepted or prevented by a more advanced security architecture (e.g. VPN access or other encryption/decryption techniques).

(?) Discussion:

  • Do you see an alternative frontend / (querqy) search engine and Rewriter setup that also fulfills the requirements expressed in this improvement proposal? Some thoughts: The smui.user querqy annotations can also be part of the main common and replace rules.txt input as second versions of the inputs (making dedicated draft Rewriters obsolete). But this way, still the whole rules.txt deployment process incl. time consuming processes eventually are part of draft deployments.
  • Do you see a way to suppress the laptop rules in the common Rewriter when a “laptop” search frontend query is triggered by the manager (instead of the end user)? The way it is suggested here, seems to impose the risk of overlapping rule application (draft vs. LIVE).

(!) Next: Concept evaluation of the search manager experience for (single) draft preview rules in SMUI.

@renekrie
Copy link
Collaborator

renekrie commented Mar 1, 2023

I wonder how much sense it makes to try and deploy single rules. A query might trigger more than one rule (eg. if you have a rule for laptop and a rule for sleeve and want to optimise the query laptop sleeve. Which rules would you then deploy?

  • I'm trying to understand the root cause of this feature request. Is it that a publish to prelive would be too slow?! (How can that happen?)

@dobestler
Copy link
Contributor

We are publishing single rules (or smaller rules collections) already because we want Search Managers to have a quick way of testing and maybe tuning a rule before it is deployed to all users. We do not use any testing/staging environment because we don't want to invest too much into having a production like environment regarding data. So we only use production.

Valid point though regarding multiple rules that might be triggered. For now we are generously ignoring it and have not suffered from it (yet). Mainly because our rules collections are not that large and usually a single rule is triggered, which is the new one.
I believe this problem could easily be addressed e.g. by deploying all rules as preview together with the new one to ensure same context. If performance is an issue such a "test draft rule within all rules context" deploy could be an optional step where as default is to deploy only the preview/draft rule.

@epugh
Copy link
Contributor

epugh commented Mar 2, 2023

I've noticed in my various projects with SMUI that there is a continuum of folks from the "Let's treat Rules like code and put them through a more full testing cycle" camp at one end to the "This is a activity that I do all the time in production and it's okay" ;-).

I've been wondering if we could get to a place where we have real time analytics/alerting to let us know when we make changes in production that really change things, so that we can have more confidence in creating rules in production.

@dobestler
Copy link
Contributor

I think the information should all be there to do something like that. We make use of the querqy info logging feature to segment queries that triggered a rule to monitor their metrics.

Just to clarify because my comment was not clear enough: We are not directly deploying rules to all users but rather deploy them to a separate "draft rewriter" that can be applied e.g. using URL Params/Headers. Once the SERP looks good to the Search Manager, the rules are deployed to the rewriter serving all user queries.

@pbartusch
Copy link
Collaborator Author

Hi, thanks so much for your input. I see some common themes:

@renekrie :

A query might trigger more than one rule

Is what I meant here with:

Challenge / search manager UX hypothesis: While using the preview, search managers want to experience the impact of all rules end users experience, but with the additional flavor of draft rules.

@dobestler :

I almost assumed, you are ignoring this for now. Would be interested in feedback / input from your SMs while your rule collection evolves and grows (other installations might have a larger "rule stock" already, but I don't know for sure ;-)).

@epugh :

Thanks for sharing this experience / perspective.

Rules should not be treated like code, but rather like payload or editorial data. I agree. On the other hand, those (rule) data have the power to influence the search algorithm - and thereby, the experience massively. Automatic regression tests IMHO do not necessarily constraint only to code changes, but to all aspects of the product quality. I.e., for an ML ranking model, you probably also want to introduce automated checks, plausibility (top ranked products on top queries etc) and the like before this data artefact goes into production too.

Also, I am a big fan of more data driven search management with monitoring - maybe even alerting - aspects, but I think this is a different story ( though, I like the impulse! ).

@pbartusch
Copy link
Collaborator Author

I think the information should all be there to do something like that. We make use of the querqy info logging feature to segment queries that triggered a rule to monitor their metrics.

That is possible (at least with Solr). SMUI log statements have specifically being designed to connect the triggering of rules (and which rules) with the analytics system on the frontend (like Google Analytics).

There are super-interesting analyses, and business-relevant insights you can draw from such a setup, but I also think we should discuss this in a separate topic ( as it is only remotely connected ).

@pbartusch
Copy link
Collaborator Author

Side note regarding this topic:

Some time ago there was a discussion around #86 , also incl. you, @dobestler. Was the idea for this PR also motivated by the idea of draft rules? Or was this about tenants?

@pbartusch
Copy link
Collaborator Author

Let me try to conclude to get a nice concept eventually:

The primary motivation for this feature is to enable Search Managers to immediately see the impact of rules while they are drafting (editing) them in a realistic (LIVE) setup (product data especially).

Performance is a concern of this feature request - not its root cause. Also, not wanting to have immediate search end user impact while drafting the rules - but still seeing their impact like an end user - is required (but this should be possible instantly).

What is understood so far?

Evaluating the whole rule pipeline performance from SMUI (Search Manager at the frontend) to the querqy Rewriters, that will eventually enhance/modify the query, depends on various "moving parts". I see:

  • SMUI rendering (probably super fast, but I have never measured it)
  • SMUI “handover” & rules.txt deployment (highly depends, especially in custom setups)
  • querqy rules.txt “takeover”
    (taking request performance out of the equation for now)

I assume your point , @renekrie ( and also partially yours , @epugh ) , is that the performance of the querqy takeover - even of large and many rule.txt files - should not enter a human perceptible region.

Let’s assume an extreme case:

  • SMUI does not have no Save button no more,
  • Saving of rules happens instantly while they are edited (on exit of the corresponding input fields),
  • Every rule save also triggers a full rules.txt files export for the whole rules collection. This will be a draft status of the rules.txt files, but it will still contain all the rules.

I also believe that the rules.txt deployment, after the SMUI handover, might be the performance bottleneck - especially in cases of a full rules.txt export. Custom deployment implementations might have added additional business logic like regression testing to the deployment which also slows down the process.

The search manager should see the impact of rules:

  • All rules as they are currently live (maybe one for sleeve),
  • All manager specific draft rules that have not been published yet (maybe laptop). They might exist on LIVE in a pre-draft state for the end user - or not exist at all. Either way, the manager sees the impact of the draft version.

Now, I see 3 alternatives to reflect this technically with the querqy Rewriter setup:

(1) “Straight forward solution”: The Rewriter setup is not to be touched. Additional smui.user (or equivalent) annotations of rules make sure that the frontend can select rules to include or filter for a specific query that might come from a search manager or the end user on the other hand.
(+) Pro: “Straight forward solution” :-)
(-) Con: Constant deployment of draft rules to the production Rewriter for end users while search managers maintain rule drafts. Performance for the search managers must be sufficient (will become problematic when additional business logic during the deployment is involved, disputable in itself whether that is legitimate or not).

(2) *_draft Rewriters containing all rules: Search managers will interact with draft Rewriters, while end users interact with their productive equivalent. The rules.txt are mainly the same for both rewriters, but differ in draft rules.
(+) Pro: Dedicated Rewriters enable dedicated deployment processes with different reliability requirements.
(-) Con: Increased complexity. Different deployment “tracks” for draft vs. productive rules (“fast lane” vs. regular track). Fast enough?

(3) *_draft Rewriters containing only draft rules.
(+) Pro: Clear technical separation of (often only single) draft vs. the whole set of productive rules. This also enables dedicated deployment processes.
(-) Con:

  • Increased complexity with different deployment “tracks”.
  • Either missing or potentially overlapping query Rewritings, e.g. rules of laptop in draft mode followed by rules of laptop in production). How is querqy actually reacting in such cases (e.g. common_draft applies “laptop” --> common applies “laptop”)?

🤯

Let me know if you see more or other alternatives that I might have missed here. Please feel free to also share your thoughts on the alternatives and the Pros/Cons especially.

@dobestler
Copy link
Contributor

Side note regarding this topic:

Some time ago there was a discussion around #86 , also incl. you, @dobestler. Was the idea for this PR also motivated by the idea of draft rules? Or was this about tenants?

It was about Draft Rules. Same goal that we can achieve now with the new Copy Feature in a IMHO better way for the time being.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants