You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A web page containing QUnit.test('add', shows up in search result snippets as QUnit . test ( ' add ' , assert. Take note of the unexpected spaces around virtually every symbol. I believe this is most likely a side-effect of the characters in question having <span> in the source code. However, there are no spaces in the source code around (most) of these characters.
I'm evaluating Typesense for use on https://api.jquery.com, https://qunitjs.com and other OpenJS sites. I've used typesense/docsearch-scraper via GitHub Actions, and docsearch is configured with "text": "p,li,tr,pre" among the selectors. The above code is part of a regular paragraph of PRE tag.
For inline elements like <span>, <em>, <code>, <strong> to not result in additional spaces to be injected into the indexed text. It is not uncommon for prose to sometimes emphasize, underline, strike, superscript, or otherwise wrap only part of a word in markup for any reason. It is probably most common in content with syntax-highlighted source code.
Metadata
Typesense Version: 0.24.1
OS: Debian 11 Bullseye
The text was updated successfully, but these errors were encountered:
Description
A web page containing
QUnit.test('add',
shows up in search result snippets asQUnit . test ( ' add ' , assert
. Take note of the unexpected spaces around virtually every symbol. I believe this is most likely a side-effect of the characters in question having<span>
in the source code. However, there are no spaces in the source code around (most) of these characters.Steps to reproduce
I'm evaluating Typesense for use on https://api.jquery.com, https://qunitjs.com and other OpenJS sites. I've used
typesense/docsearch-scraper
via GitHub Actions, and docsearch is configured with"text": "p,li,tr,pre"
among the selectors. The above code is part of a regular paragraph of PRE tag.source: typense.yaml
source: /docsearch.config.json)
Expected Behavior
For inline elements like
<span>
,<em>
,<code>
,<strong>
to not result in additional spaces to be injected into the indexed text. It is not uncommon for prose to sometimes emphasize, underline, strike, superscript, or otherwise wrap only part of a word in markup for any reason. It is probably most common in content with syntax-highlighted source code.Metadata
Typesense Version: 0.24.1
OS: Debian 11 Bullseye
The text was updated successfully, but these errors were encountered: