How to Create a Custom Content Parser

Admittedly, creating a custom content parser is a bit cumbersome. However, once it's configured, it shouldn't change frequently.

Inspect the DOM

Right-click on any part of the page and select "Inspect" to open the "Elements" tab of the Chrome Developer Tools.

Find the Containing `<div>`

An effective approach is to find the innermost (or outermost depending on your use case) containing <div> of the selected element that has a unique id or class that can be used to distinguish the container.

Note: The container tag doesn't necessarily have to be a <div> tag. It can be any tag.

Determine Selector Query

Determine the selector (and selectorAll) queries and add them to the existing content parser configuration in the Lumos Options page. See documentation for querySelector() and querySelectorAll() to confirm all querying capabilities and see more examples.

Example queries:

Select element by tag name: tagName
Select element by id (leading #): #elementId
Select element by class name (leading .): .className

querySelector() supports complex selectors and negation.

Example config for a single domain:

{
  "domain.com": {
    "chunkSize": 500,
    "chunkOverlap": 0,
    "selectors": [
      "tagName",
      "#elementId"
    ],
    "selectorsAll": [
      ".className"
    ]
  }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

content_parser.md

content_parser.md

How to Create a Custom Content Parser

Inspect the DOM

Find the Containing `<div>`

Determine Selector Query

Files

content_parser.md

Latest commit

History

content_parser.md

File metadata and controls

How to Create a Custom Content Parser

Inspect the DOM

Find the Containing <div>

Determine Selector Query

Find the Containing `<div>`