Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KB] Add documentation packaging workflow #193473

Closed
pgayvallet opened this issue Sep 19, 2024 · 2 comments · Fixed by #194379
Closed

[KB] Add documentation packaging workflow #193473

pgayvallet opened this issue Sep 19, 2024 · 2 comments · Fixed by #194379
Labels
Feature:AI Product Docs Product Documentation for AI workflows Team:AI Infra AppEx AI Infrastructure Team

Comments

@pgayvallet
Copy link
Contributor

pgayvallet commented Sep 19, 2024

For #192031, we need to have a CI task or workflow that would

  • Retrieve the subset of documentation articles we are interested in from the innovation team's cluster
  • (Unless we decide to re-use their embeddings) Generate embeddings for it
  • Re-export the documents with their embeddings
  • Build the fleet package containing those documents and the corresponding index creation instructions

Embedding generation could be done by indexing the documents in some cluster with the fields we want embeddings for as semantic_text, wait for the embedding generation to be complete and then re-export the documents for the next steps.

The last step is the one that is unclear to me - I'm not sure atm how exactly fleet packages are being built and added to the package registry / images.

@botelastic botelastic bot added the needs-team Issues missing a team label label Sep 19, 2024
@pgayvallet pgayvallet added the Team:AI Infra AppEx AI Infrastructure Team label Sep 19, 2024
@botelastic botelastic bot removed the needs-team Issues missing a team label label Sep 19, 2024
@pgayvallet
Copy link
Contributor Author

I created a POC (#193847) to show what the documentation extraction script would be in charge of doing.

What the script does:

  • connect to the source cluster containing the documentation, and extract the subset that we are interested in
  • setup an index with the right mappings on a local cluster and index the documentation there, generating the embeddings
  • store the documents with embeddings on disk, on json format.

I tried with the Kibana 8.15 documentation, which is ~600 files, and the zipped output is around 12mb. I'd say that most of it is coming from the embeddings.

I also tested the semantic search based documentation retrieval, which seems to be doing okay, E.g

search term: 'How to enable TLS for Kibana?'

top 3 results:
- Encrypt TLS communications in Kibana | Kibana Guide [8.15] | Elastic
- Security production considerations | Kibana Guide [8.15] | Elastic
- Mutual TLS authentication between Kibana and Elasticsearch | Kibana Guide [8.15] | Elastic

See the performSemanticSearch function of the PR for details.

@pgayvallet
Copy link
Contributor Author

I think we will need to progress on #193849 before progressing further on the current issue, as we need more clarity on what the exact format will be for our "KB packages" and their documents.

pgayvallet added a commit that referenced this issue Oct 7, 2024
## Summary

Related #193473

Add initial implementation of the knowledge base artifact builder. This
PR only introduces the builder script, it doesn't do anything about
automation.

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
pgayvallet added a commit to pgayvallet/kibana that referenced this issue Oct 14, 2024
## Summary

Related elastic#193473

Add initial implementation of the knowledge base artifact builder. This
PR only introduces the builder script, it doesn't do anything about
automation.

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
(cherry picked from commit 1ab1add)

# Conflicts:
#	.github/CODEOWNERS
@legrego legrego added the Feature:AI Product Docs Product Documentation for AI workflows label Nov 13, 2024
pgayvallet added a commit to pgayvallet/kibana that referenced this issue Nov 19, 2024
## Summary

Close elastic#193473
Close elastic#193474

This PR utilize the documentation packages that are build via the tool
introduced by elastic#193847, allowing to
install them in Kibana and expose documentation retrieval as an LLM task
that AI assistants (or other consumers) can call.

Users can now decide to install the Elastic documentation from the
assistant's config screen, which will expose a new tool for the
assistant, `retrieve_documentation` (only implemented for the o11y
assistant in the current PR, shall be done for security as a follow up).

For more information, please refer to the self-review.

## General architecture

<img width="1118" alt="Screenshot 2024-10-17 at 09 22 32"
src="https://github.com/user-attachments/assets/3df8c30a-9ccc-49ab-92ce-c204b96d6fc4">

## What this PR does

Adds two plugin:
- `productDocBase`: contains all the logic related to product
documentation installation, status, and search. This is meant to be a
"low level" components only responsible for this specific part.
- `llmTasks`: an higher level plugin that will contain various LLM tasks
to be used by assistants and genAI consumers. The intent is not to have
a single place to put all llm tasks, but more to have a default place
where we can introduce new tasks from. (fwiw, the `nlToEsql` task will
probably be moved to that plugin).

- Add a `retrieve_documentation` tool registration for the o11y
assistant
- Add a component on the o11y assistant configuration page to install
the product doc

(wiring the feature to the o11y assistant was done for testing purposes
mostly, any addition / changes / enhancement should be done by the
owning team - either in this PR or as a follow-up)

## What is NOT included in this PR:

- Wire product base feature to the security assistant (should be done by
the owning team as a follow-up)
  - installation
  - utilization as tool

- FTR tests: this is somewhat blocked by the same things we need to
figure out for elastic/kibana-team#1271

## Screenshots

### Installation from o11y assistant configuration page

<img width="1476" alt="Screenshot 2024-10-17 at 09 41 24"
src="https://github.com/user-attachments/assets/31daa585-9fb2-400a-a2d1-5917a262367a">

### Example of output

#### Without product documentation installed

<img width="739" alt="Screenshot 2024-10-10 at 09 59 41"
src="https://github.com/user-attachments/assets/993fb216-6c9a-433f-bf44-f6e383d20d9d">

#### With product documentation installed

<img width="718" alt="Screenshot 2024-10-10 at 09 55 38"
src="https://github.com/user-attachments/assets/805ea4ca-8bc9-4355-a434-0ba81f8228a9">

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Alex Szabo <[email protected]>
Co-authored-by: Matthias Wilhelm <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
(cherry picked from commit 455c781)

# Conflicts:
#	.github/CODEOWNERS
paulinashakirova pushed a commit to paulinashakirova/kibana that referenced this issue Nov 26, 2024
## Summary

Close elastic#193473
Close elastic#193474

This PR utilize the documentation packages that are build via the tool
introduced by elastic#193847, allowing to
install them in Kibana and expose documentation retrieval as an LLM task
that AI assistants (or other consumers) can call.

Users can now decide to install the Elastic documentation from the
assistant's config screen, which will expose a new tool for the
assistant, `retrieve_documentation` (only implemented for the o11y
assistant in the current PR, shall be done for security as a follow up).

For more information, please refer to the self-review.

## General architecture

<img width="1118" alt="Screenshot 2024-10-17 at 09 22 32"
src="https://github.com/user-attachments/assets/3df8c30a-9ccc-49ab-92ce-c204b96d6fc4">

## What this PR does

Adds two plugin:
- `productDocBase`: contains all the logic related to product
documentation installation, status, and search. This is meant to be a
"low level" components only responsible for this specific part.
- `llmTasks`: an higher level plugin that will contain various LLM tasks
to be used by assistants and genAI consumers. The intent is not to have
a single place to put all llm tasks, but more to have a default place
where we can introduce new tasks from. (fwiw, the `nlToEsql` task will
probably be moved to that plugin).

- Add a `retrieve_documentation` tool registration for the o11y
assistant
- Add a component on the o11y assistant configuration page to install
the product doc

(wiring the feature to the o11y assistant was done for testing purposes
mostly, any addition / changes / enhancement should be done by the
owning team - either in this PR or as a follow-up)

## What is NOT included in this PR:

- Wire product base feature to the security assistant (should be done by
the owning team as a follow-up)
  - installation
  - utilization as tool

- FTR tests: this is somewhat blocked by the same things we need to
figure out for elastic/kibana-team#1271

## Screenshots 

### Installation from o11y assistant configuration page

<img width="1476" alt="Screenshot 2024-10-17 at 09 41 24"
src="https://github.com/user-attachments/assets/31daa585-9fb2-400a-a2d1-5917a262367a">

### Example of output

#### Without product documentation installed 

<img width="739" alt="Screenshot 2024-10-10 at 09 59 41"
src="https://github.com/user-attachments/assets/993fb216-6c9a-433f-bf44-f6e383d20d9d">

#### With product documentation installed

<img width="718" alt="Screenshot 2024-10-10 at 09 55 38"
src="https://github.com/user-attachments/assets/805ea4ca-8bc9-4355-a434-0ba81f8228a9">

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Alex Szabo <[email protected]>
Co-authored-by: Matthias Wilhelm <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
CAWilson94 pushed a commit to CAWilson94/kibana that referenced this issue Dec 12, 2024
## Summary

Close elastic#193473
Close elastic#193474

This PR utilize the documentation packages that are build via the tool
introduced by elastic#193847, allowing to
install them in Kibana and expose documentation retrieval as an LLM task
that AI assistants (or other consumers) can call.

Users can now decide to install the Elastic documentation from the
assistant's config screen, which will expose a new tool for the
assistant, `retrieve_documentation` (only implemented for the o11y
assistant in the current PR, shall be done for security as a follow up).

For more information, please refer to the self-review.

## General architecture

<img width="1118" alt="Screenshot 2024-10-17 at 09 22 32"
src="https://github.com/user-attachments/assets/3df8c30a-9ccc-49ab-92ce-c204b96d6fc4">

## What this PR does

Adds two plugin:
- `productDocBase`: contains all the logic related to product
documentation installation, status, and search. This is meant to be a
"low level" components only responsible for this specific part.
- `llmTasks`: an higher level plugin that will contain various LLM tasks
to be used by assistants and genAI consumers. The intent is not to have
a single place to put all llm tasks, but more to have a default place
where we can introduce new tasks from. (fwiw, the `nlToEsql` task will
probably be moved to that plugin).

- Add a `retrieve_documentation` tool registration for the o11y
assistant
- Add a component on the o11y assistant configuration page to install
the product doc

(wiring the feature to the o11y assistant was done for testing purposes
mostly, any addition / changes / enhancement should be done by the
owning team - either in this PR or as a follow-up)

## What is NOT included in this PR:

- Wire product base feature to the security assistant (should be done by
the owning team as a follow-up)
  - installation
  - utilization as tool

- FTR tests: this is somewhat blocked by the same things we need to
figure out for elastic/kibana-team#1271

## Screenshots 

### Installation from o11y assistant configuration page

<img width="1476" alt="Screenshot 2024-10-17 at 09 41 24"
src="https://github.com/user-attachments/assets/31daa585-9fb2-400a-a2d1-5917a262367a">

### Example of output

#### Without product documentation installed 

<img width="739" alt="Screenshot 2024-10-10 at 09 59 41"
src="https://github.com/user-attachments/assets/993fb216-6c9a-433f-bf44-f6e383d20d9d">

#### With product documentation installed

<img width="718" alt="Screenshot 2024-10-10 at 09 55 38"
src="https://github.com/user-attachments/assets/805ea4ca-8bc9-4355-a434-0ba81f8228a9">

---------

Co-authored-by: kibanamachine <[email protected]>
Co-authored-by: Alex Szabo <[email protected]>
Co-authored-by: Matthias Wilhelm <[email protected]>
Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:AI Product Docs Product Documentation for AI workflows Team:AI Infra AppEx AI Infrastructure Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants