Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Docs #11

Merged
merged 1 commit into from
Aug 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 13 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

## DocsScraper: "A document scraping and parsing tool used to create a custom RAG database for AIHelpMe.jl"
## DocsScraper: "Efficient RAG knowledge pack creator from online Julia documentation"
[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://juliagenai.github.io/DocsScraper.jl/dev/) [![Build Status](https://github.com/JuliaGenAI/DocsScraper.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/JuliaGenAI/DocsScraper.jl/actions/workflows/CI.yml?query=branch%3Amain) [![Coverage](https://codecov.io/gh/JuliaGenAI/DocsScraper.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/JuliaGenAI/DocsScraper.jl) [![Aqua](https://raw.githubusercontent.com/JuliaTesting/Aqua.jl/master/badge.svg)](https://github.com/JuliaTesting/Aqua.jl)


Expand All @@ -15,27 +15,27 @@ It scrapes and parses the URLs and with the help of PromptingTools.jl, creates a

## Installation

To install DocsScraper, use the Julia package manager and the package name:
To install DocsScraper, use the Julia package manager and the package name (it's not registered yet):

```julia
using Pkg
Pkg.add("DocsScraper")
Pkg.add(url="https://github.com/JuliaGenAI/DocsScraper.jl")
```


**Prerequisites:**

- Julia (version 1.10 or later).
- Internet connection for API access.
- OpenAI API keys with available credits. See [How to Obtain API Keys](#how-to-obtain-api-keys).
- OpenAI API keys with available credits. See [How to Obtain API Keys](https://svilupp.github.io/PromptingTools.jl/dev/frequently_asked_questions#Creating-OpenAI-API-Key).


## Building the Index
```julia
crawlable_urls = ["https://juliagenai.github.io/DocsScraper.jl/dev/home/"]

index_path = make_knowledge_packs(crawlable_urls;
index_name = "docsscraper", embedding_dimension = 1024, embedding_bool = true, target_path=joinpath(pwd(), "knowledge_packs"))
index_name = "docsscraper", embedding_dimension = 1024, embedding_bool = true, target_path="knowledge_packs")
```
```julia
[ Info: robots.txt unavailable for https://juliagenai.github.io:/DocsScraper.jl/dev/home/
Expand Down Expand Up @@ -73,14 +73,12 @@ a docsscraper__v20240823__textembedding3large-1024-Bool__v1.0.hdf5

```julia
using AIHelpMe
using AIHelpMe: pprint, load_index!

# Either use the index explicitly
aihelp(index_path, "what is DocsScraper.jl?")
# set it as the "default" index, then it will be automatically used for every question
load_index!(index_path)

# or set it as the "default" index, then it will be automatically used for every question
AIHelpMe.load_index!(index_path)

pprint(aihelp("what is DocsScraper.jl?"))
aihelp("what is DocsScraper.jl?") |> pprint
```
```julia
[ Info: Updated RAG pipeline to `:bronze` (Configuration key: "textembedding3large-1024-Bool").
Expand All @@ -96,8 +94,9 @@ PromptingTools.jl, creates a vector store that can be utilized in RAG (Retrieval
AIHelpMe.jl and PromptingTools.jl to provide efficient and relevant query retrieval, ensuring that the responses generated by the system are specific to the content in the created database.
```

Tip: Use `pprint` for nicer outputs with sources
Tip: Use `pprint` for nicer outputs with sources and `last_result` for more detailed outputs (with sources).
```julia
using AIHelpMe: pprint, last_result
print(last_result)
using AIHelpMe: last_result
# last_result() returns the last result from the RAG pipeline, ie, same as running aihelp(; return_all=true)
print(last_result())
```
25 changes: 12 additions & 13 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@

## DocsScraper: "A document scraping and parsing tool used to create a custom RAG database for AIHelpMe.jl"
# DocsScraper

DocsScraper is a package designed to create "knowledge packs" from online documentation sites for the Julia language.

It scrapes and parses the URLs and with the help of PromptingTools.jl, creates an index of chunks and their embeddings that can be used in RAG applications. It integrates with AIHelpMe.jl and PromptingTools.jl to offer highly efficient and relevant query retrieval, ensuring that the responses generated by the system are specific to the content in the created database.
Expand All @@ -12,19 +13,19 @@ It scrapes and parses the URLs and with the help of PromptingTools.jl, creates a

## Installation

To install DocsScraper, use the Julia package manager and the package name:
To install DocsScraper, use the Julia package manager and the package name (it's not registered yet):

```julia
using Pkg
Pkg.add("DocsScraper")
Pkg.add(url="https://github.com/JuliaGenAI/DocsScraper.jl")
```


**Prerequisites:**

- Julia (version 1.10 or later).
- Internet connection for API access.
- OpenAI API keys with available credits. See [How to Obtain API Keys](#how-to-obtain-api-keys).
- OpenAI API keys with available credits. See [How to Obtain API Keys](https://svilupp.github.io/PromptingTools.jl/dev/frequently_asked_questions#Creating-OpenAI-API-Key).


## Building the Index
Expand Down Expand Up @@ -70,14 +71,12 @@ a docsscraper__v20240823__textembedding3large-1024-Bool__v1.0.hdf5

```julia
using AIHelpMe
using AIHelpMe: pprint, load_index!

# Either use the index explicitly
aihelp(index_path, "what is DocsScraper.jl?")

# or set it as the "default" index, then it will be automatically used for every question
AIHelpMe.load_index!(index_path)
# set it as the "default" index, then it will be automatically used for every question
load_index!(index_path)

pprint(aihelp("what is DocsScraper.jl?"))
aihelp("what is DocsScraper.jl?") |> pprint
```
```julia
[ Info: Updated RAG pipeline to `:bronze` (Configuration key: "textembedding3large-1024-Bool").
Expand All @@ -93,8 +92,8 @@ PromptingTools.jl, creates a vector store that can be utilized in RAG (Retrieval
AIHelpMe.jl and PromptingTools.jl to provide efficient and relevant query retrieval, ensuring that the responses generated by the system are specific to the content in the created database.
```

Tip: Use `pprint` for nicer outputs with sources
Tip: Use `pprint` for nicer outputs with sources and `last_result` for more detailed outputs (with sources).
```julia
using AIHelpMe: pprint, last_result
print(last_result)
using AIHelpMe: last_result
print(last_result())
```
1 change: 0 additions & 1 deletion docs/src/working.md

This file was deleted.

16 changes: 11 additions & 5 deletions examples/scripts/using_with_AIHelpMe.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,22 @@ Pkg.add(url = "https://github.com/JuliaGenAI/DocsScraper.jl/")
Pkg.add("AIHelpMe")
using DocsScraper
using AIHelpMe
using AIHelpMe: pprint
using AIHelpMe: pprint, last_result

# Creating the index
crawlable_urls = ["https://juliagenai.github.io/DocsScraper.jl/dev/home/"]
index_path = make_knowledge_packs(crawlable_urls;
index_name = "docsscraper", embedding_dimension = 1024, embedding_bool = true,
target_path = joinpath(pwd(), "knowledge_packs"))
target_path = "knowledge_packs")

# Using the index with AIHelpMe
# Using the index with AIHelpMe, load it as the default index
AIHelpMe.load_index!(index_path)

pprint(aihelp("what is DocsScraper.jl?"))
pprint(aihelp("how do I install DocsScraper?"))
# Ask questions // pprint is optional
aihelp("what is DocsScraper.jl?") |> pprint

aihelp("how do I install DocsScraper?") |> pprint

# Get more detailed outputs with sources for the last answer
# Identical to running aihelp(; return_all=true)
last_result() |> pprint
Loading