Skip to content

Commit

Permalink
Merge pull request mayooear#165 from mayooear/feat/retriever
Browse files Browse the repository at this point in the history
Feat/retriever
  • Loading branch information
mayooear authored Apr 13, 2023
2 parents 6db8ba8 + aff71aa commit f1ee996
Show file tree
Hide file tree
Showing 11 changed files with 3,674 additions and 3,883 deletions.
25 changes: 16 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,19 +12,27 @@ The visual guide of this repo and tutorial is in the `visual guide` folder.

**If you run into errors, please review the troubleshooting section further down this page.**

Prelude: Please make sure you have already downloaded node on your system and the version is 18 or greater.

## Development

1. Clone the repo
1. Clone the repo or download the ZIP

```
git clone [github https url]
```


2. Install packages

First run `npm install yarn -g` to install yarn globally (if you haven't already).

Then run:

```
pnpm install
yarn install
```
After installation, you should now see a `node_modules` folder.

3. Set up your `.env` file

Expand All @@ -44,9 +52,9 @@ PINECONE_INDEX_NAME=
- Visit [openai](https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key) to retrieve API keys and insert into your `.env` file.
- Visit [pinecone](https://pinecone.io/) to create and retrieve your API keys, and also retrieve your environment and index name from the dashboard.

4. In the `config` folder, replace the `PINECONE_NAME_SPACE` with a `namespace` where you'd like to store your embeddings on Pinecone when you run `pnpm run ingest`. This namespace will later be used for queries and retrieval.
4. In the `config` folder, replace the `PINECONE_NAME_SPACE` with a `namespace` where you'd like to store your embeddings on Pinecone when you run `npm run ingest`. This namespace will later be used for queries and retrieval.

5. In `utils/makechain.ts` chain change the `QA_PROMPT` for your own usecase. Change `modelName` in `new OpenAIChat` to `gpt-3.5-turbo`, if you don't have access to `gpt-4`. Please verify outside this repo that you have access to `gpt-4`, otherwise the application will not work with it.
5. In `utils/makechain.ts` chain change the `QA_PROMPT` for your own usecase. Change `modelName` in `new OpenAI` to `gpt-4`, if you have access to `gpt-4` api. Please verify outside this repo that you have access to `gpt-4` api, otherwise the application will not work.

## Convert your PDF files to embeddings

Expand All @@ -60,7 +68,7 @@ PINECONE_INDEX_NAME=

## Run the app

Once you've verified that the embeddings and content have been successfully added to your Pinecone, you can run the app `pnpm run dev` to launch the local dev environment, and then type a question in the chat interface.
Once you've verified that the embeddings and content have been successfully added to your Pinecone, you can run the app `npm run dev` to launch the local dev environment, and then type a question in the chat interface.

## Troubleshooting

Expand All @@ -73,11 +81,10 @@ In general, keep an eye out in the `issues` and `discussions` section of this re
- `Console.log` the `env` variables and make sure they are exposed.
- Make sure you're using the same versions of LangChain and Pinecone as this repo.
- Check that you've created an `.env` file that contains your valid (and working) API keys, environment and index name.
- If you change `modelName` in `OpenAIChat` note that the correct name of the alternative model is `gpt-3.5-turbo`
- Make sure you have access to `gpt-4` if you decide to use. Test your openAI keys outside the repo and make sure it works and that you have enough API credits.
- If you change `modelName` in `OpenAI`, make sure you have access to the api for the appropriate model.
- Make sure you have enough OpenAI credits and a valid card on your billings account.
- Check that you don't have multiple OPENAPI keys in your global environment. If you do, the local `env` file from the project will be overwritten by systems `env` variable.
- Try to hard code your API keys into the `process.env` variables.

- Try to hard code your API keys into the `process.env` variables if there are still issues.

**Pinecone errors**

Expand Down
Binary file removed docs/finance/turingfinance.pdf
Binary file not shown.
Binary file removed docs/law/MorseVsFrederick.pdf
Binary file not shown.
7 changes: 2 additions & 5 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,6 @@
"name": "gpt4-langchain-pdf-chatbot",
"version": "0.1.0",
"private": true,
"engines": {
"node": ">=18"
},
"license": "MIT",
"author": "Mayooear<twitter:@mayowaoshin>",
"type": "module",
Expand All @@ -19,11 +16,11 @@
},
"dependencies": {
"@microsoft/fetch-event-source": "^2.0.1",
"@pinecone-database/pinecone": "^0.0.10",
"@pinecone-database/pinecone": "0.0.12",
"@radix-ui/react-accordion": "^1.1.1",
"clsx": "^1.2.1",
"dotenv": "^16.0.3",
"langchain": "0.0.41",
"langchain": "0.0.55",
"lucide-react": "^0.125.0",
"next": "13.2.3",
"pdf-parse": "1.1.1",
Expand Down
65 changes: 28 additions & 37 deletions pages/api/chat.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import type { NextApiRequest, NextApiResponse } from 'next';
import { OpenAIEmbeddings } from 'langchain/embeddings';
import { PineconeStore } from 'langchain/vectorstores';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { PineconeStore } from 'langchain/vectorstores/pinecone';
import { makeChain } from '@/utils/makechain';
import { pinecone } from '@/utils/pinecone-client';
import { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE } from '@/config/pinecone';
Expand All @@ -11,54 +11,45 @@ export default async function handler(
) {
const { question, history } = req.body;

console.log('question', question);

//only accept post requests
if (req.method !== 'POST') {
res.status(405).json({ error: 'Method not allowed' });
return;
}

if (!question) {
return res.status(400).json({ message: 'No question in the request' });
}
// OpenAI recommends replacing newlines with spaces for best results
const sanitizedQuestion = question.trim().replaceAll('\n', ' ');

const index = pinecone.Index(PINECONE_INDEX_NAME);

/* create vectorstore*/
const vectorStore = await PineconeStore.fromExistingIndex(
new OpenAIEmbeddings({}),
{
pineconeIndex: index,
textKey: 'text',
namespace: PINECONE_NAME_SPACE,
},
);

res.writeHead(200, {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache, no-transform',
Connection: 'keep-alive',
});

const sendData = (data: string) => {
res.write(`data: ${data}\n\n`);
};

sendData(JSON.stringify({ data: '' }));

//create chain
const chain = makeChain(vectorStore, (token: string) => {
sendData(JSON.stringify({ data: token }));
});

try {
//Ask a question
const index = pinecone.Index(PINECONE_INDEX_NAME);

/* create vectorstore*/
const vectorStore = await PineconeStore.fromExistingIndex(
new OpenAIEmbeddings({}),
{
pineconeIndex: index,
textKey: 'text',
namespace: PINECONE_NAME_SPACE, //namespace comes from your config folder
},
);

//create chain
const chain = makeChain(vectorStore);
//Ask a question using chat history
const response = await chain.call({
question: sanitizedQuestion,
chat_history: history || [],
});

console.log('response', response);
sendData(JSON.stringify({ sourceDocs: response.sourceDocuments }));
} catch (error) {
res.status(200).json(response);
} catch (error: any) {
console.log('error', error);
} finally {
sendData('[DONE]');
res.end();
res.status(500).json({ error: error.message || 'Something went wrong' });
}
}
134 changes: 38 additions & 96 deletions pages/index.tsx
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
import { useRef, useState, useEffect, useMemo, useCallback } from 'react';
import { useRef, useState, useEffect } from 'react';
import Layout from '@/components/layout';
import styles from '@/styles/Home.module.css';
import { Message } from '@/types/chat';
import { fetchEventSource } from '@microsoft/fetch-event-source';
import Image from 'next/image';
import ReactMarkdown from 'react-markdown';
import LoadingDots from '@/components/ui/LoadingDots';
Expand All @@ -17,7 +16,6 @@ import {
export default function Home() {
const [query, setQuery] = useState<string>('');
const [loading, setLoading] = useState<boolean>(false);
const [sourceDocs, setSourceDocs] = useState<Document[]>([]);
const [error, setError] = useState<string | null>(null);
const [messageState, setMessageState] = useState<{
messages: Message[];
Expand All @@ -32,10 +30,9 @@ export default function Home() {
},
],
history: [],
pendingSourceDocs: [],
});

const { messages, pending, history, pendingSourceDocs } = messageState;
const { messages, history } = messageState;

const messageListRef = useRef<HTMLDivElement>(null);
const textAreaRef = useRef<HTMLTextAreaElement>(null);
Expand Down Expand Up @@ -66,17 +63,13 @@ export default function Home() {
message: question,
},
],
pending: undefined,
}));

setLoading(true);
setQuery('');
setMessageState((state) => ({ ...state, pending: '' }));

const ctrl = new AbortController();

try {
fetchEventSource('/api/chat', {
const response = await fetch('/api/chat', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Expand All @@ -85,40 +78,32 @@ export default function Home() {
question,
history,
}),
signal: ctrl.signal,
onmessage: (event) => {
if (event.data === '[DONE]') {
setMessageState((state) => ({
history: [...state.history, [question, state.pending ?? '']],
messages: [
...state.messages,
{
type: 'apiMessage',
message: state.pending ?? '',
sourceDocs: state.pendingSourceDocs,
},
],
pending: undefined,
pendingSourceDocs: undefined,
}));
setLoading(false);
ctrl.abort();
} else {
const data = JSON.parse(event.data);
if (data.sourceDocs) {
setMessageState((state) => ({
...state,
pendingSourceDocs: data.sourceDocs,
}));
} else {
setMessageState((state) => ({
...state,
pending: (state.pending ?? '') + data.data,
}));
}
}
},
});
const data = await response.json();
console.log('data', data);

if (data.error) {
setError(data.error);
} else {
setMessageState((state) => ({
...state,
messages: [
...state.messages,
{
type: 'apiMessage',
message: data.text,
sourceDocs: data.sourceDocuments,
},
],
history: [...state.history, [question, data.text]],
}));
}
console.log('messageState', messageState);

setLoading(false);

//scroll to bottom
messageListRef.current?.scrollTo(0, messageListRef.current.scrollHeight);
} catch (error) {
setLoading(false);
setError('An error occurred while fetching the data. Please try again.');
Expand All @@ -127,38 +112,13 @@ export default function Home() {
}

//prevent empty submissions
const handleEnter = useCallback(
(e: any) => {
if (e.key === 'Enter' && query) {
handleSubmit(e);
} else if (e.key == 'Enter') {
e.preventDefault();
}
},
[query],
);

const chatMessages = useMemo(() => {
return [
...messages,
...(pending
? [
{
type: 'apiMessage',
message: pending,
sourceDocs: pendingSourceDocs,
},
]
: []),
];
}, [messages, pending, pendingSourceDocs]);

//scroll to bottom of chat
useEffect(() => {
if (messageListRef.current) {
messageListRef.current.scrollTop = messageListRef.current.scrollHeight;
const handleEnter = (e: any) => {
if (e.key === 'Enter' && query) {
handleSubmit(e);
} else if (e.key == 'Enter') {
e.preventDefault();
}
}, [chatMessages]);
};

return (
<>
Expand All @@ -170,12 +130,13 @@ export default function Home() {
<main className={styles.main}>
<div className={styles.cloud}>
<div ref={messageListRef} className={styles.messagelist}>
{chatMessages.map((message, index) => {
{messages.map((message, index) => {
let icon;
let className;
if (message.type === 'apiMessage') {
icon = (
<Image
key={index}
src="/bot-image.png"
alt="AI"
width="40"
Expand All @@ -188,6 +149,7 @@ export default function Home() {
} else {
icon = (
<Image
key={index}
src="/usericon.png"
alt="Me"
width="30"
Expand All @@ -198,7 +160,7 @@ export default function Home() {
);
// The latest message sent by the user will be animated while waiting for a response
className =
loading && index === chatMessages.length - 1
loading && index === messages.length - 1
? styles.usermessagewaiting
: styles.usermessage;
}
Expand Down Expand Up @@ -245,26 +207,6 @@ export default function Home() {
</>
);
})}
{sourceDocs.length > 0 && (
<div className="p-5">
<Accordion type="single" collapsible className="flex-col">
{sourceDocs.map((doc, index) => (
<div key={`SourceDocs-${index}`}>
<AccordionItem value={`item-${index}`}>
<AccordionTrigger>
<h3>Source {index + 1}</h3>
</AccordionTrigger>
<AccordionContent>
<ReactMarkdown linkTarget="_blank">
{doc.pageContent}
</ReactMarkdown>
</AccordionContent>
</AccordionItem>
</div>
))}
</Accordion>
</div>
)}
</div>
</div>
<div className={styles.center}>
Expand Down
Loading

0 comments on commit f1ee996

Please sign in to comment.