Skip to content

Commit

Permalink
upgrade langchain, add customPDFLoader
Browse files Browse the repository at this point in the history
  • Loading branch information
mayooear committed Mar 27, 2023
1 parent fccd3b0 commit 10c66b0
Show file tree
Hide file tree
Showing 11 changed files with 162 additions and 1,002 deletions.
22 changes: 13 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,28 +37,30 @@ OPENAI_API_KEY=
PINECONE_API_KEY=
PINECONE_ENVIRONMENT=
PINECONE_INDEX_NAME=
```

- Visit [openai](https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key) to retrieve API keys and insert into your `.env` file.
- Visit [pinecone](https://pinecone.io/) to create and retrieve your API keys.
- Visit [pinecone](https://pinecone.io/) to create and retrieve your API keys, and also retrieve your environment and index name from the dashboard.

4. In the `config` folder, replace the `PINECONE_INDEX_NAME` and `PINECONE_NAME_SPACE` with your own details from your pinecone dashboard.
4. In the `config` folder, replace the `PINECONE_NAME_SPACE` with a `namespace` where you'd like to store your embeddings on Pinecone when you run `pnpm run ingest`. This namespace will later be used for queries and retrieval.

5. In `utils/makechain.ts` chain change the `QA_PROMPT` for your own usecase. Change `modelName` in `new OpenAIChat` to a different api model if you don't have access to `gpt-4`. See [the OpenAI docs](https://platform.openai.com/docs/models/model-endpoint-compatibility) for a list of supported `modelName`s. For example you could use `gpt-3.5-turbo` if you do not have access to `gpt-4`, yet.
5. In `utils/makechain.ts` chain change the `QA_PROMPT` for your own usecase. Change `modelName` in `new OpenAIChat` to `gpt-3.5-turbo`, if you don't have access to `gpt-4`. Please verify outside this repo that you have access to `gpt-4`, otherwise the application will not work with it.

## Convert your PDF to embeddings

1. In `docs` folder replace the pdf with your own pdf doc.

2. In `scripts/ingest-data.ts` replace `filePath` with `docs/{yourdocname}.pdf`

3. Run the script `npm run ingest` to 'ingest' and embed your docs
3. Run the script `npm run ingest` to 'ingest' and embed your docs. If you run into errors troubleshoot below.

4. Check Pinecone dashboard to verify your namespace and vectors have been added.

## Run the app

Once you've verified that the embeddings and content have been successfully added to your Pinecone, you can run the app `npm run dev` to launch the local dev environment and then type a question in the chat interface.
Once you've verified that the embeddings and content have been successfully added to your Pinecone, you can run the app `pnpm run dev` to launch the local dev environment, and then type a question in the chat interface.

## Troubleshooting

Expand All @@ -68,15 +70,17 @@ In general, keep an eye out in the `issues` and `discussions` section of this re

- Make sure you're running the latest Node version. Run `node -v`
- Make sure you're using the same versions of LangChain and Pinecone as this repo.
- Check that you've created an `.env` file that contains your valid (and working) API keys.
- Check that you've created an `.env` file that contains your valid (and working) API keys, environment and index name.
- If you change `modelName` in `OpenAIChat` note that the correct name of the alternative model is `gpt-3.5-turbo`
- Pinecone indexes of users on the Starter(free) plan are deleted after 7 days of inactivity. To prevent this, send an API request to Pinecone to reset the counter.
- Make sure you have access to `gpt-4` if you decide to use. Test your openAI keys outside the repo and make sure it works and that you have enough API credits.

**Pinecone errors**

- Make sure your pinecone dashboard `environment` and `index` matches the one in your `config` folder.
- Make sure your pinecone dashboard `environment` and `index` matches the one in the `pinecone.ts` and `.env` files.
- Check that you've set the vector dimensions to `1536`.
- Switch your Environment in pinecone to `us-east1-gcp` if the other environment is causing issues.
- Make sure your pinecone namespace is in lowercase.
- Pinecone indexes of users on the Starter(free) plan are deleted after 7 days of inactivity. To prevent this, send an API request to Pinecone to reset the counter.
- Retry with a new Pinecone index.

If you're stuck after trying all these steps, delete `node_modules`, restart your computer, then `pnpm install` again.

Expand Down
2 changes: 1 addition & 1 deletion components/layout.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ export default function Layout({ children }: LayoutProps) {
</nav>
</div>
</header>
<div className="container">
<div>
<main className="flex w-full flex-1 flex-col overflow-hidden">
{children}
</main>
Expand Down
8 changes: 6 additions & 2 deletions config/pinecone.ts
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
/**
* Change the index and namespace to your own
* Change the namespace to the namespace on Pinecone you'd like to store your embeddings.
*/

const PINECONE_INDEX_NAME = 'langchainjsfundamentals';
if (!process.env.PINECONE_INDEX_NAME) {
throw new Error('Missing Pinecone index name in .env file');
}

const PINECONE_INDEX_NAME = process.env.PINECONE_INDEX_NAME ?? '';

const PINECONE_NAME_SPACE = 'pdf-test'; //namespace is optional for your vectors

Expand Down
5 changes: 5 additions & 0 deletions declarations/pdf-parse.d.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
declare module 'pdf-parse/lib/pdf-parse.js' {
import pdf from 'pdf-parse';

export default pdf;
}
5 changes: 4 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
"@radix-ui/react-accordion": "^1.1.1",
"clsx": "^1.2.1",
"dotenv": "^16.0.3",
"langchain": "0.0.33",
"langchain": "0.0.41",
"lucide-react": "^0.125.0",
"next": "13.2.3",
"pdf-parse": "1.1.1",
Expand All @@ -43,6 +43,9 @@
"tsx": "^3.12.3",
"typescript": "^4.9.5"
},
"engines": {
"node": ">=18"
},
"keywords": [
"starter",
"gpt4",
Expand Down
8 changes: 5 additions & 3 deletions pages/api/chat.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,12 @@ export default async function handler(

/* create vectorstore*/
const vectorStore = await PineconeStore.fromExistingIndex(
index,
new OpenAIEmbeddings({}),
'text',
PINECONE_NAME_SPACE, //optional
{
pineconeIndex: index,
textKey: 'text',
namespace: PINECONE_NAME_SPACE,
},
);

res.writeHead(200, {
Expand Down
9 changes: 5 additions & 4 deletions pages/index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,6 @@ export default function Home() {

const { messages, pending, history, pendingSourceDocs } = messageState;

console.log('messageState', messageState);

const messageListRef = useRef<HTMLDivElement>(null);
const textAreaRef = useRef<HTMLTextAreaElement>(null);

Expand Down Expand Up @@ -201,7 +199,10 @@ export default function Home() {
</div>
</div>
{message.sourceDocs && (
<div className="p-5">
<div
className="p-5"
key={`sourceDocsAccordion-${index}`}
>
<Accordion
type="single"
collapsible
Expand Down Expand Up @@ -234,7 +235,7 @@ export default function Home() {
<div className="p-5">
<Accordion type="single" collapsible className="flex-col">
{sourceDocs.map((doc, index) => (
<div key={index}>
<div key={`SourceDocs-${index}`}>
<AccordionItem value={`item-${index}`}>
<AccordionTrigger>
<h3>Source {index + 1}</h3>
Expand Down
Loading

0 comments on commit 10c66b0

Please sign in to comment.