Skip to content

Commit

Permalink
add chunk loop to prevent pinecone upsert errors
Browse files Browse the repository at this point in the history
  • Loading branch information
mayooear committed Mar 21, 2023
1 parent 4f373b4 commit 8f4c7da
Show file tree
Hide file tree
Showing 5 changed files with 41 additions and 23 deletions.
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ Tech stack used includes LangChain, Pinecone, Typescript, Openai, and Next.js. L

The visual guide of this repo and tutorial is in the `visual guide` folder.

**If you run into errors, please review the troubleshooting section further down this page.**

## Development

1. Clone the repo
Expand Down Expand Up @@ -58,6 +60,26 @@ PINECONE_ENVIRONMENT=

Once you've verified that the embeddings and content have been successfully added to your Pinecone, you can run the app `npm run dev` to launch the local dev environment and then type a question in the chat interface.

## Troubleshooting

In general, keep an eye out in the `issues` and `discussions` section of this repo for solutions.

**General errors**

- Make sure you're running the latest Node version. Run `node -v`
- Make sure you're using the same versions of LangChain and Pinecone as this repo.
- Check that you've created an `.env` file that contains your valid (and working) API keys.
- If you change `modelName` in `OpenAIChat` note that the correct name of the alternative model is `gpt-3.5-turbo`
- Pinecone indexes of users on the Starter(free) plan are deleted after 7 days of inactivity. To prevent this, send an API request to Pinecone to reset the counter.

**Pinecone errors**

- Make sure your pinecone dashboard `environment` and `index` matches the one in your `config` folder.
- Check that you've set the vector dimensions to `1536`.
- Switch your Environment in pinecone to `us-east1-gcp` if the other environment is causing issues.

If you're stuck after trying all these steps, delete `node_modules`, restart your computer, then `pnpm install` again.

## Credit

Frontend of this repo is inspired by [langchain-chat-nextjs](https://github.com/zahidkhawaja/langchain-chat-nextjs)
2 changes: 1 addition & 1 deletion config/pinecone.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@

const PINECONE_INDEX_NAME = 'langchainjsfundamentals';

const PINECONE_NAME_SPACE = 'demo'; //namespace is optional for your vectors
const PINECONE_NAME_SPACE = 'pdf-test'; //namespace is optional for your vectors

export { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE };
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
"@radix-ui/react-accordion": "^1.1.1",
"clsx": "^1.2.1",
"dotenv": "^16.0.3",
"langchain": "^0.0.33",
"langchain": "0.0.33",
"lucide-react": "^0.125.0",
"next": "13.2.3",
"pdf-parse": "1.1.1",
Expand Down
16 changes: 2 additions & 14 deletions pnpm-lock.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

22 changes: 15 additions & 7 deletions scripts/ingest-data.ts
Original file line number Diff line number Diff line change
Expand Up @@ -30,14 +30,22 @@ export const run = async () => {
/*create and store the embeddings in the vectorStore*/
const embeddings = new OpenAIEmbeddings();
const index = pinecone.Index(PINECONE_INDEX_NAME); //change to your own index name

//embed the PDF documents
await PineconeStore.fromDocuments(
index,
docs,
embeddings,
'text',
PINECONE_NAME_SPACE,
);

/* Pinecone recommends a limit of 100 vectors per upsert request to avoid errors*/
const chunkSize = 50;
for (let i = 0; i < docs.length; i += chunkSize) {
const chunk = docs.slice(i, i + chunkSize);
console.log('chunk', i, chunk);
await PineconeStore.fromDocuments(
index,
chunk,
embeddings,
'text',
PINECONE_NAME_SPACE,
);
}
} catch (error) {
console.log('error', error);
throw new Error('Failed to ingest your data');
Expand Down

0 comments on commit 8f4c7da

Please sign in to comment.