You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
for i, record in enumerate(tqdm(data)):
# first get metadata fields for this record
metadata = {
'wiki-id': str(record['id']),
'source': record['url'],
'title': record['title']
}
# now we create chunks from the record text
record_texts = text_splitter.split_text(record['text'])
# create individual metadata dicts for each chunk
record_metadatas = [{
"chunk": j, "text": text, **metadata
} for j, text in enumerate(record_texts)]
# append these to current batches
texts.extend(record_texts)
metadatas.extend(record_metadatas)
# if we have reached the batch_limit we can add texts
if len(texts) >= batch_limit:
ids = [str(uuid4()) for _ in range(len(texts))]
embeds = embed.embed_documents(texts)
index.upsert(vectors=zip(ids, embeds, metadatas))
texts = []
metadatas = []
The implementation is using python 3.8 and on macos (Intel) box.
Expected Behavior
The indexing process should iterate through the data we’d like to add to our knowledge base, creating IDs, embeddings, and metadata — then adding these to the index.
activate conda env using python 3.8 (to be compatible with tiktoken)
run this in a jupyter notebook
Error when this part of the code is in the for-loop:
if len(texts) >= batch_limit:
ids = [str(uuid4()) for _ in range(len(texts))]
embeds = embed.embed_documents(texts)
index.upsert(vectors=zip(ids, embeds, metadatas))
texts = []
metadatas = []```
The error is around this, I think ... but I might be wrong:
```PineconeException Traceback (most recent call last)
Cell In[28], line 21
19 ids = [str(uuid4()) for _ in range(len(texts))]
20 embeds = embed.embed_documents(texts)
---> 21 index.upsert(vectors=zip(ids, embeds, metadatas))
22 texts = []
23 metadatas = []```
Is this a new bug?
Current Behavior
When the following code runs:
https://www.pinecone.io/learn/series/langchain/langchain-retrieval-augmentation/ is the reference example that running into an issue.
The implementation is using python 3.8 and on macos (Intel) box.
Expected Behavior
The indexing process should iterate through the data we’d like to add to our knowledge base, creating IDs, embeddings, and metadata — then adding these to the index.
As we do this in batches.
this is from: https://www.pinecone.io/learn/series/langchain/langchain-retrieval-augmentation/
Steps To Reproduce
Cell In[28], line 21
19 ids = [str(uuid4()) for _ in range(len(texts))]
20 embeds = embed.embed_documents(texts)
---> 21 index.upsert(vectors=zip(ids, embeds, metadatas))
22 texts = []
23 metadatas = []```
Relevant log output
No response
Environment
Additional Context
I am doing this while connected to a vpn.
The text was updated successfully, but these errors were encountered: