Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oraclevs integration #7333

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
e37525f
refresh to current version
skmishraoracle Jul 11, 2024
b722003
Add doc loader files
hackerdave Sep 19, 2024
d93610c
Add docs
hackerdave Sep 19, 2024
c648518
Update dependencies
hackerdave Sep 19, 2024
00abea4
change metadata column type to JSON & add support for connection pool
skmishraoracle Sep 22, 2024
2f2f6b3
documentation in cookbook - oraclevs.md
skmishraoracle Sep 23, 2024
afc56fe
Add entry points
hackerdave Sep 24, 2024
d649ca8
Update import paths in doc
hackerdave Sep 24, 2024
c72d99c
Combine doc loader and text splitter files
hackerdave Sep 24, 2024
7fe39a0
Move oracle text splitter to langchain-textsplitters
hackerdave Sep 29, 2024
9b2778c
Update docs for oracle text splitter
hackerdave Sep 29, 2024
1478731
Update import paths
hackerdave Sep 29, 2024
b84f98e
Change imports oracleai to oracle
hackerdave Sep 29, 2024
6cdd334
Handle escaped double quotes
hackerdave Oct 21, 2024
c2dd804
config changes changes to imports
skmishraoracle Nov 8, 2024
d719938
checking in files to gen pr...
skmishraoracle Dec 9, 2024
74bfaad
generate pr-3
skmishraoracle Dec 9, 2024
49163ce
Add doc loader files
hackerdave Sep 19, 2024
e3e8c19
Add docs
hackerdave Sep 19, 2024
21cf483
Update dependencies
hackerdave Sep 19, 2024
581893e
change metadata column type to JSON & add support for connection pool
skmishraoracle Sep 22, 2024
486050c
documentation in cookbook - oraclevs.md
skmishraoracle Sep 23, 2024
c9ccb8f
Add entry points
hackerdave Sep 24, 2024
5b1e0dd
Update import paths in doc
hackerdave Sep 24, 2024
a79a934
Combine doc loader and text splitter files
hackerdave Sep 24, 2024
462de8a
Move oracle text splitter to langchain-textsplitters
hackerdave Sep 29, 2024
9b1460c
Update docs for oracle text splitter
hackerdave Sep 29, 2024
d37038b
Update import paths
hackerdave Sep 29, 2024
0d6c42f
Change imports oracleai to oracle
hackerdave Sep 29, 2024
9cb2b44
Handle escaped double quotes
hackerdave Oct 21, 2024
b680622
generate pr - 4
skmishraoracle Dec 9, 2024
cb6bc78
generate pr - 5
skmishraoracle Dec 9, 2024
49afa7d
generate pr-7
skmishraoracle Dec 9, 2024
87a172d
generate pr-7
skmishraoracle Dec 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .yarnrc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,4 @@ supportedArchitectures:
- darwin
- linux

yarnPath: .yarn/releases/yarn-3.5.1.cjs
yarnPath: ./.yarn/releases/yarn-3.5.1.cjs
535 changes: 535 additions & 0 deletions cookbook/oracleai.mdx

Large diffs are not rendered by default.

273 changes: 273 additions & 0 deletions cookbook/oraclevs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,273 @@
Oracle AI Vector Search with LangchainJS Integration
Introduction
Oracle AI Vector Search enables semantic search on unstructured data while simultaneously providing relational search capabilities on business data, all within a unified system. This approach eliminates the need for a separate vector database, reducing data fragmentation and improving efficiency.

By integrating Oracle AI Vector Search with Langchain, you can build a powerful pipeline for Retrieval Augmented Generation (RAG), leveraging Oracle's robust database features.

Key Advantages of Oracle Database
Oracle AI Vector Search is built on top of the Oracle Database, providing several key features:

Partitioning Support
Real Application Clusters (RAC) Scalability
Exadata Smart Scans
Geographically Distributed Shard Processing
Transactional Capabilities
Parallel SQL
Disaster Recovery
Advanced Security
Oracle Machine Learning
Oracle Graph Database
Oracle Spatial and Graph
Oracle Blockchain
JSON Support
Guide Overview
This guide demonstrates how to integrate Oracle AI Vector Search with Langchain to create an end-to-end RAG pipeline. You'll learn how to:

Load documents from different sources using OracleDocLoader.
Summarize documents inside or outside the database using OracleSummary.
Generate embeddings either inside or outside the database using OracleEmbeddings.
Chunk documents based on specific needs using OracleTextSplitter.
Store, index, and query data using OracleVS.
Getting Started
If you're new to Oracle Database, consider using the free Oracle 23 AI Database to get started.

Best Practices
User Management: Create dedicated users for your Oracle Database projects instead of using the system user for security and control purposes. See the end-to-end guide for more details.
User Privileges: Be sure to manage user privileges effectively to maintain database security. You can find more information in the official Oracle documentation.
Prerequisites
To get started, install the Oracle JavaScript client driver:

``` typescript
npm install oracledb
```

Document Preparation
Assuming you have documents stored in a file system that you want to use with Oracle AI Vector Search and Langchain, these documents need to be instances of langchain/core/documents.

Example: Ingesting JSON Documents
In the following TypeScript example, we demonstrate how to ingest documents from JSON files:

```typescript
private createDocument(row: DataRow): Document {
const metadata = {
id: row.id,
link: row.link,
};
return new Document({ pageContent: row.text, metadata: metadata });
}

public async ingestJson(): Promise<Document[]> {
try {
const filePath = `${this.docsDir}${this.filename}`;
const fileContent = await fs.readFile(filePath, {encoding: 'utf8'});
const jsonData: DataRow[] = JSON.parse(fileContent);
return jsonData.map((row) => this.createDocument(row));
} catch (error) {
console.error('An error occurred while ingesting JSON:', error);
throw error; // Rethrow for the calling function to handle
}
}
```

Langchain and Oracle Integration
The Oracle AI Vector Search Langchain library offers a rich set of APIs for document processing, which includes loading, chunking, summarizing, and embedding generation. Here's how to set up a connection and integrate Oracle with Langchain.

Connecting to Oracle Database
Below is an example of how to connect to an Oracle Database using both a direct connection and a connection pool:

```typescript
async function dbConnect(): Promise<oracledb.Connection> {
const connection = await oracledb.getConnection({
user: '****',
password: '****',
connectString: '***.**.***.**:1521/****'
});
console.log('Connection created...');
return connection;
}

async function dbPool(): Promise<oracledb.Pool> {
const pool = await oracledb.createPool({
user: '****',
password: '****',
connectString: '***.**.***.**:1521/****'
});
console.log('Connection pool started...');
return pool;
}
```

Testing the Integration
Here, we demonstrate how to create a test class TestsOracleVS to explore various features of Oracle Vector Store and its integration with Langchain.

Example Test Class
Testing the Integration
Here, we demonstrate how to create a test class TestsOracleVS to explore various features of Oracle Vector Store and its integration with Langchain.

Example Test Class

``` typescript
class TestsOracleVS {
client: any | null = null;
embeddingFunction: HuggingFaceTransformersEmbeddings;
dbConfig: Record<string, any> = {};
oraclevs!: OracleVS;

constructor(embeddingFunction: HuggingFaceTransformersEmbeddings) {
this.embeddingFunction = embeddingFunction;
}

async init(): Promise<void> {
this.client = await dbPool();
this.dbConfig = {
"client": this.client,
"tableName": "some_tablenm",
"distanceStrategy": DistanceStrategy.DOT_PRODUCT,
"query": "What are the salient features of OracleDB?"
};
this.oraclevs = new OracleVS(this.embeddingFunction, this.dbConfig);
}

public async testCreateIndex(): Promise<void> {
const connection: oracledb.Connection = await dbConnect();
await createIndex(connection, this.oraclevs, {
idxName: "IVF",
idxType: "IVF",
neighborPart: 64,
accuracy: 90
});
console.log("Index created successfully");
await connection.close();
}

// We are ready to test SimilaritySearchByVector - To this one passes an embedding which is a number array. a k value and a filter. This call returns documents ordered by distance.
public async testSimilaritySearchByVector(
embedding: number[],
k: number,
filter?: OracleVS["FilterType"],
): Promise<[DocumentInterface, number][]> {
return this.oraclevs.similaritySearchVectorWithScore(
embedding,
k,
filter,
);
}

// This call does the same except that it returns Documents and embeddings.
public async testSimilaritySearchByVectorReturningEmbeddings(
embedding: number[],
k: number = 4,
filter?: OracleVS["FilterType"],
): Promise<[Document, number, Float32Array | number[]][]> {
return await this.oraclevs.similaritySearchByVectorReturningEmbeddings( embedding, k, filter);
}

// This call tests out the MaxMarginalRelevanceSearch the parameters are self explanatory. The Callback is reserved for future use.
public async testMaxMarginalRelevanceSearch(
query: string,
options?: MaxMarginalRelevanceSearchOptions<OracleVS["FilterType"]>,
_callbacks?: Callbacks
): Promise<DocumentInterface[]> {
if (!options) {
options = { k: 10, fetchK: 20 }; // Default values for the options
}
// @ts-ignore
return this.oraclevs.maxMarginalRelevanceSearch(query, options, _callbacks);
}

// This call is the same as above except that it takes a vector instead of a query as an argument.
public async testMaxMarginalRelevanceSearchByVector(
query: number[],
options?: MaxMarginalRelevanceSearchOptions<OracleVS["FilterType"]>,
_callbacks?: Callbacks | undefined
): Promise<DocumentInterface[]> {
if (!options) {
options = { k: 10, fetchK: 20 }; // Default values for the options
}
return this.oraclevs!.maxMarginalRelevanceSearchByVector(query, options, _callbacks);
}

// This too is the same as above except that it returns document and the score.
public async testMaxMarginalRelevanceSearchWithScoreByVector(
embedding: number[],
options?: MaxMarginalRelevanceSearchOptions<OracleVS["FilterType"]>,
_callbacks?: Callbacks | undefined
): Promise<Array<{ document: Document; score: number }>> {
if (!options) {
options = { k: 10, fetchK: 20 }; // Default values for the options
}
return this.oraclevs.maxMarginalRelevanceSearchWithScoreByVector(embedding, options, _callbacks)
}

// This call tests out the delete feature.
testDelete( params: { ids?: string[], deleteAll?: boolean } ): Promise<void> {
return this.oraclevs.delete(params);
}
}

// The runTestOracleVS is the driver to test out each of the calls.
async function runTestsOracleVS() {
// Initialize dotenv to load environment variables
dotenv.config();
const query = "What is the language used by Oracle database";

// Set up the embedding function model: "Xenova/all-MiniLM-L6-v2"
const embeddingFunction = new HuggingFaceTransformersEmbeddings();
if (!embeddingFunction) {
console.error("Failed to initialize the embedding function.");
return;
}

if (!(embeddingFunction instanceof Embeddings)) {
console.error("Embedding function is not an instance of Embeddings.");
return;
}

console.log("Embedding function initialized successfully");

// Initialize the TestsOracleVS class
const testsOracleVS = new TestsOracleVS("concepts23c_small.json",
embeddingFunction);

// Initialize connection and other setup
await testsOracleVS.init();

// Ingest JSON data to create documents
const documents = await testsOracleVS.testIngestJson();
await OracleVS.fromDocuments(
documents,
testsOracleVS.embeddingFunction,
testsOracleVS.dbConfig
)

// Create an index
await testsOracleVS.testCreateIndex();

// Assume some dummy embedding vector for demonstration
// const embedding: number[] = [0.1, 0.2, 0.3, 0.4]; // Example embedding

// Perform a similarity search by vector
const embedding = await embeddingFunction.embedQuery(query);
const similaritySearchByVector = await testsOracleVS.testSimilaritySearchByVector(embedding, 5);
console.log("Similarity Search Results:", similaritySearchByVector);

// Perform a similarity search by vector
const similaritySearchByEmbeddings =
await testsOracleVS.testSimilaritySearchByVectorReturningEmbeddings(embedding, 5)
console.log("Similarity Search Results:", similaritySearchByEmbeddings);

const maxMarginalRelevanceSearch =
await testsOracleVS.testMaxMarginalRelevanceSearch(query)
console.log("Max Marginal Relevance Search:", maxMarginalRelevanceSearch);

const maxMarginalRelevanceSearchByVector =
await testsOracleVS.testMaxMarginalRelevanceSearchByVector(embedding)
console.log("Max Marginal Relevance Search By Vector:", maxMarginalRelevanceSearchByVector);

const maxMarginalRelevanceSearchWithScoreByVector =
await testsOracleVS.testMaxMarginalRelevanceSearchWithScoreByVector(embedding)
console.log("Max Marginal Relevance Search By Vector:", maxMarginalRelevanceSearchWithScoreByVector);

}
```
That is all for now.
1 change: 0 additions & 1 deletion docs/api_refs/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
"@types/react": "^18",
"@types/react-dom": "^18",
"autoprefixer": "^10.0.1",
"eslint": "^8",
"eslint-config-next": "14.0.1",
"glob": "^10.3.10",
"postcss": "^8",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Oracle AI Vector Search: Document Processing

## Load Documents

Users have the flexibility to load documents from either the Oracle Database, a file system, or both, by appropriately configuring the loader parameters. For comprehensive details on these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-73397E89-92FB-48ED-94BB-1AD960C4EA1F).

A significant advantage of utilizing OracleDocLoader is its capability to process over 150 distinct file formats, eliminating the need for multiple loaders for different document types. For a complete list of the supported formats, please refer to the [Oracle Text Supported Document Formats](https://docs.oracle.com/en/database/oracle/oracle-database/23/ccref/oracle-text-supported-document-formats.html).

Below is a sample code snippet that demonstrates how to use OracleDocLoader

```typescript
import {OracleDocLoader} from "@langchain/community/document_loaders/fs/oracle";

/*
// loading a local file
loader_params = {"file": "<file>"};

// loading from a local directory
loader_params = {"dir": "<directory>"};
*/

// loading from Oracle Database table
// make sure you have the table with this specification
const loader_params = {
"owner": "testuser",
"tablename": "demo_tab",
"colname": "data",
};

// load the docs
const loader = new OracleDocLoader(conn, loader_params);
const docs = await loader.load();

// verify
console.log(`Number of docs loaded: ${docs.length}`);
//console.log(`Document-0: ${docs[0].pageContent}`);
```

## Split Documents

The documents may vary in size, ranging from small to very large. Users often prefer to chunk their documents into smaller sections to facilitate the generation of embeddings. A wide array of customization options is available for this splitting process. For comprehensive details regarding these parameters, please consult the [Oracle AI Vector Search Guide](https://docs.oracle.com/en/database/oracle/oracle-database/23/arpls/dbms_vector_chain1.html#GUID-4E145629-7098-4C7C-804F-FC85D1F24240).

Below is a sample code illustrating how to implement this:

```typescript
import {OracleTextSplitter} from "@langchain/textsplitters/oracle";

/*
// Some examples
// split by chars, max 500 chars
splitter_params = {"split": "chars", "max": 500, "normalize": "all"};

// split by words, max 100 words
splitter_params = {"split": "words", "max": 100, "normalize": "all"};

// split by sentence, max 20 sentences
splitter_params = {"split": "sentence", "max": 20, "normalize": "all"};
*/

// split by default parameters
const splitter_params = {"normalize": "all"};

// get the splitter instance
const splitter = new OracleTextSplitter(conn, splitter_params);

let list_chunks = [];
for (let[, doc]of docs.entries()) {
let chunks = await splitter.splitText(doc.pageContent);
list_chunks.push(chunks);
}

// verify
console.log(`Number of Chunks: ${list_chunks.length}`);
//console.log(`Chunk-0: ${list_chunks[0]}`); // content
```

## End to End Demo

Please refer to our complete demo guide [Oracle AI Vector Search End-to-End Demo Guide](https://github.com/langchain-ai/langchainjs/tree/main/cookbook/oracleai.mdx) to build an end to end RAG pipeline with the help of Oracle AI Vector Search.
Loading