Skip to content

Commit

Permalink
docs: Data connector docs (#214)
Browse files Browse the repository at this point in the history
* update

* update

* fix: Add graphql to data-connector

* update

* update

* feat: Add mysql data connector

* update

* update

* feat: SQLite data connector

* update

* docs: Data connector docs
  • Loading branch information
nadeesha authored Dec 3, 2024
1 parent 5dd661f commit b146453
Show file tree
Hide file tree
Showing 6 changed files with 587 additions and 7 deletions.
37 changes: 30 additions & 7 deletions data-connector/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,10 @@ Inferable Data Connector is a bridge between your data systems and Inferable. Co
- [x] [OpenAPI](./src/open-api/open-api.ts)
- [x] [GraphQL](./src/graphql/graphql.ts)
- [x] [MySQL](./src/mysql/mysql.ts)
<<<<<<< HEAD
- [x] [SQLite](./src/sqlite/sqlite.ts)
- [ ] [MongoDB](./src/mongodb/mongodb.ts)
- [ ] [Big Query](./src/big-query/big-query.ts)
- [ ] [Google Sheets](./src/google-sheets/google-sheets.ts)
=======
- [ ] [SQLite](./src/sqlite/sqlite.ts)
>>>>>>> origin/main

## Quick Start

Expand Down Expand Up @@ -138,16 +134,37 @@ Each connector is defined in the `config.connectors` array.

</details>

<<<<<<< HEAD
<details>
<summary>SQLite Connector Configuration</summary>

- `config.connectors[].filePath`: The path to your SQLite database file. (e.g. `/path/to/your/database.sqlite`)

</details>

=======
>>>>>>> origin/main
<details>
<summary>GraphQL Connector Configuration</summary>

- `config.connectors[].schemaUrl`: The URL to your GraphQL schema. Must be publicly accessible.
- `config.connectors[].endpoint`: The endpoint to use. (e.g. `https://api.inferable.ai`)
- `config.connectors[].defaultHeaders`: The default headers to use. (e.g. `{"Authorization": "Bearer <token>"}`)

</details>

<details>
<summary>MySQL Connector Configuration</summary>

- `config.connectors[].connectionString`: The connection string to your database. (e.g. `mysql://root:mysql@localhost:3306/mysql`)
- `config.connectors[].schema`: The schema to use. (e.g. `mysql`)

</details>

<details>
<summary>SQLite Connector Configuration</summary>

- `config.connectors[].filePath`: The path to your SQLite database file. (e.g. `/path/to/your/database.sqlite`)

</details>

### config.privacyMode

When enabled (`config.privacyMode=1`), raw data is never sent to the model. Instead:
Expand Down Expand Up @@ -202,6 +219,12 @@ A: By default, yes, but enabling `config.privacyMode` ensures that only database
**Q: Where do the queries execute?**
A: All queries execute within your dockerized environment. Neither the model nor the Inferable Control Plane have direct query execution capabilities.

## Failure Modes

**Context Window Limitations**: The connector may face challenges with large database schemas, large OpenAPI specs, or large GraphQL schemas. In such cases, you may need to provide multiple subsets of the schema to the model via multiple `config.connectors` entries.

**Return Data Limitations**: The connector may face latency issues with large data sets. In such cases, turning on `config.privacyMode` will prevent the model from seeing the raw data, and instead return the data directly to the user.

## Contributing

We welcome contributions! To add support for a new database:
Expand Down
134 changes: 134 additions & 0 deletions data-connector/src/graphql/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# GraphQL Data Connector

The GraphQL Data Connector enables LLMs to interact with GraphQL APIs through Inferable by automatically generating functions from GraphQL schemas and providing schema introspection capabilities.

## Configuration

Configure the connector in your `config.json`:

```json
{
"type": "graphql",
"name": "myGraphql",
"schemaUrl": "process.env.GRAPHQL_SCHEMA_URL",
"endpoint": "process.env.GRAPHQL_ENDPOINT",
"defaultHeaders": {
"Authorization": "process.env.GRAPHQL_AUTH_HEADER"
}
}
```

## How It Works

The connector operates in three main phases:

1. **Schema Discovery**: When initialized, it fetches and parses the GraphQL schema to generate callable functions
2. **Type Introspection**: Provides detailed type information to help construct valid queries
3. **Query Execution**: Executes GraphQL operations while respecting privacy and security settings

```mermaid
sequenceDiagram
participant LLM as LLM Agent
participant Connector as GraphQL Connector
participant API as GraphQL API
%% Schema Loading
LLM->>Connector: Initialize
Connector->>API: Fetch Schema
API-->>Connector: Return Schema
Connector-->>LLM: Generated functions
%% Type Introspection
LLM->>Connector: searchGraphQLDefinition
Connector-->>LLM: Type definitions
%% Query Execution
LLM->>Connector: Execute operation
alt Paranoid Mode
Connector->>Human: Request approval
Human-->>Connector: Approve request
end
Connector->>API: GraphQL request
API-->>Connector: Response
alt Privacy Mode
Connector-->>User: Direct data transfer
Connector-->>LLM: Confirmation only
else Normal Mode
Connector-->>LLM: API response
end
```

## Features

- **Schema Introspection**: Provides detailed type information for constructing queries
- **Automatic Function Generation**: Creates Inferable functions from GraphQL operations
- **Privacy Mode**: Prevents sensitive API responses from passing through the LLM
- **Paranoid Mode**: Requires human approval for API requests
- **Type Validation**: Ensures queries match the GraphQL schema

## Important Considerations

### Schema Introspection

The connector provides a special `searchGraphQLDefinition` function to explore types:

```typescript
// Example introspection request
{
operation: "query",
fieldName: "user"
}

// Example response
{
operation: "query",
fieldName: "user",
inputTypes: {
id: {
type: "ID!",
definition: { type: "scalar", name: "ID" }
}
},
outputType: {
type: "User!",
definition: {
type: "object",
fields: {
id: { type: "ID!" },
name: { type: "String!" },
email: { type: "String" }
}
}
}
}
```

### Query Construction

Use the type information to construct valid queries:

```graphql
query user($id: ID!) {
user(id: $id) {
id
name
email
}
}
```

### Security Considerations

- **Authentication**: Configure through defaultHeaders
- **Privacy Mode**: Prevents sensitive API responses from reaching the LLM
- **Paranoid Mode**: Requires approval before making requests
- **Schema Validation**: Ensures queries match the GraphQL schema

### GraphQL-Specific Features

- **Type System**: Full support for GraphQL's type system
- **Operation Types**: Supports both queries and mutations
- **Field Selection**: Allows precise selection of required fields
- **Variables**: Supports GraphQL variables for dynamic queries
- **Nested Types**: Handles complex nested object types
- **Schema Exploration**: Built-in type introspection capabilities
108 changes: 108 additions & 0 deletions data-connector/src/mysql/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# MySQL Data Connector

The MySQL Data Connector enables LLMs to interact with MySQL databases through Inferable by providing schema understanding and query execution capabilities.

## Request Configuration

Configure the connector in your `config.json`:

```json
{
"type": "mysql",
"name": "myMysql",
"connectionString": "process.env.MYSQL_URL",
"schema": "process.env.MYSQL_SCHEMA"
}
```

## How It Works

The connector operates in two main phases:

1. **Schema Discovery**: When initialized, it analyzes the database structure and provides context to the LLM
2. **Query Execution**: Based on the schema understanding, it can execute SQL queries while respecting privacy and security settings

```mermaid
sequenceDiagram
participant LLM as LLM Agent
participant Connector as MySQL Connector
participant DB as MySQL DB
%% Schema Discovery
LLM->>Connector: getMysqlContext()
Connector->>DB: Query table structure
DB-->>Connector: Return schema info
Connector-->>LLM: Tables & columns context
%% Query Execution
LLM->>Connector: executeMysqlQuery()
alt Paranoid Mode
Connector->>Human: Request approval
Human-->>Connector: Approve query
end
Connector->>DB: Execute SQL query
DB-->>Connector: Return results
alt Privacy Mode
Connector-->>User: Direct data transfer
Connector-->>LLM: Confirmation only
else Normal Mode
Connector-->>LLM: Query results
end
```

## Features

- **Schema Analysis**: Automatically maps database structure for LLM context
- **Privacy Mode**: Prevents sensitive data from passing through the LLM
- **Paranoid Mode**: Requires human approval for query execution
- **Sample Data**: Provides example rows to help LLM understand data patterns

## Important Considerations

### Context Window Limitations

The connector may face challenges with large database schemas:

```typescript
// Example context structure
{
tableName: "users",
columns: ["id", "name", "email"],
sampleData: ["1", "John Doe", "[email protected]"]
}
```

**Solution**: If you have many tables, create a refined schema focusing on relevant tables:

```sql
CREATE DATABASE llm_visible;
GRANT ALL PRIVILEGES ON llm_visible.* TO 'your_user'@'localhost';
-- Create views of only the necessary tables
CREATE VIEW llm_visible.users AS SELECT * FROM main_database.users;
```

### Data Privacy

Large result sets passing through the LLM can:

- Consume excessive tokens
- Expose sensitive data
- Cause context overflow

**Solution**: Enable privacy mode to send data directly to the user:

```typescript
new MySQLClient({
connectionString: "mysql://user:pass@localhost:3306/database",
schema: "your_schema",
privacyMode: true,
});
```

### Connection Management

The connector automatically handles:

- Connection initialization and verification
- Graceful shutdown on SIGTERM signals
- Connection pooling and reuse
Loading

0 comments on commit b146453

Please sign in to comment.