Skip to content

Commit

Permalink
feat!: V2 Java client, named vectors
Browse files Browse the repository at this point in the history
BREAKING
  • Loading branch information
Anush008 committed Feb 28, 2024
1 parent c41f7f7 commit f5b77c6
Show file tree
Hide file tree
Showing 18 changed files with 629 additions and 262 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@ jobs:
echo "AUTHOR_EMAIL=$AUTHOR_EMAIL" >> $GITHUB_OUTPUT
id: author_info

- name: Set up Java 17
- name: Set up Java 8
uses: actions/setup-java@v3
with:
distribution: 'oracle'
java-version: '17'
java-version: "8"
distribution: temurin
server-id: ossrh
server-username: OSSRH_JIRA_USERNAME
server-password: OSSRH_JIRA_PASSWORD
Expand All @@ -51,7 +51,7 @@ jobs:

- name: Semantic Release
run: |
bun install @conveyal/maven-semantic-release semantic-release
bun install @conveyal/maven-semantic-release semantic-release @semantic-release/git conventional-changelog-conventionalcommits
bun x semantic-release --prepare @conveyal/maven-semantic-release --publish @semantic-release/github,@conveyal/maven-semantic-release --verify-conditions @semantic-release/github,@conveyal/maven-semantic-release --verify-release @conveyal/maven-semantic-release
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ jobs:
- uses: actions/checkout@v4
- uses: actions/setup-java@v3
with:
java-version: "17"
java-version: "8"
distribution: temurin
- name: Run the Maven tests
run: mvn test
- name: Generate assembly fat JAR
run: mvn clean package -Passembly
run: mvn clean package
135 changes: 135 additions & 0 deletions .releaserc
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
{
"branches": [
"master"
],
"plugins": [
[
"@semantic-release/commit-analyzer",
{
"preset": "conventionalcommits",
"releaseRules": [
{
"breaking": true,
"release": "major"
},
{
"type": "feat",
"release": "minor"
},
{
"type": "fix",
"release": "patch"
},
{
"type": "perf",
"release": "patch"
},
{
"type": "revert",
"release": "patch"
},
{
"type": "docs",
"release": "patch"
},
{
"type": "style",
"release": "patch"
},
{
"type": "refactor",
"release": "patch"
},
{
"type": "test",
"release": "patch"
},
{
"type": "build",
"release": "patch"
},
{
"type": "ci",
"release": "patch"
},
{
"type": "chore",
"release": "patch"
}
]
}
],
"@semantic-release/release-notes-generator",
[
"@semantic-release/release-notes-generator",
{
"preset": "conventionalcommits",
"parserOpts": {
"noteKeywords": [
"BREAKING CHANGE",
"BREAKING CHANGES",
"BREAKING"
]
},
"writerOpts": {
"commitsSort": [
"subject",
"scope"
]
},
"presetConfig": {
"types": [
{
"type": "feat",
"section": "πŸ• Features"
},
{
"type": "feature",
"section": "πŸ• Features"
},
{
"type": "fix",
"section": "πŸ› Bug Fixes"
},
{
"type": "perf",
"section": "πŸ”₯ Performance Improvements"
},
{
"type": "revert",
"section": "⏩ Reverts"
},
{
"type": "docs",
"section": "πŸ“ Documentation"
},
{
"type": "style",
"section": "🎨 Styles"
},
{
"type": "refactor",
"section": "πŸ§‘β€πŸ’» Code Refactoring"
},
{
"type": "test",
"section": "βœ… Tests"
},
{
"type": "build",
"section": "πŸ€– Build System"
},
{
"type": "ci",
"section": "πŸ” Continuous Integration"
},
{
"type": "chore",
"section": "🧹 Chores"
}
]
}
}
]
]
}
25 changes: 13 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,19 @@
## Installation πŸš€

> [!IMPORTANT]
> Requires Java 17 or above.
> Requires Java 8 or above.
### GitHub Releases πŸ“¦

The packaged `jar` file releases can be found [here](https://github.com/qdrant/qdrant-spark/releases).
The packaged `jar` file can be found [here](https://github.com/qdrant/qdrant-spark/releases).

### Building from source πŸ› οΈ

To build the `jar` from source, you need [JDK@17](https://www.oracle.com/java/technologies/javase/jdk17-archive-downloads.html) and [Maven](https://maven.apache.org/) installed.
To build the `jar` from source, you need [JDK@18](https://www.azul.com/downloads/#zulu) and [Maven](https://maven.apache.org/) installed.
Once the requirements have been satisfied, run the following command in the project root. πŸ› οΈ

```bash
mvn package -P assembly
mvn package
```

This will build and store the fat JAR in the `target` directory by default.
Expand All @@ -30,7 +30,7 @@ For use with Java and Scala projects, the package can be found [here](https://ce
<dependency>
<groupId>io.qdrant</groupId>
<artifactId>spark</artifactId>
<version>1.12.1</version>
<version>2.0</version>
</dependency>
```

Expand All @@ -43,7 +43,7 @@ from pyspark.sql import SparkSession

spark = SparkSession.builder.config(
"spark.jars",
"spark-1.12.1-jar-with-dependencies.jar", # specify the downloaded JAR file
"spark-2.0.jar", # specify the downloaded JAR file
)
.master("local[*]")
.appName("qdrant")
Expand All @@ -58,7 +58,7 @@ To load data into Qdrant, a collection has to be created beforehand with the app
<pyspark.sql.DataFrame>
.write
.format("io.qdrant.spark.Qdrant")
.option("qdrant_url", <QDRANT_URL>)
.option("qdrant_url", <QDRANT_GRPC_URL>)
.option("collection_name", <QDRANT_COLLECTION_NAME>)
.option("embedding_field", <EMBEDDING_FIELD_NAME>) # Expected to be a field of type ArrayType(FloatType)
.option("schema", <pyspark.sql.DataFrame>.schema.json())
Expand All @@ -70,31 +70,32 @@ To load data into Qdrant, a collection has to be created beforehand with the app
- An API key can be set using the `api_key` option to make authenticated requests.

## Databricks

You can use the `qdrant-spark` connector as a library in Databricks to ingest data into Qdrant.

- Go to the `Libraries` section in your cluster dashboard.
- Select `Install New` to open the library installation modal.
- Search for `io.qdrant:spark:1.12.1` in the Maven packages and click `Install`.
- Search for `io.qdrant:spark:2.0` in the Maven packages and click `Install`.

<img width="1064" alt="Screenshot 2024-01-05 at 17 20 01 (1)" src="https://github.com/qdrant/qdrant-spark/assets/46051506/d95773e0-c5c6-4ff2-bf50-8055bb08fd1b">


## Datatype support πŸ“‹

Qdrant supports all the Spark data types, and the appropriate types are mapped based on the provided `schema`.
Qdrant supports all the Spark data types. The appropriate types are mapped based on the provided `schema`.

## Options and Spark types πŸ› οΈ

| Option | Description | DataType | Required |
| :---------------- | :------------------------------------------------------------------------ | :--------------------- | :------- |
| `qdrant_url` | REST URL of the Qdrant instance | `StringType` | βœ… |
| `qdrant_url` | GRPC URL of the Qdrant instance. Eg: <http://localhost:6334> | `StringType` | βœ… |
| `collection_name` | Name of the collection to write data into | `StringType` | βœ… |
| `embedding_field` | Name of the field holding the embeddings | `ArrayType(FloatType)` | βœ… |
| `schema` | JSON string of the dataframe schema | `StringType` | βœ… |
| `mode` | Write mode of the dataframe. Supports "append". | `StringType` | βœ… |
| `id_field` | Name of the field holding the point IDs. Default: Generates a random UUId | `StringType` | ❌ |
| `batch_size` | Max size of the upload batch. Default: 100 | `IntType` | ❌ |
| `retries` | Number of upload retries. Default: 3 | `IntType` | ❌ |
| `api_key` | Qdrant API key to be sent in the header. Default: null | `StringType` | ❌ |
| `vector_name` | Name of the vector in the collection. Default: null | `StringType` | ❌ |

## LICENSE πŸ“œ

Expand Down
Loading

0 comments on commit f5b77c6

Please sign in to comment.