diff --git a/docs/develop/java/getting-started/index.md b/docs/develop/java/getting-started/index.md index d3abc4b39d8..7d7cce1d20b 100644 --- a/docs/develop/java/getting-started/index.md +++ b/docs/develop/java/getting-started/index.md @@ -14,17 +14,7 @@ Find tutorials, examples and technical articles that will help you to develop wi ## Getting Started -Java community has built many client libraries that you can find [here](https://redis.io/clients#java). For your first steps with Java and Redis, this article will show how to use the two main libraries: [Jedis](https://github.com/redis/jedis) and [Lettuce](https://lettuce.io/). - -The blog post “[Jedis vs. Lettuce: An Exploration](https://redis.com/blog/jedis-vs-lettuce-an-exploration/)” will help you to select the best for your application; keeping in mind that both are available in Spring & SpringBoot framework. - - - +Java community has built many client libraries that you can find [here](https://redis.io/clients#java). For your first steps with Java and Redis, this article will show how to use [Jedis](https://github.com/redis/jedis), the supported Redis client for Java. Redis is an open source, in-memory, key-value data store most commonly used as a primary database, cache, message broker, and queue. Redis delivers sub-millisecond response times, enabling fast and powerful real-time applications in industries such as gaming, fintech, ad-tech, social media, healthcare, and IoT. @@ -106,53 +96,6 @@ Once you have access to the connection pool you can now get a Jedis instance and Find more information about Java & Redis connections in the "[Redis Connect](https://github.com/redis-developer/redis-connect/tree/master/java/jedis)". - - - -## Using Lettuce - -### Step 1. Add dependencies Jedis dependency to your Maven (or Gradle) project file: - -```xml - - io.lettuce - lettuce-corea - 6.0.1.RELEASE - -``` - -### Step 2. Import the Jedis classes - -```java - import io.lettuce.core.RedisClient; - import io.lettuce.core.api.StatefulRedisConnection; - import io.lettuce.core.api.sync.RedisCommands; -``` - -### Step 3. Write your application code - -```java - RedisClient redisClient = RedisClient.create("redis://localhost:6379/"); - StatefulRedisConnection connection = redisClient.connect(); - RedisCommands syncCommands = connection.sync(); - - syncCommands.set("mykey", "Hello from Lettuce!"); - String value = syncCommands.get("mykey"); - System.out.println( value ); - - syncCommands.zadd("vehicles", 0, "car"); - syncCommands.zadd("vehicles", 0, "bike"); - List vehicles = syncCommands.zrange("vehicles", 0, -1); - System.out.println( vehicles ); - - connection.close(); - redisClient.shutdown(); -``` - -Find more information about Java & Redis connections in the "[Redis Connect](https://github.com/redis-developer/redis-connect/tree/master/java/lettuce)". - - - ### Redis Launchpad Redis Launchpad is like an “App Store” for Redis sample apps. You can easily find apps for your preferred frameworks and languages. @@ -248,12 +191,6 @@ As developer you can use the Java client library directly in your application, o ### More developer resources -
- -
- -#### Sample Code - **[Brewdis - Product Catalog (Spring)](https://github.com/redis-developer/brewdis)** See how to use Redis and Spring to build a product catalog with streams, hashes and Search @@ -263,27 +200,6 @@ See how to use Spring to create multiple producer and consumers with Redis Strea **[Rate Limiting with Vert.x](https://github.com/redis-developer/vertx-rate-limiting-redis)** See how to use Redis Sorted Set with Vert.x to build a rate limiting service. -**[Redis Java Samples with Lettuce](https://github.com/redis-developer/vertx-rate-limiting-redis)** -Run Redis Commands from Lettuce - -
-
-
- -
- -#### Technical Articles - -**[Getting Started with Redis Streams and Java (Lettuce)](https://redis.com/blog/getting-started-with-redis-streams-and-java/)** - -**[Jedis vs. Lettuce: An Exploration](https://redis.com/blog/jedis-vs-lettuce-an-exploration/)** - -
- -
- ---- - ### Redis University ### [Redis for Java Developers](https://university.redis.com/courses/ru102j/) @@ -294,7 +210,6 @@ Redis for Java Developers teaches you how to build robust Redis client applicati -##
Redis Launchpad diff --git a/docs/develop/java/spring/rate-limiting/fixed-window/index-fixed-window-reactive-gears.mdx b/docs/develop/java/spring/rate-limiting/fixed-window/index-fixed-window-reactive-gears.mdx index 2f3e36f2755..fda5e8febd5 100644 --- a/docs/develop/java/spring/rate-limiting/fixed-window/index-fixed-window-reactive-gears.mdx +++ b/docs/develop/java/spring/rate-limiting/fixed-window/index-fixed-window-reactive-gears.mdx @@ -5,6 +5,12 @@ sidebar_label: Atomicity with Gears slug: /develop/java/spring/rate-limiting/fixed-window/reactive-gears --- +:::warning LETTUCE + +This tutorial uses Lettuce, which is an unsupported Redis library. For production applications, we recommend using [**Jedis**](https://github.com/redis/jedis) + +::: + ## Improving atomicity and performance with RedisGears ### What is RedisGears? diff --git a/docs/develop/java/spring/redis-and-spring-course/lesson_7/index-lesson_7.mdx b/docs/develop/java/spring/redis-and-spring-course/lesson_7/index-lesson_7.mdx index f505cb6d1b1..e63512be1f9 100644 --- a/docs/develop/java/spring/redis-and-spring-course/lesson_7/index-lesson_7.mdx +++ b/docs/develop/java/spring/redis-and-spring-course/lesson_7/index-lesson_7.mdx @@ -6,6 +6,12 @@ slug: /develop/java/redis-and-spring-course/lesson_7 authors: [bsb] --- +:::warning LETTUCE + +This tutorial uses Lettuce, which is an unsupported Redis library. For production applications, we recommend using [**Jedis**](https://github.com/redis/jedis) + +::: + import Authors from '@theme/Authors'; diff --git a/docs/explore/redisinsight/streams/index-streams.mdx b/docs/explore/redisinsight/streams/index-streams.mdx index 486c9d6d5a0..758b5592add 100644 --- a/docs/explore/redisinsight/streams/index-streams.mdx +++ b/docs/explore/redisinsight/streams/index-streams.mdx @@ -20,7 +20,7 @@ Redis Streams lets you build “Kafka-like” applications, which can: In addition, Redis Streams has the concept of a consumer group. Redis Streams consumer groups, like the similar concept in [Apache Kafka](https://kafka.apache.org/), allows client applications to consume messages in a distributed fashion (multiple clients), making it easy to scale and create highly available systems. -Let’s dive under the covers and see [Redis Streams](https://redis.io/topics/streams-intro) through the lens of RedisInsight. You will see how to use the [Lettuce Java client](https://developer.redis.com/develop/java/#using-lettuce) to publish and consume messages using consumer group.This is the first basic example that uses a single consumer. +Let’s dive under the covers and see [Redis Streams](https://redis.io/topics/streams-intro) through the lens of RedisInsight. You will see how to use the Redis to publish and consume messages using a consumer group. This is the first basic example that uses a single consumer. ## Prerequisite: diff --git a/docs/howtos/antipatterns/index-antipatterns.mdx b/docs/howtos/antipatterns/index-antipatterns.mdx index 715d1c8abc6..e3956ad7507 100644 --- a/docs/howtos/antipatterns/index-antipatterns.mdx +++ b/docs/howtos/antipatterns/index-antipatterns.mdx @@ -39,60 +39,33 @@ Let us look at the redis-py that uses a connection pool to manage connections to [Learn more about redis-py](/develop/python/) -### Example #2 - Lettuce - -Lettuce provides generic connection pool support.Lettuce connections are designed to be thread-safe so one connection can be shared amongst multiple threads and Lettuce connections auto-reconnection by default. While connection pooling is not necessary in most cases it can be helpful in certain use cases. Lettuce provides generic connection pooling support. - -```java - RedisClient client = RedisClient.create(RedisURI.create(host, port)); - - GenericObjectPool> pool = ConnectionPoolSupport - .createGenericObjectPool(() -> client.connect(), new GenericObjectPoolConfig()); - - // executing work - try (StatefulRedisConnection connection = pool.borrowObject()) { - - RedisCommands commands = connection.sync(); - commands.multi(); - commands.set("key", "value"); - commands.set("key2", "value2"); - commands.exec(); - } - - // terminating - pool.close(); - client.shutdown(); -``` - -[Learn more about Lettuce](/develop/java/?s=lettuce) - -### 3. Connecting directly to Redis instances +### 2. Connecting directly to Redis instances With a large number of clients, a reconnect flood will be able to simply overwhelm a single threaded Redis process and force a failover. Hence, it is recommended that you should use the right tool that allows you to reduce the number of open connections to your Redis server. [Redis Enterprise DMC proxy](https://docs.redis.com/latest/rs/administering/designing-production/networking/multiple-active-proxy/) allows you to reduce the number of connections to your cache server by acting as a proxy. There are other 3rd party tool like [Twemproxy](https://github.com/twitter/twemproxy). It is a fast and lightweight proxy server that allows you to reduce the number of open connections to your Redis server. It was built primarily to reduce the number of connections to the caching servers on the backend. This, together with protocol pipelining and sharding enables you to horizontally scale your distributed caching architecture. -### 4. More than one secondary shard (Redis OSS) +### 3. More than one secondary shard (Redis OSS) Redis OSS uses a shard-based quorum. It's advised to use at least 3 copies of the data (2 replica shards per master shard) in order to be protected from split-brain situations. In nutshell, Redis OSS solves the quorum challenge by having an odd number of shards (primary + 2 replicas). Redis Enterprise solves the quorum challenge with an odd number of nodes. Redis Enterprise avoids a split-brain situation with only 2 copies of the data, which is more cost-efficient. In addition, the so-called ‘quorum-only node' can be used to bring a cluster up to an odd number of nodes if an additional, not necessary data node would be too expensive. -### 5. Performing single operation +### 4. Performing single operation Performing several operations serially increases connection overhead. Instead, use [Redis Pipelining](https://redis.io/topics/pipelining). Pipelining is the process of sending multiple messages down the pipe without waiting on the reply from each - and (typically) processing the replies later when they come in. Pipelining is completely a client side implementation. It is aimed at solving response latency issues in high network latency environments. So, the lesser the amount of time spent over the network in sending commands and reading responses, the better. This is effectively achieved by buffering. The client may (or may not) buffer the commands at the TCP stack (as mentioned in other answers) before they are sent to the server. Once they are sent to the server, the server executes them and buffers them on the server side. The benefit of the pipelining is a drastically improved protocol performance. The speedup gained by pipelining ranges from a factor of five for connections to localhost up to a factor of at least one hundred over slower internet connections. -### 6. Caching keys without TTL +### 5. Caching keys without TTL Redis functions primarily as a key-value store. It is possible to set timeout values on these keys. Said that, a timeout expiration automatically deletes the key. Additionally, when we use commands that delete or overwrite the contents of the key, it will clear the timeout. Redis TTL command is used to get the remaining time of the key expiry in seconds. TTL returns the remaining time to live of a key that has a timeout. This introspection capability allows a Redis client to check how many seconds a given key will continue to be part of the dataset.Keys will accumulate and end up being evicted. Hence, it is recommended to set TTLs on all caching keys. -### 7. Endless Redis Replication Loop +### 6. Endless Redis Replication Loop When attempting to replicate a very large active database over a slow or saturated link, replication never finishes due to the continuous updates. Hence, it is recommended to tune the slave and client buffers to allow for slower replication. Check out [this detailed blog](https://redis.com/blog/the-endless-redis-replication-loop-what-why-and-how-to-solve-it/). -### 8. Hot Keys +### 7. Hot Keys Redis can easily become the core of your app’s operational data, holding valuable and frequently accessed information. However, if you centralize the access down to a few pieces of data accessed constantly, you create what is known as a hot-key problem. In a Redis cluster, the key is actually what determines where in the cluster that data is stored. The data is stored in one single, primary location based off of hashing that key. So, when you access a single key over and over again, you’re actually accessing a single node/shard over and over again. Let’s put it another way—if you have a cluster of 99 nodes and you have a single key that gets a million requests in a second, all million of those requests will be going to a single node, not spread across the other 98 nodes. @@ -104,7 +77,7 @@ Redis even provides tools to find where your hot keys are located. Use redis-cli When possible, the best defence is to avoid the development pattern that is creating the situation. Writing the data to multiple keys that reside in different shards will allow you to access the same data more frequently. In nutshell, having specific keys that are accessed with every client operation. Hence, it's recommended to shard out hot keys using hashing algorithms. You can set policy to LFU and run redis-cli --hotkeys to determine. -### 9. Using Keys command +### 8. Using Keys command In Redis, the KEYS command can be used to perform exhaustive pattern matching on all stored keys. This is not advisable, as running this on an instance with a large number of keys could take a long time to complete, and will slow down the Redis instance in the process. In the relational world, this is equivalent to running an unbound query (SELECT...FROM without a WHERE clause). Execute this type of operation with care, and take necessary measures to ensure that your tenants are not performing a KEYS operation from within their application code. Use SCAN, which spreads the iteration over many calls, not tying up your whole server at one time. @@ -115,17 +88,17 @@ Scaning keyspace by keyname is an extremely slow operation and will run O(N) wit 2SQL: SELECT * FROM orders WHERE make=ford AND model=explorer" ``` -### 10. Running Ephemeral Redis as a primary database +### 9. Running Ephemeral Redis as a primary database Redis is often used as a primary storage engine for applications. Unlike using Redis as a cache, using Redis as a primary database requires two extra features to be effective. Any primary database should really be highly available. If a cache goes down, then generally your application is in a brown-out state. If a primary database goes down, your application also goes down. Similarly, if a cache goes down and you restart it empty, that’s no big deal. For a primary database, though, that’s a huge deal. Redis can handle these situations easily, but they generally require a different configuration than running as a cache. Redis as a primary database is great, but you’ve got to support it by turning on the right features. With Redis open source, you need to set up Redis Sentinel for high availability. In Redis Enterprise, it’s a core feature that you just need to turn on when creating the database. As for durability, both Redis Enterprise and open source Redis provide durability through AOF or snapshotting so your instance(s) start back up the way you left them. -### 11. Storing JSON blobs in a string +### 10. Storing JSON blobs in a string Microservices written in several languages may not marshal/unmarshal JSON in a consistent manner. Application logic will be required to lock/watch a key for atomic updates. JSON manipulation is often a very compute costly operation. Hence, it is recommended to use HASH data structure and also Redis JSON. -### 12. Translating a table or JSON to a HASH without considering query pattern +### 11. Translating a table or JSON to a HASH without considering query pattern The only query mechanism is a SCAN which requires reading the data structure and limits filtering to the MATCH directive. It is recommended to store the table or JSON as a string. Break out the indexes into reverse indexes using a SET or SORTED SET and point back to the key for the string. Using SELECT command and multiple databases inside one Redis instance diff --git a/docs/howtos/solutions/vector/getting-started-vector/chart/common.js b/docs/howtos/solutions/vector/getting-started-vector/chart/common.js new file mode 100644 index 00000000000..c5f990909c5 --- /dev/null +++ b/docs/howtos/solutions/vector/getting-started-vector/chart/common.js @@ -0,0 +1,30 @@ +// Sample data +const products = [ + { + name: 'Puma Men Race Black Watch', + price: 150, + quality: 5, + popularity: 8, + }, + { + name: 'Puma Men Top Fluctuation Red Black Watch', + price: 180, + quality: 7, + popularity: 6, + }, + { + name: 'Inkfruit Women Behind Cream Tshirt', + price: 5, + quality: 9, + popularity: 7, + }, +]; + +const dataWithAttributes = products.map((product) => ({ + x: product.price, + y: product.quality, + label: product.name, +})); + +const product1Point = { x: products[0].price, y: products[0].quality }; +const product2Point = { x: products[1].price, y: products[1].quality }; diff --git a/docs/howtos/solutions/vector/getting-started-vector/chart/cosine.js b/docs/howtos/solutions/vector/getting-started-vector/chart/cosine.js new file mode 100644 index 00000000000..2035affa807 --- /dev/null +++ b/docs/howtos/solutions/vector/getting-started-vector/chart/cosine.js @@ -0,0 +1,95 @@ +/* eslint-disable @typescript-eslint/no-unsafe-return */ +/* eslint-disable @typescript-eslint/no-unsafe-call */ +/* eslint-disable @typescript-eslint/no-unsafe-argument */ + +const ctxCosine = document + .getElementById('productChartCosine') + .getContext('2d'); + +function cosineSimilarity(point1, point2) { + let dotProduct = point1.x * point2.x + point1.y * point2.y; + let magnitudePoint1 = Math.sqrt( + Math.pow(point1.x, 2) + Math.pow(point1.y, 2), + ); + let magnitudePoint2 = Math.sqrt( + Math.pow(point2.x, 2) + Math.pow(point2.y, 2), + ); + return dotProduct / (magnitudePoint1 * magnitudePoint2); +} + +const cosineSim = cosineSimilarity(product1Point, product2Point); + +const scatterChartCosine = new Chart(ctxCosine, { + type: 'scatter', + data: { + datasets: [ + { + label: 'Products', + data: dataWithAttributes, + pointBackgroundColor: ['black', 'red', 'bisque'], + pointRadius: 5, + }, + { + label: 'Vector for Product-1', + data: [{ x: 0, y: 0 }, product1Point], + showLine: true, + fill: false, + borderColor: 'black', + pointRadius: [0, 5], + lineTension: 0, + }, + { + label: 'Vector for Product-2', + data: [{ x: 0, y: 0 }, product2Point], + showLine: true, + fill: false, + borderColor: 'red', + pointRadius: [0, 5], + lineTension: 0, + }, + ], + }, + options: { + responsive: true, + plugins: { + legend: { + position: 'top', + }, + title: { + display: true, + text: `Cosine Similarity between Product-1 and Product-2 is ${cosineSim}`, + }, + }, + scales: { + x: { + type: 'linear', + position: 'bottom', + title: { + display: true, + text: 'Price ($)', + }, + ticks: { + beginAtZero: true, + }, + min: 0, // Ensure it starts from 0 + }, + y: { + title: { + display: true, + text: 'Quality (1-10)', + }, + ticks: { + beginAtZero: true, + }, + min: 0, // Ensure it starts from 0 + }, + }, + tooltips: { + callbacks: { + title: function (tooltipItem, data) { + return data.datasets[0].data[tooltipItem[0].index].label; + }, + }, + }, + }, +}); diff --git a/docs/howtos/solutions/vector/getting-started-vector/chart/euclidean.js b/docs/howtos/solutions/vector/getting-started-vector/chart/euclidean.js new file mode 100644 index 00000000000..15fce10c435 --- /dev/null +++ b/docs/howtos/solutions/vector/getting-started-vector/chart/euclidean.js @@ -0,0 +1,78 @@ +/* eslint-disable @typescript-eslint/no-unsafe-call */ +// eslint-disable-next-line @typescript-eslint/no-unsafe-call +const ctx = document.getElementById('productChartEuclidean').getContext('2d'); + +function euclideanDistance(point1, point2) { + return Math.sqrt( + Math.pow(point1.x - point2.x, 2) + Math.pow(point1.y - point2.y, 2), + ); +} + +const distance = euclideanDistance(product1Point, product2Point); + +const scatterChart = new Chart(ctx, { + type: 'scatter', + data: { + datasets: [ + { + label: 'Products', + data: dataWithAttributes, + pointBackgroundColor: ['black', 'red', 'bisque'], + pointRadius: 5, + }, + { + label: `Euclidean Distance: ${distance.toFixed(2)}`, + data: [product1Point, product2Point], + showLine: true, + fill: false, + borderColor: 'green', + pointRadius: 0, + lineTension: 0, + }, + ], + }, + options: { + responsive: true, + plugins: { + legend: { + position: 'top', + }, + title: { + display: true, + text: `Euclidean Distance between Product-1 and Product-2`, + }, + }, + scales: { + x: { + type: 'linear', + position: 'bottom', + title: { + display: true, + text: 'Price ($)', + }, + ticks: { + beginAtZero: true, + }, + min: 0, // Ensure it starts from 0 + }, + y: { + title: { + display: true, + text: 'Quality (1-10)', + }, + ticks: { + beginAtZero: true, + }, + min: 0, // Ensure it starts from 0 + }, + }, + tooltips: { + callbacks: { + title: function (tooltipItem, data) { + // eslint-disable-next-line @typescript-eslint/no-unsafe-return + return data.datasets[0].data[tooltipItem[0].index].label; + }, + }, + }, + }, +}); diff --git a/docs/howtos/solutions/vector/getting-started-vector/chart/index.html b/docs/howtos/solutions/vector/getting-started-vector/chart/index.html new file mode 100644 index 00000000000..c089045aa21 --- /dev/null +++ b/docs/howtos/solutions/vector/getting-started-vector/chart/index.html @@ -0,0 +1,28 @@ + + + + + Product Attributes Visualization + + + + + +
+ +
+
+
+
+ +
+ + + + + + diff --git a/docs/howtos/solutions/vector/getting-started-vector/images/11001.jpg b/docs/howtos/solutions/vector/getting-started-vector/images/11001.jpg new file mode 100644 index 00000000000..6366a8dda81 Binary files /dev/null and b/docs/howtos/solutions/vector/getting-started-vector/images/11001.jpg differ diff --git a/docs/howtos/solutions/vector/getting-started-vector/images/cosine-chart.png b/docs/howtos/solutions/vector/getting-started-vector/images/cosine-chart.png new file mode 100644 index 00000000000..073ef9709b9 Binary files /dev/null and b/docs/howtos/solutions/vector/getting-started-vector/images/cosine-chart.png differ diff --git a/docs/howtos/solutions/vector/getting-started-vector/images/cosine-formula.png b/docs/howtos/solutions/vector/getting-started-vector/images/cosine-formula.png new file mode 100644 index 00000000000..51b7ab52438 Binary files /dev/null and b/docs/howtos/solutions/vector/getting-started-vector/images/cosine-formula.png differ diff --git a/docs/howtos/solutions/vector/getting-started-vector/images/cosine-sample.png b/docs/howtos/solutions/vector/getting-started-vector/images/cosine-sample.png new file mode 100644 index 00000000000..cdcbb0beef4 Binary files /dev/null and b/docs/howtos/solutions/vector/getting-started-vector/images/cosine-sample.png differ diff --git a/docs/howtos/solutions/vector/getting-started-vector/images/euclidean-distance-chart.png b/docs/howtos/solutions/vector/getting-started-vector/images/euclidean-distance-chart.png new file mode 100644 index 00000000000..7e8d13b29f0 Binary files /dev/null and b/docs/howtos/solutions/vector/getting-started-vector/images/euclidean-distance-chart.png differ diff --git a/docs/howtos/solutions/vector/getting-started-vector/images/euclidean-distance-formula.png b/docs/howtos/solutions/vector/getting-started-vector/images/euclidean-distance-formula.png new file mode 100644 index 00000000000..8cbaa051809 Binary files /dev/null and b/docs/howtos/solutions/vector/getting-started-vector/images/euclidean-distance-formula.png differ diff --git a/docs/howtos/solutions/vector/getting-started-vector/images/euclidean-distance-sample.png b/docs/howtos/solutions/vector/getting-started-vector/images/euclidean-distance-sample.png new file mode 100644 index 00000000000..10c4103d325 Binary files /dev/null and b/docs/howtos/solutions/vector/getting-started-vector/images/euclidean-distance-sample.png differ diff --git a/docs/howtos/solutions/vector/getting-started-vector/images/ip-formula.png b/docs/howtos/solutions/vector/getting-started-vector/images/ip-formula.png new file mode 100644 index 00000000000..2f1912d7c99 Binary files /dev/null and b/docs/howtos/solutions/vector/getting-started-vector/images/ip-formula.png differ diff --git a/docs/howtos/solutions/vector/getting-started-vector/images/ip-sample.png b/docs/howtos/solutions/vector/getting-started-vector/images/ip-sample.png new file mode 100644 index 00000000000..5c29c8ba6e3 Binary files /dev/null and b/docs/howtos/solutions/vector/getting-started-vector/images/ip-sample.png differ diff --git a/docs/howtos/solutions/vector/getting-started-vector/images/products-data-gui.png b/docs/howtos/solutions/vector/getting-started-vector/images/products-data-gui.png new file mode 100644 index 00000000000..df6021442c9 Binary files /dev/null and b/docs/howtos/solutions/vector/getting-started-vector/images/products-data-gui.png differ diff --git a/docs/howtos/solutions/vector/getting-started-vector/index-getting-started-vector.mdx b/docs/howtos/solutions/vector/getting-started-vector/index-getting-started-vector.mdx new file mode 100644 index 00000000000..8c89906485f --- /dev/null +++ b/docs/howtos/solutions/vector/getting-started-vector/index-getting-started-vector.mdx @@ -0,0 +1,760 @@ +--- +id: index-getting-started-vector +title: Getting Started with Vector Search Using Redis in NodeJS +sidebar_label: Getting Started with Vector Search Using Redis in NodeJS +slug: /howtos/solutions/vector/getting-started-vector +authors: [prasan, will] +--- + +import Authors from '@theme/Authors'; +import SampleWatchImage from './images/11001.jpg'; +import EuclideanDistanceFormulaImage from './images/euclidean-distance-formula.png'; +import EuclideanDistanceSampleImage from './images/euclidean-distance-sample.png'; +import CosineFormulaImage from './images/cosine-formula.png'; +import CosineSampleImage from './images/cosine-sample.png'; +import IpFormulaImage from './images/ip-formula.png'; +import IpSampleImage from './images/ip-sample.png'; + + + +:::tip GITHUB CODE + +Below is a command to the clone the source code used in this tutorial + +git clone https://github.com/redis-developer/redis-vector-nodejs-solutions.git +::: + +## What is a vector in machine learning? + +In the context of machine learning, a vector is a mathematical representation of data. It is an ordered list of numbers that encode the features or attributes of a piece of data. + +Vectors can be thought of as points in a multi-dimensional space where each dimension corresponds to a feature. +**For example**, consider a simple dataset about ecommerce `products`. Each product might have features such as `price`, `quality`, and `popularity`. + +| Id | Product | Price ($) | Quality (1 - 10) | Popularity (1 - 10) | +| --- | ---------------------------------------- | --------- | ---------------- | ------------------- | +| 1 | Puma Men Race Black Watch | 150 | 5 | 8 | +| 2 | Puma Men Top Fluctuation Red Black Watch | 180 | 7 | 6 | +| 3 | Inkfruit Women Behind Cream Tshirt | 5 | 9 | 7 | + +Now, product 1 `Puma Men Race Black Watch` might be represented as the vector `[150, 5, 8]` + +In a more complex scenario, like natural language processing (NLP), words or entire sentences can be converted into dense vectors (often referred to as embeddings) that capture the semantic meaning of the text.Vectors play a foundational role in many machine learning algorithms, particularly those that involve distance measurements, such as clustering and classification algorithms. + +## What is a vector database? + +A vector database is a database that's optimized for storing and searching vectors. It's a specialized database that's designed to store and search vectors efficiently. Vector databases are often used to power vector search applications, such as recommendation systems, image search, and textual content retrieval. Vector databases are also referred to as vector stores, vector indexes, or vector search engines. Vector databases use vector similarity algorithms to search for vectors that are similar to a given query vector. + +:::tip + +[**Redis Cloud**](https://redis.com/try-free) is a popular choice for vector databases, as it offers a rich set of data structures and commands that are well-suited for vector storage and search. Redis Cloud allows you to index vectors and perform vector similarity search in a few different ways outlined further in this tutorial. It also maintains a high level of performance and scalability. + +::: + +## What is vector similarity? + +Vector similarity is a measure that quantifies how alike two vectors are, typically by evaluating the `distance` or `angle` between them in a multi-dimensional space. +When vectors represent data points, such as texts or images, the similarity score can indicate how similar the underlying data points are in terms of their features or content. + +### Use cases for vector similarity + +- **Recommendation Systems**: If you have vectors representing user preferences or item profiles, you can quickly find items that are most similar to a user's preference vector. +- **Image Search**: Store vectors representing image features, and then retrieve images most similar to a given image's vector. +- **Textual Content Retrieval**: Store vectors representing textual content (e.g., articles, product descriptions) and find the most relevant texts for a given query vector. + +:::tip CALCULATING VECTOR SIMILARITY + +If you're interested in learning more about the mathematics behind vector similarity, scroll down to the [**How to calculate vector similarity?**](#how-to-calculate-vector-similarity) section. + +::: + +## Generating vectors + +In our scenario, our focus revolves around generating sentence (product description) and image (product image) embeddings or vectors. There's an abundance of AI model repositories, like GitHub, where AI models are pre-trained, maintained, and shared. + +For sentence embeddings, we'll employ a model from [Hugging Face Model Hub](https://huggingface.co/models), and for image embeddings, one from [TensorFlow Hub](https://tfhub.dev/) will be leveraged for variety. + +:::tip GITHUB CODE + +Below is a command to the clone the source code used in this tutorial + +git clone https://github.com/redis-developer/redis-vector-nodejs-solutions.git +::: + +### Sentence vector + +To generate sentence embeddings, we'll make use of a Hugging Face model titled [Xenova/all-distilroberta-v1](https://huggingface.co/Xenova/all-distilroberta-v1). It's a compatible version of [sentence-transformers/all-distilroberta-v1](https://huggingface.co/sentence-transformers/all-distilroberta-v1) for transformer.js with ONNX weights. + +:::info + +[Hugging Face Transformers](https://huggingface.co/docs/transformers.js/index) +is a renowned open-source tool for Natural Language Processing (NLP) tasks. +It simplifies the use of cutting-edge NLP models. + +The transformers.j library is essentially the JavaScript version of Hugging Face's popular Python library. + +::: + +:::info + +[ONNX (Open Neural Network eXchange)](https://onnx.ai) is an open standard +that defines a common set of operators and a common file format to represent deep +learning models in a wide variety of frameworks, including PyTorch and TensorFlow + +::: + +Below, you'll find a Node.js code snippet that illustrates how to create vector embeddings for any provided `sentence`: + +```sh +npm install @xenova/transformers +``` + +```js title="src/text-vector-gen.ts" +import * as transformers from '@xenova/transformers'; + +async function generateSentenceEmbeddings(_sentence): Promise { + let modelName = 'Xenova/all-distilroberta-v1'; + let pipe = await transformers.pipeline('feature-extraction', modelName); + + let vectorOutput = await pipe(_sentence, { + pooling: 'mean', + normalize: true, + }); + + const embeddings: number[] = Object.values(vectorOutput?.data); + return embeddings; +} + +export { generateSentenceEmbeddings }; +``` + +Here's a glimpse of the vector output for a sample text: + +```js title="sample output" +const embeddings = await generateSentenceEmbeddings('I Love Redis !'); +console.log(embeddings); +/* + 768 dim vector output + embeddings = [ + -0.005076113156974316, -0.006047232076525688, -0.03189406543970108, + -0.019677048549056053, 0.05152582749724388, -0.035989608615636826, + -0.009754283353686333, 0.002385444939136505, -0.04979122802615166, + ....] +*/ +``` + +### Image vector + +To obtain image embeddings, we'll leverage the [mobilenet](https://github.com/tensorflow/tfjs-models/tree/master/mobilenet) model from TensorFlow. + +Below, you'll find a Node.js code snippet that illustrates how to create vector embeddings for any provided `image`: + +```sh +npm i @tensorflow/tfjs @tensorflow/tfjs-node @tensorflow-models/mobilenet jpeg-js +``` + +```js title="src/image-vector-gen.ts" +import * as tf from '@tensorflow/tfjs-node'; +import * as mobilenet from '@tensorflow-models/mobilenet'; + +import * as jpeg from 'jpeg-js'; + +import * as path from 'path'; +import { fileURLToPath } from 'url'; +import * as fs from 'fs/promises'; + +const __filename = fileURLToPath(import.meta.url); +const __dirname = path.dirname(__filename); + +async function decodeImage(imagePath) { + imagePath = path.join(__dirname, imagePath); + + const buffer = await fs.readFile(imagePath); + const rawImageData = jpeg.decode(buffer); + const imageTensor = tf.browser.fromPixels(rawImageData); + return imageTensor; +} + +async function generateImageEmbeddings(imagePath: string) { + const imageTensor = await decodeImage(imagePath); + + // Load MobileNet model + const model = await mobilenet.load(); + + // Classify and predict what the image is + const prediction = await model.classify(imageTensor); + console.log(`${imagePath} prediction`, prediction); + + // Preprocess the image and get the intermediate activation. + const activation = model.infer(imageTensor, true); + + // Convert the tensor to a regular array. + const vectorOutput = await activation.data(); + + imageTensor.dispose(); // Clean up tensor + + return vectorOutput; //DIM 1024 +} +``` + +
+ +
+ +:::tip Image classification model + +We are using [mobilenet model](https://github.com/tensorflow/tfjs-models/tree/master/mobilenet) which is trained only on small [set of image classes](https://github.com/tensorflow/tfjs-examples/blob/master/mobilenet/imagenet_classes.js). The choice of an image classification model depends on various factors, such as the dataset size, dataset diversity, computational resources, and the specific requirements of your application. There are various alternative image classification models, such as EfficientNet, ResNets, and Vision Transformers (ViT), that you can select based on your needs. +::: + +Below is an illustration of the vector output for a sample watch image: + +
+ +ecommerce watch + +
+ +```js title="sample output" +//watch image +const imageEmbeddings = await generateImageEmbeddings('images/11001.jpg'); +console.log(imageEmbeddings); +/* + 1024 dim vector output + imageEmbeddings = [ + 0.013823275454342365, 0.33256298303604126, 0, + 2.2764432430267334, 0.14010703563690186, 0.972867488861084, + 1.2307443618774414, 2.254523992538452, 0.44696325063705444, + ....] + + images/11001.jpg (mobilenet model) prediction [ + { className: 'digital watch', probability: 0.28117117285728455 }, + { className: 'spotlight, spot', probability: 0.15369531512260437 }, + { className: 'digital clock', probability: 0.15267866849899292 } +] +*/ +``` + +## Database setup + +:::tip GITHUB CODE + +Below is a command to the clone the source code used in this tutorial + +git clone https://github.com/redis-developer/redis-vector-nodejs-solutions.git +::: + +### Sample Data seeding + +For the purposes of this tutorial, let's consider a simplified e-commerce context. The `products` JSON provided offers a glimpse into vector search functionalities we'll be discussing. + +```js title="src/data.ts" +const products = [ + { + _id: '1', + price: 4950, + productDisplayName: 'Puma Men Race Black Watch', + brandName: 'Puma', + ageGroup: 'Adults-Men', + gender: 'Men', + masterCategory: 'Accessories', + subCategory: 'Watches', + imageURL: 'images/11002.jpg', + productDescription: + '

This watch from puma comes in a heavy duty design. The asymmetric dial and chunky casing gives this watch a tough appearance perfect for navigating the urban jungle.

Dial shape
: Round
Case diameter: 32 cm
Warranty: 2 Years

Stainless steel case with a fixed bezel for added durability, style and comfort
Leather straps with a tang clasp for comfort and style
Black dial with cat logo on the 12 hour mark
Date aperture at the 3 hour mark
Analog time display
Solid case back made of stainless steel for enhanced durability
Water resistant upto 100 metres

', + }, + { + _id: '2', + price: 5450, + productDisplayName: 'Puma Men Top Fluctuation Red Black Watches', + brandName: 'Puma', + ageGroup: 'Adults-Men', + gender: 'Men', + masterCategory: 'Accessories', + subCategory: 'Watches', + imageURL: 'images/11001.jpg', + productDescription: + '

This watch from puma comes in a clean sleek design. This active watch is perfect for urban wear and can serve you well in the gym or a night of clubbing.

Case diameter
: 40 mm<

', + }, + + { + _id: '3', + price: 499, + productDisplayName: 'Inkfruit Women Behind Cream Tshirts', + brandName: 'Inkfruit', + ageGroup: 'Adults-Women', + gender: 'Women', + masterCategory: 'Apparel', + subCategory: 'Topwear', + imageURL: 'images/11008.jpg', + productDescription: + '

Composition
Yellow round neck t-shirt made of 100% cotton, has short sleeves and graphic print on the front

Fitting
Comfort

Wash care
Hand wash separately in cool water at 30 degrees
Do not scrub
Do not bleach
Turn inside out and dry flat in shade
Warm iron on reverse
Do not iron on print

Flaunt your pretty, long legs in style with this inkfruit t-shirt. The graphic print oozes sensuality, while the cotton fabric keeps you fresh and comfortable all day. Team this with a short denim skirt and high-heeled sandals and get behind the wheel in style.

Model statistics
The model wears size M in t-shirts
Height: 5\'7", Chest: 33"

', + }, +]; +``` + +Below is the sample code to seed `products` data as JSON in Redis. The data also includes vectors of both product descriptions and images. + +```js title="src/index.ts" +async function addProductWithEmbeddings(_products) { + const nodeRedisClient = getNodeRedisClient(); + + if (_products && _products.length) { + for (let product of _products) { + console.log( + `generating description embeddings for product ${product._id}`, + ); + const sentenceEmbedding = await generateSentenceEmbeddings( + product.productDescription, + ); + product['productDescriptionEmbeddings'] = sentenceEmbedding; + + console.log(`generating image embeddings for product ${product._id}`); + const imageEmbedding = await generateImageEmbeddings(product.imageURL); + product['productImageEmbeddings'] = imageEmbedding; + + await nodeRedisClient.json.set(`products:${product._id}`, '$', { + ...product, + }); + console.log(`product ${product._id} added to redis`); + } + } +} +``` + +You can observe products JSON data in RedisInsight: + +![products data in RedisInsight](./images/products-data-gui.png) + +:::tip + +Download [RedisInsight](https://redis.com/redis-enterprise/redis-insight/) to visually explore your Redis data or to engage with raw Redis commands in the workbench. Dive deeper into RedisInsight with these [tutorials](/explore/redisinsight/). + +::: + +### Create vector index + +For searches to be conducted on JSON fields in Redis, they must be indexed. The methodology below highlights the process of indexing different types of fields. This encompasses vector fields such as `productDescriptionEmbeddings` and `productImageEmbeddings`. + +```ts title="src/redis-index.ts" +import { + createClient, + SchemaFieldTypes, + VectorAlgorithms, + RediSearchSchema, +} from 'redis'; + +const PRODUCTS_KEY_PREFIX = 'products'; +const PRODUCTS_INDEX_KEY = 'idx:products'; +const REDIS_URI = 'redis://localhost:6379'; +let nodeRedisClient = null; + +const getNodeRedisClient = async () => { + if (!nodeRedisClient) { + nodeRedisClient = createClient({ url: REDIS_URI }); + await nodeRedisClient.connect(); + } + return nodeRedisClient; +}; + +const createRedisIndex = async () => { + /* (RAW COMMAND) + FT.CREATE idx:products + ON JSON + PREFIX 1 "products:" + SCHEMA + "$.productDisplayName" as productDisplayName TEXT NOSTEM SORTABLE + "$.brandName" as brandName TEXT NOSTEM SORTABLE + "$.price" as price NUMERIC SORTABLE + "$.masterCategory" as "masterCategory" TAG + "$.subCategory" as subCategory TAG + "$.productDescriptionEmbeddings" as productDescriptionEmbeddings VECTOR "FLAT" 10 + "TYPE" FLOAT32 + "DIM" 768 + "DISTANCE_METRIC" "L2" + "INITIAL_CAP" 111 + "BLOCK_SIZE" 111 + "$.productDescription" as productDescription TEXT NOSTEM SORTABLE + "$.imageURL" as imageURL TEXT NOSTEM + "$.productImageEmbeddings" as productImageEmbeddings VECTOR "HNSW" 8 + "TYPE" FLOAT32 + "DIM" 1024 + "DISTANCE_METRIC" "COSINE" + "INITIAL_CAP" 111 + + */ + const nodeRedisClient = await getNodeRedisClient(); + + const schema: RediSearchSchema = { + '$.productDisplayName': { + type: SchemaFieldTypes.TEXT, + NOSTEM: true, + SORTABLE: true, + AS: 'productDisplayName', + }, + '$.brandName': { + type: SchemaFieldTypes.TEXT, + NOSTEM: true, + SORTABLE: true, + AS: 'brandName', + }, + '$.price': { + type: SchemaFieldTypes.NUMERIC, + SORTABLE: true, + AS: 'price', + }, + '$.masterCategory': { + type: SchemaFieldTypes.TAG, + AS: 'masterCategory', + }, + '$.subCategory': { + type: SchemaFieldTypes.TAG, + AS: 'subCategory', + }, + '$.productDescriptionEmbeddings': { + type: SchemaFieldTypes.VECTOR, + TYPE: 'FLOAT32', + ALGORITHM: VectorAlgorithms.FLAT, + DIM: 768, + DISTANCE_METRIC: 'L2', + INITIAL_CAP: 111, + BLOCK_SIZE: 111, + AS: 'productDescriptionEmbeddings', + }, + '$.productDescription': { + type: SchemaFieldTypes.TEXT, + NOSTEM: true, + SORTABLE: true, + AS: 'productDescription', + }, + '$.imageURL': { + type: SchemaFieldTypes.TEXT, + NOSTEM: true, + AS: 'imageURL', + }, + '$.productImageEmbeddings': { + type: SchemaFieldTypes.VECTOR, + TYPE: 'FLOAT32', + ALGORITHM: VectorAlgorithms.HNSW, //Hierarchical Navigable Small World graphs + DIM: 1024, + DISTANCE_METRIC: 'COSINE', + INITIAL_CAP: 111, + AS: 'productImageEmbeddings', + }, + }; + console.log(`index ${PRODUCTS_INDEX_KEY} created`); + + try { + await nodeRedisClient.ft.dropIndex(PRODUCTS_INDEX_KEY); + } catch (indexErr) { + console.error(indexErr); + } + await nodeRedisClient.ft.create(PRODUCTS_INDEX_KEY, schema, { + ON: 'JSON', + PREFIX: PRODUCTS_KEY_PREFIX, + }); +}; +``` + +:::info FLAT VS HNSW indexing + +FLAT: When vectors are indexed in a "FLAT" structure, they're stored in their original form without any added hierarchy. A search against a FLAT index will require the algorithm to scan each vector linearly to find the most similar matches. While this is accurate, it's computationally intensive and slower, making it ideal for smaller datasets. + +HNSW (Hierarchical Navigable Small World): HNSW is a graph-centric method tailored for indexing high-dimensional data. With larger datasets, linear comparisons against every vector in the index become time-consuming. HNSW employs a probabilistic approach, ensuring faster search results but with a slight trade-off in accuracy. + +::: + +:::info INITIAL_CAP and BLOCK_SIZE parameters + +Both INITIAL_CAP and BLOCK_SIZE are configuration parameters that control how vectors are stored and indexed. + +INITIAL_CAP defines the initial capacity of the vector index. It helps in pre-allocating space for the index. + +BLOCK_SIZE defines the size of each block of the vector index. As more vectors are added, Redis will allocate memory in chunks, with each chunk being the size of the BLOCK_SIZE. It helps in optimizing the memory allocations during index growth. + +::: + +## What is vector KNN query? + +KNN, or k-Nearest Neighbors, is an algorithm used in both classification and regression tasks, but when referring to "KNN Search," we're typically discussing the task of finding the "k" points in a dataset that are closest (most similar) to a given query point. In the context of vector search, this means identifying the "k" vectors in our database that are most similar to a given query vector, usually based on some distance metric like cosine similarity or Euclidean distance. + +### Vector KNN query with Redis + +Redis allows you to index and then search for vectors [using the KNN approach](https://redis.io/docs/stack/search/reference/vectors/#pure-knn-queries). + +Below, you'll find a Node.js code snippet that illustrates how to perform `KNN query` for any provided `search text`: + +```ts title="src/knn-query.ts" +const float32Buffer = (arr) => { + const floatArray = new Float32Array(arr); + const float32Buffer = Buffer.from(floatArray.buffer); + return float32Buffer; +}; +const queryProductDescriptionEmbeddingsByKNN = async ( + _searchTxt, + _resultCount, +) => { + //A KNN query will give us the top n documents that best match the query vector. + + /* sample raw query + + FT.SEARCH idx:products + "*=>[KNN 5 @productDescriptionEmbeddings $searchBlob AS score]" + RETURN 4 score brandName productDisplayName imageURL + SORTBY score + PARAMS 2 searchBlob "6\xf7\..." + DIALECT 2 + + */ + //https://redis.io/docs/interact/search-and-query/query/ + + console.log(`queryProductDescriptionEmbeddingsByKNN started`); + let results = {}; + if (_searchTxt) { + _resultCount = _resultCount ?? 5; + + const nodeRedisClient = getNodeRedisClient(); + const searchTxtVectorArr = await generateSentenceEmbeddings(_searchTxt); + + const searchQuery = `*=>[KNN ${_resultCount} @productDescriptionEmbeddings $searchBlob AS score]`; + + results = await nodeRedisClient.ft.search(PRODUCTS_INDEX_KEY, searchQuery, { + PARAMS: { + searchBlob: float32Buffer(searchTxtVectorArr), + }, + RETURN: ['score', 'brandName', 'productDisplayName', 'imageURL'], + SORTBY: { + BY: 'score', + // DIRECTION: "DESC" + }, + DIALECT: 2, + }); + } else { + throw 'Search text cannot be empty'; + } + + return results; +}; +``` + +Please find output for a KNN query in Redis **(A lower score or distance in the output signifies a higher degree of similarity.)** + +```js title="sample output" +const result = await queryProductDescriptionEmbeddingsByKNN( + 'Puma watch with cat', //search text + 3, //max number of results expected +); +console.log(JSON.stringify(result, null, 4)); + +/* +{ + "total": 3, + "documents": [ + { + "id": "products:1", + "value": { + "score": "0.762174725533", + "brandName": "Puma", + "productDisplayName": "Puma Men Race Black Watch", + "imageURL": "images/11002.jpg" + } + }, + { + "id": "products:2", + "value": { + "score": "0.825711071491", + "brandName": "Puma", + "productDisplayName": "Puma Men Top Fluctuation Red Black Watches", + "imageURL": "images/11001.jpg" + } + }, + { + "id": "products:3", + "value": { + "score": "1.79949247837", + "brandName": "Inkfruit", + "productDisplayName": "Inkfruit Women Behind Cream Tshirts", + "imageURL": "images/11008.jpg" + } + } + ] +} +*/ +``` + +:::note +KNN queries can be combined with standard Redis search functionalities using [hybrid knn queries](https://redis.io/docs/interact/search-and-query/search/vectors/#hybrid-knn-queries). +::: + +## What is vector range query? + +Range queries retrieve data that falls within a specified range of values. +For vectors, a "range query" typically refers to retrieving all vectors within a certain distance of a target vector. The "range" in this context is a radius in the vector space. + +### Vector range query with Redis + +Below, you'll find a Node.js code snippet that illustrates how to perform vector `range query` for any range (radius/ distance)provided: + +```js title="src/range-query.ts" +const queryProductDescriptionEmbeddingsByRange = async (_searchTxt, _range) => { + /* sample raw query + + FT.SEARCH idx:products + "@productDescriptionEmbeddings:[VECTOR_RANGE $searchRange $searchBlob]=>{$YIELD_DISTANCE_AS: score}" + RETURN 4 score brandName productDisplayName imageURL + SORTBY score + PARAMS 4 searchRange 0.685 searchBlob "A=\xe1\xbb\x8a\xad\x...." + DIALECT 2 + */ + + console.log(`queryProductDescriptionEmbeddingsByRange started`); + let results = {}; + if (_searchTxt) { + _range = _range ?? 1.0; + + const nodeRedisClient = getNodeRedisClient(); + + const searchTxtVectorArr = await generateSentenceEmbeddings(_searchTxt); + + const searchQuery = + '@productDescriptionEmbeddings:[VECTOR_RANGE $searchRange $searchBlob]=>{$YIELD_DISTANCE_AS: score}'; + + results = await nodeRedisClient.ft.search(PRODUCTS_INDEX_KEY, searchQuery, { + PARAMS: { + searchBlob: float32Buffer(searchTxtVectorArr), + searchRange: _range, + }, + RETURN: ['score', 'brandName', 'productDisplayName', 'imageURL'], + SORTBY: { + BY: 'score', + // DIRECTION: "DESC" + }, + DIALECT: 2, + }); + } else { + throw 'Search text cannot be empty'; + } + + return results; +}; +``` + +Please find output for a range query in Redis + +```js title="sample output" +const result2 = await queryProductDescriptionEmbeddingsByRange( + 'Puma watch with cat', //search text + 1.0, //with in range or distance +); +console.log(JSON.stringify(result2, null, 4)); +/* +{ + "total": 2, + "documents": [ + { + "id": "products:1", + "value": { + "score": "0.762174725533", + "brandName": "Puma", + "productDisplayName": "Puma Men Race Black Watch", + "imageURL": "images/11002.jpg" + } + }, + { + "id": "products:2", + "value": { + "score": "0.825711071491", + "brandName": "Puma", + "productDisplayName": "Puma Men Top Fluctuation Red Black Watches", + "imageURL": "images/11001.jpg" + } + } + ] +} +*/ +``` + +:::info Image vs text vector query +The syntax for KNN/range vector queries remains consistent whether you're dealing with image vectors or text vectors. +::: + +## How to calculate vector similarity? + +Several techniques are available to assess vector similarity, with some of the most prevalent ones being: + +### Euclidean Distance (L2 norm) + +**Euclidean Distance (L2 norm)** calculates the linear distance between two points within a multi-dimensional space. Lower values indicate closer proximity, and hence higher similarity. + +EuclideanDistanceFormulaImage + +For illustration purposes, let's assess `product 1` and `product 2` from the earlier ecommerce dataset and determine the `Euclidean Distance` considering all features. + +EuclideanDistanceSampleImage + +As an example, we will use a 2D chart made with [chart.js](https://www.chartjs.org/) comparing the `Price vs. Quality` features of our products, focusing solely on these two attributes to compute the `Euclidean Distance`. + +![chart](./images/euclidean-distance-chart.png) + +### Cosine Similarity + +**Cosine Similarity** measures the cosine of the angle between two vectors. The cosine similarity value ranges between -1 and 1. A value closer to 1 implies a smaller angle and higher similarity, while a value closer to -1 implies a larger angle and lower similarity. Cosine similarity is particularly popular in NLP when dealing with text vectors. + +CosineFormulaImage + +:::note +If two vectors are pointing in the same direction, the cosine of the angle between them is 1. If they're orthogonal, the cosine is 0, and if they're pointing in opposite directions, the cosine is -1. +::: + +Again, consider `product 1` and `product 2` from the previous dataset and calculate the `Cosine Distance` for all features. + +![sample](./images/cosine-sample.png) + +Using [chart.js](https://www.chartjs.org/), we've crafted a 2D chart of `Price vs. Quality` features. It visualizes the `Cosine Similarity` solely based on these attributes. + +![chart](./images/cosine-chart.png) + +### Inner Product + +**Inner Product (dot product)** The inner product (or dot product) isn't a distance metric in the traditional sense but can be used to calculate similarity, especially when vectors are normalized (have a magnitude of 1). It's the sum of the products of the corresponding entries of the two sequences of numbers. + +IpFormulaImage + +:::note +The inner product can be thought of as a measure of how much two vectors "align" +in a given vector space. Higher values indicate higher similarity. However, the raw +values can be large for long vectors; hence, normalization is recommended for better +interpretation. If the vectors are normalized, their dot product will be `1 if they are identical` and `0 if they are orthogonal` (uncorrelated). +::: + +Considering our `product 1` and `product 2`, let's compute the `Inner Product` across all features. + +![sample](./images/ip-sample.png) + +:::tip +Vectors can also be stored in databases in **binary formats** to save space. In practical applications, it's crucial to strike a balance between the dimensionality of the vectors (which impacts storage and computational costs) and the quality or granularity of the information they capture. +:::