diff --git a/README.md b/README.md index 773b2194f6..99f27d44f1 100644 --- a/README.md +++ b/README.md @@ -36,45 +36,45 @@ Features without links are part of our future roadmap. ### Model I/O -**Language Models:** A foundational feature is a common client API for interacting with various Large Language Models (LLMs). +**AI Models:** A foundational feature is a common client API for interacting with generative AI Models. A common API enables you to develop an application targeting OpenAI's ChatGPT HTTP interface and easily switch to Azure's OpenAI service, as an example. -**Prompts:** At the center of LLM interaction is the Prompt - a set of instructions for the LLM to respond to. +**Prompts:** At the center of the AI model interaction is the Prompt - a set of instructions for the AI model to respond to. Creating an effective Prompt is part art and part science, giving rise to the discipline of Prompt Engineering. Prompts utilize a templating engine, enabling easy replacement of data within prompt text placeholders. -**Output Parsers:** LLM responses are typically a raw `java.lang.String`. Output Parsers transform the raw String into structured formats like CSV or JSON, to make the output usable in a programming environment. +**Output Parsers:** The AI responses are typically a raw `java.lang.String`. Output Parsers transform the raw String into structured formats like CSV or JSON, to make the output usable in a programming environment. Output Parsers may also do additional post-processing on the response String. ### Incorporating your data -**Data Management:** A significant innovation in Generative AI involves enabling LLMs to understand your proprietary data without having to retrain the model's weights. Retraining a model is a complex and compute-intensive task. +**Data Management:** A significant innovation in Generative AI involves enabling the model to understand your proprietary data without having to retrain the model's weights. Retraining a model is a complex and compute-intensive task. Recent Generative AI models have billions of parameters that require specialized hard-to-find hardware making it practically impossible to retrain the largest of models. Instead, the 'In-context' learning technique lets you more easily incorporate your data into the pre-trained model. This data can be from text files, HTML, database results, etc. -Effectively incorporating your data in a LLM requires specific techniques critical for developing successful solutions. +Effectively incorporating your data in an AI model requires specific techniques critical for developing successful solutions. -**Vector Stores:** A widely used technique to incorporate your data in a LLM is using Vector Databases. -Vector Databases help to classify which part of your documents are most relevant for the LLM to use in creating a response. +**Vector Stores:** A widely used technique to incorporate your data in a AI model is using Vector Databases. +Vector Databases help to classify which part of your documents are most relevant for the AI model to use in creating a response. Examples of Vector Databases are Chroma, Pinecone, Weaviate, Mongo Atlas, and RediSearch. Spring IO abstracts these databases, allowing easy swapping of implementations. -### Chaining together multiple LLM interactions +### Chaining together multiple AI model interactions -**Chains:** Many AI solutions require multiple LLM interactions to respond to a single user input. +**Chains:** Many AI solutions require multiple AI interactions to respond to a single user input. "Chains" organize these interactions, offering modular AI workflows that promote reusability. While you can create custom Chains tailored to your specific use case, pre-configured use-case-specific Chains are provided to accelerate your development. Use-cases such as Question-Answering, Text Generation, and Summarization are examples. ### Memory -**Memory:** To support multiple LLM interactions, your application must recall the previous inputs and outputs. +**Memory:** To support multiple AI model interactions, your application must recall the previous inputs and outputs. A variety of algorithms are available for different scenarios, often backed by databases like Redis, Cassandra, MongoDB, Postgres, and other database technologies. ### Agents Beyond Chains, Agents represent the next level of sophistication. -Agents use the LLM to determine the techniques and steps to respond to a user's query. +Agents use the AI models themselves to determine the techniques and steps to respond to a user's query. Agents might even dynamically access external data sources to retrieve information necessary for responding to a user. It's getting a bit funky, isn't it? @@ -85,3 +85,24 @@ It's getting a bit funky, isn't it? * Documentation * [JavaDocs](https://docs.spring.io/spring-ai/docs/current-SNAPSHOT/) +## Building + +To build with only unit tests + +```shell +./mvnw clean package +``` + +To build including integration tests. +You will need to set environment variables for API keys to OpenAI + +```shell +./mvww clean package -Pintegration-tests +``` + +To build the docs +```shell +./mvnw -pl spring-ai-docs antora +``` + +The docs are then in the directory `spring-ai-docs/target/antora/site/index.html` \ No newline at end of file diff --git a/spring-ai-docs/concepts-staging.adoc b/spring-ai-docs/concepts-staging.adoc new file mode 100644 index 0000000000..b0ed2e7087 --- /dev/null +++ b/spring-ai-docs/concepts-staging.adoc @@ -0,0 +1,126 @@ + +== Prompts + +Prompts serve as the foundation for language-based inputs that guide an AI model to produce specific outputs. +While this might seem intuitive considering our interactions with ChatGPT, crafting effective prompts involves both an art and a science. +The wording of the language utilized significantly impacts the AI model's responses, and there specific patters and words are recognized by the model to guide responses in the intended direction. + +The importance of this skill has led to the emergence of "Prompt Engineering." +When an effective prompt for a particular use case is identified, it is often shared within the community. + +== Prompt Templates + +Creating effective prompts involves establishing the context of the request and substituting parts of the request with values specific to the user's input. + +This process utilizes traditional text-based Template engines for prompt creation and management. +Spring AI employs the OSS library, StringTemplate, for this purpose. + +For instance, consider the simple prompt template: Tell me a {adjective} joke about {content}. + +In Spring AI, Prompt Templates can be likened to the 'View' in Spring MVC architecture. +A model object, typically a java.util.Map, is provided to populate placeholders within the template. +The 'rendered' string becomes the content of the Prompt supplied to the AI model. + +There is considerable variability in the specific data format of the Prompt sent to the model. +Initially starting as simple strings, prompts have evolved to include multiple messages, where each string in each message represents a distinct role for the model. + +== Tokens + +Tokens serve as the building blocks of how an AI model works. +On input, Models convert words to tokens, and on output, they convert tokens back to words. + +In English, one token roughly corresponds to 75% of a word. For reference, Shakespeare's complete works, totaling around 900,000 words, translates to approximately 1.2 million tokens. + +Perhaps more important is that Tokens = *`$`*. + +In the context of hosted AI models, your charges are determined by the number of tokens utilized. Both input and output contribute to the overall token count. + +Also, models are subject to token limits, which restrict the amount of text processed in a single API call. +This threshold is often referred to as the 'context window'. The model won't process any text exceeding this limit. + +For instance, ChatGPT3 has a 4K token limit, while GPT4 offers varying options, such as 8K, 16K, and 32K. +Anthropic's Claude AI model features a 100K token limit, and Meta's recent research yielded a 1M token limit model. + +To summarize the collected works of Shakespeare with GPT4, you need to devise software engineering strategies to chop up the data and present the data within the model's context window limits. +This is an area that the Spring AI project helps you with. + +== Output Parsing + +The output of AI models traditionally arrives as a `java.util.String`, even if you ask for the reply to be in JSON. +It may be the correct JSON, but it isn't a JSON data structure. It is just a string. +Also, asking "for JSON" as part of the prompt isn't 100% accurate. + +This intricacy has led to the emergence of a specialized field involving the creation of prompts to yield the intended output, followed by parsing the resulting simple string into a usable data structure for application integration. + +Output parsing employs meticulously crafted prompts, often necessitating multiple interactions with the model to achieve the desired formatting. + +This challenge has prompted OpenAI to introduce 'OpenAI Functions' as a means to specify the desired output format from the model precisely. + +== Chaining Calls + +A Chain is the concept that represents a series of calls to a AI model. +It uses the output from one call as the input to another. + +By chaining calls together, you can support complex use-cases by composing pipelines of multiple chains. + +== Customizing Models: Integrating Your Data + +How can you equip the AI model with information it hasn't been trained on? + +It's important to note that the GPT 3.5/4.0 dataset extends only until September 2021. +Consequently, the model will say that it doesn't know the answer for questions that require knowledge beyond that date. +An interesting bit of trivia is that this dataset is around ~650GB. + +Two techniques exist for customizing the AI model to incorporate your data: + +Fine Tuning: This traditional Machine Learning technique involves tailoring the model and changing its internal weighting. +However, it's a challenging process for Machine Learning experts and extremely resource-intensive for models like GPT due to their size. Additionally, some models might not offer this option. + +Prompt Stuffing: A more practical alternative involves embedding your data within the prompt provided to the model. Given a model's token limits, techniques are required to present relevant data within the model's context window. +This approach is colloquially referred to as 'stuffing the prompt'. + +The Spring AI library helps you implement solutions based on the 'stuffing of the prompt' technique. + + +== Retrieval Augmented Generation + +A technique termed Retrieval Augmented Generation has emerged to address the challenge of incorporating relevant data into prompts for accurate AI model responses. + +The approach involves extracting data from your source and segmenting it into smaller units, each within the model's token limit. These pieces are then stored in a database. +When a user's request is received, the most pertinent document fragments are retrieved from the database to enrich the prompt, aiding the AI model's response accuracy. + +Data Loaders play a pivotal role in this process, reading and formatting your data into fragments suitable for database storage. +For optimal retrieval of related documents, a Vector Database is the type of database best suited for this task. + +Data Loaders and Vector Database are the fundamental building blocks for solving use cases such as "Q&A over my documentation". + + + +=== Data Loaders + +TBD + +=== Vector Databases + +== Evaluating AI responses + +Effectively evaluating the output of an AI system in response to user requests is very important to ensuring accuracy and usefulness of the final application. +Several emerging techniques enable the use of the pretrained model itself for this purpose. + +This evaluation process involves analyzing whether the generated response aligns with the user's intent and the context of the query. Metrics such as relevance, coherence, and factual correctness are used to gauge the quality of the AI-generated response. + +One approach involves presenting both the user's request and the AI model's response to the model, querying whether the response aligns with the provided data. + +Furthermore, leveraging the information stored in the Vector Database as supplementary data can enhance the evaluation process, aiding in the determination of response relevance. + +The Spring AI project currenlty provides some very basic examples of how you can evaluate the responses in the form of prompts to include in a JUnit test. + + + + + + + + + + diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/nav.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/nav.adoc index 338e20e272..a2965c0121 100644 --- a/spring-ai-docs/src/main/antora/modules/ROOT/nav.adoc +++ b/spring-ai-docs/src/main/antora/modules/ROOT/nav.adoc @@ -1,6 +1,6 @@ * xref:index.adoc[Overview] -* xref:domain/index.adoc[] -** xref:domain/prompt.adoc[] +* xref:concepts.adoc[AI Concepts] +* xref:getting-started.adoc[Getting Started] * xref:prompt/index.adoc[] * xref:client/index.adoc[] ** xref:client/usage.adoc[] diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/concepts.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/concepts.adoc new file mode 100644 index 0000000000..0055e7b2f0 --- /dev/null +++ b/spring-ai-docs/src/main/antora/modules/ROOT/pages/concepts.adoc @@ -0,0 +1,173 @@ += AI Concepts + +== Models + +AI models are algorithms designed to process and generate information, often mimicking human cognitive functions. +By learning patterns and insights from large datasets, these models can make predictions, generate text, images, or other outputs, enhancing various applications across industries. + +There are many different types of AI models, each suited for a specific use case. +While ChatGPT and its generative AI capabilities have captivated users through text input and output, many models and companies offer diverse inputs and outputs. +Before ChatGPT, many people were fascinated by text-to-image generation models such as Midjourney and Stable Diffusion. + +The following table categorizes several models based on their input and output types. + + +[cols=3*, options=header] +|=== +|Input +|Output +|Examples + +|Language/Code/Images (Multi Modal) +|Language/Code +|GPT4 - OpenAI + +|Language/Code +|Language/Code +|GPT 3.5 - OpenAI-Azure OpenAI, Google Bard, Meta Llama + +|Language +|Image +|Dall-E - OpenAI + Azure, Deep AI + +|Language/Image +|Image +|Midjourney, Stable Diffusion, RunwayML + +|Text +|Numbers +|Many, aka, Embeddings +|=== + +The initial focus of Spring AI is on models that process language input and provide language output, initially OpenAI + Azure OpenAI. +The last row in the previous table, which accepts text as input and output numbers, is more commonly known as Embedding text and represents the internal data structures used in an AI model. +Spring AI has support for Embeddings to support more advanced use-cases. + +What sets models like GPT apart is their pre-trained nature, as indicated by the "P" in GPT—Chat Generative Pre-Trained Transformer. +This pre-training feature transforms AI into a general developer tool that doesn't necessitate an extensive machine learning or model training background. + + +== Prompts + +Prompts serve as the foundation for language-based inputs that guide an AI model to produce specific outputs. +While this might seem intuitive considering our interactions with ChatGPT, crafting effective prompts involves both an art and a science. +The wording of the language utilized significantly impacts the AI model's responses, and there specific patters and words are recognized by the model to guide responses in the intended direction. + +The importance of this skill has led to the emergence of "Prompt Engineering." +When an effective prompt for a particular use case is identified, it is often shared within the community. + + +== Prompt Templates + +Creating effective prompts involves establishing the context of the request and substituting parts of the request with values specific to the user's input. + +This process utilizes traditional text-based Template engines for prompt creation and management. +Spring AI employs the OSS library, StringTemplate, for this purpose. + +For instance, consider the simple prompt template: + +``` +Tell me a {adjective} joke about {content}. +``` + +In Spring AI, Prompt Templates can be likened to the 'View' in Spring MVC architecture. +A model object, typically a java.util.Map, is provided to populate placeholders within the template. +The 'rendered' string becomes the content of the Prompt supplied to the AI model. + +There is considerable variability in the specific data format of the Prompt sent to the model. +Initially starting as simple strings, prompts have evolved to include multiple messages, where each string in each message represents a distinct role for the model. + + +== Tokens + +Tokens serve as the building blocks of how an AI model works. +On input, Models convert words to tokens, and on output, they convert tokens back to words. + +In English, one token roughly corresponds to 75% of a word. For reference, Shakespeare's complete works, totaling around 900,000 words, translates to approximately 1.2 million tokens. + +Perhaps more important is that Tokens = *`$`*. + +In the context of hosted AI models, your charges are determined by the number of tokens utilized. Both input and output contribute to the overall token count. + +Also, models are subject to token limits, which restrict the amount of text processed in a single API call. +This threshold is often referred to as the 'context window'. The model won't process any text exceeding this limit. + +For instance, ChatGPT3 has a 4K token limit, while GPT4 offers varying options, such as 8K, 16K, and 32K. +Anthropic's Claude AI model features a 100K token limit, and Meta's recent research yielded a 1M token limit model. + +To summarize the collected works of Shakespeare with GPT4, you need to devise software engineering strategies to chop up the data and present the data within the model's context window limits. +This is an area that the Spring AI project helps you with. + +== Output Parsing + +The output of AI models traditionally arrives as a `java.util.String`, even if you ask for the reply to be in JSON. +It may be the correct JSON, but it isn't a JSON data structure. It is just a string. +Also, asking "for JSON" as part of the prompt isn't 100% accurate. + +This intricacy has led to the emergence of a specialized field involving the creation of prompts to yield the intended output, followed by parsing the resulting simple string into a usable data structure for application integration. + +Output parsing employs meticulously crafted prompts, often necessitating multiple interactions with the model to achieve the desired formatting. + +This challenge has prompted OpenAI to introduce 'OpenAI Functions' as a means to specify the desired output format from the model precisely. + +== Chaining Calls + +A Chain is the concept that represents a series of calls to a AI model. +It uses the output from one call as the input to another. + +By chaining calls together, you can support complex use-cases by composing pipelines of multiple chains. + +== Customizing Models: Integrating Your Data + +How can you equip the AI model with information it hasn't been trained on? + +It's important to note that the GPT 3.5/4.0 dataset extends only until September 2021. +Consequently, the model will say that it doesn't know the answer for questions that require knowledge beyond that date. +An interesting bit of trivia is that this dataset is around ~650GB. + +Two techniques exist for customizing the AI model to incorporate your data: + +Fine Tuning: This traditional Machine Learning technique involves tailoring the model and changing its internal weighting. +However, it's a challenging process for Machine Learning experts and extremely resource-intensive for models like GPT due to their size. Additionally, some models might not offer this option. + +Prompt Stuffing: A more practical alternative involves embedding your data within the prompt provided to the model. Given a model's token limits, techniques are required to present relevant data within the model's context window. +This approach is colloquially referred to as 'stuffing the prompt'. + +The Spring AI library helps you implement solutions based on the 'stuffing of the prompt' technique. + + +== Retrieval Augmented Generation + +A technique termed Retrieval Augmented Generation has emerged to address the challenge of incorporating relevant data into prompts for accurate AI model responses. + +The approach involves extracting data from your source and segmenting it into smaller units, each within the model's token limit. These pieces are then stored in a database. +When a user's request is received, the most pertinent document fragments are retrieved from the database to enrich the prompt, aiding the AI model's response accuracy. + +Data Loaders play a pivotal role in this process, reading and formatting your data into fragments suitable for database storage. +For optimal retrieval of related documents, a Vector Database is the type of database best suited for this task. + +Data Loaders and Vector Database are the fundamental building blocks for solving use cases such as "Q&A over my documentation". + + + +=== Data Loaders + +TBD + +=== Vector Databases + +== Evaluating AI responses + +Effectively evaluating the output of an AI system in response to user requests is very important to ensuring accuracy and usefulness of the final application. +Several emerging techniques enable the use of the pretrained model itself for this purpose. + +This evaluation process involves analyzing whether the generated response aligns with the user's intent and the context of the query. Metrics such as relevance, coherence, and factual correctness are used to gauge the quality of the AI-generated response. + +One approach involves presenting both the user's request and the AI model's response to the model, querying whether the response aligns with the provided data. + +Furthermore, leveraging the information stored in the Vector Database as supplementary data can enhance the evaluation process, aiding in the determination of response relevance. + +The Spring AI project currenlty provides some very basic examples of how you can evaluate the responses in the form of prompts to include in a JUnit test. + + + diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/domain/index.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/domain/index.adoc deleted file mode 100644 index ad23dac83f..0000000000 --- a/spring-ai-docs/src/main/antora/modules/ROOT/pages/domain/index.adoc +++ /dev/null @@ -1,3 +0,0 @@ -= The Domain Language of AI - -TBD diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/domain/prompt.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/domain/prompt.adoc deleted file mode 100644 index 0740ed88ee..0000000000 --- a/spring-ai-docs/src/main/antora/modules/ROOT/pages/domain/prompt.adoc +++ /dev/null @@ -1,7 +0,0 @@ -= Prompt - -TBD - -== AiClient - -TBD diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/getting-started.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/getting-started.adoc new file mode 100644 index 0000000000..34a4eaa6ec --- /dev/null +++ b/spring-ai-docs/src/main/antora/modules/ROOT/pages/getting-started.adoc @@ -0,0 +1,2 @@ += Getting Started + diff --git a/spring-ai-docs/src/main/antora/modules/ROOT/pages/index.adoc b/spring-ai-docs/src/main/antora/modules/ROOT/pages/index.adoc index f20a922c3a..f7693b5e57 100644 --- a/spring-ai-docs/src/main/antora/modules/ROOT/pages/index.adoc +++ b/spring-ai-docs/src/main/antora/modules/ROOT/pages/index.adoc @@ -1,3 +1,23 @@ = Spring AI -Reference Documentation +== Introduction + +The Spring AI project aims to streamline the development of applications that incorporate artificial intelligence functionality without unnecessary complexity. + +The project draws inspiration from notable Python projects such as LangChain and LlamaIndex, but Spring AI is not a direct port of those projects. +The project was founded with the belief that the next wave of Generative AI applications will not just be for Python developers only, but will be ubiquitous across many programming languages. + +At its core, Spring AI provides abstractions that serve as the foundation for developing AI applications. +These abstractions have multiple implementations, enabling easy component swapping with minimal code changes. +For example, Spring AI introduces the AiClient interface with implementations for OpenAI and Azure OpenAI. + +In addition to these core abstractions, Spring AI aims to provide higher-level functionalities to address common use cases such as "Q&A over your documentation" or "Chat with your documentation." +As the complexity of the use cases increase, the Spring AI project will integration with other projects in the Spring Ecosystem such as Spring Integration, Spring Batch, and Spring Data. + +To simplify setup, Spring Boot Starters are available to help set up essential dependencies and classes. +There is also a collection of sample applications to help you explore the project's features. +Lastly, the new Spring CLI project also enables you to get started quickly using the command `spring boot new ai` for new projects or `spring boot add ai` for adding AI capabilities to your existing application. + +The next section, provides a high level overview of AI concepts and their representation in Spring AI. +The Getting Started section shows you how to create your first AI application +Subsequent sections delve into each component and common use cases with a code-focused approach.