sitemap.yaml

README.md:
  hash: 8b7d5c4796e0b327b7d85d4a9918c2f9
  summary: Dria is a comprehensive synthetic data infrastructure designed to support
    diverse AI projects by offering scalable data pipelines and a multi-agent network.
    It enables the creation, management, and orchestration of synthetic data from
    both web and siloed sources, empowering AI development across tasks such as classification,
    instruction following, and dialogue generation. Key features include massive parallelization,
    compute offloading, flexible custom pipelines, and an extensive toolset derived
    from verified research. Dria supports the creation of high-quality, diverse datasets
    that mirror real-life distribution, without requiring personal GPU infrastructure.
cookbook/eval.md:
  hash: 5fe5d63be37a0ee5eecda5f50d9eba3c
  summary: This guide provides a comprehensive approach to evaluating Retrieval-Augmented
    Generation (RAG) systems using synthetic data, focusing on improving AI-powered
    question-answering applications. It outlines the setup, implementation, and evaluation
    of RAG pipelines, emphasizing parameters such as embedding model choice, retrieval
    methods, and answer generation strategies. By utilizing tools like RAGatouille
    for retrieval and the instructor library for OpenAI API interactions, the guide
    demonstrates generating diverse synthetic datasets for effective evaluation. Key
    keywords include RAG systems, synthetic data, question answering, AI evaluation,
    and data science. The complete implementation script is also linked for practical
    reference.
cookbook/function_calling.md:
  hash: c584cd8c66d1ee607dd84e706b350676
  summary: The content explores function calling in programming, focusing on techniques,
    best practices, and real-world applications. It aims to enhance coding efficiency
    by providing insights into various programming techniques and software development
    practices. Key points include effective API usage, optimizing function calls for
    better performance, and adhering to best practices to improve software maintainability
    and scalability. Essential keywords are function calling, programming techniques,
    coding best practices, software development, and API usage.
cookbook/nemotron_qa.md:
  hash: 92b57a83e52e37be73a20647ec416a31
  summary: 'This guide outlines the implementation of Nvidia''s Preference Data Pipeline
    using Dria, focusing on synthetic data generation and reward modeling techniques.
    The process involves synthetic response generation for domain-specific queries
    using Meta''s Llama 3.1 405B Instruct, followed by a scoring phase with Nemotron-4
    340B Reward for alignment training via NeMo Aligner. The tutorial details step-by-step
    instructions, including the use of a defined folder structure and implementation
    of specific prompts and callback methods. The key steps include generating subtopics,
    questions, and responses. The pipeline is constructed using Dria''s PipelineBuilder
    and executed through an example Python script. Keywords: Nvidia, Dria, Preference
    Data Pipeline, synthetic data, reward modeling, Llama 3.1, NeMo Aligner, GPT4O,
    machine learning.'
cookbook/patient_dialogues.md:
  hash: b900cb12d723828949dd6f7bf2512372
  summary: 'The content focuses on enhancing understanding in healthcare through effective
    patient dialogues and interactions. Key areas include patient care, healthcare
    communication, and patient engagement. It provides examples of medical interactions
    aimed at improving the quality of communication between healthcare providers and
    patients. The objective is to showcase best practices in dialogue that contribute
    to improved patient experience and outcomes. Keywords: patient care, healthcare
    communication, patient engagement, medical interaction, dialogue examples.'
cookbook/preference_data.md:
  hash: c0b34f5245dc5b2545e6b9f409082734
  summary: Explore the innovative techniques of synthetic preference data generation
    using Dria to enhance AI models and decision-making processes. This approach focuses
    on generating high-quality synthetic data related to preference modeling, offering
    valuable insights for AI applications. Key elements include synthetic data, preference
    modeling, Dria, and data generation, making it a critical resource for optimizing
    AI performance and analysis.
example_run.md:
  hash: d22a5939591e3baa07b93227453f442b
  summary: The provided script illustrates how to execute parallel model operations
    using Python with Dria, MagPie, and asyncio for enhanced performance in machine
    learning tasks. It sets up a `ParallelSingletonExecutor` to run multiple models
    such as GPT4O_MINI, GEMINI_15_FLASH, and MIXTRAL_8_7B concurrently, processing
    different prompts simultaneously. Key terms include Python, Dria, MagPie, asyncio,
    machine learning, parallel model execution, and the specific models used. The
    objective is to efficiently handle concurrent model execution with minimal blocking,
    leveraging asynchronous programming techniques.
factory/clair.md:
  hash: 0ad232cd84373a386069ad2009ca063d
  summary: Clair is an AI-powered tool designed for automated evaluation of student
    solutions, providing corrections and detailed feedback for improvement. It acts
    as a solution checker and feedback provider for educational tasks, using AI to
    assess student responses and offer corrected versions with comprehensive explanations.
    Key features include AI Feedback, Automated Evaluation, Solution Checking, and
    Education Technology. Clair can be integrated into datasets for generating automated
    feedback, enhancing student learning experiences. The tool uses advanced AI models,
    such as "gemma2:9b-instruct-fp16," to deliver precise and insightful evaluations.
factory/code_generation.md:
  hash: 9ce059372df70ff187ef64349d1399fc
  summary: The document describes a software engineering solution for code generation
    using Singleton classes, specifically `GenerateCode` and `IterateCode`. These
    classes generate and iterate over code based on instructions in specified programming
    languages using AI models optimized for coding tasks, such as `Model.CODER` and
    `Model.QWEN2_5_CODER_1_5B`. Key features include specifying instructions and the
    programming language to generate code, which is returned along with the chosen
    AI model. This tool is beneficial for tasks involving code generation, software
    engineering, AI-driven programming, and understanding the Singleton design pattern.
factory/complexity_scorer.md:
  hash: b83910aa425dc40abda6d8096306a678
  summary: ScoreComplexity is a tool designed to evaluate and assign complexity scores
    to a list of instructions, helping streamline data generation tasks in fields
    like AI and machine learning. It analyzes instructions to provide a numerical
    complexity score and identifies the AI model used for scoring, such as "llama3.1:8b-instruct-fp16."
    This process can help in instruction analysis, complexity scoring, and data generation,
    contributing to better data selection and alignment. Noteworthy elements include
    its application in creating datasets for complexity scoring and its contribution
    to improving AI models. Core keywords include complexity scoring, instruction
    analysis, data generation, AI models, and machine learning.
factory/csv_extender.md:
  hash: 34233ecd4c84fb699023e953edb89ff5
  summary: 'The `CSVExtenderPipeline` class is designed to enhance CSV data processing
    by automatically generating new rows from existing entries, facilitating improved
    data analysis and organization. This pipeline works by adding new subcategories
    to current categories, with the number of additional rows determined by user-specified
    parameters: `num_values` and `num_rows`. The result is an extended dataset, increasing
    the CSV''s utility in data-driven environments. Key focus areas include data extension,
    CSV automation, and enhanced data processing, with applications in fields requiring
    structured data expansion.'
factory/evaluate.md:
  hash: 864b36f9a12a32ba2ca25a6b76dedd17
  summary: EvaluatePrediction is a specialized AI tool designed to assess the quality
    and accuracy of predicted answers in relation to specific questions and context.
    This tool provides detailed feedback rather than a simple true or false result,
    enhancing the evaluation process. Key features include insights into the prediction's
    validity and the AI model used for assessment. Essential keywords include applied
    AI, evaluation, predictions, feedback, and data generation. Suitable for incorporation
    into data generation processes, the tool aims to improve the reliability of AI
    predictions by offering comprehensive evaluation metrics.
factory/evolve_complexity.md:
  hash: 52df1707b67c781bf548cbf9b2b278eb
  summary: EvolveComplexity is a tool designed to transform simple instructions into
    more complex versions while retaining their core meaning and intent, making it
    ideal for instruction variations and data generation tasks. This process is particularly
    beneficial for generating more intricate and enriched data variations in workflows
    involving AI language models. The tool outputs the evolved instruction alongside
    the original input and the AI model utilized for the transformation. Key concepts
    include instruction evolution, complexity transformation, and AI-driven content
    generation. Notable references include EvolComplexity Distilabel and WizardLM,
    which explore similar themes in the realm of empowering large language models.
factory/graph_builder.md:
  hash: cdc925edd3af4c7f07c9cda25e6e3c96
  summary: 'GenerateGraph is a tool designed for extracting ontological relationships
    from text, transforming concepts and their interconnections into a structured
    graph. Key features include graph generation, ontology extraction, and the ability
    to reveal AI relationships through a machine learning-driven approach. It processes
    contextual input to deliver outputs that map out connections between concepts,
    identified as node pairs with descriptive edges. Popular applications involve
    constructing detailed graphs of related terms, aiding in data structure comprehension
    and enhancing machine learning datasets. The model utilized, such as "qwen2.5:32b-instruct-fp16,"
    underpins the generation process. Keywords: graph generation, ontology extraction,
    machine learning, AI relationships, data structure.'
factory/instruction_backtranslation.md:
  hash: 4bf4829f8a5d6f22bb3c7caa0ec3bcec
  summary: 'InstructionBacktranslation is a specialized AI evaluation tool used for
    analyzing instruction-generation pairs. It provides detailed scoring and reasoning
    to ensure accurate AI responses. Key features include its ability to echo original
    instructions and generated text while scoring responses for accuracy and relevance.
    Important concepts involve instruction backtranslation, AI response scoring, instruction-generation
    evaluation, and applied AI. This method enhances data generation by improving
    the accuracy and quality of AI-generated text. Keywords: InstructionBacktranslation,
    AI evaluation, data generation, instruction-management, machine learning.'
factory/instruction_evolution.md:
  hash: 2ec3bfe35639eab5ecc3a71f0b6d7d0b
  summary: EvolveInstruct is a versatile tool designed to evolve AI prompts through
    diverse mutation strategies, including FRESH_START, ADD_CONSTRAINTS, DEEPEN, CONCRETIZE,
    INCREASE_REASONING, and SWITCH_TOPIC. Its primary objective is to transform original
    prompts into new versions while maintaining core intent, thereby enhancing complexity
    and specificity. Ideal for data generation in machine learning, EvolveInstruct
    allows users to add constraints, increase reasoning requirements, or change topics
    with comparable difficulty levels. It supports various fields, such as AI prompts
    and mutation strategies, and uses models like "gemma2:9b-instruct-fp16" to generate
    transformed prompts.
factory/iterate_code.md:
  hash: e5672d2a5b07add21e6da0fb6c7dc3b1
  summary: IterateCode Singleton is a tool designed for code optimization and enhancement
    using AI-generated instructions across various programming languages. By inputting
    code, specific instructions, and the desired programming language, users can obtain
    an improved version of their code. The tool provides outputs including the original
    instruction, programming language, original code, and iterated code, along with
    the AI model used. Key concepts include applied AI, code optimization, and software
    development, with a particular focus on practical improvements in languages like
    Python.
factory/list_extender.md:
  hash: e70524882e8d281a8e1c5af466237316
  summary: ListExtender is a powerful data generation tool designed for AI model applications,
    operating as a singleton class that extends lists by generating unique, related
    items. It inputs an initial list and outputs an expanded, unique list by employing
    AI-driven techniques, specified by the included AI model. Its primary function
    is to enhance and diversify datasets for various applications, including software
    development. Key features include its emphasis on maintaining the uniqueness of
    items and its use in generating comprehensive extended lists suitable for applications
    like dataset enrichment and AI training. Important keywords for SEO include List
    Generation, Data Generation, AI Tools, and Software Development.
factory/magpie.md:
  hash: 3fe5f28769368c37b197924bfa42d24c
  summary: MagPie is a versatile template for generating dialogues between AI personas,
    focusing on efficient conversation turn management between an instructor and a
    responder. It's designed to handle configurable conversation turns and is useful
    for creating dialogue datasets for natural language processing tasks. Key features
    include customizable personas, dialogue generation, and a focus on bias mitigation
    and fairness in AI. The outputs consist of detailed dialogue exchanges and the
    model used, such as "gemma2:9b-instruct-fp16". MagPie is relevant for those interested
    in AI dialogue generation, conversation management, and responsible AI development.
factory/multihopqa.md:
  hash: 4b68ce1970aadf6e6def41b6f082adc4
  summary: 'The "MultiHopQuestion" template is designed to generate multi-hop questions
    using exactly three document chunks, producing questions of increasing complexity:
    1-hop, 2-hop, and 3-hop. This approach is beneficial for data generation in the
    fields of question answering and document processing. By leveraging AI models,
    it provides answers based on the interconnected information from the documents.
    Key features include a detailed schema for input and output, ensuring that questions
    require different levels of reasoning. This is particularly useful for enhancing
    AI models in multi-hop question answering tasks. Keywords associated with this
    template include multi-hop questions, data generation, AI models, and document
    processing.'
factory/persona.md:
  hash: 8cb910a39f1dab99d602a31d86e7074f
  summary: The document provides an overview of "Persona," a pipeline composed of
    four singletons designed to generate character bios and backstories based on persona
    traits and simulation context. The primary components, PersonaBio and PersonaBackstory,
    produce short bios and extended narratives, respectively. These tools are ideal
    for use in simulations or creative scenarios requiring detailed character development.
    By utilizing specified AI models, such as "meta-llama/llama-3.1-8b-instruct,"
    the pipeline offers structured data generation for varied settings like medieval
    villages or futuristic cities. Key terms include character generation, AI narratives,
    creative writing, persona traits, and simulation development.
factory/qa.md:
  hash: 612b016ffa04cb8a87a443e173544a1c
  summary: The "QuestionAnswer" pipeline leverages AI models to generate contextually
    relevant answers by adopting specific personas. Designed for applications in natural
    language processing, this workflow processes inputs such as context, persona,
    and the number of questions to create structured, reliable responses. Utilizing
    JSON Schemas ensures model-generated outputs adhere to required formats, thus
    enhancing the accuracy and reliability of answers. Core components include question
    answering, AI models, data generation, and persona adoption. This approach is
    particularly useful for developing datasets and ensuring structured, context-driven
    AI interactions.
factory/quality_evolution.md:
  hash: 023124d647010d0481eccb5ecf3172e7
  summary: EvolveQuality is a template designed to enhance text responses by focusing
    on specific quality dimensions such as helpfulness, relevance, depth, creativity,
    and detail level. It improves the original response text through various evolution
    methods, making it a valuable tool for generating high-quality data. This process
    involves using AI models, like "gemma2:9b-instruct-fp16," to rewrite or refine
    responses, ensuring they are more informative and tailored to the user's needs.
    Key concepts include response quality evolution, AI enhancement, and text generation.
    The project is detailed in resources like the [EvolInstruct Distilabel] and related
    scientific studies on data selection for instruction tuning.
factory/search.md:
  hash: ad5f5d75064bdcf33b3f20aa3c33d0a9
  summary: The documentation for the SearchWeb singleton implementation outlines its
    functionality as a web search template designed to generate structured and localized
    search results in multiple languages. It operates using the singleton pattern
    and allows users to perform searches with specified queries and language preferences
    while controlling the number of results returned. Each search result includes
    details such as the original query, link, snippet, and title. This implementation
    is particularly useful for data generation tasks, providing a consistent and scalable
    method to access web data. Keywords associated with this implementation include
    web search, singleton pattern, structured data, data generation, and Python.
factory/self_instruct.md:
  hash: 03e50792c805009e2732a5b0bca7e62d
  summary: SelfInstruct is an AI-driven template designed to automate the generation
    of user queries for AI applications, focusing on specific criteria and context.
    This process assists in creating instructions or queries useful for testing or
    training AI systems, enhancing efficiency in data generation. Key features include
    the ability to specify the number of queries, criteria for generation, application
    description, and applicable context. The output provides a list of generated instructions
    and the AI model employed. Keywords include AI, query generation, data generation,
    instructions, and self-instruct.
factory/semantic_triplet.md:
  hash: 68c2795b52b59d063e3c224aebf8020d
  summary: The "SemanticTriplet" is a Python tool designed to generate semantic triplets,
    which are JSON objects containing three textual units with specified semantic
    similarity scores. This task is useful for natural language processing (NLP) and
    educational content development. The input parameters include the type of textual
    unit (e.g., sentence or paragraph), language, desired similarity scores, and the
    educational difficulty level. The output is a JSON object featuring three related
    but distinct textual pieces, useful in evaluating and comparing semantic similarities
    for AI models. Core keywords include NLP, semantic triplet, JSON generation, Python,
    and data processing.
factory/simple.md:
  hash: 9071c17442f43b92e104ad0b7a7dfd5e
  summary: The content describes "Simple," a singleton template designed for basic
    text generation tasks. It utilizes specified AI models to generate text based
    on an input prompt, offering a streamlined workflow for automated writing and
    dataset generation. The template's key features include ease of use and efficient
    text generation, making it suitable for workflows requiring simple templates,
    dataset generation, and AI model integration. This tool can be particularly beneficial
    for tasks involving text generation, simple template implementation, and automated
    writing.
factory/subtopic.md:
  hash: 168e9d24337d7ef08b4f81a2535477c3
  summary: GenerateSubtopics is an AI-driven template designed to enhance content
    creation by breaking down a main topic into relevant subtopics. It uses AI generation
    to create structured workflows that facilitate data generation and content organization.
    Key features include specifying the main topic as input and providing a list of
    generated subtopics along with the AI model used. This tool is particularly useful
    for topics like "rust language," where it generates informative subtopics such
    as "Ownership and Borrowing Concepts" and "Memory Safety Without Garbage Collection."
    Core keywords include AI generation, subtopics, content creation, data generation,
    and workflow.
factory/validate.md:
  hash: b559a2c786213ffc84bff9e6a9cd6858
  summary: 'ValidatePrediction is a singleton class designed for assessing the accuracy
    of AI predictions through contextual and semantic comparison. It evaluates whether
    a predicted answer aligns with the correct answer, providing a boolean validation
    result alongside the original prediction and correct answer. Key features include
    prediction validation, semantic comparison, and contextual evaluation, employing
    AI models for precise validation. This class is particularly useful in data generation
    and AI model evaluation, ensuring accuracy and reliability in AI predictions.
    Keywords: prediction validation, semantic comparison, AI model, contextual evaluation,
    data generation.'
factory/web_multi_choice.md:
  hash: 14746cba2bb4e9ea2a446ddc84363f3a
  summary: WebMultiChoice is a specialized task designed to effectively answer multiple-choice
    questions by leveraging web search and AI evaluation. It works by generating a
    search query from the question and possible answers, then selecting a relevant
    URL, scraping and analyzing the content to identify the best answer. This process
    is particularly useful for educational evaluations and involves models like the
    `QWEN2_5_7B_FP16`. Key features include utilizing AI models, web search capabilities,
    and focus on accuracy for educational purposes.
how-to/batches.md:
  hash: 97e4629a57399db53b64e4ff692ecac4
  summary: 'The provided content explains how to utilize Batches with the `ParallelSingletonExecutor`
    in Dria for running multiple instructions concurrently, enhancing task efficiency.
    It involves setting up a `Dria` client, a `Singleton` task, and a `ParallelSingletonExecutor`
    object to manage parallel execution of tasks using models like `Model.QWEN2_5_7B_OR`
    and `Model.LLAMA3_2_1B`. The guide includes a Python code example demonstrating
    the setup, emphasizing key concepts of async programming, parallel execution,
    and task management in Dria. This approach is especially beneficial when dealing
    with a large number of instructions that need to be processed simultaneously.
    Key terms: Batches, Parallel Execution, Dria, Async Programming, Task Management.'
how-to/data_enrichment.md:
  hash: a73df5f52feeaecd1f23540736249cb7
  summary: The content explains the data enrichment capabilities of the `enrich` method
    using Dria, which enhances datasets by adding new fields for richer data representation
    in analytics and machine learning. Key steps include defining a Pydantic schema
    for output fields, creating a prompt to guide data transformation, and running
    the enrichment process. A basic text summarization example is presented, where
    a dataset gains a "summary" field through a defined schema and prompt. It includes
    a comprehensive demonstration of enriching customer reviews with additional analyses
    like sentiment and keywords. Use cases highlighted include enriching customer
    feedback, enhancing text corpora with metadata, and improving workflows by categorizing
    and filtering large volumes of unstructured data. Core keywords featured are data
    enrichment, Dria, schema, prompt, text summarizing, sentiment analysis, and key
    insights.
how-to/data_generators.md:
  hash: 1a7f4b3a1012254217b5ef21d5dce05c
  summary: The Dataset Generator in Dria is a robust tool designed for efficient dataset
    generation and transformation, offering both prompt-based and singleton-based
    workflows. Key features include parallel execution, automatic schema validation,
    support for multiple AI models, search capabilities, and sequential workflow processing.
    Users can employ prompt-based generation via the `Prompt` class and instructions,
    or utilize custom workflow classes called singletons. Model configurations allow
    for flexible use of single or multiple AI models, catering to various data generation
    needs. Important keywords include Dataset Generation, Data Transformation, Prompt-based
    Generation, Singletons, and AI Models.
how-to/dria_datasets.md:
  hash: 17f267b3fe2bf7ecfa40f21b5163cee2
  summary: The Dria Dataset, facilitated by the `DriaDataset` class, is an advanced
    framework for data generation and management. Core features include data persistence,
    flexible initialization, and robust data management with schema validation. Users
    can create datasets from scratch or initialize them from existing formats like
    JSON, CSV, or even import data from Hugging Face datasets. This ensures compatibility
    and consistency in handling complex datasets while offering multiple import/export
    options. The framework is tailored for applications in data generation, schema
    validation, and comprehensive data management, making it a versatile tool for
    developers and data scientists.
how-to/dria_datasets_exports.md:
  hash: e372f345a42a5f72fda51db21b8b557e
  summary: The document provides a detailed guide on exporting and formatting data
    from the DriaDataset using the Formatter class and integrating it with HuggingFace's
    TRL framework. It explains how to export data into various formats such as pandas
    DataFrame, JSON, and JSONL, and emphasizes preparing data for different training
    setups. The guide covers the supported format types like Standard and Conversational,
    along with subtypes such as LANGUAGE_MODELING and PROMPT_COMPLETION. Additionally,
    it demonstrates converting data into HuggingFace TRL-compatible formats for seamless
    use with various trainers, using examples like the CONVERSATIONAL_PROMPT_COMPLETION
    format. Keywords include DriaDataset, data export, data formatting, HuggingFace
    TRL, and machine learning.
how-to/formatting.md:
  hash: 7d27581e5b87c20076f3ae969e92a50d
  summary: The content provides a comprehensive guide on the `Formatter` class, a
    tool designed for converting datasets into training-ready formats compatible with
    Dria Network and HuggingFace's TRL framework. The `Formatter` supports various
    format types, including Standard and Conversational, with subtypes such as LANGUAGE_MODELING
    and PROMPT_COMPLETION. Key components include `FieldMapping` and `ConversationMapping`,
    which map original data keys to formatted ones. The guide also details how Dria
    Network can transform generated data for seamless integration with TRL's trainers
    like BCOTrainer, PPOTrainer, and SFTTrainer, highlighting its plug-n-play capabilities.
    Key terms include Formatter, Data Formatting, Dria Network, HuggingFace TRL, Machine
    Learning, and Dataset Transformation.
how-to/functions.md:
  hash: e5a9a97043f059a0cfc83481f1fa67ab
  summary: Dria Nodes offers extensive workflow automation capabilities through built-in
    and custom functions, enabling seamless integration and execution of tasks. It
    supports custom workflow tools such as `CustomTool`, which allows for defining
    unique operations, and `HttpRequestTool`, which facilitates HTTP requests. Key
    features include the ability to perform generative steps using these tools, and
    executing tasks like financial queries for stock prices or cryptocurrency data
    from APIs. Users can expand functionality by adding custom functions to workflows
    using the `WorkflowBuilder`. This empowers efficient automation for processes
    involving HTTP requests, operations on integers, or real-time data retrieval,
    crucial for optimizing workflow automation and enhancing functionality. Keywords
    include Dria Nodes, workflow automation, custom functions, HTTP requests, and
    generative steps.
how-to/models.md:
  hash: 1551d580678398d738e3f2eb1fb3862e
  summary: Explore the diverse range of AI models available in the Dria Network, featuring
    popular models like Nous's Hermes, Microsoft's Phi3, and Meta's Llama series.
    Key offerings include quantized versions, varying parameter counts, and specialized
    instructions suited for specific tasks, with models from leading tech giants like
    Google, Alibaba, and OpenAI. AI models offered include Llama versions by Meta,
    Qwen models by Alibaba, and OpenAI's GPT-4 among others, described with specifics
    like parameter size, quantization details, and task specializations. This catalog
    serves as a comprehensive guide to the available AI models, helping users select
    the right model for their needs based on attributes such as parameter scale, context
    length, and precision settings.
how-to/pipelines.md:
  hash: 1ce47a066ab5e4e15b74c01a640630c6
  summary: The guide explains how to create and implement efficient pipelines for
    executing workflows in parallel using Dria, focusing on asynchronous processing.
    It emphasizes building pipelines through a sequence of interconnected workflows,
    each comprising multiple steps that process outputs into subsequent inputs. A
    detailed example is provided, illustrating a pipeline for generating question-answer
    pairs. Core implementation involves using classes like `Pipeline`, `PipelineBuilder`,
    and `StepTemplate` to define and execute steps with callbacks like `scatter` for
    parallel task execution. The content highlights the importance of customizing
    workflows and effectively using Dria's asynchronous capabilities for enhanced
    data processing, with key terms including pipelines, workflows, Dria, and asynchronous
    processing.
how-to/prompters.md:
  hash: fac964842dd69cb73c69c38045f80267
  summary: The guide explains how to use the `Prompt` class with Pydantic models in
    Python to create structured prompts with schema validation. It covers the essential
    steps including defining an output schema using Pydantic, creating a dataset with
    `DriaDataset`, initializing a `Prompt` with template text, and generating data
    through `DatasetGenerator`. The tutorial showcases a complete example of generating
    tweets on various topics, leveraging tools like Pyasyncio and models like Model.LLAMA_3_1_8B_OR.
    Key concepts include structured prompt generation, schema validation, and integration
    with language models using tools like Pydantic and Python.
how-to/selecting_models.md:
  hash: c91d6b866f8ee61e810e762517eef3a4
  summary: The article provides an overview of model selection and task assignment
    in the Dria Network, a platform that utilizes a network of LLMs (Large Language
    Models) for AI-driven task execution. It highlights how tasks can be specifically
    assigned to models using the `Model` enum in the Dria SDK, and provides examples
    of task distribution among available nodes, such as using the `LLAMA3_1_8B_FP16`
    model. Features include handling task execution asynchronously across multiple
    nodes, enabling comparison of results by assigning a single task to multiple models,
    and the ability to select specific model providers like OpenAI and Gemini. Key
    concepts include the Dria Network, LLM Models, task assignment, and model selection.
how-to/singletons.md:
  hash: 8c7d2274ee071b20688e4f9e5dd187d9
  summary: "This guide on using and creating Singletons in Dria outlines how these\
    \ pre-built task templates streamline task handling with Pydantic validation,\
    \ offering efficiency in workflows. The text explains how to utilize ready-made\
    \ Singletons from Dria\u2019s Factory, like the `Simple` singleton, which executes\
    \ tasks using specified prompts. It further delves into crafting custom Singletons\
    \ for tailored needs, detailing their core components: input fields, output schema,\
    \ and workflow methods. By leveraging structured outputs and workflows, users\
    \ can efficiently manage data generation and task validation. Keywords: Singletons,\
    \ Dria, Pydantic, Task Management, Workflow, Data Generation."
how-to/structured_outputs.md:
  hash: a1022bc9a23283121f07adc104a9759b
  summary: The content discusses implementing structured outputs in AI workflows using
    the Dria SDK for reliable JSON Schema compliance. It highlights how structured
    outputs ensure models adhere to specified JSON schemas, preventing issues like
    missing required keys or incorrect enum values. The feature is supported by providers
    such as OpenAI, Gemini, and Ollama, but is limited to models capable of function
    calling. The process involves attaching an OutputSchema to a workflow using the
    WorkflowBuilder instance, as demonstrated in a Python code example. Key terms
    include structured outputs, Dria SDK, AI workflows, JSON Schema, and function
    calling.
how-to/tasks.md:
  hash: 07bad9c830d85a863b68a0f8441b1887
  summary: 'The Dria network utilizes tasks as fundamental units of work, enabling
    efficient distributed computing. Tasks, which consist of workflows and models,
    are executed asynchronously by nodes within the network. Key features include
    model selection, asynchronous execution, scalability, and result retrieval. The
    task lifecycle involves creation, publication, execution, result retrieval, and
    completion, making it crucial for scalable operations in environments that leverage
    Dria''s distributed computing capabilities. Keywords: Dria network, tasks, distributed
    computing, asynchronous execution, workflows, scalability.'
how-to/workflows.md:
  hash: ea5a2fdfb08fe5fcd1c1335f6767d18f
  summary: 'The article titled "Custom Workflows within Dria Network" provides a comprehensive
    guide on creating custom workflows using the Dria SDK, particularly through the
    `dria_workflows` package. It focuses on constructing workflows that involve Large
    Language Models (LLMs) and memory operations for efficient task execution. The
    guide outlines key components such as configuration settings, steps, flow, and
    memory operations, with examples on how to set parameters like `max_steps`, `max_time`,
    and `max_tokens`. It explains the types of steps, like `generative_step` for text
    generation, and how memory operations facilitate data transfer. Additionally,
    it includes practical examples, such as creating a workflow for random variable
    generation and validation, highlighting the integration of LLMs and conditional
    logic for decision-making. Keywords: workflows, Dria SDK, LLM integration, custom
    workflows, memory operations, task execution, Dria nodes.'
installation.md:
  hash: 904fef2b131a370d84bbf4082c310c5d
  summary: Learn how to install the Dria SDK for Python, designed for Python 3.10
    or higher, by following a straightforward setup process that includes creating
    a new conda environment and using pip to install the package. Address potential
    installation issues by separately installing the coincurve library and resolving
    GCC-related problems with tools like brew and xcode-select. The Dria Network,
    currently in alpha, is freely accessible for data generation, and users can contribute
    by running a node, enhancing network scalability and throughput. Access examples
    and guides to start using the SDK and build synthetic data pipelines efficiently.
modules/structrag.md:
  hash: 25a7ed3b2c516366cb1c1349894765e6
  summary: 'StructRAG is a retrieval-augmented generation (RAG) framework designed
    to enhance large language models (LLMs) for complex, knowledge-intensive reasoning
    tasks. It addresses issues of scattered and noisy information by restructuring
    documents using cognitive-inspired techniques. The framework includes three main
    components: StructRAGSynthesize, which organizes initial documents into structured
    knowledge units; StructRAGSimulate, which creates simulation based on the structured
    data; and StructRAGJudge, which evaluates the relevance and correctness of the
    solutions. StructRAG has shown state-of-the-art results in various tasks, leveraging
    models like Qwen2.5-7B-Instruct and others available on platforms such as Hugging
    Face. Key concepts include document structuring, knowledge reasoning, and language
    models, with a focus on improving accuracy and reasoning capabilities.'
modules/structrag2.md:
  hash: 0e7e709a338a279efacbbe557f86188d
  summary: 'StructRAG is a methodology aimed at enhancing the reasoning capabilities
    of Large Language Models (LLMs) through hybrid information structuring during
    assessments. It leverages knowledge restructuring via a Hybrid Router to determine
    the format of structured information, optimizing machine learning inference. The
    provided Python code illustrates its usage, utilizing StructRAG components to
    score the complexity of various tasks. Key concepts include LLMs, hybrid information
    structuring, knowledge restructuring, and machine learning inference. For further
    details, refer to the research paper, "StructRAG: Boosting Knowledge Intensive
    Reasoning of LLMs via Inference-time Hybrid Information Structurization."'
node.md:
  hash: 827fab98192f10e974b90a98781c9596
  summary: The guide provides a quick start for setting up a node on the Dria decentralized
    AI network, developed by FirstBatch. This setup requires no wallet activity and
    takes only a few minutes. Users can find node requirements on GitHub and follow
    simple steps, including downloading the launcher, running it, and entering an
    Ethereum wallet private key. Additional options for serving models and API tool
    integration are available. MacOS users might need to bypass security warnings.
    Post-setup involves filling out a form for a Discord role and engaging with Dria's
    online community. Key terms include decentralized network, AI collaboration, node
    setup, Dria, and FirstBatch.
quickstart.md:
  hash: 0f8dcb5fc8556469addb10472d55c30e
  summary: This guide provides a quick start for using the Dria SDK to generate datasets
    with tweets by leveraging large language models (LLMs). Key steps include creating
    a dataset, attaching a dataset generator, defining instructions and prompts, and
    executing the process to store results locally. The core components include using
    Python, Dria SDK, and models like GPT-4. Important keywords are Dria SDK, data
    generation, LLMs, Python, and tweet datasets. The guide emphasizes Dria SDK's
    simplicity and efficiency in setting up and generating data, although it notes
    current limitations in network capacity and data generation volumes.