-
Notifications
You must be signed in to change notification settings - Fork 0
/
sitemap.yaml
547 lines (547 loc) · 39.1 KB
/
sitemap.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
README.md:
hash: 8b7d5c4796e0b327b7d85d4a9918c2f9
summary: Dria is a comprehensive synthetic data infrastructure designed to support
diverse AI projects by offering scalable data pipelines and a multi-agent network.
It enables the creation, management, and orchestration of synthetic data from
both web and siloed sources, empowering AI development across tasks such as classification,
instruction following, and dialogue generation. Key features include massive parallelization,
compute offloading, flexible custom pipelines, and an extensive toolset derived
from verified research. Dria supports the creation of high-quality, diverse datasets
that mirror real-life distribution, without requiring personal GPU infrastructure.
cookbook/eval.md:
hash: 5fe5d63be37a0ee5eecda5f50d9eba3c
summary: This guide provides a comprehensive approach to evaluating Retrieval-Augmented
Generation (RAG) systems using synthetic data, focusing on improving AI-powered
question-answering applications. It outlines the setup, implementation, and evaluation
of RAG pipelines, emphasizing parameters such as embedding model choice, retrieval
methods, and answer generation strategies. By utilizing tools like RAGatouille
for retrieval and the instructor library for OpenAI API interactions, the guide
demonstrates generating diverse synthetic datasets for effective evaluation. Key
keywords include RAG systems, synthetic data, question answering, AI evaluation,
and data science. The complete implementation script is also linked for practical
reference.
cookbook/function_calling.md:
hash: c584cd8c66d1ee607dd84e706b350676
summary: The content explores function calling in programming, focusing on techniques,
best practices, and real-world applications. It aims to enhance coding efficiency
by providing insights into various programming techniques and software development
practices. Key points include effective API usage, optimizing function calls for
better performance, and adhering to best practices to improve software maintainability
and scalability. Essential keywords are function calling, programming techniques,
coding best practices, software development, and API usage.
cookbook/nemotron_qa.md:
hash: 92b57a83e52e37be73a20647ec416a31
summary: 'This guide outlines the implementation of Nvidia''s Preference Data Pipeline
using Dria, focusing on synthetic data generation and reward modeling techniques.
The process involves synthetic response generation for domain-specific queries
using Meta''s Llama 3.1 405B Instruct, followed by a scoring phase with Nemotron-4
340B Reward for alignment training via NeMo Aligner. The tutorial details step-by-step
instructions, including the use of a defined folder structure and implementation
of specific prompts and callback methods. The key steps include generating subtopics,
questions, and responses. The pipeline is constructed using Dria''s PipelineBuilder
and executed through an example Python script. Keywords: Nvidia, Dria, Preference
Data Pipeline, synthetic data, reward modeling, Llama 3.1, NeMo Aligner, GPT4O,
machine learning.'
cookbook/patient_dialogues.md:
hash: b900cb12d723828949dd6f7bf2512372
summary: 'The content focuses on enhancing understanding in healthcare through effective
patient dialogues and interactions. Key areas include patient care, healthcare
communication, and patient engagement. It provides examples of medical interactions
aimed at improving the quality of communication between healthcare providers and
patients. The objective is to showcase best practices in dialogue that contribute
to improved patient experience and outcomes. Keywords: patient care, healthcare
communication, patient engagement, medical interaction, dialogue examples.'
cookbook/preference_data.md:
hash: c0b34f5245dc5b2545e6b9f409082734
summary: Explore the innovative techniques of synthetic preference data generation
using Dria to enhance AI models and decision-making processes. This approach focuses
on generating high-quality synthetic data related to preference modeling, offering
valuable insights for AI applications. Key elements include synthetic data, preference
modeling, Dria, and data generation, making it a critical resource for optimizing
AI performance and analysis.
example_run.md:
hash: d22a5939591e3baa07b93227453f442b
summary: The provided script illustrates how to execute parallel model operations
using Python with Dria, MagPie, and asyncio for enhanced performance in machine
learning tasks. It sets up a `ParallelSingletonExecutor` to run multiple models
such as GPT4O_MINI, GEMINI_15_FLASH, and MIXTRAL_8_7B concurrently, processing
different prompts simultaneously. Key terms include Python, Dria, MagPie, asyncio,
machine learning, parallel model execution, and the specific models used. The
objective is to efficiently handle concurrent model execution with minimal blocking,
leveraging asynchronous programming techniques.
factory/clair.md:
hash: 0ad232cd84373a386069ad2009ca063d
summary: Clair is an AI-powered tool designed for automated evaluation of student
solutions, providing corrections and detailed feedback for improvement. It acts
as a solution checker and feedback provider for educational tasks, using AI to
assess student responses and offer corrected versions with comprehensive explanations.
Key features include AI Feedback, Automated Evaluation, Solution Checking, and
Education Technology. Clair can be integrated into datasets for generating automated
feedback, enhancing student learning experiences. The tool uses advanced AI models,
such as "gemma2:9b-instruct-fp16," to deliver precise and insightful evaluations.
factory/code_generation.md:
hash: 9ce059372df70ff187ef64349d1399fc
summary: The document describes a software engineering solution for code generation
using Singleton classes, specifically `GenerateCode` and `IterateCode`. These
classes generate and iterate over code based on instructions in specified programming
languages using AI models optimized for coding tasks, such as `Model.CODER` and
`Model.QWEN2_5_CODER_1_5B`. Key features include specifying instructions and the
programming language to generate code, which is returned along with the chosen
AI model. This tool is beneficial for tasks involving code generation, software
engineering, AI-driven programming, and understanding the Singleton design pattern.
factory/complexity_scorer.md:
hash: b83910aa425dc40abda6d8096306a678
summary: ScoreComplexity is a tool designed to evaluate and assign complexity scores
to a list of instructions, helping streamline data generation tasks in fields
like AI and machine learning. It analyzes instructions to provide a numerical
complexity score and identifies the AI model used for scoring, such as "llama3.1:8b-instruct-fp16."
This process can help in instruction analysis, complexity scoring, and data generation,
contributing to better data selection and alignment. Noteworthy elements include
its application in creating datasets for complexity scoring and its contribution
to improving AI models. Core keywords include complexity scoring, instruction
analysis, data generation, AI models, and machine learning.
factory/csv_extender.md:
hash: 34233ecd4c84fb699023e953edb89ff5
summary: 'The `CSVExtenderPipeline` class is designed to enhance CSV data processing
by automatically generating new rows from existing entries, facilitating improved
data analysis and organization. This pipeline works by adding new subcategories
to current categories, with the number of additional rows determined by user-specified
parameters: `num_values` and `num_rows`. The result is an extended dataset, increasing
the CSV''s utility in data-driven environments. Key focus areas include data extension,
CSV automation, and enhanced data processing, with applications in fields requiring
structured data expansion.'
factory/evaluate.md:
hash: 864b36f9a12a32ba2ca25a6b76dedd17
summary: EvaluatePrediction is a specialized AI tool designed to assess the quality
and accuracy of predicted answers in relation to specific questions and context.
This tool provides detailed feedback rather than a simple true or false result,
enhancing the evaluation process. Key features include insights into the prediction's
validity and the AI model used for assessment. Essential keywords include applied
AI, evaluation, predictions, feedback, and data generation. Suitable for incorporation
into data generation processes, the tool aims to improve the reliability of AI
predictions by offering comprehensive evaluation metrics.
factory/evolve_complexity.md:
hash: 52df1707b67c781bf548cbf9b2b278eb
summary: EvolveComplexity is a tool designed to transform simple instructions into
more complex versions while retaining their core meaning and intent, making it
ideal for instruction variations and data generation tasks. This process is particularly
beneficial for generating more intricate and enriched data variations in workflows
involving AI language models. The tool outputs the evolved instruction alongside
the original input and the AI model utilized for the transformation. Key concepts
include instruction evolution, complexity transformation, and AI-driven content
generation. Notable references include EvolComplexity Distilabel and WizardLM,
which explore similar themes in the realm of empowering large language models.
factory/graph_builder.md:
hash: cdc925edd3af4c7f07c9cda25e6e3c96
summary: 'GenerateGraph is a tool designed for extracting ontological relationships
from text, transforming concepts and their interconnections into a structured
graph. Key features include graph generation, ontology extraction, and the ability
to reveal AI relationships through a machine learning-driven approach. It processes
contextual input to deliver outputs that map out connections between concepts,
identified as node pairs with descriptive edges. Popular applications involve
constructing detailed graphs of related terms, aiding in data structure comprehension
and enhancing machine learning datasets. The model utilized, such as "qwen2.5:32b-instruct-fp16,"
underpins the generation process. Keywords: graph generation, ontology extraction,
machine learning, AI relationships, data structure.'
factory/instruction_backtranslation.md:
hash: 4bf4829f8a5d6f22bb3c7caa0ec3bcec
summary: 'InstructionBacktranslation is a specialized AI evaluation tool used for
analyzing instruction-generation pairs. It provides detailed scoring and reasoning
to ensure accurate AI responses. Key features include its ability to echo original
instructions and generated text while scoring responses for accuracy and relevance.
Important concepts involve instruction backtranslation, AI response scoring, instruction-generation
evaluation, and applied AI. This method enhances data generation by improving
the accuracy and quality of AI-generated text. Keywords: InstructionBacktranslation,
AI evaluation, data generation, instruction-management, machine learning.'
factory/instruction_evolution.md:
hash: 2ec3bfe35639eab5ecc3a71f0b6d7d0b
summary: EvolveInstruct is a versatile tool designed to evolve AI prompts through
diverse mutation strategies, including FRESH_START, ADD_CONSTRAINTS, DEEPEN, CONCRETIZE,
INCREASE_REASONING, and SWITCH_TOPIC. Its primary objective is to transform original
prompts into new versions while maintaining core intent, thereby enhancing complexity
and specificity. Ideal for data generation in machine learning, EvolveInstruct
allows users to add constraints, increase reasoning requirements, or change topics
with comparable difficulty levels. It supports various fields, such as AI prompts
and mutation strategies, and uses models like "gemma2:9b-instruct-fp16" to generate
transformed prompts.
factory/iterate_code.md:
hash: e5672d2a5b07add21e6da0fb6c7dc3b1
summary: IterateCode Singleton is a tool designed for code optimization and enhancement
using AI-generated instructions across various programming languages. By inputting
code, specific instructions, and the desired programming language, users can obtain
an improved version of their code. The tool provides outputs including the original
instruction, programming language, original code, and iterated code, along with
the AI model used. Key concepts include applied AI, code optimization, and software
development, with a particular focus on practical improvements in languages like
Python.
factory/list_extender.md:
hash: e70524882e8d281a8e1c5af466237316
summary: ListExtender is a powerful data generation tool designed for AI model applications,
operating as a singleton class that extends lists by generating unique, related
items. It inputs an initial list and outputs an expanded, unique list by employing
AI-driven techniques, specified by the included AI model. Its primary function
is to enhance and diversify datasets for various applications, including software
development. Key features include its emphasis on maintaining the uniqueness of
items and its use in generating comprehensive extended lists suitable for applications
like dataset enrichment and AI training. Important keywords for SEO include List
Generation, Data Generation, AI Tools, and Software Development.
factory/magpie.md:
hash: 3fe5f28769368c37b197924bfa42d24c
summary: MagPie is a versatile template for generating dialogues between AI personas,
focusing on efficient conversation turn management between an instructor and a
responder. It's designed to handle configurable conversation turns and is useful
for creating dialogue datasets for natural language processing tasks. Key features
include customizable personas, dialogue generation, and a focus on bias mitigation
and fairness in AI. The outputs consist of detailed dialogue exchanges and the
model used, such as "gemma2:9b-instruct-fp16". MagPie is relevant for those interested
in AI dialogue generation, conversation management, and responsible AI development.
factory/multihopqa.md:
hash: 4b68ce1970aadf6e6def41b6f082adc4
summary: 'The "MultiHopQuestion" template is designed to generate multi-hop questions
using exactly three document chunks, producing questions of increasing complexity:
1-hop, 2-hop, and 3-hop. This approach is beneficial for data generation in the
fields of question answering and document processing. By leveraging AI models,
it provides answers based on the interconnected information from the documents.
Key features include a detailed schema for input and output, ensuring that questions
require different levels of reasoning. This is particularly useful for enhancing
AI models in multi-hop question answering tasks. Keywords associated with this
template include multi-hop questions, data generation, AI models, and document
processing.'
factory/persona.md:
hash: 8cb910a39f1dab99d602a31d86e7074f
summary: The document provides an overview of "Persona," a pipeline composed of
four singletons designed to generate character bios and backstories based on persona
traits and simulation context. The primary components, PersonaBio and PersonaBackstory,
produce short bios and extended narratives, respectively. These tools are ideal
for use in simulations or creative scenarios requiring detailed character development.
By utilizing specified AI models, such as "meta-llama/llama-3.1-8b-instruct,"
the pipeline offers structured data generation for varied settings like medieval
villages or futuristic cities. Key terms include character generation, AI narratives,
creative writing, persona traits, and simulation development.
factory/qa.md:
hash: 612b016ffa04cb8a87a443e173544a1c
summary: The "QuestionAnswer" pipeline leverages AI models to generate contextually
relevant answers by adopting specific personas. Designed for applications in natural
language processing, this workflow processes inputs such as context, persona,
and the number of questions to create structured, reliable responses. Utilizing
JSON Schemas ensures model-generated outputs adhere to required formats, thus
enhancing the accuracy and reliability of answers. Core components include question
answering, AI models, data generation, and persona adoption. This approach is
particularly useful for developing datasets and ensuring structured, context-driven
AI interactions.
factory/quality_evolution.md:
hash: 023124d647010d0481eccb5ecf3172e7
summary: EvolveQuality is a template designed to enhance text responses by focusing
on specific quality dimensions such as helpfulness, relevance, depth, creativity,
and detail level. It improves the original response text through various evolution
methods, making it a valuable tool for generating high-quality data. This process
involves using AI models, like "gemma2:9b-instruct-fp16," to rewrite or refine
responses, ensuring they are more informative and tailored to the user's needs.
Key concepts include response quality evolution, AI enhancement, and text generation.
The project is detailed in resources like the [EvolInstruct Distilabel] and related
scientific studies on data selection for instruction tuning.
factory/search.md:
hash: ad5f5d75064bdcf33b3f20aa3c33d0a9
summary: The documentation for the SearchWeb singleton implementation outlines its
functionality as a web search template designed to generate structured and localized
search results in multiple languages. It operates using the singleton pattern
and allows users to perform searches with specified queries and language preferences
while controlling the number of results returned. Each search result includes
details such as the original query, link, snippet, and title. This implementation
is particularly useful for data generation tasks, providing a consistent and scalable
method to access web data. Keywords associated with this implementation include
web search, singleton pattern, structured data, data generation, and Python.
factory/self_instruct.md:
hash: 03e50792c805009e2732a5b0bca7e62d
summary: SelfInstruct is an AI-driven template designed to automate the generation
of user queries for AI applications, focusing on specific criteria and context.
This process assists in creating instructions or queries useful for testing or
training AI systems, enhancing efficiency in data generation. Key features include
the ability to specify the number of queries, criteria for generation, application
description, and applicable context. The output provides a list of generated instructions
and the AI model employed. Keywords include AI, query generation, data generation,
instructions, and self-instruct.
factory/semantic_triplet.md:
hash: 68c2795b52b59d063e3c224aebf8020d
summary: The "SemanticTriplet" is a Python tool designed to generate semantic triplets,
which are JSON objects containing three textual units with specified semantic
similarity scores. This task is useful for natural language processing (NLP) and
educational content development. The input parameters include the type of textual
unit (e.g., sentence or paragraph), language, desired similarity scores, and the
educational difficulty level. The output is a JSON object featuring three related
but distinct textual pieces, useful in evaluating and comparing semantic similarities
for AI models. Core keywords include NLP, semantic triplet, JSON generation, Python,
and data processing.
factory/simple.md:
hash: 9071c17442f43b92e104ad0b7a7dfd5e
summary: The content describes "Simple," a singleton template designed for basic
text generation tasks. It utilizes specified AI models to generate text based
on an input prompt, offering a streamlined workflow for automated writing and
dataset generation. The template's key features include ease of use and efficient
text generation, making it suitable for workflows requiring simple templates,
dataset generation, and AI model integration. This tool can be particularly beneficial
for tasks involving text generation, simple template implementation, and automated
writing.
factory/subtopic.md:
hash: 168e9d24337d7ef08b4f81a2535477c3
summary: GenerateSubtopics is an AI-driven template designed to enhance content
creation by breaking down a main topic into relevant subtopics. It uses AI generation
to create structured workflows that facilitate data generation and content organization.
Key features include specifying the main topic as input and providing a list of
generated subtopics along with the AI model used. This tool is particularly useful
for topics like "rust language," where it generates informative subtopics such
as "Ownership and Borrowing Concepts" and "Memory Safety Without Garbage Collection."
Core keywords include AI generation, subtopics, content creation, data generation,
and workflow.
factory/validate.md:
hash: b559a2c786213ffc84bff9e6a9cd6858
summary: 'ValidatePrediction is a singleton class designed for assessing the accuracy
of AI predictions through contextual and semantic comparison. It evaluates whether
a predicted answer aligns with the correct answer, providing a boolean validation
result alongside the original prediction and correct answer. Key features include
prediction validation, semantic comparison, and contextual evaluation, employing
AI models for precise validation. This class is particularly useful in data generation
and AI model evaluation, ensuring accuracy and reliability in AI predictions.
Keywords: prediction validation, semantic comparison, AI model, contextual evaluation,
data generation.'
factory/web_multi_choice.md:
hash: 14746cba2bb4e9ea2a446ddc84363f3a
summary: WebMultiChoice is a specialized task designed to effectively answer multiple-choice
questions by leveraging web search and AI evaluation. It works by generating a
search query from the question and possible answers, then selecting a relevant
URL, scraping and analyzing the content to identify the best answer. This process
is particularly useful for educational evaluations and involves models like the
`QWEN2_5_7B_FP16`. Key features include utilizing AI models, web search capabilities,
and focus on accuracy for educational purposes.
how-to/batches.md:
hash: 97e4629a57399db53b64e4ff692ecac4
summary: 'The provided content explains how to utilize Batches with the `ParallelSingletonExecutor`
in Dria for running multiple instructions concurrently, enhancing task efficiency.
It involves setting up a `Dria` client, a `Singleton` task, and a `ParallelSingletonExecutor`
object to manage parallel execution of tasks using models like `Model.QWEN2_5_7B_OR`
and `Model.LLAMA3_2_1B`. The guide includes a Python code example demonstrating
the setup, emphasizing key concepts of async programming, parallel execution,
and task management in Dria. This approach is especially beneficial when dealing
with a large number of instructions that need to be processed simultaneously.
Key terms: Batches, Parallel Execution, Dria, Async Programming, Task Management.'
how-to/data_enrichment.md:
hash: a73df5f52feeaecd1f23540736249cb7
summary: The content explains the data enrichment capabilities of the `enrich` method
using Dria, which enhances datasets by adding new fields for richer data representation
in analytics and machine learning. Key steps include defining a Pydantic schema
for output fields, creating a prompt to guide data transformation, and running
the enrichment process. A basic text summarization example is presented, where
a dataset gains a "summary" field through a defined schema and prompt. It includes
a comprehensive demonstration of enriching customer reviews with additional analyses
like sentiment and keywords. Use cases highlighted include enriching customer
feedback, enhancing text corpora with metadata, and improving workflows by categorizing
and filtering large volumes of unstructured data. Core keywords featured are data
enrichment, Dria, schema, prompt, text summarizing, sentiment analysis, and key
insights.
how-to/data_generators.md:
hash: 1a7f4b3a1012254217b5ef21d5dce05c
summary: The Dataset Generator in Dria is a robust tool designed for efficient dataset
generation and transformation, offering both prompt-based and singleton-based
workflows. Key features include parallel execution, automatic schema validation,
support for multiple AI models, search capabilities, and sequential workflow processing.
Users can employ prompt-based generation via the `Prompt` class and instructions,
or utilize custom workflow classes called singletons. Model configurations allow
for flexible use of single or multiple AI models, catering to various data generation
needs. Important keywords include Dataset Generation, Data Transformation, Prompt-based
Generation, Singletons, and AI Models.
how-to/dria_datasets.md:
hash: 17f267b3fe2bf7ecfa40f21b5163cee2
summary: The Dria Dataset, facilitated by the `DriaDataset` class, is an advanced
framework for data generation and management. Core features include data persistence,
flexible initialization, and robust data management with schema validation. Users
can create datasets from scratch or initialize them from existing formats like
JSON, CSV, or even import data from Hugging Face datasets. This ensures compatibility
and consistency in handling complex datasets while offering multiple import/export
options. The framework is tailored for applications in data generation, schema
validation, and comprehensive data management, making it a versatile tool for
developers and data scientists.
how-to/dria_datasets_exports.md:
hash: e372f345a42a5f72fda51db21b8b557e
summary: The document provides a detailed guide on exporting and formatting data
from the DriaDataset using the Formatter class and integrating it with HuggingFace's
TRL framework. It explains how to export data into various formats such as pandas
DataFrame, JSON, and JSONL, and emphasizes preparing data for different training
setups. The guide covers the supported format types like Standard and Conversational,
along with subtypes such as LANGUAGE_MODELING and PROMPT_COMPLETION. Additionally,
it demonstrates converting data into HuggingFace TRL-compatible formats for seamless
use with various trainers, using examples like the CONVERSATIONAL_PROMPT_COMPLETION
format. Keywords include DriaDataset, data export, data formatting, HuggingFace
TRL, and machine learning.
how-to/formatting.md:
hash: 7d27581e5b87c20076f3ae969e92a50d
summary: The content provides a comprehensive guide on the `Formatter` class, a
tool designed for converting datasets into training-ready formats compatible with
Dria Network and HuggingFace's TRL framework. The `Formatter` supports various
format types, including Standard and Conversational, with subtypes such as LANGUAGE_MODELING
and PROMPT_COMPLETION. Key components include `FieldMapping` and `ConversationMapping`,
which map original data keys to formatted ones. The guide also details how Dria
Network can transform generated data for seamless integration with TRL's trainers
like BCOTrainer, PPOTrainer, and SFTTrainer, highlighting its plug-n-play capabilities.
Key terms include Formatter, Data Formatting, Dria Network, HuggingFace TRL, Machine
Learning, and Dataset Transformation.
how-to/functions.md:
hash: e5a9a97043f059a0cfc83481f1fa67ab
summary: Dria Nodes offers extensive workflow automation capabilities through built-in
and custom functions, enabling seamless integration and execution of tasks. It
supports custom workflow tools such as `CustomTool`, which allows for defining
unique operations, and `HttpRequestTool`, which facilitates HTTP requests. Key
features include the ability to perform generative steps using these tools, and
executing tasks like financial queries for stock prices or cryptocurrency data
from APIs. Users can expand functionality by adding custom functions to workflows
using the `WorkflowBuilder`. This empowers efficient automation for processes
involving HTTP requests, operations on integers, or real-time data retrieval,
crucial for optimizing workflow automation and enhancing functionality. Keywords
include Dria Nodes, workflow automation, custom functions, HTTP requests, and
generative steps.
how-to/models.md:
hash: 1551d580678398d738e3f2eb1fb3862e
summary: Explore the diverse range of AI models available in the Dria Network, featuring
popular models like Nous's Hermes, Microsoft's Phi3, and Meta's Llama series.
Key offerings include quantized versions, varying parameter counts, and specialized
instructions suited for specific tasks, with models from leading tech giants like
Google, Alibaba, and OpenAI. AI models offered include Llama versions by Meta,
Qwen models by Alibaba, and OpenAI's GPT-4 among others, described with specifics
like parameter size, quantization details, and task specializations. This catalog
serves as a comprehensive guide to the available AI models, helping users select
the right model for their needs based on attributes such as parameter scale, context
length, and precision settings.
how-to/pipelines.md:
hash: 1ce47a066ab5e4e15b74c01a640630c6
summary: The guide explains how to create and implement efficient pipelines for
executing workflows in parallel using Dria, focusing on asynchronous processing.
It emphasizes building pipelines through a sequence of interconnected workflows,
each comprising multiple steps that process outputs into subsequent inputs. A
detailed example is provided, illustrating a pipeline for generating question-answer
pairs. Core implementation involves using classes like `Pipeline`, `PipelineBuilder`,
and `StepTemplate` to define and execute steps with callbacks like `scatter` for
parallel task execution. The content highlights the importance of customizing
workflows and effectively using Dria's asynchronous capabilities for enhanced
data processing, with key terms including pipelines, workflows, Dria, and asynchronous
processing.
how-to/prompters.md:
hash: fac964842dd69cb73c69c38045f80267
summary: The guide explains how to use the `Prompt` class with Pydantic models in
Python to create structured prompts with schema validation. It covers the essential
steps including defining an output schema using Pydantic, creating a dataset with
`DriaDataset`, initializing a `Prompt` with template text, and generating data
through `DatasetGenerator`. The tutorial showcases a complete example of generating
tweets on various topics, leveraging tools like Pyasyncio and models like Model.LLAMA_3_1_8B_OR.
Key concepts include structured prompt generation, schema validation, and integration
with language models using tools like Pydantic and Python.
how-to/selecting_models.md:
hash: c91d6b866f8ee61e810e762517eef3a4
summary: The article provides an overview of model selection and task assignment
in the Dria Network, a platform that utilizes a network of LLMs (Large Language
Models) for AI-driven task execution. It highlights how tasks can be specifically
assigned to models using the `Model` enum in the Dria SDK, and provides examples
of task distribution among available nodes, such as using the `LLAMA3_1_8B_FP16`
model. Features include handling task execution asynchronously across multiple
nodes, enabling comparison of results by assigning a single task to multiple models,
and the ability to select specific model providers like OpenAI and Gemini. Key
concepts include the Dria Network, LLM Models, task assignment, and model selection.
how-to/singletons.md:
hash: 8c7d2274ee071b20688e4f9e5dd187d9
summary: "This guide on using and creating Singletons in Dria outlines how these\
\ pre-built task templates streamline task handling with Pydantic validation,\
\ offering efficiency in workflows. The text explains how to utilize ready-made\
\ Singletons from Dria\u2019s Factory, like the `Simple` singleton, which executes\
\ tasks using specified prompts. It further delves into crafting custom Singletons\
\ for tailored needs, detailing their core components: input fields, output schema,\
\ and workflow methods. By leveraging structured outputs and workflows, users\
\ can efficiently manage data generation and task validation. Keywords: Singletons,\
\ Dria, Pydantic, Task Management, Workflow, Data Generation."
how-to/structured_outputs.md:
hash: a1022bc9a23283121f07adc104a9759b
summary: The content discusses implementing structured outputs in AI workflows using
the Dria SDK for reliable JSON Schema compliance. It highlights how structured
outputs ensure models adhere to specified JSON schemas, preventing issues like
missing required keys or incorrect enum values. The feature is supported by providers
such as OpenAI, Gemini, and Ollama, but is limited to models capable of function
calling. The process involves attaching an OutputSchema to a workflow using the
WorkflowBuilder instance, as demonstrated in a Python code example. Key terms
include structured outputs, Dria SDK, AI workflows, JSON Schema, and function
calling.
how-to/tasks.md:
hash: 07bad9c830d85a863b68a0f8441b1887
summary: 'The Dria network utilizes tasks as fundamental units of work, enabling
efficient distributed computing. Tasks, which consist of workflows and models,
are executed asynchronously by nodes within the network. Key features include
model selection, asynchronous execution, scalability, and result retrieval. The
task lifecycle involves creation, publication, execution, result retrieval, and
completion, making it crucial for scalable operations in environments that leverage
Dria''s distributed computing capabilities. Keywords: Dria network, tasks, distributed
computing, asynchronous execution, workflows, scalability.'
how-to/workflows.md:
hash: ea5a2fdfb08fe5fcd1c1335f6767d18f
summary: 'The article titled "Custom Workflows within Dria Network" provides a comprehensive
guide on creating custom workflows using the Dria SDK, particularly through the
`dria_workflows` package. It focuses on constructing workflows that involve Large
Language Models (LLMs) and memory operations for efficient task execution. The
guide outlines key components such as configuration settings, steps, flow, and
memory operations, with examples on how to set parameters like `max_steps`, `max_time`,
and `max_tokens`. It explains the types of steps, like `generative_step` for text
generation, and how memory operations facilitate data transfer. Additionally,
it includes practical examples, such as creating a workflow for random variable
generation and validation, highlighting the integration of LLMs and conditional
logic for decision-making. Keywords: workflows, Dria SDK, LLM integration, custom
workflows, memory operations, task execution, Dria nodes.'
installation.md:
hash: 904fef2b131a370d84bbf4082c310c5d
summary: Learn how to install the Dria SDK for Python, designed for Python 3.10
or higher, by following a straightforward setup process that includes creating
a new conda environment and using pip to install the package. Address potential
installation issues by separately installing the coincurve library and resolving
GCC-related problems with tools like brew and xcode-select. The Dria Network,
currently in alpha, is freely accessible for data generation, and users can contribute
by running a node, enhancing network scalability and throughput. Access examples
and guides to start using the SDK and build synthetic data pipelines efficiently.
modules/structrag.md:
hash: 25a7ed3b2c516366cb1c1349894765e6
summary: 'StructRAG is a retrieval-augmented generation (RAG) framework designed
to enhance large language models (LLMs) for complex, knowledge-intensive reasoning
tasks. It addresses issues of scattered and noisy information by restructuring
documents using cognitive-inspired techniques. The framework includes three main
components: StructRAGSynthesize, which organizes initial documents into structured
knowledge units; StructRAGSimulate, which creates simulation based on the structured
data; and StructRAGJudge, which evaluates the relevance and correctness of the
solutions. StructRAG has shown state-of-the-art results in various tasks, leveraging
models like Qwen2.5-7B-Instruct and others available on platforms such as Hugging
Face. Key concepts include document structuring, knowledge reasoning, and language
models, with a focus on improving accuracy and reasoning capabilities.'
modules/structrag2.md:
hash: 0e7e709a338a279efacbbe557f86188d
summary: 'StructRAG is a methodology aimed at enhancing the reasoning capabilities
of Large Language Models (LLMs) through hybrid information structuring during
assessments. It leverages knowledge restructuring via a Hybrid Router to determine
the format of structured information, optimizing machine learning inference. The
provided Python code illustrates its usage, utilizing StructRAG components to
score the complexity of various tasks. Key concepts include LLMs, hybrid information
structuring, knowledge restructuring, and machine learning inference. For further
details, refer to the research paper, "StructRAG: Boosting Knowledge Intensive
Reasoning of LLMs via Inference-time Hybrid Information Structurization."'
node.md:
hash: 827fab98192f10e974b90a98781c9596
summary: The guide provides a quick start for setting up a node on the Dria decentralized
AI network, developed by FirstBatch. This setup requires no wallet activity and
takes only a few minutes. Users can find node requirements on GitHub and follow
simple steps, including downloading the launcher, running it, and entering an
Ethereum wallet private key. Additional options for serving models and API tool
integration are available. MacOS users might need to bypass security warnings.
Post-setup involves filling out a form for a Discord role and engaging with Dria's
online community. Key terms include decentralized network, AI collaboration, node
setup, Dria, and FirstBatch.
quickstart.md:
hash: 0f8dcb5fc8556469addb10472d55c30e
summary: This guide provides a quick start for using the Dria SDK to generate datasets
with tweets by leveraging large language models (LLMs). Key steps include creating
a dataset, attaching a dataset generator, defining instructions and prompts, and
executing the process to store results locally. The core components include using
Python, Dria SDK, and models like GPT-4. Important keywords are Dria SDK, data
generation, LLMs, Python, and tweet datasets. The guide emphasizes Dria SDK's
simplicity and efficiency in setting up and generating data, although it notes
current limitations in network capacity and data generation volumes.