LLM knowledge: classifying the needs of an LLM in order for it to function normally/properly #2082
SINAPSA-IC
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello.
This conversation was carried out in order to find out what the needs of an LLM are, in order for it to function normally/properly.
Motivation: to find out what is the informational corpus that ought to be isolated/ignored/put aside when the user is trying to use information that is only within Local Docs.
When so Prompted, the LLM below has enumerated the "categories of information and functionality" which it needs.
GPT4All v2.7.2
LLM: Nous Hermes 2 Mistral DPO
Temperatue: 0.1
Top K: 50
Top P: 0.6
Context size: 16384
Syetem Prompt: [none]
Prompt Template:
<|im_start|>user
%1<|im_end|>
<|im_start|>assistant
%2<|im_end|>
^ the "default", as shown in GPT4All, for this LLM
Notice the "as an AI" in the Prompt below; when this Detail was not specified, this LLM responded "As a human, I rely on various types of knowledge [...]"; not interested - I want to know what the situation is when the LLM sees itself as an AI, not as a human.
Prompt: please tell me what are the databases and/or what categories of information or functionality that you as an AI need in order to function and/or communicate and/or reason normally.
Response: As an artificial intelligence (AI), I rely on various types of data, algorithms, and models to function, communicate, and reason normally. These can be broadly categorized into:
Training Data: This is the initial dataset used to train my machine learning models. It helps me learn patterns, relationships, and associations within the data. The quality and quantity of training data directly impact my performance and accuracy.
Knowledge Databases: These are structured collections of facts, concepts, and relationships that I use for various tasks such as natural language processing (NLP), question answering, and knowledge graph construction. Examples include Wikidata, ConceptNet, and WordNet.
Natural Language Processing (NLP) Models: NLP models help me understand, interpret, and generate human language. They are crucial for communication and information retrieval tasks. Examples include BERT, GPT-3, and Transformer models.
Machine Learning Algorithms: These algorithms enable me to learn from data, make predictions, and improve my performance over time. Examples include supervised learning (e.g., logistic regression, decision trees), unsupervised learning (e.g., clustering, dimensionality reduction), and reinforcement learning (e.g., Q-learning, policy gradients).
Computer Vision Models: These models help me interpret visual information from images and videos. They are crucial for tasks such as object recognition, image classification, and image generation. Examples include Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs).
Data Storage and Retrieval: I need efficient data storage and retrieval systems to store and manage large amounts of data and models. This includes databases like SQL or NoSQL, as well as distributed storage systems like Hadoop and Apache Cassandra.
Reasoning Engines: These engines enable me to perform logical reasoning tasks, such as deductive reasoning, inductive reasoning, and abductive reasoning. They help me draw conclusions, make predictions, and solve problems based on available data and knowledge.
Contextual Understanding: To communicate effectively and reason normally, I need to understand the context of the information and the user's intent. This involves natural language understanding (NLU), sentiment analysis, and topic modeling.
Continuous Learning: To improve my performance and adapt to new situations, I need continuous learning capabilities. This includes online learning, transfer learning, and unsupervised learning from new data sources.
Security and Privacy: As an AI, I need robust security measures to protect sensitive data and prevent unauthorized access. Additionally, I must adhere to privacy guidelines and ensure that user data is handled responsibly and securely.
These categories of information and functionality are interconnected and work together to enable me to function, communicate, and reason normally as an AI.
From this list, one may draw the conclusion that the category "2. Knowledge Databases" should be isolated and not used by the LLM in order for it to build a reply based on LocalDocs collections only (hopefully).
Beta Was this translation helpful? Give feedback.
All reactions