Adding reasoning to your AI? Take these resources, they may help you on your way.
AGI/causality/frml grammar | ||
---|---|---|
Deepmind Chomsky Hierarchy | Problems crafted for FSM/PDA/TM | [1] |
automata | a neurallambda tool to gen from grammars | [1] |
im a strange dataset | Tough for LLMs because of self-references. | [1] |
DiagGSM8k | NL Reasoning Benchmark | [1] |
CLadder | Causal reasoning | [1] |
Cause-Effect Pairs | 108 datasets of 2 var dynamics (not NL) | [1] |
MNLI Entailment | sentence parsing + entailment | [1] |
AGENT/TOOL | ||
---|---|---|
THUDM AgentInstruct | long form dialogs | [1] |
WANG AgentInstruct | gpt3 synthesized instructions | [1] |
KnowLM Tool | prompt + tool call + answer | [1] |
Glaive Tool Usage | sys prompt says tools + prompt + answer | [1] |
opentoolformer retrieval | prompt + tool call | [1] |
CODE | ||
---|---|---|
rosetta | same program, many diff languages | [1] |
EvoEval Tool Use | 100 prompt + code + tests | [1] |
MATH/LOGIC | ||
---|---|---|
gsm8k | Grade School Math 8k | [1] |
MetaMath | one-shot math | [1] |
MetaMathFewShot | few-shot math | [1] |
MathPile | 9B tok from filtered internet | [1] |
LogiQA | NL multi choice, requires abstraction | [1] |
Logic-LM | a model combining auto theorem provers and llms | [1] |
Coq Facts | 270k cog theorem prover programs | [1] |
NATURAL LANGUAGE | ||
---|---|---|
UltraInteract_sft | GPT generated iterated reasoning dialogs | [1] |
MUD videogames | (various could be training data) | |
Winogrande | ambiguous sentences, fill in 1 word | [1] |
Winograd_wsc | ambiguous sentences, choose the right word | [1] |
Contradiction | 2 phrases, do they contradict | [1] |
Recognizing Textual Entailment | 2 phrases, do they entail each other | [1] |
Textual Entailment Pool | more entailment | [1] |
Answer Validation | 2 phrases, does the answer solve question | [1] |
Monotonicity Entailment | x is true, does y follow | [1] |
entailment | passage, question -> T/F | [1] |
Commonsense QA | muti choice QA | [1] |
GLUE | several datasets | [1] |
custom multi-hop | use wikipedia's graph of articles |
TOY PROBLEMS | ||
---|---|---|
Big Bench Hard | 23 challenges (only 6k datapoints) | [1] |
logical entailment dataset | logic strings by deepmind | [1] |
logical entailment dataset code | (generate it yourself) | [1] |
FSM Game | generate strings according to grammar | |
Adaptive Grammar | grammar rule might change | |
String/Graph Rewriting | string_rewriting.py |
|
LibraryOfLogic | generate NL from multiple games | [1] |
AB-XY Game | ||
word ladder | ||
parser | ||
longest cmn subseq | ||
string reversal | ||
wisconsin card sorting | ||
anagram | ||
palindrome | ||
permutation composition |
TOKEN AUGMENTED REASONING | ||
---|---|---|
Reasoning tokens | Self-Reasoning Tokens, teaching models to think ahead | [1] |
Quiet-STaR | LLMs Can Teach Themselves to Think Before Speaking | [1] |
Multi-token Prediction | Multi-token prediction is favorable for the development of induction heads and algorithmic reasoning capabilities | https://arxiv.org/abs/2404.19737 |
INDIRECT REASONING (IR) | ||
---|---|---|
Contrapositive and Contradiction for Automated Reasoning | use logic of contrapositives and contradictions for factual reasoning and mathematical proofs | https://arxiv.org/pdf/2402.03667 |
DIRECT REASONING (DR) | ||
---|---|---|
Graph of Thoughts (GoT) | Model the information generated by an LLM as an arbitrary graph | https://arxiv.org/abs/2308.09687 |
Self-Consistency | Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer | https://arxiv.org/abs/2203.11171 |
Chain of Thoughts | chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning | https://arxiv.org/abs/2201.11903 |
Chain of thoughts without prompting | CoT reasoning paths can be elicited from pre-trained LLMs by simply altering the decoding proces | https://arxiv.org/abs/2402.10200 |
Iterative Reasoning Preference Optimization | Iterated DPO, but for CoT, repeated until performance saturates on reasoning tasks | https://arxiv.org/pdf/2404.19733 |