-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathindex.html
272 lines (252 loc) · 13.5 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="description"
content="We introduce the first autonomous LLM agent which we call "reason for future, act for now" with provable regret guarantees and outstanding empirical performances.">
<meta name="keywords" content="Large Language Model, Agent">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency</title>
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
rel="stylesheet">
<link rel="stylesheet" href="./static/css/bulma.min.css">
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="./static/css/bulma-slider.min.css">
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<link rel="stylesheet" href="./static/css/index.css">
<link rel="icon" href="./static/images/favicon.svg">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script defer src="./static/js/fontawesome.all.min.js"></script>
<script src="./static/js/bulma-carousel.min.js"></script>
<script src="./static/js/bulma-slider.min.js"></script>
<script src="./static/js/index.js"></script>
</head>
<body>
<section class="hero">
<div class="hero-body">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column has-text-centered">
<h1 class="title is-1 publication-title">Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency</h1>
<div class="is-size-5 publication-authors">
<span class="author-block">
<a href="https://scholar.google.com/citations?user=uEl_TtkAAAAJ&hl=en">Zhihan Liu</a><sup>*1</sup>,</span>
<span class="author-block">
<a href="http://mousehu.cn/">Hao Hu</a><sup>*2</sup>,</span>
<span class="author-block">
<a href="https://shenao-zhang.github.io/">Shenao Zhang</a><sup>*1</sup>,</span>
</span>
<span class="author-block">
<a href="https://scholar.google.com/citations?user=bzPCv_8AAAAJ&hl=en">Hongyi Guo</a><sup>1</sup>,
</span>
<span class="author-block">
<a href="https://openreview.net/profile?id=~Shuqi_Ke1">Shuqi Ke</a><sup>3</sup>,
</span>
<span class="author-block">
<a href="https://scholar.google.com/citations?user=1G8RH_YAAAAJ&hl=en">Boyi Liu</a><sup>1</sup>,
</span>
<span class="author-block">
<a href="https://zhaoranwang.github.io/">Zhaoran Wang</a><sup>1</sup>
</span>
</div>
<div class="is-size-5 publication-authors">
<span class="author-block"><sup>1</sup>Northwestern University,</span>
<span class="author-block"><sup>2</sup>Tsinghua University,</span>
<span class="author-block"><sup>3</sup>The Chinese University of Hong Kong</span><br>
<small>(* indicates equal contribution)</small><br>
<small>ICML 2024</small><br>
</div>
<div class="column has-text-centered">
<div class="publication-links">
<!-- PDF Link. -->
<span class="link-block">
<a href="https://arxiv.org/pdf/2309.17382.pdf"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fas fa-file-pdf"></i>
</span>
<span>Paper</span>
</a>
</span>
<span class="link-block">
<a href="https://arxiv.org/abs/2309.17382"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="ai ai-arxiv"></i>
</span>
<span>arXiv</span>
</a>
</span>
<!-- Code Link. -->
<span class="link-block">
<a href="https://github.com/agentification/RAFA_code"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fab fa-github"></i>
</span>
<span>Code</span>
</a>
</span>
<!-- Dataset Link. -->
<!-- <span class="link-block">
<a href="https://github.com/askforalfred/alfred"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="far fa-images"></i>
</span>
<span>Data</span>
</a> -->
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<section class="hero teaser">
<div class="container is-max-desktop">
<div class="hero-body">
<h2 class="subtitle has-text-centered">
<b>Overview:</b> We introduce the first autonomous LLM agent which we call "reason for future, act for now" (<span class="dnerf">RAFA</span>) with provable regret guarantees and outstanding empirical performances.
</h2>
<img src="./static/images/intro.svg" width="100%"/>
</div>
</div>
</section>
<section class="section">
<div class="container is-max-desktop">
<!-- Abstract. -->
<div class="columns is-centered text-align: center">
<div class="column is-four-fifths">
<div class="columns is-centered text-align: center">
<h2 class="title is-3">Abstract</h2>
</div>
<div class="content text-align: left">
<p>Large language models (LLMs) demonstrate impressive reasoning abilities, but translating reasoning into actions in the real world remains challenging. In particular, it remains unclear how to complete a given task provably within a minimum number of interactions with the external environment, e.g., through an internal mechanism of reasoning. To this end, we propose a principled framework with provable regret guarantees to orchestrate reasoning and acting, which we call "reason for future, act for now" (<span class="dnerf">RAFA</span>). Specifically, we design a prompt template for reasoning that learns from the memory buffer and plans a future trajectory over a long horizon ("reason for future"). At each step, the LLM agent takes the initial action of the planned trajectory ("act for now"), stores the collected feedback in the memory buffer, and reinvokes the reasoning routine to replan the future trajectory from the new state.
</p>
<p>The key idea is to cast reasoning in LLMs as learning and planning in Bayesian adaptive Markov decision processes (MDPs). Correspondingly, we prompt LLMs to form an updated posterior of the unknown environment from the memory buffer (learning) and generate an optimal trajectory for multiple future steps that maximizes a value function (planning). The learning and planning subroutines are performed in an "in-context" manner to emulate the actor-critic update for MDPs. Our theoretical analysis proves that the novel combination of long-term reasoning and short-term acting achieves a √T regret. In particular, the regret bound highlights an intriguing interplay between the prior knowledge obtained through pretraining and the uncertainty reduction achieved by reasoning and acting. Our empirical validation shows that it outperforms various existing frameworks and achieves nearly perfect scores on a few benchmarks. By incorporating “classical” MDP techniques, RAFA introduces the first autonomous LLM agent with provable regret guarantees. Notably, LLMs do not function as actors, critics, or learned world models, but rather as an internal mechanism that improves them iteratively.
</p>
</div>
</div>
</div>
<!--/ Abstract. -->
</section>
<section class="section">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column is-full-width">
<h2 class="title is-3">Method</h2>
<div class="columns is-centered">
<figure>
<img src="./static/images/algo1.jpg"/>
</figure>
</div>
<div class="content text-align: left">
<p>At the t-th step of RAFA (Algorithm 1), the LLM agent invokes the reasoning routine, which learns from the memory buffer and plans a future trajectory over a long horizon ("reason for future" in Line 6), takes the initial action of the planned trajectory (“act for now” in Line 7), and stores the collected feedback (state, action, and reward) in the memory buffer (Line 8). Upon the state transition of the external environment, the LLM agent reinvokes the reasoning routine to replan another future trajectory from the new state (Line 6 following Line 9). To ensure the learning and planning stability, we impose the switching condition (Line 10) to decide whether to incorporate the newest chunk of history into the information state, which is used in the reasoning routine as contexts. For different concrete settings, we use different implementations of the LLM learner-planner. Please check our paper for more details.
</p>
<div class="columns is-centered">
<figure>
<img src="./static/images/intro.gif"/>
</figure>
</div>
<h2 class="title is-3">Experimental Results</h2>
<div class="content text-align: left"></div>
<p>Our empirical validation shows that RAFA outperforms various existing frameworks in interactive decision-making tasks, including ALFWorld, BlocksWorld, Game of 24, and a new benchmark based on TicTacToe. In a few benchmarks, it achieves nearly perfect scores.</p>
</div>
<h3 class="title is-4">(A) Game of 24</h3>
<div class="content text-align: left">
<p>Game of 24 is a mathematical puzzle to obtain 24 from four natural numbers through basic arithmetic operations. RAFA uses the beam search planner (Algorithm 4 in our paper) on Game of 24.</p>
</div>
<div class="columns is-centered" align="center">
<figure>
<img src="./static/images/demo_24.gif" height="120%" width="120%"/>
</figure>
</div>
<figure>
<img src="./static/images/result_24.jpg" width="100%"/>
<figcaption style="text-align: center; color: #888;">
Results on Game of 24.
</figcaption>
</figure>
<h3 class="title is-4">(B) ALFWorld</h3>
<div class="content text-align: left">
<p>ALFWorld is an interactive environment for embodied agent simulations, encompassing 134 household tasks in six categories. RAFA uses the tree-search planner (Algorithm 2 in our paper) on ALFWorld.</p>
</div>
<div class="columns is-centered" align="center">
<figure>
<img src="./static/images/demo_alf.gif" height="100%" width="100%"/>
</figure>
</div>
<figure>
<img src="./static/images/combine_alf.svg" width="100%"/>
<figcaption style="text-align: center; color: #888;">
Results on ALFWorld.
</figcaption>
</figure>
<h3 class="title is-4">(C) BlocksWorld</h3>
<div class="content text-align: left">
<p>BlocksWorld contains tasks to arrange blocks in specific configurations. RAFA uses the MCTS planner (Algorithm 5 in our paper) on BlocksWorld.</p>
</div>
<div class="columns is-centered" align="center">
<figure>
<img src="./static/images/demo_blocks.gif" height="80%" width="80%"/>
</figure>
</div>
<figure>
<img src="./static/images/result_blocks1.jpg" width="100%"/>
<img src="./static/images/result_blocks2.jpg" width="100%"/>
<figcaption style="text-align: center; color: #888;">
Results on BlocksWorld.
</figcaption>
</figure>
<h3 class="title is-4">(D) Tic-Tac-Toe</h3>
<div class="content text-align: left">
<p>Tic-Tac-Toe is a competitive game where the X and O sides take turns to place marks. RAFA uses the MCTS planner (Algorithm 5 in our paper) on Tic-Tac-Toe.</p>
</div>
<div class="columns is-centered" align="center">
<figure>
<img src="./static/images/illu_ttt.gif" height="100%" width="100%"/>
</figure>
</div>
<div class="columns is-centered" align="center">
<figure>
<img src="./static/images/demo_ttt.gif" center height="50%" width="50%"/>
</figure>
</div>
<figure>
<img src="./static/images/combine_ttt.svg" width="100%"/>
<figcaption style="text-align: center; color: #888;">
Results on Tic-Tac-Toe.
</figcaption>
</figure>
</figure>
</div>
</div>
</div>
</section>
<section class="section" id="BibTeX">
<div class="container is-max-desktop content">
<h2 class="title">BibTeX</h2>
<pre><code>@article{liu2023reason,
title={Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency},
author={Liu, Zhihan and Hu, Hao and Zhang, Shenao and Guo, Hongyi and Ke, Shuqi and Liu, Boyi and Wang, Zhaoran},
journal={arXiv preprint arXiv:2309.17382},
year={2023}
}</code></pre>
</div>
</section>
<footer class="footer">
<div class="container">
<div class="columns is-centered">
<div class="content has-text-justified">
<p>
We acknowledge <a
href="https://github.com/nerfies/nerfies.github.io">Nerfies</a> for the website template.
</p>
</div>
</div>
</div>
</footer>
</body>
</html>