Shifting the Burden: Automating Problem Solving by Guiding Models with Models #36

mortenfj · 2024-09-16T09:02:44Z

mortenfj
Sep 16, 2024

Gist:

Imagine you have a large, powerful model capable of solving complex puzzles. Normally, a human might guide this model when it encounters challenges, offering subtle hints or guiding the model’s reasoning. Now, what if we shift that interaction "down the ranks"? Instead of a human guiding the large model, we have the large model guiding a smaller, weaker model through a problem it can't solve on its own. The large model offers guidance at key moments, and we use this interaction to create a dataset that captures the process of problem-solving. Later, we replace the guidance (hints) with inner reflections, creating a dataset that can teach models not just answers, but how to approach problems.

Overview

This method was inspired by watching David Shapiro's video, where some interesting ideas about solving complex tasks were discussed. The idea stuck with us: how could we automate this process to guide models through challenges, without relying on humans?

Initially, in many tasks, a human might guide a frontier model, offering hints or feedback when it stumbles on difficult parts of a problem. But we wondered: What if we replaced the human with another model? Instead of a person guiding the model, we shift down a level, letting a large "frontier" model take over the role of guiding a weaker model. This interaction, where the frontier model helps the weaker model solve the puzzle, becomes the foundation for a dataset designed to teach problem-solving skills.

Here’s the process, step by step, as i envision it.

Step 1: Generating Complex Puzzles and Solutions

The first step involves using a frontier model (a large, powerful model) to generate complex puzzles that are difficult but solvable. These puzzles should be hard enough that a smaller model won’t be able to solve them in one shot. The frontier model also generates the solution to the puzzle.

For example:

{
  "prompt": "In a room, there are three light switches, each controlling one of three light bulbs in the next room. You can flip the switches on or off and enter the room only once. How do you determine which switch controls which bulb?"
}

And the frontier model generates this solution:

{
  "solution": "Turn on the first switch, wait a few minutes, then turn it off. Turn on the second switch. Enter the room and feel the bulbs. The one that is warm but off corresponds to the first switch. The one that is on corresponds to the second switch. The one that is off and cold corresponds to the third switch."
}

The goal is to create puzzles that are easy enough for the frontier model to generate and solve, but hard enough that a smaller model won’t solve them on its first try.

Step 2: Shifting Down the Guidance

Now comes the shift in roles. Originally, we might think of the interaction as a human guiding a frontier model through a complex task. The human provides hints, offers corrections, and ultimately helps the model reach the solution. But instead of this human-guided process, we "shift down" the interaction.

Here’s what this looks like: the frontier model now becomes the guide, and the lesser model (e.g., a 7B model) becomes the subject that needs help solving the puzzle. The lesser model will attempt to solve the puzzle on its own, but it won’t get it right at first. Here’s an example:

{
  "initial_attempt": "Flip the switches randomly and see what happens when you enter the room."
}

The frontier model then steps in to guide the weaker model with a hint:

{
  "hint": "Think about how you can use the heat of the bulbs to determine which switch controls which bulb."
}

This guidance helps the lesser model improve its answer:

{
  "improved_attempt": "Turn on one switch, wait a few minutes, turn it off, then turn on another switch. Enter the room and check the bulbs."
}

The process of guidance and iteration continues until the lesser model eventually arrives at the correct solution. What we’re capturing here is the interaction between the models—one guiding the other—replacing the role that a human would traditionally have.

Step 3: Replacing Guidance with Inner Reflections

Now that the puzzle has been solved using hints, we take the process one step further. We go back through the interaction and replace the hints with inner reflections. These reflections simulate the thought process of the model as it moves from one step to the next, guiding itself through the problem.

For each hint, we generate a reflection that represents the model’s internal reasoning, as though it were pausing to think before the next attempt. Here’s how this might look:

{
  "reflection": "As I approach this next attempt, I’m realizing that the key lies in observing how the light bulbs respond to time. The heat of the bulbs should tell me which switch was turned on first. I’ll test this theory by feeling the bulbs after I enter the room."
}

So, the full conversation now becomes:

{
  "prompt": "In a room, there are three light switches...",
  "initial_attempt": "Flip the switches randomly...",
  "reflection_1": "I realize now that the key lies in the heat of the bulbs...",
  "improved_attempt_1": "Turn on one switch, wait a few minutes...",
  "reflection_2": "I’m confident now that feeling the bulbs will give me the right answer...",
  "final_answer": "Turn on the first switch, wait, then turn it off..."
}

These inner reflections are generated by the frontier model and replace the explicit hints that were guiding the lesser model, creating a more introspective narrative of problem-solving.

Step 4: Building a Dataset of Problem-Solving Conversations

Once we have the full conversation—starting with the puzzle, then the initial attempts, the reflections, and finally the correct answer—we package this into a dataset. The dataset captures the entire process of problem-solving rather than just the answer itself.

Here’s what a full entry in the dataset might look like:

[
  {
    "prompt": "In a room, there are three light switches...",
    "response": "Flip the switches randomly... -> I realize now that... -> Turn on one switch... -> I’m confident now that... -> Turn on the first switch, wait, then turn it off..."
  }
]

The key idea here is that we’re not just teaching models what the answer is—we’re teaching them how to approach and refine their problem-solving process. The inner reflections offer a window into how a model might guide itself to better solutions.
As demonstrated in Davids video.

Closing Thoughts

This approach came about after watching David Shapiro's video, and it’s built on the idea of shifting the interaction down from human-guided to model-guided. By having a frontier model guide a weaker model, we can fully automate the process of problem-solving. The dataset that emerges from this process isn’t just about the solutions—it’s about the journey of reflecting, iterating, and improving.

I'm still experimenting with this idea, and I’d love to hear from others who might have thoughts or ideas. Have you tried something similar? Do you think this approach could be useful in other contexts? Feel free to comment below or submit a pull request to join the conversation.

daveshap · 2024-09-16T10:45:03Z

daveshap
Sep 16, 2024
Maintainer

Yes, we're looking at using a multi-agent framework approach to serve as a "user proxy"

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shifting the Burden: Automating Problem Solving by Guiding Models with Models #36

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Shifting the Burden: Automating Problem Solving by Guiding Models with Models #36

mortenfj Sep 16, 2024

Gist:

Overview

Step 1: Generating Complex Puzzles and Solutions

Step 2: Shifting Down the Guidance

Step 3: Replacing Guidance with Inner Reflections

Step 4: Building a Dataset of Problem-Solving Conversations

Closing Thoughts

Replies: 1 comment

daveshap Sep 16, 2024 Maintainer

mortenfj
Sep 16, 2024

daveshap
Sep 16, 2024
Maintainer