Research Log - 2024-09-15 - Generative infinite high quality questions #37

daveshap · 2024-09-16T10:41:49Z

daveshap
Sep 16, 2024
Maintainer

Video documentation: https://youtu.be/JZfzo4SrPIs?si=M9RerSV08rC4_pBu

Primary result: question generator: https://github.com/daveshap/raspberry_experiments/blob/main/generate_many_questions.py

Research Log: Raspberry Project - Automated Question Generation

Overview

In this project, I developed an automated question generation system for the Raspberry project. My goal was to create a system capable of generating complex, domain-specific questions suitable for AI benchmarking and training.

Process Development

I began by outlining the requirements for the question generation system. I wanted to create questions that were answerable without external resources, required multiple reasoning steps, spanned diverse fields, and targeted high difficulty levels from graduate to world expert.

To achieve this, I designed a multi-step process for question generation. This included creating lists of main topics, generating subtopics, defining question parameters, and formulating the final questions. I created four key lists to provide randomization and diversity: main topics, difficulty levels, problem types, and conceptual connectors.

Prompt Engineering

I developed two main prompts: one for subtopic generation and another for final question generation. I designed these prompts to guide Claude in creating specific, challenging subtopics and questions based on randomly selected parameters from my lists.

API Integration and Script Development

I integrated the Anthropic API to interact with Claude for question generation. This involved setting up API calls, handling responses, and implementing error checking. I developed two main Python scripts: generate_question.py for generating a single question, and generate_many_questions.py for generating multiple questions in batch.

In these scripts, I included functions for reading lists from files, generating question parameters, formatting prompts, querying Claude, and saving the generated questions and logs. I also implemented error handling and added delays between API calls to avoid rate limiting.

Testing and Refinement

I went through several iterations of testing and refinement. I encountered and resolved issues such as API refusals and formatting problems. To improve the randomization process, I added random seed initialization to the scripts.

Results

The system successfully generated complex, domain-specific questions. I was particularly pleased with examples like a question about counterfactual European power dynamics involving Liechtenstein and another about reframing global power dynamics through fractal patterns in political science.

Future Directions

Looking ahead, I've identified several next steps for the project:

Developing provable question generators for domains like math, coding, and games.
Implementing systems to answer generated questions with multi-step reasoning.
Creating validation processes for the generated answers.
Developing data cleaning procedures for the final dataset.

I believe this work is crucial in solving the data problem for AI training, and I see parallels between my approach and methods potentially used by organizations like OpenAI. I'm excited to continue refining and expanding this system to generate even more sophisticated and diverse questions.

saldanhad · 2024-09-16T17:04:21Z

saldanhad
Sep 16, 2024

great spec sheet so far, just curious, would human-in-the-loop aided backtracking to find an optimal prompt or set of thinking steps by domain to generate the dataset be feasible.

3 replies

daveshap Sep 16, 2024
Maintainer Author

Only for proving out a small subsample. We're talking dozens of samples. We need thousands of samples minimum for a true project.

saldanhad Sep 16, 2024

The reason I ask is because, with hallucinations within a sub-domain itself, the approach of one size, in this case one set of steps fits all might not work. Hence, having the ability to manipulate the thinking might work better?

daveshap Sep 16, 2024
Maintainer Author

no, we will use a combination of RAG, MCTS, and CoT to self-validate. No solution that is not 100% scalable is acceptable. Never try to take "shortcuts." Always solve the hard problem. You need to start thinking exponentially: something that is scalable to a billion samples if needed.

Infinite scalability only.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research Log - 2024-09-15 - Generative infinite high quality questions #37

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Research Log - 2024-09-15 - Generative infinite high quality questions #37

daveshap Sep 16, 2024 Maintainer

Overview

Process Development

Prompt Engineering

API Integration and Script Development

Testing and Refinement

Results

Future Directions

Replies: 1 comment · 3 replies

saldanhad Sep 16, 2024

daveshap Sep 16, 2024 Maintainer Author

saldanhad Sep 16, 2024

daveshap Sep 16, 2024 Maintainer Author

daveshap
Sep 16, 2024
Maintainer

Replies: 1 comment 3 replies

saldanhad
Sep 16, 2024

daveshap Sep 16, 2024
Maintainer Author

daveshap Sep 16, 2024
Maintainer Author