P2M‐Self‐Distill

Introduction

We present the P2M-Self-Distill project, aimed at enhancing our existing framework. This involves improving input generation, output annotation, quality evaluation, model execution, and training.

New Components

InputGenerator

Create new inputs based on instructions and a selected dataset.

Utilizes diverse prompts, incorporating examples from the dataset for relevance.
Gradually generates inputs with higher temperature settings, a key focus.
Importantly, it continuously adds new examples from selected dataset without fine-tuning.

OutputAnnotator

Similar to traditional few-shot learning, the OutputAnnotator generates multiple outputs for each input.

It will be continuously finetuned.
Only finetune on new examples of each round/ finetune on the whole dataset/cut off low score examples and finetune on high-quality ones.

QualityEvaluator

Evaluate/Score/Rank the input-output pairs for each specific input.

Different ways of evaluation: “self-consistency as the score” and “model evaluation.”
Different models as the evaluator for “model evaluation”: “base model itself,” “larger open-source model,” or even “ChatGPT evaluator” for the worst cases.

ModelExecutor

An advanced version incorporating Quantize.

ModelTrainer

Combine RLAIF and Lora.

Rough Working Plan

Initialize Framework: Establish base and mock classes.
Implement New Components
Set Up Benchmarks: Create benchmarks to evaluate new system performance.
Set Up Contrast Experiments: Conduct experiments on factors like "temperature scaling" and "self-consistency vs. model evaluation", and so on.

Team Members

Our dedicated team comprises:

Vijay
Chenyang
Mingdao
Sherry
Graham
TBD

This proposal outlines the ambitious goals of the P2M-Self-Distill project and the essential components driving its success.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly