Skip to content

mx54039q/UniDM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UniDM: A Unified Framework for Data Manipulation with Large Language Models

This is the official repo for UniDM: A Unified Framework for Data Manipulation with Large Language Models.

Install

Download and unpack the dataset:

mkdir dataset
wget https://fm-data-tasks.s3.us-west-1.amazonaws.com/datasets.tar.gz -P dataset
tar xvf dataset/datasets.tar.gz -C dataset/

Prerequisites

pip install -r requirements.txt

# Manifest
git clone [email protected]:HazyResearch/manifest.git
cd manifest
pip install -e .

Run

To run inference, use

python inference.py --help

Some examples are as follows: (The API KEY can be obtained by registering with the LLM provider. For instance, if you want to run inference with the OpenAI API models, create an account here.)

To run data imputation task select 3 example to add to the prompt by the auto-retrieve module, using both the adaptive data parsing and prompt engineering modules,

python inference.py \
    --api_key <YOUR API KEY> \
    --data_dir <DATA DIR> \
    --task data_imputation \
    --instance_num 3 \
    --metadata_wise \
    --instance_wise \
    --data_parsing \
    --prompt_engineering

To run entity resolution task and select 3 examples to add to the prompt, using both the adaptive data parsing and prompt engineering modules,

python inference.py \
    --api_key <YOUR API KEY> \
    --data_dir <DATA DIR> \
    --task entity_resolution \
    --context_num 3 \
    --metadata_wise \
    --instance_wise \
    --data_parsing \
    --prompt_engineering

To run data transformation task and select 3 examples to add to the prompt, using both the adaptive data parsing and prompt engineering modules,

python inference.py \
    --api_key <YOUR API KEY> \
    --data_dir <DATA DIR> \
    --task data_transformation \
    --context_num 3 \
    --data_parsing \
    --prompt_engineering

Notes

The data retrieval may take time. When inference, we restore the retrieval scores in the ret_score folder intermediately. And we provide the retrieval scores of examples above for quick verification.

About

A Unified Framework for Data Manipulation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages