Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ViperGPT: A Novel Framework for Programmatic Visual Reasoning #94

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions vipergpt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# ViperGPT: Programmatic Visual Reasoning Framework

## Resource Information
- **Paper**: [ViperGPT: Visual Inference via Python Execution for Reasoning](https://arxiv.org/abs/2303.08128)
- **Published**: March 2023
- **Authors**: Dídac Surís, Sachit Menon, Carl Vondrick
- **Organization**: Columbia University, Microsoft Research

## Overview
ViperGPT represents a paradigm shift in visual reasoning by generating executable Python code to compose specialized vision and language models. Instead of end-to-end black-box solutions, it creates interpretable programs that explicitly show the reasoning process.

## Technical Implementation
```python
from viper import ViperGPT
from viper.models import VisionModel, LanguageModel

# Initialize ViperGPT with available models
viper = ViperGPT(
vision_model=VisionModel(),
language_model=LanguageModel()
)

# Example query processing
def process_visual_query(image, query):
# Generate Python code for the query
program = viper.generate_program(query)

# Execute the generated program
result = viper.execute(program, image)

return {
'result': result,
'reasoning_steps': program.get_steps(),
'executed_code': program.get_source()
}

# Usage example
query = "How many red objects are to the left of the blue sphere?"
results = process_visual_query(image, query)

Why It Matters
ViperGPT demonstrates how code generation can bridge specialized AI models,
enabling complex visual reasoning with explicit, verifiable steps. Its modular approach
allows continuous improvement as underlying models advance, making it a significant
milestone in interpretable AI systems.