Flux.1-dev on 24GB VRAM OOM #1971

CarstenHoyer · 2024-09-25T12:39:44Z

I have this predict function:

def predict(self) -> Any:
        """Run a single prediction on the model"""
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

        vram = int(torch.cuda.get_device_properties(0).total_memory/(1024*1024*1024))
        print("VRAM", vram)
        
        pipe = FluxPipeline.from_pretrained(flux_path, torch_dtype=torch.bfloat16).to(device)
        pipe.enable_model_cpu_offload()

        prompt = "A cat holding a sign that says hello world"
        image = pipe(
            prompt,
            height=1024,
            width=1024,
            guidance_scale=3.5,
            num_inference_steps=50,
            max_sequence_length=512,
            generator=torch.Generator("cpu").manual_seed(0)
        ).images[0]
        image.save("flux-dev.png")
        return "flux-dev.png"

I have 24GB VRAM (the vram variable report 23) on a NVIDIA GeForce RTX 4090.

But when I run sudo cog predict --setup-timeout 3600 I get an Out of Memory error. But flux should be able to run 22GB. I wonder if it is something related to cog/wsl/docker?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flux.1-dev on 24GB VRAM OOM #1971

Flux.1-dev on 24GB VRAM OOM #1971

CarstenHoyer commented Sep 25, 2024

Flux.1-dev on 24GB VRAM OOM #1971

Flux.1-dev on 24GB VRAM OOM #1971

Comments

CarstenHoyer commented Sep 25, 2024