how to config accelerate on 2 mac machines #3356

hsoftxl · 2025-01-20T11:35:35Z

https://huggingface.co/docs/accelerate/usage_guides/distributed_inference

i use accelerate config and when i run model , it will block and then got an error. means , can not connect IP and port.

who can help me.

BenjaminBossan · 2025-01-20T16:00:36Z

With this little information, we cannot help to figure out the issue. Please follow the instructions for reporting bugs and provide the missing information.

hsoftxl · 2025-01-21T03:50:45Z

System Info
*
Mac OS Sequoia 15.2

Mac Studio M2 Ultra 192G

Information
I have 4 Mac computers, each with 192GB of memory. I want to use these 4 Macs to run the Falcon 180B model. I configured distributed training using accelerate config, but when I run the script, each machine always loads all the layers of the model. How can I load different parts of the model on different machines?

My scripts

from accelerate import Accelerator
from accelerate.utils import gather_object
from transformers import AutoModelForCausalLM, AutoTokenizer
from statistics import mean
import torch, time, json

accelerator = Accelerator()

# 10*10 Prompts. Source: https://www.penguin.co.uk/articles/2022/04/best-first-lines-in-books
prompts_all=[
    "The King is dead. Long live the Queen.",
    "Once there were four children whose names were Peter, Susan, Edmund, and Lucy.",
    "The story so far: in the beginning, the universe was created.",
    "It was a bright cold day in April, and the clocks were striking thirteen.",
    "It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.",
    "The sweat wis lashing oafay Sick Boy; he wis trembling.",
    "124 was spiteful. Full of Baby's venom.",
    "As Gregor Samsa awoke one morning from uneasy dreams he found himself transformed in his bed into a gigantic insect.",
    "I write this sitting in the kitchen sink.",
    "We were somewhere around Barstow on the edge of the desert when the drugs began to take hold.",
] * 10

# load a base model and tokenizer
model_path = "tiiuae/falcon-180B-chat"
model = AutoModelForCausalLM.from_pretrained(
    model_path,    
    device_map={"": accelerator.process_index},
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_path)   

# sync GPUs and start the timer
accelerator.wait_for_everyone()
start=time.time()

# divide the prompt list onto the available GPUs 
with accelerator.split_between_processes(prompts_all) as prompts:
    # store output of generations in dict
    results=dict(outputs=[], num_tokens=0)

    # have each GPU do inference, prompt by prompt
    for prompt in prompts:
        prompt_tokenized=tokenizer(prompt, return_tensors="pt").to("mps")
        output_tokenized = model.generate(**prompt_tokenized, max_new_tokens=100)[0]

        # remove prompt from output 
        output_tokenized=output_tokenized[len(prompt_tokenized["input_ids"][0]):]

        # store outputs and number of tokens in result{}
        results["outputs"].append( tokenizer.decode(output_tokenized) )
        results["num_tokens"] += len(output_tokenized)

    results=[ results ] # transform to list, otherwise gather_object() will not collect correctly

# collect results from all the GPUs
results_gathered=gather_object(results)

if accelerator.is_main_process:
    timediff=time.time()-start
    num_tokens=sum([r["num_tokens"] for r in results_gathered ])

    print(f"tokens/sec: {num_tokens//timediff}, time {timediff}, total tokens {num_tokens}, total prompts {len(prompts_all)}")

Tasks

accelerate launch test.py

hsoftxl · 2025-01-21T03:51:10Z

@BenjaminBossan thanks

BenjaminBossan · 2025-01-21T10:42:45Z

Thanks for the additional info, we're still missing the accelerate env output.

From what you shared, this looks like multi-node inference for me. AFAIK, this is not supported out of the box. Typically, people also use some framework to handle multiple nodes but it's not clear to me if those work with Macs (I'm not a Mac user).

As to the specific problem of avoiding to load the whole model on each node, did you check out the docs for big model inference?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to config accelerate on 2 mac machines #3356

how to config accelerate on 2 mac machines #3356

hsoftxl commented Jan 20, 2025

BenjaminBossan commented Jan 20, 2025

hsoftxl commented Jan 21, 2025

hsoftxl commented Jan 21, 2025

BenjaminBossan commented Jan 21, 2025

how to config accelerate on 2 mac machines #3356

how to config accelerate on 2 mac machines #3356

Comments

hsoftxl commented Jan 20, 2025

BenjaminBossan commented Jan 20, 2025

hsoftxl commented Jan 21, 2025

hsoftxl commented Jan 21, 2025

BenjaminBossan commented Jan 21, 2025