[Feature request] Compatibility with transformers>=4.43.2 #65

SiriuslySirius · 2024-07-31T10:37:25Z

Hello, I am currently working with the new LLaMA 3.1 models by Meta and they require the newer versions of transformers, optimum, and accelerate. I ran into compatibility issues with XTTS regarding the version of transformers.

I personally use the inference streaming feature, and that's where I am having issues.

Here is an error log I got:

Traceback (most recent call last):
  File "C:\Users\eyein\OneDrive\Desktop\Files\Discord Bots\JenEva-3.0\cogs\rt_tts_cog.py", line 501, in text_to_speech
    for j, chunk in enumerate(chunks):
  File "C:\Users\eyein\miniconda3\envs\JenEva\Lib\site-packages\torch\utils\_contextlib.py", line 35, in generator_context
    response = gen.send(None)
               ^^^^^^^^^^^^^^
  File "C:\Users\eyein\miniconda3\envs\JenEva\Lib\site-packages\TTS\tts\models\xtts.py", line 657, in inference_stream
    gpt_generator = self.gpt.get_generator(
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\eyein\miniconda3\envs\JenEva\Lib\site-packages\TTS\tts\layers\xtts\gpt.py", line 602, in get_generator
    return self.gpt_inference.generate_stream(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\eyein\miniconda3\envs\JenEva\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\eyein\miniconda3\envs\JenEva\Lib\site-packages\TTS\tts\layers\xtts\stream_generator.py", line 117, in generate
    - [~generation.BeamSampleDecoderOnlyOutput]
                             ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\eyein\miniconda3\envs\JenEva\Lib\site-packages\transformers\generation\utils.py", line 489, in _prepare_attention_mask_for_generation
    torch.isin(elements=inputs, test_elements=pad_token_id).any()
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: isin() received an invalid combination of arguments - got (elements=Tensor, test_elements=int, ), but expected one of:
 * (Tensor elements, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
 * (Number element, Tensor test_elements, *, bool assume_unique, bool invert, Tensor out)
 * (Tensor elements, Number test_element, *, bool assume_unique, bool invert, Tensor out)

ERROR: None

The text was updated successfully, but these errors were encountered:

eginhard · 2024-07-31T12:29:37Z

Yes, also reported in #59 (comment). The streaming code unfortunately relies a lot on internals of the transformers library, so it can break at any time. Best would probably be to pin a specific version that works.

Could you share which exact package requires the latest transformers version?

SiriuslySirius · 2024-07-31T13:01:28Z

Yes, also reported in #59 (comment). The streaming code unfortunately relies a lot on internals of the transformers library, so it can break at any time. Best would probably be to pin a specific version that works.

Could you share which exact package requires the latest transformers version?

It's not necessarily a package, but rather, it is a dependency for running the latest version of Meta LLaMA, LLaMA 3.1, which uses the transformers library and it was recommended to use the latest version of transformers. Right now, I am running the latest version of what Coqui TTS allows, which works fine, but I have a lot of warning messages about deprecating implementations from the transformers library.

SiriuslySirius · 2024-08-01T11:36:05Z

Yeah, I'm currently trying out Google's Gemma 2 LLM and yeah, this is going to be an issue for those who are doing LLM + XTTS. Gemma 2 requires a newer version transformers because it doesn't recognize it in version 4.40.2. So we're left with a choice to be less flexible on what LLMs we can use or drop XTTS completely.

eginhard · 2024-08-01T12:50:51Z

It would be helpful if you shared what package/repo/code you're running to be aware of how Coqui is used and how it is affected by external changes. But for this kind of use case the best solution is probably to put the TTS and the LLM into separate environments, so that their dependencies don't affect each other.

SiriuslySirius · 2024-08-01T14:01:20Z

For my current use case, if I am using Nextcord for my Discord bot and I have TTS and LLM running in the same "cog", which is a way to isolated bot features grouped into their own "cog" for the sake of modularity. So to separate XTTS from my LLM requires a bit of an architectural change to my private codebase and having to separate them would add a bit more latency between the two modules, which is not ideal for real-time application. Everything runs locally on my machine.

The issue is mainly incompatibility between the versions of transformers required to run newer local open-source LLMs and XTTS.

I'm using inference streaming normally by passing text into the text input parameter as written in the docs for XTTS V2.

SiriuslySirius · 2024-08-03T13:40:52Z

Yes, also reported in #59 (comment). The streaming code unfortunately relies a lot on internals of the transformers library, so it can break at any time. Best would probably be to pin a specific version that works.

Could you share which exact package requires the latest transformers version?

I tried the patch (https://github.com/h2oai/h2ogpt/blob/52923ac21a1532983c72b45a8e0785f6689dc770/docs/xtt.patch) mentioned in that thread and it worked.

timwillhack · 2024-08-28T15:02:17Z

Just throwing this in here because I ran into another set of models that relies on 4.43: Microsoft Phi-3.5-mini-instruct, which apparently is very decent for how small it is. I spent a day attempting to have gpt4o help me make coqui streaming work with transformers 4.43 and it did, I got it to output voice from text! but it added stuff that caused my vram to spike and I'm not familiar enough with neural net code to figure out what it did wrong. Python is also not my strong suit!

SiriuslySirius · 2024-08-28T15:10:33Z

Just throwing this in here because I ran into another set of models that relies on 4.43: Microsoft Phi-3.5-mini-instruct, which apparently is very decent for how small it is. I spent a day attempting to have gpt4o help me make coqui streaming work with transformers 4.43 and it did, I got it to output voice from text! but it added stuff that caused my vram to spike and I'm not familiar enough with neural net code to figure out what it did wrong. Python is also not my strong suit!

It would help to see your implementation for streaming to see if it's the problem. It could be the LLM if you are running it locally and it is an issue for some LLMs to spike in VRAM usage as you use it, especially if you feed it with context like a chat history.

timwillhack · 2024-08-28T17:36:55Z

I'm just using the xtts/stream_generator.py script. I haven't tried to use Phi-3.5 because it relies on transformers 4.43, but coqui only works up to 4.42.4 or something right now. When I ran the gpt changed script (while transformers 4.43 was installed) it wasn't using other models so the spike in vram was just related to the changes it made (I'm guessing). It was pretty ugly looking to be honest.

ajkessel · 2024-10-08T19:21:17Z

+1 for this

There is some transformers code that breaks on the Mac M1 family, specifically this:

        if inputs.device.type == "mps":
            # mps does not support torch.isin (https://github.com/pytorch/pytorch/issues/77764)
            raise ValueError(
                "Can't infer missing attention mask on `mps` device. Please provide an `attention_mask` or use a different device."
            )

This appears to be fixed in more recent transformers releases but can't be leveraged by coqui-ai-tts due to incompatibility.

DrewThomasson · 2024-10-18T14:58:45Z

I would also greatly appreciate the ability for mps Apple Silicon speedup on xtts inference 🥺

eginhard added the bug Something isn't working label Jul 31, 2024

eginhard mentioned this issue Aug 5, 2024

Numpy 2 support (gruut, soxr, spacy) #56

Merged

eginhard mentioned this issue Aug 20, 2024

[Feature request] Transformers 4.43+ Support coqui-ai/TTS#3974

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Compatibility with transformers>=4.43.2 #65

[Feature request] Compatibility with transformers>=4.43.2 #65

SiriuslySirius commented Jul 31, 2024

eginhard commented Jul 31, 2024

SiriuslySirius commented Jul 31, 2024 •

edited

Loading

SiriuslySirius commented Aug 1, 2024

eginhard commented Aug 1, 2024

SiriuslySirius commented Aug 1, 2024 •

edited

Loading

SiriuslySirius commented Aug 3, 2024 •

edited

Loading

timwillhack commented Aug 28, 2024

SiriuslySirius commented Aug 28, 2024

timwillhack commented Aug 28, 2024

ajkessel commented Oct 8, 2024

DrewThomasson commented Oct 18, 2024

[Feature request] Compatibility with transformers>=4.43.2 #65

[Feature request] Compatibility with transformers>=4.43.2 #65

Comments

SiriuslySirius commented Jul 31, 2024

eginhard commented Jul 31, 2024

SiriuslySirius commented Jul 31, 2024 • edited Loading

SiriuslySirius commented Aug 1, 2024

eginhard commented Aug 1, 2024

SiriuslySirius commented Aug 1, 2024 • edited Loading

SiriuslySirius commented Aug 3, 2024 • edited Loading

timwillhack commented Aug 28, 2024

SiriuslySirius commented Aug 28, 2024

timwillhack commented Aug 28, 2024

ajkessel commented Oct 8, 2024

DrewThomasson commented Oct 18, 2024

SiriuslySirius commented Jul 31, 2024 •

edited

Loading

SiriuslySirius commented Aug 1, 2024 •

edited

Loading

SiriuslySirius commented Aug 3, 2024 •

edited

Loading