Building multimodal systems #1536
Closed
AmirHussein96
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
Shouldn't be too hard as Lhotse has recently supported dataloading to text/multimodal data. Check lhotse-speech/lhotse#1295 for more details. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
How difficult is it to develop a multimodal system that integrates text, speech, and images using Icefall? Given that Icefall and lhotse currently support speech and text, how difficult would it be to extend its capabilities to include images?
Beta Was this translation helpful? Give feedback.
All reactions