-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check what to do to bring in LLM models #940
Comments
I've looked into building each runtime for Android and here's what I found out: TFLiteGoogle's warning about AI-Edge-Torch being experimental is quite the understatement. After much struggling I ended up getting the nightly versions of ai-edge-torch and ai-edge-quantizer which were released the the same day as the last edit to the llama example was made. That was the only way to get the thing to actually function. EditI managed to get a model loaded by using ExecutorchExecutorch seemed to have the most support and robust examples behind it. EditI was able to get Executorch working. The cmake error was resolved by removing my local gflags package, seems like it was conflicting with executorch's gflags in the venv. Executorch has quite a bit of steps to get the Getting the models wasn't very straightforward either, since they needed to be converted from Running the app required upgrading gradle to 8.5 which thankfully Android Studio made quite straightforward, once that was done the app compiled and ran without issue. I attempted to use the unquantized model first. Upon pressing This was done using the xnnpack backend, Executorch also supports Qualcomm and Mediatek AI engines which I did not test. llama.cppworking with llama.cpp has been the most straightforward, the app compiled and launched without issue, a couple of line changes allowed me to download whichever custome GGUF model I wanted, and the models actually functioned (They were however extremely slow on my Galaxy S8). There was an issue where manually loading the models into my phone caused them to not be readable, but was fixed by having the phone download the models instead of my PC. ConclusionMy priority was TFLite because it's what we use. but it seems that both MediaPipe and AI-Edge-Torch are still in their infancy, and could cause problems when attempting to integrate them into the mobile app.
|
For model conversion with AI-Edge-Torch, as I noted at google-ai-edge/ai-edge-torch#269 (comment)
|
I tried getting AI-Edge-Torch to work again, this time using a docker container. The code seemed to run without issue until it killed itself, seemingly due to lack of RAM (my machine only has 32GB). But with this setup a local CPU can be used to convert the models. It's just extremely finicky and easily prone to breaking. |
@Mostelk: there is a GPT2 Android tflite app, https://github.com/huggingface/tflite-android-transformers/tree/master/gpt2 |
What can we do for the default backend?
The text was updated successfully, but these errors were encountered: