Getting to understand efficient inference with ONNX models for practical applications and pipelines
- onnxHelpers/onnxBenchmark.py = script to convert pytorch model to onnx, quantize fp32 onnx models to int8, and run benchmark inference on AMD Ryzen AI processor
-
Implemented custom AI Recall feature, similar to Microsoft Windows AI Recall feature, running locally with Phi-3 Vision model for describing/analysing screenshots and Phi-3 Mini model to rename the screenshots based on the image description geneated by the vision model.
-
The filenames and descriptions (after chunking) are stored in a simple database for Retrieval-Augmented Generation (RAG). Based on a query, given by the user, the descriptions, along with the associated filenames of the screenshots, that are similar to the query are retrieved. The Phi-3 models have been tested on the CPU
-
Example Run 1:
-
Best Result:
-
Once, the descriptions are added into the database, subsequent retrivals are quick (test screenshots and database saved in results/aiRecall/snapshots; these screenshots are not very diverse)
- Scripts to run stable diffusion pipeline, currently on running on DirectML-supported devices