Vraspar/phi 3 ios update #467

vraspar · 2024-10-03T21:55:42Z

Update Phi-3 iOS application:

Modify ReadME and add sample screenshot
Add logging and display token generation stats
Use User input instead of fixed prompt
UI Improvements

edgchen1

nice improvements!

edgchen1 · 2024-10-03T23:38:23Z

mobile/examples/phi-3/ios/LocalLLM/LocalLLM/ContentView.swift

@@ -1,27 +1,124 @@
-// Copyright (c) Microsoft Corporation. All rights reserved.


should we keep the copyright notice?

edgchen1 · 2024-10-04T00:21:33Z

mobile/examples/phi-3/ios/LocalLLM/LocalLLM/ContentView.swift

+ let totalTime = userInfo["totalTime"] as? Int,
+ let firstTokenTime = userInfo["firstTokenTime"] as? Int,
+ let tokenCount = userInfo["tokenCount"] as? Int {
+ stats = "Generated \(tokenCount) tokens in \(totalTime) ms. First token in \(firstTokenTime) ms."


could we also include something like this:

token generation rate (tokens/second) = (tokenCount - 1) * 1000 / (totalTimeInMs - firstTokenTimeInMs)

the token generation and prompt processing rates may be different, and it might be useful to get a sense of both.

Would be nice to have prompt tokens/s as well given that and generation tokens/s are the usual metrics that gets compared. That would require something to return the number of tokens in the prompt (possibly the length of the input sequence post-tokenization) as IIRC it's not 1:1 with the number of words.

edgchen1 · 2024-10-04T00:28:58Z

mobile/examples/phi-3/ios/LocalLLM/LocalLLM/GenAIGenerator.mm

-
- auto sequences = OgaSequences::Create();
- tokenizer->Encode(prompt, *sequences);
+typedef std::chrono::high_resolution_clock Clock;


nit: consider steady_clock

https://en.cppreference.com/w/cpp/chrono/steady_clock
"This clock is not related to wall clock time (for example, it can be time since last reboot), and is most suitable for measuring intervals."

edgchen1 · 2024-10-04T00:40:24Z

mobile/examples/phi-3/ios/LocalLLM/LocalLLM/GenAIGenerator.mm

+ // Log model creation
+ NSLog(@"Creating model ...");
+ auto model = OgaModel::Create(modelPath);
+ if (!model) {


the C++ API throws exceptions.

https://github.com/microsoft/onnxruntime-genai/blob/bcf55a6dc563bc8b356128b47504d59a21c5ef2f/src/ort_genai.h#L54-L60

https://github.com/microsoft/onnxruntime-genai/blob/bcf55a6dc563bc8b356128b47504d59a21c5ef2f/src/ort_genai.h#L65

might be simplest to just put most of this method into a try {}. propagating the error back to the UI would be a nice touch.

edgchen1 · 2024-10-04T00:44:33Z

mobile/examples/phi-3/ios/LocalLLM/LocalLLM/GenAIGenerator.mm

+
+ // Log model creation
+ NSLog(@"Creating model ...");
+ auto model = OgaModel::Create(modelPath);


performance-wise, it's probably nicer to not re-create the model every time.

it should definitely be nicer as the first request is usually the slowest (and typically when doing perf testing we do a warmup query first and exclude that from timing data).

skottmckay · 2024-10-04T04:54:05Z

mobile/examples/phi-3/ios/LocalLLM/LocalLLM/Simulator Screenshot - iPhone 16.png

skottmckay · 2024-10-04T04:59:31Z

mobile/examples/phi-3/ios/LocalLLM/LocalLLM/ContentView.swift

+ let totalTime = userInfo["totalTime"] as? Int,
+ let firstTokenTime = userInfo["firstTokenTime"] as? Int,
+ let tokenCount = userInfo["tokenCount"] as? Int {
+ stats = "Generated \(tokenCount) tokens in \(totalTime) ms. First token in \(firstTokenTime) ms."


Would be nice to have prompt tokens/s as well given that and generation tokens/s are the usual metrics that gets compared. That would require something to return the number of tokens in the prompt (possibly the length of the input sequence post-tokenization) as IIRC it's not 1:1 with the number of words.

skottmckay · 2024-10-04T05:02:57Z

mobile/examples/phi-3/ios/LocalLLM/LocalLLM/GenAIGenerator.mm

+
+ // Log model creation
+ NSLog(@"Creating model ...");
+ auto model = OgaModel::Create(modelPath);


it should definitely be nicer as the first request is usually the slowest (and typically when doing perf testing we do a warmup query first and exclude that from timing data).

skottmckay · 2024-10-04T05:04:05Z

mobile/examples/phi-3/ios/LocalLLM/LocalLLM/GenAIGenerator.mm

+ NSLog(@"Creating tokenizer...");
+ auto tokenizer = OgaTokenizer::Create(*model);
+ if (!tokenizer) {
+ NSLog(@"Failed to create tokenizer.");
+ return;
+ }


The tokenizer could also be created once and re-used as I believe it's tied to the model not the prompt so can be re-used.

skottmckay · 2024-10-04T05:06:45Z

mobile/examples/phi-3/ios/LocalLLM/LocalLLM/GenAIGenerator.mm

+ NSLog(@"First token generated.");
+ firstTokenTime = Clock::now();


nit: logging takes time so we want to record firstTokenTime prior to logging.

might also be interesting to include per-token timing in the log as that would help get a picture of performance throughout the generation phase. e.g. is the time per token consistent? if not is the variability random or does it gradually increase/decrease? that can provide hints as to potential causes of performance issues (if there are any).

if we do that we might want to accumulate per-token times in a list and log at the end, as inserting log calls for every token inside the loop could significantly affect overall time taken (esp. is the log call is synchronous which apparently NSLog is).

skottmckay · 2024-10-04T05:14:41Z

mobile/examples/phi-3/ios/LocalLLM/LocalLLM/GenAIGenerator.mm

+ break;
+ }
+
+ NSLog(@"Decoded token: %s", decode_tokens);


nit: might want a setting to control whether we log inside the loop (if we're after the best perf numbers possible we probably don't want to do that), or a way to exclude time in calls to NSLog from the total.

vraspar added 2 commits October 1, 2024 16:03

Update Phi-3 iOS build instructions

05f7a72

Refactor chat UI and token generation logic

1b1d999

vraspar requested review from skottmckay and edgchen1 October 3, 2024 21:56

edgchen1 reviewed Oct 4, 2024

View reviewed changes

skottmckay reviewed Oct 4, 2024

View reviewed changes

mobile/examples/phi-3/ios/LocalLLM/LocalLLM/Simulator Screenshot - iPhone 16.png

Copy link

Contributor

skottmckay Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

skottmckay reviewed Oct 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vraspar/phi 3 ios update #467

Vraspar/phi 3 ios update #467

vraspar commented Oct 3, 2024

edgchen1 left a comment

edgchen1 Oct 3, 2024

edgchen1 Oct 4, 2024

skottmckay Oct 4, 2024

edgchen1 Oct 4, 2024

edgchen1 Oct 4, 2024

edgchen1 Oct 4, 2024

skottmckay Oct 4, 2024

skottmckay Oct 4, 2024

skottmckay Oct 4, 2024

skottmckay Oct 4, 2024

skottmckay Oct 4, 2024

skottmckay Oct 4, 2024

skottmckay Oct 4, 2024

		@@ -1,27 +1,124 @@
		// Copyright (c) Microsoft Corporation. All rights reserved.

		NSLog(@"First token generated.");
		firstTokenTime = Clock::now();

Vraspar/phi 3 ios update #467

Are you sure you want to change the base?

Vraspar/phi 3 ios update #467

Conversation

vraspar commented Oct 3, 2024

edgchen1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment