Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support payload input in GAP #219

Open
wants to merge 7 commits into
base: feature-request-level-timing
Choose a base branch
from

Conversation

debermudez
Copy link
Contributor

@debermudez debermudez commented Dec 13, 2024

This PR creates a

  • new input retreiver PayloadInputRetriever to handle payload data
  • modifies existing converters to handle timestamp and other optional_data from the payload
  • build --schedule arg to pass to PA
  • Update existing unit tests to handle payload input case

@debermudez debermudez force-pushed the dbermudez-add-timing-retriever branch from 10148f4 to ae3759c Compare January 3, 2025 23:58
@lkomali lkomali force-pushed the dbermudez-add-timing-retriever branch from ae3759c to 82a3726 Compare January 7, 2025 00:13
@lkomali lkomali force-pushed the dbermudez-add-timing-retriever branch 2 times, most recently from 82a3726 to 7f47060 Compare January 7, 2025 00:41
@lkomali lkomali force-pushed the dbermudez-add-timing-retriever branch from b5c5d71 to 3f6e35c Compare January 7, 2025 20:30
@lkomali lkomali marked this pull request as ready for review January 7, 2025 21:35
@lkomali lkomali changed the title add timing retriever GAP: Support payload input Jan 7, 2025
@lkomali lkomali requested a review from dyastremsky January 7, 2025 21:40
@lkomali lkomali changed the title GAP: Support payload input Support payload input in GAP Jan 7, 2025
Copy link
Contributor

@dyastremsky dyastremsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work, Harshini! I appreciate you cleaning things up as you go too.

Left some comments.

genai-perf/genai_perf/export_data/json_exporter.py Outdated Show resolved Hide resolved
with open(str(filename), "w") as f:
f.write(json.dumps(self._stats_and_args, indent=2))

def _prepare_args_for_export(self) -> None:
self._args.pop("func", None)
self._args.pop("output_format", None)
self._args.pop("input_file", None)
self._args.pop("payload_input_file", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside to the team: I didn't know this was here. We should create a cleaner way to do this. Like having a separate function with a list of args that are excluded, then popping them all. We could get refactor the code to get rid of all of the args that aren't necessary for the JSON exporter to make it cleaner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

genai-perf/genai_perf/inputs/converters/base_converter.py Outdated Show resolved Hide resolved
@@ -56,6 +56,11 @@ def convert(
payload = {
"input": [{"type": "image_url", "url": img} for img in row.images]
}
request_body["data"].append({"payload": [payload]})
self._add_payload_params(payload, row.optional_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this approach. Is there a way to wrap this and make it cleaner? This is going to make the converters less straightforward to write. You'll also need to update the customizable frontends tutorial to reflect any changes here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean for this converter? OR are you making a general statement about the way this was implemented? Just trying to understand the scope of the ask.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the converters. I think this is the "hot path" for adding a new converter (i.e. customizable frontend). We want it to be as simple, concise, and intuitive as possible.

Any changes to this process also need to be documented in customizable_frontends.md.

if not filename.exists():
raise FileNotFoundError(f"The file '{filename}' does not exist.")

def _get_content_from_input_file(self, filename: Path) -> Union[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to organize these into the implemented ones vs the non-implemented ones? And perhaps alphabetize them within each of those sections for easy navigation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. This was also my mess. I agree that it needs to be cleaned up more.

genai-perf/genai_perf/wrapper.py Outdated Show resolved Hide resolved

if args.prompt_source == ic.PromptSource.PAYLOAD:
if args.concurrency or args.request_rate:
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could you return this as a parser.error? That's more native to ArgParse and consistent with the other errors returned.


Filename: TypeAlias = str
TypeOfData: TypeAlias = str
ListOfData: TypeAlias = List[str]
DataRowDict: TypeAlias = Dict[TypeOfData, ListOfData]
DataRowDict: TypeAlias = Dict[str, Union[List[str], Dict[str, Any], str]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe loop in our alias whiz, @nv-braf? My guess though is that if it's this complex, we're doing some overloading with typing that's really unclean and masking other issues. I also would have no idea what's going on here from a quick review of the code.

I can't speak to the complexity over time, which we have tried to reduce, but if you look at the before/after, one of these is relatively straightforward and the other is relatively messy. Having three possible datatypes values for a DataRowDict (or anything) doesn't sit right with me. That's too much overloading. We could fix the underlying issue in a future refactor, theoretically, but it looks like this PR is creating the above issue so we should fix it now.

"""

files_data: Dict[str, FileData] = {}
input_file = cast(Path, self.config.payload_input_filename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so it sounds like we need to clean this up somewhere. This is using latent knowledge of what the types should be, which is dangerous and hard to navigate as a code reader.

We can leave this given the other code already does this, but whenever we have a major refactor (e.g. for GA), we should look at removing these casts wherever possible. Ryan pointed this out to me in one of my reviews that I cleaned up: there was a type that was optional to make the code flow easier and he asked whether it ever should be optional; the answer was no, so even though it required more effort to think through the default, I swapped it to have the correct data type and only that type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants