Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write JSON state #997

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Write JSON state #997

wants to merge 15 commits into from

Conversation

KaspariK
Copy link
Member

What

We store job and job run state as pickles. We would like to not do that. This is part of the path to not doing that by writing state data as JSON alongside our current pickles. Restoring from JSON will follow in another PR.

Why

In DAR-2328, the removal of the mesos-related code led to resetting of tron jobs state resulting in job runs starting from "0". The pickles weren't being unpickled correctly as this relies on classes that were deleted in the mesos code.

nemacysts and others added 9 commits September 17, 2024 07:17
Right now we make at most 2N calls to the Tron API during config
deployments: N to get the current configs and at most N if all services
have changes.

To start, I'd like to reduce this to N by allowing GET /api/config to
return all the configs so that the only requests needed are POSTs for
changed configs.

Depending on how this goes, we can look into batching up the POSTs so
that we can also do that in a single request.

In terms of speed, it looks like loading all the configs from pnw-prod
(on my devbox) with this new behavior takes ~3s - which isn't great, but
there's a decent bit of file IO going on here :(
…d to_json methods for classes in DynamoDB restore flow. Write an additional attribute to DynamoDB to capture non-pickled state_data.
@KaspariK KaspariK marked this pull request as ready for review September 17, 2024 16:59
Copy link
Member

@nemacysts nemacysts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty for adding all these types as well!

Comment on lines +204 to +211
@staticmethod
def to_json(state_data: dict) -> str:
return json.dumps(
{
"status_path": state_data["status_path"],
"exec_path": state_data["exec_path"],
}
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these to_json() functions be normal on the methods so that it's easier to add additional data to be serialized in the future? as-is, we'd need to track down any calls of to_json() for any modified classes and ensure that the state_data dict that we build for those calls has any new fields

e.g.,

Suggested change
@staticmethod
def to_json(state_data: dict) -> str:
return json.dumps(
{
"status_path": state_data["status_path"],
"exec_path": state_data["exec_path"],
}
)
def to_json(self) -> str:
return json.dumps(
{
"status_path": self.status_path,
"exec_path": self.exec_path,
}
)

i'm also debating whether or not we should have this return a dict and we call json.dumps() before saving, but that's probably not too big a change if we wanna do that later - and the current approach means that if something cannot be serialized to json, we'll get a better traceback :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i see - a lot of these have state_data() properties! i think that that makes this less pressing since the calls will (i assume) look like `SomeClass.to_json(some_object.state_data)

but it might still be nicer to switch since for things that do implement state_data() this would look like

Suggested change
@staticmethod
def to_json(state_data: dict) -> str:
return json.dumps(
{
"status_path": state_data["status_path"],
"exec_path": state_data["exec_path"],
}
)
def to_json(self) -> str:
state_data = self.state_data
return json.dumps(
{
"status_path": state_data["status_path"],
"exec_path": state_data["exec_path"],
}
)

and for classes that don't have a state_data property we'd be in the scenario described above and would benefit from this no longer being a staticmethod :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(another benefit would also be avoiding any potential circular imports in the future since we won't need to import classes just to call to_json() :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i see - when this gets called in tron/serialize/runstate/dynamodb_state_store.py we just have a key and a dict ;_;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, maybe this is better for a post-pickle-deletion refactor where we update the simplified code to pass around the actual objects rather than state_data dicts

end_time: Optional[datetime.datetime] = None,
run_state: str = SCHEDULED,
exit_status: Optional[int] = None,
attempts: Optional[list] = None, # TODO: list of...ActionCommandConfig?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is Optional[List[ActionRunAttempt]]

run_state: str = SCHEDULED,
exit_status: Optional[int] = None,
attempts: Optional[list] = None, # TODO: list of...ActionCommandConfig?
action_runner: Optional[Union[NoActionRunnerFactory, SubprocessActionRunnerFactory]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe one day we'll either remove NoActionRunnerFactory (i don't think we use it outside of tests) or we'll add an abstract base class or something with the expected interface so that we don't need a union here :p

@@ -134,6 +154,7 @@ def _merge_items(self, first_items, remaining_items) -> dict:
return deserialized_items

def save(self, key_value_pairs) -> None:
log.debug(f"Adding to save queue: {key_value_pairs}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be too noisy in production?

if val is not None:
self[key] = pickle.dumps(val)
if pickled_val is not None:
self.__setitem__(key, pickle.dumps(pickled_val), json_val)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious: how come we need __setitem__ now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i see - i think we probably want to keep the existing signature since __setitem__ is an established "dunder" method in python

what we'd probably need to do here instead is to type that the value for __setitem__ is going to be a tuple :)

def get_type_from_key(self, key: str) -> str:
return key.split()[0]

def _serialize_item(self, key: str, state: Dict[str, Any]) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this only accepts two values for key, i think we can write key: Literal[runstate.JOB_STATE, runstate.JOB_RUN_STATE] rather than key: str

Comment on lines +67 to +68
return sorted_groups # type: ignore
return sorted_groups # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add comments for these ignores?

"json_val": {
"S": json_val[index * OBJECT_SIZE : min(index * OBJECT_SIZE + OBJECT_SIZE, len(json_val))]
},
"num_json_val_partitions": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i always thought that this num_partitions stuff was something that dynamo required 🤣 - i guess this is just for our usage (to know how many partitions we're using per-item?)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants