Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support for State Schema Versioning & Migration in LangGraph.js #536

Open
benjamincburns opened this issue Sep 30, 2024 · 4 comments

Comments

@benjamincburns
Copy link

benjamincburns commented Sep 30, 2024

Background

Today LangGraph allows me to persist application state via Checkpointers and Stores. Once the application is deployed to real users, you can expect that a history of persisted checkpoints will accumulate. However, LangGraph currently provides no built-in functionality for detecting or managing incompatible changes in the structure of this state over time.

As applications mature, it's common for the structure of the state to evolve. These changes can range from adding new fields, changing field types, or even restructuring objects. These changes may cause older persisted states to become incompatible with newer versions of the application, leading to failures when resuming workflows from checkpoints.

It's also common for users to store objects in their application state that aren't defined by their application. As one trivial example, most applications have a messages channel in their state that is of type BaseMessage[]. Over time application maintainers will need to update their dependencies, which may in turn modify the structure of their persisted state. Ideally this will happen in a controlled manor, but occasionally these updates will need to be done with urgency, for example due to newly discovered security vulnerabilities in the dependencies that define these portions of the application state.

Problem Statement

There are two primary challenges that applications face when managing changes to persisted state:

  1. Identifying Incompatibility: Detecting when a persisted state no longer matches the structure expected by the running application. For example, if a new required field is added to the messages state, older checkpoints will lack this field, leading to runtime errors.

  2. Migrating Incompatible States: Providing a mechanism to transform these incompatible states into a structure that is compatible with the current application schema. This transformation might need to occur lazily, at the time the state is loaded, or proactively, when the schema changes.

These concerns are relevant not only for future versions of an application but also for cases where an application rollback occurs, which could introduce older versions of code interacting with newer states.

Proposed Solution

I believe there are lightweight ways LangGraph.js could address this:

  • Version-Tagged States: LangGraph.js could allow developers to tag either channels or state objects with a version identifier at the time of persistence. When the state is retrieved, LangGraph could expose lifecycle hooks or interceptors, allowing developers to implement custom migration logic before the state is used in the application.

  • Lazy Online Migration: By using these version tags, LangGraph could support a "roll-forward-only" migration model, where older states are updated to the latest version when accessed. This could be optional, allowing applications that don't need it to ignore the feature.

  • Schema Change Detection: Expose a mechanism that warns or throws an error when a checkpoint’s state doesn’t match the expected structure. This would give developers an opportunity to trigger appropriate migration logic.

While it's understandable that LangGraph.js might not want to impose prescriptive migration strategies (given the flexibility of its checkpointer system), these small, optional enhancements could benefit the majority of applications without constraining the flexibility that more advanced users might need.

Why This Matters

As applications scale and evolve, breaking changes to application state structures become inevitable. Without explicit support for detecting and handling these changes, developers must implement their own ad-hoc solutions, which can introduce bugs and operational complexity. Many other systems that persist application state—such as databases and event-sourcing architectures—include tools to help manage schema evolution. I believe adding even minimal support for this in LangGraph.js would greatly improve its suitability for long-lived, production-grade applications.

It's also important to think about the ecosystem that you want to grow up around LangGraph. Treating state schema versioning as a "first class citizen" in the API will mean that third party authors of generic checkpointers and stores will be more likely to provide a consistent level of support for these concerns, as state schema versioning will feel more like a required thing for these components.

@hgoona
Copy link

hgoona commented Sep 30, 2024

I agree with all of this and have been thinking along similar lines of thought: as useful as Checkpointers are, I can't help but feel these seem like a system for "play through" rather than "storage".ie. I think the latter concern should be able to be separated from the former, and reconnected at will.

Specifically, if we've previously stored message history as individual messages (and threads containing messages), I feel there should be an on-ramp to take these previous "raw" or "unformatted" llm messages and translate/upgrade them to the latest Checkpointers/State's schema upon re-entering the graph in a specific node. Upgradability would obviously need addressing here also.

I don't believe this is already possible, is it??

@benjamincburns
Copy link
Author

I don't believe this is already possible, is it??

The Checkpointer interface is fairly lightweight, and if you're authoring your own you can separate concerns however you like. There's nothing to say that you can't write a generic checkpointer with support for pluggable storage backends, interceptors/middleware, etc - if that's what you want to do.

I haven't attempted it yet, but from what I can see from reading the code there's also nothing that makes it impossible today for checkpointer authors to implement support for version tagging, lazy online migrations, etc (or various other versioning strategies). It's just that these things aren't explicitly supported or facilitated by the structure of the API, so everyone who implements their own checkpointer is going to need to figure out their own abstractions for things, like how to define version tags, how to plumb through the state schema definitions, where state schema versions are stored in the serialized metadata, etc.

It really just comes down to whether LangGraph intend for application authors to have to write custom checkpointers and stores for their applications, or if they're wanting an ecosystem of generic (but production-ready) third party checkpointer and store libraries to pop up. If the latter, they'd do well to force some degree of versioning support into the structure of the Checkpointer itself, as that will make it more likely that the authors of these third party components will provide a consistent level of support for state schema versioning. That's not the only way to make that happen, but it's definitely a very reliable way.

@hgoona
Copy link

hgoona commented Oct 1, 2024

these things aren't explicitly supported or facilitated by the structure of the API, so everyone who implements their own checkpointer is going to need to figure out their own abstractions for things, like how to define version tags, how to plumb through the state schema definitions, where state schema versions are stored in the serialized metadata, etc.

Exactly this ☝🏾 is the challenge I'm facing in my own attempts to build a custom Checkpointer.

At present, I don't believe I fully understand the anatomy of the data that is being stored, the shape of it, why and where it is being used. etc.. It feels extremely verbose and hard to decipher where and what each part is needed for (unless I'm mistaken - is this sort of thing documented somewhere, @benjamincburns?)

I'd greatly appreciate a breakdown of it if someone has documented it, but from my inspections, there is duplication of data mixed in with meta tags, all compounding to make a very big object with deeply nested keys that don't make much sense on initial observation - to my mind at least.

@benjamincburns
Copy link
Author

I'd greatly appreciate a breakdown of it if someone has documented it

I raised #541 to address this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants