Refactor model API to reduce user cognitive load #355

jbischof · 2022-09-14T00:09:43Z

jbischof
Sep 14, 2022

Intro

We now offer a large number of pretrained BERT encoders using a weights argument (e.g., #331). The signature for these models is currently of the form:

encoder = BertXXX(
    weights=None,
    vocabulary_size=None
)

This approach combines two constructor modalities---random graphs and pretrained checkpoints---in one API, requiring input validation between the weights and vocabulary_size (gh).

When we start to consider offering checkpoints for fine-tuned encoder+head pairs such as SST for sentiment, CamenBERT for NER, or SQUAD for question answering the mixed modalities get significantly more cumbersome. In the case of sequence classification the natural extension of our current API would be

classifier = BertClassifier(
    base_model=None,
    num_classes=None,
    weights=None,
)

Now we have three None default arguments and need to communicate to the user that either weights OR base_model AND num_classes must be specified. If base_model is used then the user also has to navigate the arg validation of BertXXX discussion above.

Proposal

If root cause is trying to mix multiple construction modalities in one signature, creating multiple constructors could reduce user confusion and future extensibility of the package.

class BertClassifier:
    def __init__(
        self,
        base_model,
        num_classes,
    ):
        """Randomly initialized graph."""
        pass

    @classmethod
    def from_pretrained(
        cls
        weights,
    ):
        """Specify a pretrained checkpoint to load into encoder + classifier graph."""
        model = cls()
        checkpoint_path = _get_checkpoint(cls, weights)
        model.load_weights(checkpoint_path)
        return model

Here the API becomes self-describing: either supply a base_model (possibly with its own pretrained weights) and num_classes to get a randomly initialized head or supply a single weights string specifying a checkpoint for the entire graph. Defaults of None with accompanying explanations in the docstring are not required.

Here are potential calls to specify the classifiers with different types of initialization:

# Totally random
classifier = BertClassifier(BertBase(10000), 2)

# Random head only
classifier = BertClassifier(BertBase.from_pretrained("uncased_en"), 2)
# Or with string id for encoder
classifier = BertClassifier("base_uncased_en", 2)

# All pretrained
classifier = BertClassifier.from_pretrained("base_sst_uncased_en")

The extensibility of this approach becomes yet more apparent if we want to specify checkpoint with multiple strings. Rather than now validating 4+ arguments for conflicts we simply add another arg to the relevant signature:

    @classmethod
    def from_pretrained(
        cls
        architecture,    # Name of the arch, e.g., "base"
        vocabulary,      # Name of the vocab, e.g., "uncased_en"
        task,            # Name of the fine-tuning task, e.g., "sst" or "FinBert"
    ):
        """Specify a pretrained checkpoint to load into encoder + classifier graph."""
        pass

Single string identifiers for checkpoints may be the right way to go for keras-nlp, but it is worth mentioning since keras-cv has already taken the multi-arg approach for RetinaNet (gh).

Potential issues

Functionality must be synchronized between constructors. In general the classmethod constructor will call the base constructor and return the result. For this to be seamless the classmethod construction should be a simple superset of the base construction. If this is not true we may need to add args to the base constructor which are only expected to be used by the classmethod constructor.

Alternative

If this proposal is considered too complicated or divergent from Keras style, I would encourage developing a single string identifier for each model so that the API is always weights alone or no weights at all (except for vanilla args like name). In this way users don't have to learn complex interaction patterns between the args and can develop a simple expectation for how loading pretrained checkpoints works.

For example:

# Totally random
classifier = BertClassifier(BertBase(10000), 2)

# Random head only
classifier = BertClassifier(base_model="base_uncased_en", 2)

# All pretrained
classifier = BertClassifier("sst_base_uncased_en")

Other advantages of a single id are that it makes it clear which cross products have coverage and is easy to override with a fully configurable class or the user's own filepath. In practice most of the cross products will not have coverage and in fact we may only support one cross product for many tasks. Guessing which combinations are supported could be quite frustrating for users.

One drawback of this approach is that we need to disambiguate between the string id for the base_model vs the entire graph.

mattdangerw · 2022-09-14T22:11:25Z

mattdangerw
Sep 14, 2022
Maintainer

Thanks very much for much for kicking off discussion! I'll reply in three parts. High-level thoughts, questions, and in the weeds thoughts.

High level

I think in terms consistency with rest of Keras, and overall identity of our package, I would have preference for the Alternative listed above. (It also reads very nice!) There are specifics we can get into there (e.g. Keras does not use separate classmethod constructors anywhere), but by and large I think that is a discussion that will need to happen separate from the merits of the API in a vacuum. It is more stylistic and high-level.

Worth noting that we can always achieve what is effectively a static constructor by splitting one class level API symbol into two.

Another high-level call out is we are really seeing two needs pitted against each other. A single string identifier for a checkpoint is easy to discover and copy-paste, but is not more readable.

# More readable.
BertClassifier(size="base", pretraining_data="wiki_books_en", finetuning_data="sst2", lowercase=True)
# Easier to discover and sub in a desired checkpoint.
BertClassifier("sst_base_uncased_en")

So we are considering deliberately forgoing some code readability for discover-ability and simplicity of our API. That makes sense to me! Just worth calling out.

Questions

All looking at the Alternative proposal:

Does a "core network" instantiation still look like BertBase("uncased_en")?
- If we care about the simplicity of single string identifier should we consider Bert("base_uncased_en") and BertClassifier(base_model="base_uncased_en", 2)?
How could you instantiate a BertPreprocessing object with a preset vocabulary, but a custom sequence length?
Does BertClassifier(base_model="base_uncased_en", 2) also instantiate pre-processing? How would that work with BertClassifier(BertBase(10000), 2)?
What happens if you do BertClassifier("sst_base_uncased_en", 3)?

In the weeds thoughts

Really thinking about checkpoints and general parameters. I think we have three camps.

Parameters fixed by a checkpoint selection.
Parameters with a default given by a checkpoint selection.
Parameters with that can be changed regardless of checkpoint selection.

I've been listing these out.

preprocessing

vocabulary: Fixed by checkpoint.
lowercase: Fixed by checkpoint.
sequence_length: Defaulted by checkpoint.
name: Defaulted by checkpoint?
truncator: Unaffected by checkpoint.

model

weights: Fixed by checkpoint.
vocabulary_size: Fixed by checkpoint.
all bert model hyperparameters: Fixed by checkpoint.
name: Defaulted by checkpoint?
trainable: Unaffected by checkpoint.
dtype: Unaffected by checkpoint.
dynamic: Unaffected by checkpoint.

high level classification object (if we do a "Pipeline" approach)

num_classes: Fixed by a checkpoint which includes the head.
name: Defaulted by checkpoint?
trainable: Unaffected by checkpoint.
dtype: Unaffected by checkpoint.
dynamic: Unaffected by checkpoint.
include_preprocessing: Unaffected by checkpoint.

We don't need to cover all combinations. But there are some mixing of checkpoint ids and args we have to allow I think. BertClassifier("sst_base_uncased_en", include_preprocessing=False) and BertPreprocessor("uncased_en", sequence_length=128) for example.

This is mainly to say that it seems incorrect to say that we can fold every constructor into a binary decision of either one string or a list of arguments as a general strategy. But maybe there's a way I'm not seeing!

0 replies

jbischof · 2022-09-19T20:30:31Z

jbischof
Sep 19, 2022
Author

Thanks for the feedback @mattdangerw! I have prototyped a version of the "Alternative" proposal incorporating many of your suggestions in #363 and #361. Please take a look.

3 replies

martin-gorner Sep 28, 2022

Late to the party but I kind of like the from_pretrained() syntax. A couple of reasons:

from_pretrained makes it extra clear and readable that you are loading a pretrained model
allows us to isolate parameters that only matter for pretrained models to the from_pretrained() function
could allow us, in the future, to auto-populate some model parameters in based on the checkpoint being loaded
does not really diverge from Keras syntax. There is no Keras syntax for pretrained models. The only part there is one is Keras.applications and we are migrating those, and changing the API.

martin-gorner Sep 28, 2022

I agree with the cross-product issue though. It's solvable with documentation and error messages but that will require some work.

jbischof Sep 29, 2022
Author

Thanks @martin-gorner! See #377 for a related proposal. @mattdangerw agree with your points and are working on final distillation of these ideas.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor model API to reduce user cognitive load #355

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Refactor model API to reduce user cognitive load #355

jbischof Sep 14, 2022

Intro

Proposal

Potential issues

Alternative

Replies: 2 comments · 3 replies

mattdangerw Sep 14, 2022 Maintainer

High level

Questions

In the weeds thoughts

jbischof Sep 19, 2022 Author

martin-gorner Sep 28, 2022

martin-gorner Sep 28, 2022

jbischof Sep 29, 2022 Author

jbischof
Sep 14, 2022

Replies: 2 comments 3 replies

mattdangerw
Sep 14, 2022
Maintainer

jbischof
Sep 19, 2022
Author

jbischof Sep 29, 2022
Author