Replies: 2 comments 3 replies
-
Thanks very much for much for kicking off discussion! I'll reply in three parts. High-level thoughts, questions, and in the weeds thoughts. High levelI think in terms consistency with rest of Keras, and overall identity of our package, I would have preference for the Alternative listed above. (It also reads very nice!) There are specifics we can get into there (e.g. Keras does not use separate classmethod constructors anywhere), but by and large I think that is a discussion that will need to happen separate from the merits of the API in a vacuum. It is more stylistic and high-level. Worth noting that we can always achieve what is effectively a static constructor by splitting one class level API symbol into two. Another high-level call out is we are really seeing two needs pitted against each other. A single string identifier for a checkpoint is easy to discover and copy-paste, but is not more readable. # More readable.
BertClassifier(size="base", pretraining_data="wiki_books_en", finetuning_data="sst2", lowercase=True)
# Easier to discover and sub in a desired checkpoint.
BertClassifier("sst_base_uncased_en") So we are considering deliberately forgoing some code readability for discover-ability and simplicity of our API. That makes sense to me! Just worth calling out. QuestionsAll looking at the Alternative proposal:
In the weeds thoughtsReally thinking about checkpoints and general parameters. I think we have three camps.
I've been listing these out. preprocessing
model
high level classification object (if we do a "Pipeline" approach)
We don't need to cover all combinations. But there are some mixing of checkpoint ids and args we have to allow I think. This is mainly to say that it seems incorrect to say that we can fold every constructor into a binary decision of either one string or a list of arguments as a general strategy. But maybe there's a way I'm not seeing! |
Beta Was this translation helpful? Give feedback.
-
Thanks for the feedback @mattdangerw! I have prototyped a version of the "Alternative" proposal incorporating many of your suggestions in #363 and #361. Please take a look. |
Beta Was this translation helpful? Give feedback.
-
Intro
We now offer a large number of pretrained BERT encoders using a
weights
argument (e.g., #331). The signature for these models is currently of the form:This approach combines two constructor modalities---random graphs and pretrained checkpoints---in one API, requiring input validation between the
weights
andvocabulary_size
(gh).When we start to consider offering checkpoints for fine-tuned encoder+head pairs such as SST for sentiment, CamenBERT for NER, or SQUAD for question answering the mixed modalities get significantly more cumbersome. In the case of sequence classification the natural extension of our current API would be
Now we have three
None
default arguments and need to communicate to the user that eitherweights
ORbase_model
ANDnum_classes
must be specified. Ifbase_model
is used then the user also has to navigate the arg validation ofBertXXX
discussion above.Proposal
If root cause is trying to mix multiple construction modalities in one signature, creating multiple constructors could reduce user confusion and future extensibility of the package.
Here the API becomes self-describing: either supply a
base_model
(possibly with its own pretrained weights) andnum_classes
to get a randomly initialized head or supply a singleweights
string specifying a checkpoint for the entire graph. Defaults ofNone
with accompanying explanations in the docstring are not required.Here are potential calls to specify the classifiers with different types of initialization:
The extensibility of this approach becomes yet more apparent if we want to specify checkpoint with multiple strings. Rather than now validating 4+ arguments for conflicts we simply add another arg to the relevant signature:
Single string identifiers for checkpoints may be the right way to go for
keras-nlp
, but it is worth mentioning sincekeras-cv
has already taken the multi-arg approach forRetinaNet
(gh).Potential issues
Functionality must be synchronized between constructors. In general the
classmethod
constructor will call the base constructor and return the result. For this to be seamless theclassmethod
construction should be a simple superset of the base construction. If this is not true we may need to add args to the base constructor which are only expected to be used by theclassmethod
constructor.Alternative
If this proposal is considered too complicated or divergent from Keras style, I would encourage developing a single string identifier for each model so that the API is always
weights
alone or noweights
at all (except for vanilla args likename
). In this way users don't have to learn complex interaction patterns between the args and can develop a simple expectation for how loading pretrained checkpoints works.For example:
Other advantages of a single id are that it makes it clear which cross products have coverage and is easy to override with a fully configurable class or the user's own filepath. In practice most of the cross products will not have coverage and in fact we may only support one cross product for many tasks. Guessing which combinations are supported could be quite frustrating for users.
One drawback of this approach is that we need to disambiguate between the string id for the
base_model
vs the entire graph.Beta Was this translation helpful? Give feedback.
All reactions