Skip to content
This repository has been archived by the owner on Oct 15, 2019. It is now read-only.

Possible improvements about the wrapper #160

Open
wddabc opened this issue Mar 13, 2017 · 11 comments
Open

Possible improvements about the wrapper #160

wddabc opened this issue Mar 13, 2017 · 11 comments

Comments

@wddabc
Copy link

wddabc commented Mar 13, 2017

While minpy is a great tool for building dynamic nets, I found the wrapper (Solver, ModelBase...) is a little bit hard to use that I had to do a lot of hacks to make this work. Here are some thoughts about the possible improvements at the current stage:

  1. Lots of things should be passed into Solver.train rather than the constructor. Such as the data iterator, the optimizer. etc. In fact, the Solver.train should support more, such as whether do cross-validation, callback functions. I think a good template is the .fit function in Keras.
  2. As pointed out in Why solver._step() shouldn't be called manually? #147, user should have flexibility writing their own training loop. So I think a public step is necessary.
  3. The definition of the final training loss is inflexible.
 def loss_func(*params): # pylint: disable=unused-argument
            """
            Loss function calculate the loss
            """

            # It seems that params are not used in forward function. But since we will pass
            # model.params as arguments, we are ok here.
            predict = self.model.forward_batch(batch, mode='train')
            return self.model.loss_batch(batch, predict)

Assuming the data is x and the response is y, it is not necessary the case that the final loss can be decomposed to loss(model.predict(x), y) . In some cases it is just a single loss model.loss(x,y) , which means no prediction is made at the training time. This is a common case in the structured prediction problems, such as conditional random fields.

@sneakerkg
Copy link
Member

@wddabc Thanks a lot for the suggestion.
We are going to refine the nn package to make it more convenient for users.
For the loss_func part, what do you think we pre-define some loss or criterion template for users?

@wddabc
Copy link
Author

wddabc commented Mar 13, 2017

I think directly returning self.model.loss_batch(batch) should be fine. Something looks like:

def loss_func(*params):
           return self.model.loss_batch(batch)

User can define whatever they like as the implementation of model.loss_batch(batch), for example, loss(model.predict(x), y). model.predict is optional, user can define or not based on whether they want to make prediction. For example, users might just use the model to pretrain some word embeddings, in which case model.predict is not necessary.

One thing should be careful is a wrapper on the top of loss_batch is still necessary, which will control whether to switch on the dropout/ regularizer. Might be defined inside loss_func? I'm not quite sure.

The dropout/ regularizer should be declared by user as arguments of .add_param() and be applied before calling self.params[name] . In fact, I'd suggest change self.params[name] to self.params(name) , where the self.params() decides whether to apply the dropout based on whether it is in the training stage.

Here are some other issues should be considered for the new design (I think I'm basically proposing some Keras-thing):

  1. ModelBase looks like a standalone module that doesn't support calling other models. That is, you should define a huge graph within one ModelBase.

  2. Also, joint training different models with shared parameters is not supported in the current design. For this case, I think the right design is the parameters for each sub-module are globally transparent to each other.

@sneakerkg
Copy link
Member

@wddabc Thanks a lot for the suggestions. We totally agree on the thought that flags like "isTrain" is needed when calling loss_batch;
And:

  1. For modelBase problem, now we have the concept Module in https://github.com/dmlc/minpy/blob/master/minpy/nn/model_builder.py . We can have multiple sub-module in a whole model;
  2. Thanks for pointing out shared parameter issue. Could I bother you give us some suggestion on the interface for shared parameter functionality?
  3. I'm not quite understand why we need to apply dropout on the params. Is it a common trick in NLP tasks? For vision and audio task, dropout is usually applied on the activations.

@wddabc
Copy link
Author

wddabc commented Mar 17, 2017 via email

@wddabc
Copy link
Author

wddabc commented Mar 17, 2017

I think there is still flaw of the dry-run idea. I forgot the user might want to build graph themselves and shouldn't implement this. Well, this looks like a hard problem, as it needs static analysis on the dynamic graph. But the lazy init approach still seems to be viable. Or just let it go and let the user to take care of the shapes.

@GaiYu0
Copy link
Collaborator

GaiYu0 commented Mar 17, 2017

@wddabc Thank you very much for your suggestion! (I should have participated in discussion earlier)

The module Parallel has not been implemented because in the framework of model_builder it is difficult to provide a flexible interface for user to specify an operation on multiple outputs of Parallel. The difficulty is because a graph created by model_builder is essentially a "list", as you said. I agree that model_builder should support more general graph composition and if general graph composition is implemented it is going be simple for user to manipulate outputs of Parallel.

@GaiYu0
Copy link
Collaborator

GaiYu0 commented Mar 17, 2017

Resolving parameter sharing might be easier. I am implementing a initializer that enables user to index initialization configuration by parameter. For example, initializer[convolution0.weight].std = 0.1 specifies that the weight of convolution layer 0 should be sampled from a distribution using a standard deviation of 0.1. On this basis it is possible to implement an interface of the form initializer.share(parameters), which binds those parameters in initialization. No other modification is required for forward functions. What do you think?

@GaiYu0
Copy link
Collaborator

GaiYu0 commented Mar 17, 2017

Also, model_builder only supports static graph composition for the time being, i.e. statically declare a graph and use a function to convert the graph to a model. Dry-running is going to be an issue when model_builder supports dynamic graph composition in future, as PyTorch does. PyTorch forces users to declare all layers that is going to be used in dynamic graph composition in advance so that PyTorch can register related info, e.g. parameters. I am not sure whether PyTorch figures out parameter shape dynamically. It appears that the information provided in PyTorch layer declaration is sufficient for PyTorch to figure out shapes in advance. For example, the fully-connected layer in PyTorch is:

image

The shape of weight is (in_features, out_features), which can be inferred from layer declaration alone. model_builder only requires out_features, so the shape of weight must be inferred from data. Perhaps PyTorch-style dynamic graph composition is more feasible in term of difficulty of implementation?

@wddabc
Copy link
Author

wddabc commented Mar 18, 2017

I agree, as I said in the last email. Maybe it is ok to just let users to take care of the shapes, just as PyTorch. The dynamic graph by itself is easy enough.

Regarding the shared parameter, I also agree that it is easy to fix. For the initialization, did you mean you want a separated initializer to take care of the initialization? I think specifying the initialization method as an argument of the parameter declaration is good enough, just as the current implementation does. As a user, I'm think about something like this.

class RNNNet(ModelBase):
    def __init__(self,
                 batch_size=100,
                 input_size=2,
                 hidden_size=64,
                 num_classes=1):
        super(RNNNet, self).__init__()
        self.add_param(name='Wx', shape=(input_size, hidden_size), init='uniform')\ #when using some random init, shape should be given
            .add_param(name='Wh', init=np.zeros(hidden_size, hidden_size))\ #when actual np ndarray should is given, shape can be ignored
            .add_param(name='b', share=global.parameter('convolution0:W') #shared parameter, shape can also be ignored. 'b' will be a local alias of 'convolution0:W'. 'convolution0:W' is a global name.

This is just from a user's prospective.

@zzhang-cn
Copy link
Collaborator

zzhang-cn commented Mar 18, 2017 via email

@wddabc
Copy link
Author

wddabc commented Mar 19, 2017

Agree. I think one important thing is figuring out what functionality a Module should have (for example, load(), save(), fit(), predict() ...), and how these could be implemented under the context of dynamic graph. Starting from a sequential network is a good way.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants