-
Notifications
You must be signed in to change notification settings - Fork 112
Possible improvements about the wrapper #160
Comments
@wddabc Thanks a lot for the suggestion. |
I think directly returning
User can define whatever they like as the implementation of One thing should be careful is a wrapper on the top of The dropout/ regularizer should be declared by user as arguments of Here are some other issues should be considered for the new design (I think I'm basically proposing some Keras-thing):
|
@wddabc Thanks a lot for the suggestions. We totally agree on the thought that flags like "isTrain" is needed when calling loss_batch;
|
On Thu, Mar 16, 2017 at 7:49 PM, Tianjun Xiao ***@***.***> wrote:
@wddabc <https://github.com/wddabc> Thanks a lot for the suggestions. We
totally agree on the thought that flags like "isTrain" is needed when
calling loss_batch;
And:
1. For modelBase problem, now we have the concept Module in
https://github.com/dmlc/minpy/blob/master/minpy/nn/model_builder.py
<https://github.com/dmlc/minpy/blob/master/minpy/nn/model_builder.py>
. We can have multiple sub-module in a whole model;
It looks nice. I found there is a a module called Parallel which hasn't
been implemented yet. Is this a version where you support appending
multiple inputs? I'm actually thinking about making this more general by
building a *graph* rather than a *list* (I mean Sequential).This is more
flexible I think. Also, although it is a minor issue, I might consider
renaming forward by __call__, as layer is basically a function of the
input. This design is very close to the Functional API in Keras
<https://keras.io/getting-started/functional-api-guide/>, which I like a
lot. One major difference between minpy and Keras is the __call__ is: In
Keras, __call__ is building a static graph and initialize the weight after
shape inference. Minpy is different, as we are doing real computation! I
think one idea is to initialize the weights lazily at the first __call__,
when you will know the input shape. Then set a flag (say built) to true and
skip this initialization in the following pass. This sounds a little bit
hacky as the initialization shouldn't depends on the first input, for
example, what if I just want to build the model and save it before passing
data?
A less hacky way, though I think still hacky, is adding a build function
and ask user to call before running. The build function does nothing but a
dry-run though the graph. The idea is, using a global private flag to
indicate wether it is a dry-run, if true, use the size of the input to
initialize the weights, ignore the computation, just output a random value
with the same size as the declared output. The reason we want to ignore the
computation is 1) It might introduce the startup cost. 2) More serious, the
user might define the graph on specific types of data, our random value
might crash the program (For example, user might define division, and you
pass a zero-value dummy matrix)
1. Thanks for pointing out shared parameter issue. Could I bother you
give us some suggestion on the interface for shared parameter functionality?
I think all the parameter is declared in the submodules but actually
stored in a global dictionary. There are two ways to access these
parameters, first is using local names by calling the local method. Another
is directly look up the global dictionary with using global names. The
global name is something like a concatenation of the module name and the
local name. If the user would like to involve shared parameters, they
should explicitly look up them by global name. I think this is flexible
enough.
1. I'm not quite understand why we need to apply dropout on the
params. Is it a common trick in NLP tasks? For vision and audio task,
dropout is usually applied on the activations.
This is called DropConnect
<http://www.matthewzeiler.com/pubs/icml2013/icml2013.pdf>? In stead of
dropout the hidden output, dropout some connections (weights) . Well, this
is not quite frequent in NLP as well. But I remember somehow I did dropout
on the entire word embedding to solve some out-of-vocabulary issue.
…
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#160 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFdRO1CWhXeoO9OzP7dVTV3rOgav9Mr5ks5rmcp4gaJpZM4Ma1VI>
.
--
Best wishes
---
Dingquan Wang
http://www.cs.jhu.edu/~wdd/
The Center for Language and Speech Processing
Department of Computer Science
Johns Hopkins University
|
I think there is still flaw of the dry-run idea. I forgot the user might want to build graph themselves and shouldn't implement this. Well, this looks like a hard problem, as it needs static analysis on the dynamic graph. But the lazy init approach still seems to be viable. Or just let it go and let the user to take care of the shapes. |
@wddabc Thank you very much for your suggestion! (I should have participated in discussion earlier) The module |
Resolving parameter sharing might be easier. I am implementing a initializer that enables user to index initialization configuration by parameter. For example, |
Also, The shape of weight is |
I agree, as I said in the last email. Maybe it is ok to just let users to take care of the shapes, just as PyTorch. The dynamic graph by itself is easy enough. Regarding the shared parameter, I also agree that it is easy to fix. For the initialization, did you mean you want a separated initializer to take care of the initialization? I think specifying the initialization method as an argument of the parameter declaration is good enough, just as the current implementation does. As a user, I'm think about something like this.
This is just from a user's prospective. |
Hi all,
So far, we have focused on building a sequential network, but adding
per-layer init and optimization rules, each of which can be dynamically
changed. This adds flexibility and simplies logics.
I suggest us build upon these principles and study Keras (
https://keras.io/getting-started/functional-api-guide/) to complete
functionality incrementally. Things like:
- Multiple inputs
- Multiple outputs
- Shared layers (or modules)
- Recurrent
- Dynamic layers (new layers may be inserted dynamically, and their weights
might be computed online).
There's no rush to implement them right away, we can put all
prototypes/signatures in a google doc and hash them out. What do you all
think?
…-zz
On Sat, Mar 18, 2017 at 9:32 AM, Dingquan ***@***.***> wrote:
I agree, as I said in the last email. Maybe it is ok to just let users to
take care of the shapes, just as PyTorch. The dynamic graph by itself is
easy enough.
Regarding the shared parameter, I also agree that it is easy to fix. For
the initialization, did you mean you want a separated initializer to take
care of the initialization? I think the specify the initialization method
as an argument of the parameter declaration is good enough, just as the
current implementation does. As a user, I'm think about something like this.
class RNNNet(ModelBase):
def __init__(self,
batch_size=100,
input_size=2,
hidden_size=64,
num_classes=1):
super(RNNNet, self).__init__()
self.add_param(name='Wx', shape=(input_size, hidden_size), init='uniform')\ #when using some random init, shape should be given
.add_param(name='Wh', init=np.zeros(hidden_size, hidden_size))\ #when actual np ndarray should is given, shape can be ignored
.add_param(name='b', share=global.parameter('convolution0:W') #shared parameter, shape can also be ignored. 'b' will be a local alias of 'convolution0:W'. 'convolution0:W' is a global name.
This is just from a user's prospective.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#160 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AIKVUWplRxfHMIOHHNvWXogyC-IsH-8tks5rmzQtgaJpZM4Ma1VI>
.
--
*Zheng Zhang <https://shanghai.nyu.edu/academics/faculty/zheng-zhang> 张峥*
Professor of Computer Science, NYU Shanghai 上海纽约大学计算机系教授
Room 1118, 1555 Century Avenue
Pudong New District, Shanghai 200122, China
中国上海市浦东新区世纪大道1555号1118室, 200122
T Shanghai office 上海办公室 +86 021 2059 5687
T China mobile 中国手机 +86 156 1808 6522
E 电子邮件 [email protected]
<[email protected]>
|
Agree. I think one important thing is figuring out what functionality a Module should have (for example, load(), save(), fit(), predict() ...), and how these could be implemented under the context of dynamic graph. Starting from a sequential network is a good way. |
While minpy is a great tool for building dynamic nets, I found the wrapper (Solver, ModelBase...) is a little bit hard to use that I had to do a lot of hacks to make this work. Here are some thoughts about the possible improvements at the current stage:
.fit
function in Keras.Assuming the data is
x
and the response isy
, it is not necessary the case that the final loss can be decomposed toloss(model.predict(x), y)
. In some cases it is just a single lossmodel.loss(x,y)
, which means no prediction is made at the training time. This is a common case in the structured prediction problems, such as conditional random fields.The text was updated successfully, but these errors were encountered: