-
Notifications
You must be signed in to change notification settings - Fork 953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Results quite unsatisfactory #91
Comments
@virxxiii thanks for your feedback. I too am interested in getting good results out of this project and comparing it to other solutions. Just merged PR #73 which gives you the default values for all of the parameters which is a start. Hopefully I'll start a table in the README to give a brief description of each parameter and a link to a journal article for more details etc. Also hope to add a Perplexity calculation which might be useful in comparing results across various architectures / models. I've had good results anecdotally with this system training on a baby names set using something e.g.:
I then let it train for about 12 hours on my GeForce GTX 1070 until the You can pick the number of epochs based on the speed of training to essentially "train for X hours". Finally, we just added
Though its not been tested much so YMMV. It'd be great to start collecting some real results / Tensorboard graphs / CLI parameters examples too. Just some thoughts, thanks! |
Hi, I'm struggling to get good results as well. This is my first time using a LSTM RNN and I'm relatively new (cliche response) but I've gotten human readable responses after training with my dataset using other RNNs. I get only garbage here. A couple things I've noticed that are definitely bad but I don't know why yet:
I think it's converging too fast as you said in previous posts. How would you suggest I get better output? Thanks. |
Your setup sounds good, (i use the same hardware/ TF-gpu)... Here are my suggestions:
Also the dropout feature is new and from what I can tell is most useful only when you don't want to overfit your data, but not the current problem. So I'd suggest starting with the basic settings and see how that goes e.g.:
The defaults are pretty good for your amount of data. ( The next thing I'd consider increasing is perhaps Please report back with your findings! |
Also, people have had good success with this specific repo, here is an example of generating music in ABC notation to midi: https://maraoz.com/2016/02/02/abc-rnn/ Tuning your models is kind of a "dark art" at this point. My best advice in general is:
Happy searching! |
You're right about the size. I actually had 4.3MB in my corpus but I removed a big chunk of it because I thought it was too different from the rest of my corpus. I will remove the dropout to see if it gets better results. I will definitely increase the sequence length. Will that drastically increase the iteration calculation time? One of my nagging questions in my head is, "what is clean data for this LSTM? Should all the TXT files be in the same format? Or can it vary? e.g. movie scripts cannot be used with whitepapers or technical specifications. |
@hoomanNo5 another interesting post I read about mixing input from various authors: https://www.gwern.net/RNN%20metadata Technically you can mix anything you want into a big If you use a lot of different formats in one training, then the model may "waste some space" learning to mimic each of the different formatting styles. So the most "efficient system" would have consistent style of input assuming you want consistent style of output. |
Thanks everybody for your input! I have finally managed to get waaaay better (actually really good) results by reducing the num_layers to 3. It seems to me that anything above 3 results in drastically worse results. I don't know why this is, I'm just telling you what I observed. It might help you. As for the dropout: I'd really appreciate if you could give me a short explanation about how it will affect training? |
@ubergarm So here is what I did:
A couple of questions come to mind:
|
I was just reading this and was about to point this out: most people strongly recommend against using >3 layers in RNNs, and even 1 layer often works well, because, as you discovered, it works horribly. (For example, I think Karpathy mentions this in his RNN post.) It's either too much computation per character or it's a depth problem, the same way you can't train regular feedforward or CNN NNs more than 20 layers or so without problems like totally diverging on you unless you have special tricks like residual layers. (An RNN can be seen as an extremely deep feedforward NN, after all.) For RNNs, this problem hasn't been solved yet AFAIK; presumably someone will figure out at some point how to reuse residual layers or batchnorm or initialization in just the right way to fix it, but not yet. |
guys, i too am having unsatisfactory results while using lstm, i did not use this code but i looked through this and andrej's code before trying my implementation. my config is: my error is down to 4 but its still not constructing proper words or sentences epoch 582 iteration 100 loss 4.223689n nd t thedathearot ould the, ooonth, ife! it cath oubu is ands, so i wanted to ask should i wait for the loss to go below 1 or should i change some parameters or is there a problem with my code |
First and foremost thanks to everybody involved in this. I really appreciate the work you are putting into this.
Previously I was using Karpathy's char-rnn but I couldn't get torch running with my gpu after updating my hardware so I was looking for a different solution and that has brought me here. Using Karpathy's rnn I was getting beautiful results with even very small datasets (around 1MB). With your tensorflow implementation the results are not so good and I wonder why. I tried fiddle around with the parameters (rnn_size, num_layers, etc) but the improvements were little or nonexistent.
It would be really cool if you could add some explanatory comments to the different parameters aka how they will affect the result. For me being relatively new to NNs, this would help a lot in getting better results.
Thanks again for your efforts!
The text was updated successfully, but these errors were encountered: