Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a script to compute the perplexity of test data #56

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ajaech
Copy link

@ajaech ajaech commented Oct 27, 2016

The eval.py script can be used to compute perplexity of test data.

Adding eval.py and updates to util.py and models.py to allow for
calculating the perplexity of test files.

I also modified the vocabulary to have start, end and unknown character
tokens.
count_pairs = sorted(counter.items(), key=lambda x: -x[1])
self.chars, _ = zip(*count_pairs)
self.vocab_size = len(self.chars)
self.vocab = dict(zip(self.chars, range(len(self.chars))))
with open(vocab_file, 'wb') as f:
cPickle.dump(self.chars, f)
self.tensor = np.array(list(map(self.vocab.get, data)))
self.tensor = np.array(list(map(self.vocab.get, ['<S>'] + list(data) + ['</S>'])))
Copy link

@martiansideofthemoon martiansideofthemoon Nov 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be a better idea to write this after line 59, self.tensor = self.tensor[:self.num_batches * self.batch_size * self.seq_length], since it's unlikely that you will get the </S> character

@@ -58,6 +58,29 @@ def loop(prev, _):
optimizer = tf.train.AdamOptimizer(self.lr)
self.train_op = optimizer.apply_gradients(zip(grads, tvars))

def eval(self, sess, chars, vocab, text):
batch_size = 200

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seq_length you mean?

@@ -58,6 +58,29 @@ def loop(prev, _):
optimizer = tf.train.AdamOptimizer(self.lr)
self.train_op = optimizer.apply_gradients(zip(grads, tvars))

def eval(self, sess, chars, vocab, text):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably better to move this to eval.py

@hugovk
Copy link
Contributor

hugovk commented Feb 16, 2017

@ajaech This PR has merge conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants