From 71c87119708b6466574eb2cb11097c6b38b86fc9 Mon Sep 17 00:00:00 2001
From: Funtowicz Morgan <mfuntowicz@users.noreply.github.com>
Date: Wed, 4 Mar 2020 16:45:57 +0000
Subject: [PATCH] Adding Docker images for transformers + notebooks (#3051)

* Added transformers-pytorch-cpu and gpu Docker images

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added automatic jupyter launch for Docker image.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Move image from alpine to Ubuntu to align with NVidia container images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added TRANSFORMERS_VERSION argument to Dockerfile.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added Pytorch-GPU based Docker image

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added Tensorflow images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use python 3.7 as Tensorflow doesnt provide 3.8 compatible wheel.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Remove double FROM instructions on transformers-pytorch-cpu image.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added transformers-tensorflow-gpu Docker image.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* use the correct ubuntu version for tensorflow-gpu

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added pipelines example notebook

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added transformers-cpu and transformers-gpu (including both PyTorch and TensorFlow) images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Docker images doesnt start jupyter notebook by default.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Tokenizers notebook

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Update images links

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Update Docker images to python 3.7.6 and transformers 2.5.1

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added 02-transformers notebook.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Trying to realign 02-transformers notebook ?

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added Transformer image schema

* Some tweaks on tokenizers notebook

* Removed old notebooks.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Attempt to provide table of content for each notebooks

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Second attempt.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Reintroduce transformer image.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Keep trying

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* It's going to fly !

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Remaining of the Table of Content

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix inlined elements for the table of content

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Removed anaconda dependencies for Docker images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Removing notebooks ToC

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added LABEL to each docker image.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Removed old Dockerfile

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Directly use the context and include transformers from here.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Reduce overall size of compiled Docker images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Install jupyter by default and use CMD for easier launching of the images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Reduce number of layers in the images.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added README.md for notebooks.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix notebooks link in README

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix some wording issues.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added blog notebooks too.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Addressing spelling errors in review comments.

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

Co-authored-by: MOI Anthony <xn1t0x@gmail.com>
---
 docker/Dockerfile                             |    7 -
 docker/transformers-cpu/Dockerfile            |   26 +
 docker/transformers-gpu/Dockerfile            |   26 +
 docker/transformers-pytorch-cpu/Dockerfile    |   25 +
 docker/transformers-pytorch-gpu/Dockerfile    |   25 +
 docker/transformers-tensorflow-cpu/Dockerfile |   25 +
 docker/transformers-tensorflow-gpu/Dockerfile |   25 +
 notebooks/01-training-tokenizers.ipynb        |  366 ++
 notebooks/02-transformers.ipynb               |  502 ++
 notebooks/03-pipelines.ipynb                  |  594 ++
 notebooks/Comparing-PT-and-TF-models.ipynb    | 1630 ------
 .../Comparing-TF-and-PT-models-MLM-NSP.ipynb  | 4815 -----------------
 .../Comparing-TF-and-PT-models-SQuAD.ipynb    | 1644 ------
 notebooks/Comparing-TF-and-PT-models.ipynb    | 1318 -----
 notebooks/README.md                           |   17 +
 15 files changed, 1631 insertions(+), 9414 deletions(-)
 delete mode 100644 docker/Dockerfile
 create mode 100644 docker/transformers-cpu/Dockerfile
 create mode 100644 docker/transformers-gpu/Dockerfile
 create mode 100644 docker/transformers-pytorch-cpu/Dockerfile
 create mode 100644 docker/transformers-pytorch-gpu/Dockerfile
 create mode 100644 docker/transformers-tensorflow-cpu/Dockerfile
 create mode 100644 docker/transformers-tensorflow-gpu/Dockerfile
 create mode 100644 notebooks/01-training-tokenizers.ipynb
 create mode 100644 notebooks/02-transformers.ipynb
 create mode 100644 notebooks/03-pipelines.ipynb
 delete mode 100644 notebooks/Comparing-PT-and-TF-models.ipynb
 delete mode 100644 notebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb
 delete mode 100644 notebooks/Comparing-TF-and-PT-models-SQuAD.ipynb
 delete mode 100644 notebooks/Comparing-TF-and-PT-models.ipynb
 create mode 100644 notebooks/README.md

diff --git a/docker/Dockerfile b/docker/Dockerfile
deleted file mode 100644
index fed834ff88e89e..00000000000000
--- a/docker/Dockerfile
+++ /dev/null
@@ -1,7 +0,0 @@
-FROM pytorch/pytorch:latest
-
-RUN git clone https://github.com/NVIDIA/apex.git && cd apex && python setup.py install --cuda_ext --cpp_ext
-
-RUN pip install transformers
-
-WORKDIR /workspace
\ No newline at end of file
diff --git a/docker/transformers-cpu/Dockerfile b/docker/transformers-cpu/Dockerfile
new file mode 100644
index 00000000000000..0d22039a481f0d
--- /dev/null
+++ b/docker/transformers-cpu/Dockerfile
@@ -0,0 +1,26 @@
+FROM ubuntu:18.04
+LABEL maintainer="Hugging Face"
+LABEL repository="transformers"
+
+RUN apt update && \
+    apt install -y bash \
+                   build-essential \
+                   git \
+                   curl \
+                   ca-certificates \
+                   python3 \
+                   python3-pip && \
+    rm -rf /var/lib/apt/lists
+
+RUN python3 -m pip install --no-cache-dir --upgrade pip && \
+    python3 -m pip install --no-cache-dir \
+    jupyter \
+    tensorflow-cpu \
+    torch
+
+WORKDIR /workspace
+COPY . transformers/
+RUN cd transformers/ && \
+    python3 -m pip install --no-cache-dir .
+
+CMD ["/bin/bash"]
\ No newline at end of file
diff --git a/docker/transformers-gpu/Dockerfile b/docker/transformers-gpu/Dockerfile
new file mode 100644
index 00000000000000..6d68d2e4809757
--- /dev/null
+++ b/docker/transformers-gpu/Dockerfile
@@ -0,0 +1,26 @@
+FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
+LABEL maintainer="Hugging Face"
+LABEL repository="transformers"
+
+RUN apt update && \
+    apt install -y bash \
+                   build-essential \
+                   git \
+                   curl \
+                   ca-certificates \
+                   python3 \
+                   python3-pip && \
+    rm -rf /var/lib/apt/lists
+
+RUN python3 -m pip install --no-cache-dir --upgrade pip && \
+    python3 -m pip install --no-cache-dir \
+    jupyter \
+    tensorflow \
+    torch
+
+WORKDIR /workspace
+COPY . transformers/
+RUN cd transformers/ && \
+    python3 -m pip install --no-cache-dir .
+
+CMD ["/bin/bash"]
\ No newline at end of file
diff --git a/docker/transformers-pytorch-cpu/Dockerfile b/docker/transformers-pytorch-cpu/Dockerfile
new file mode 100644
index 00000000000000..d1759d650b84fd
--- /dev/null
+++ b/docker/transformers-pytorch-cpu/Dockerfile
@@ -0,0 +1,25 @@
+FROM ubuntu:18.04
+LABEL maintainer="Hugging Face"
+LABEL repository="transformers"
+
+RUN apt update && \
+    apt install -y bash \
+                   build-essential \
+                   git \
+                   curl \
+                   ca-certificates \
+                   python3 \
+                   python3-pip && \
+    rm -rf /var/lib/apt/lists
+
+RUN python3 -m pip install --no-cache-dir --upgrade pip && \
+    python3 -m pip install --no-cache-dir \
+    jupyter \
+    torch
+
+WORKDIR /workspace
+COPY . transformers/
+RUN cd transformers/ && \
+    python3 -m pip install --no-cache-dir .
+
+CMD ["/bin/bash"]
\ No newline at end of file
diff --git a/docker/transformers-pytorch-gpu/Dockerfile b/docker/transformers-pytorch-gpu/Dockerfile
new file mode 100644
index 00000000000000..4beff57dc9f694
--- /dev/null
+++ b/docker/transformers-pytorch-gpu/Dockerfile
@@ -0,0 +1,25 @@
+FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
+LABEL maintainer="Hugging Face"
+LABEL repository="transformers"
+
+RUN apt update && \
+    apt install -y bash \
+                   build-essential \
+                   git \
+                   curl \
+                   ca-certificates \
+                   python3 \
+                   python3-pip && \
+    rm -rf /var/lib/apt/lists
+
+RUN python3 -m pip install --no-cache-dir --upgrade pip && \
+    python3 -m pip install --no-cache-dir \
+    mkl \
+    torch
+
+WORKDIR /workspace
+COPY . transformers/
+RUN cd transformers/ && \
+    python3 -m pip install --no-cache-dir .
+
+CMD ["/bin/bash"]
\ No newline at end of file
diff --git a/docker/transformers-tensorflow-cpu/Dockerfile b/docker/transformers-tensorflow-cpu/Dockerfile
new file mode 100644
index 00000000000000..e4af2b84bdeb34
--- /dev/null
+++ b/docker/transformers-tensorflow-cpu/Dockerfile
@@ -0,0 +1,25 @@
+FROM ubuntu:18.04
+LABEL maintainer="Hugging Face"
+LABEL repository="transformers"
+
+RUN apt update && \
+    apt install -y bash \
+                   build-essential \
+                   git \
+                   curl \
+                   ca-certificates \
+                   python3 \
+                   python3-pip && \
+    rm -rf /var/lib/apt/lists
+
+RUN python3 -m pip install --no-cache-dir --upgrade pip && \
+    python3 -m pip install --no-cache-dir \
+    mkl \
+    tensorflow-cpu
+
+WORKDIR /workspace
+COPY . transformers/
+RUN cd transformers/ && \
+    python3 -m pip install --no-cache-dir .
+
+CMD ["/bin/bash"]
\ No newline at end of file
diff --git a/docker/transformers-tensorflow-gpu/Dockerfile b/docker/transformers-tensorflow-gpu/Dockerfile
new file mode 100644
index 00000000000000..3277434c9f0a60
--- /dev/null
+++ b/docker/transformers-tensorflow-gpu/Dockerfile
@@ -0,0 +1,25 @@
+FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
+LABEL maintainer="Hugging Face"
+LABEL repository="transformers"
+
+RUN apt update && \
+    apt install -y bash \
+                   build-essential \
+                   git \
+                   curl \
+                   ca-certificates \
+                   python3 \
+                   python3-pip && \
+    rm -rf /var/lib/apt/lists
+
+RUN python3 -m pip install --no-cache-dir --upgrade pip && \
+    python3 -m pip install --no-cache-dir \
+    mkl \
+    tensorflow
+
+WORKDIR /workspace
+COPY . transformers/
+RUN cd transformers/ && \
+    python3 -m pip install --no-cache-dir .
+
+CMD ["/bin/bash"]
\ No newline at end of file
diff --git a/notebooks/01-training-tokenizers.ipynb b/notebooks/01-training-tokenizers.ipynb
new file mode 100644
index 00000000000000..554d25d3ff70e1
--- /dev/null
+++ b/notebooks/01-training-tokenizers.ipynb
@@ -0,0 +1,366 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "source": [
+    "## Tokenization doesn't have to be slow !\n",
+    "\n",
+    "### Introduction\n",
+    "\n",
+    "Before going deep into any Machine Learning or Deep Learning Natural Language Processing models, every practitioner\n",
+    "should find a way to map raw input strings to a representation understandable by a trainable model.\n",
+    "\n",
+    "One very simple approach would be to split inputs over every space and assign an identifier to each word. This approach\n",
+    "would look similar to the code below in python\n",
+    "\n",
+    "```python\n",
+    "s = \"very long corpus...\"\n",
+    "words = s.split(\" \")  # Split over space\n",
+    "vocabulary = dict(enumerate(set(words)))  # Map storing the word to it's corresponding id\n",
+    "```\n",
+    "\n",
+    "This approach might work well if your vocabulary remains small as it would store every word (or **token**) present in your original\n",
+    "input. Moreover, word variations like \"cat\" and \"cats\" would not share the same identifiers even if their meaning is \n",
+    "quite close.\n",
+    "\n",
+    "![tokenization_simple](https://cdn.analyticsvidhya.com/wp-content/uploads/2019/11/tokenization.png)\n",
+    "\n",
+    "### Subtoken Tokenization\n",
+    "\n",
+    "To overcome the issues described above, recent works have been done on tokenization, leveraging \"subtoken\" tokenization.\n",
+    "**Subtokens** extends the previous splitting strategy to furthermore explode a word into grammatically logicial sub-components learned\n",
+    "from the data.\n",
+    "\n",
+    "Taking our previous example of the words __cat__ and __cats__, a sub-tokenization of the word __cats__ would be [cat, ##s]. Where the prefix _\"##\"_ indicates a subtoken of the initial input. \n",
+    "Such training algorithms might extract sub-tokens such as _\"##ing\"_, _\"##ed\"_ over English corpus.\n",
+    "\n",
+    "As you might think of, this kind of sub-tokens construction leveraging compositions of _\"pieces\"_ overall reduces the size\n",
+    "of the vocabulary you have to carry to train a Machine Learning model. On the other side, as one token might be exploded\n",
+    "into multiple subtokens, the input of your model might increase and become an issue on model with non-linear complexity over the input sequence's length. \n",
+    " \n",
+    "![subtokenization](https://nlp.fast.ai/images/multifit_vocabularies.png)\n",
+    " \n",
+    "Among all the tokenization algorithms, we can highlight a few subtokens algorithms used in Transformers-based SoTA models : \n",
+    "\n",
+    "- [Byte Pair Encoding (BPE) - Neural Machine Translation of Rare Words with Subword Units (Sennrich et al., 2015)](https://arxiv.org/abs/1508.07909)\n",
+    "- [Word Piece - Japanese and Korean voice search (Schuster, M., and Nakajima, K., 2015)](https://research.google/pubs/pub37842/)\n",
+    "- [Unigram Language Model - Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates (Kudo, T., 2018)](https://arxiv.org/abs/1804.10959)\n",
+    "- [Sentence Piece - A simple and language independent subword tokenizer and detokenizer for Neural Text Processing (Taku Kudo and John Richardson, 2018)](https://arxiv.org/abs/1808.06226)\n",
+    "\n",
+    "Going through all of them is out of the scope of this notebook, so we will just highlight how you can use them.\n",
+    "\n",
+    "### @huggingface/tokenizers library \n",
+    "Along with the transformers library, we @huggingface provide a blazing fast tokenization library\n",
+    "able to train, tokenize and decode dozens of Gb/s of text on a common multi-core machine.\n",
+    "\n",
+    "The library is written in Rust allowing us to take full advantage of multi-core parallel computations in a native and memory-aware way, on-top of which \n",
+    "we provide bindings for Python and NodeJS (more bindings may be added in the future). \n",
+    "\n",
+    "We designed the library so that it provides all the required blocks to create end-to-end tokenizers in an interchangeable way. In that sense, we provide\n",
+    "these various components: \n",
+    "\n",
+    "- **Normalizer**: Executes all the initial transformations over the initial input string. For example when you need to\n",
+    "lowercase some text, maybe strip it, or even apply one of the common unicode normalization process, you will add a Normalizer. \n",
+    "- **PreTokenizer**: In charge of splitting the initial input string. That's the component that decides where and how to\n",
+    "pre-segment the origin string. The simplest example would be like we saw before, to simply split on spaces.\n",
+    "- **Model**: Handles all the sub-token discovery and generation, this part is trainable and really dependant\n",
+    " of your input data.\n",
+    "- **Post-Processor**: Provides advanced construction features to be compatible with some of the Transformers-based SoTA\n",
+    "models. For instance, for BERT it would wrap the tokenized sentence around [CLS] and [SEP] tokens.\n",
+    "- **Decoder**: In charge of mapping back a tokenized input to the original string. The decoder is usually chosen according\n",
+    "to the `PreTokenizer` we used previously.\n",
+    "- **Trainer**: Provides training capabilities to each model.\n",
+    "\n",
+    "For each of the components above we provide multiple implementations:\n",
+    "\n",
+    "- **Normalizer**: Lowercase, Unicode (NFD, NFKD, NFC, NFKC), Bert, Strip, ...\n",
+    "- **PreTokenizer**: ByteLevel, WhitespaceSplit, CharDelimiterSplit, Metaspace, ...\n",
+    "- **Model**: WordLevel, BPE, WordPiece\n",
+    "- **Post-Processor**: BertProcessor, ...\n",
+    "- **Decoder**: WordLevel, BPE, WordPiece, ...\n",
+    "\n",
+    "All of these building blocks can be combined to create working tokenization pipelines. \n",
+    "In the next section we will go over our first pipeline."
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%% md\n",
+     "is_executing": false
+    }
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "Alright, now we are ready to implement our first tokenization pipeline through `tokenizers`. \n",
+    "\n",
+    "For this, we will train a Byte-Pair Encoding (BPE) tokenizer on a quite small input for the purpose of this notebook.\n",
+    "We will work with [the file from peter Norving](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwjYp9Ppru_nAhUBzIUKHfbUAG8QFjAAegQIBhAB&url=https%3A%2F%2Fnorvig.com%2Fbig.txt&usg=AOvVaw2ed9iwhcP1RKUiEROs15Dz).\n",
+    "This file contains around 130.000 lines of raw text that will be processed by the library to generate a working tokenizer."
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "outputs": [],
+   "source": [
+    "BIG_FILE_URL = 'https://raw.githubusercontent.com/dscape/spell/master/test/resources/big.txt'\n",
+    "\n",
+    "# Let's download the file and save it somewhere\n",
+    "from requests import get\n",
+    "with open('big.txt', 'wb') as big_f:\n",
+    "    response = get(BIG_FILE_URL, )\n",
+    "    \n",
+    "    if response.status_code == 200:\n",
+    "        big_f.write(response.content)\n",
+    "    else:\n",
+    "        print(\"Unable to get the file: {}\".format(response.reason))\n"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%% code\n",
+     "is_executing": false
+    }
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    " \n",
+    "Now that we have our training data we need to create the overall pipeline for the tokenizer\n",
+    " "
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%% md\n",
+     "is_executing": false
+    }
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "outputs": [],
+   "source": [
+    "# For the user's convenience `tokenizers` provides some very high-level classes encapsulating\n",
+    "# the overall pipeline for various well-known tokenization algorithm. \n",
+    "# Everything described below can be replaced by the ByteLevelBPETokenizer class. \n",
+    "\n",
+    "from tokenizers import Tokenizer\n",
+    "from tokenizers.decoders import ByteLevel as ByteLevelDecoder\n",
+    "from tokenizers.models import BPE\n",
+    "from tokenizers.normalizers import Lowercase, NFKC, Sequence\n",
+    "from tokenizers.pre_tokenizers import ByteLevel\n",
+    "\n",
+    "# First we create an empty Byte-Pair Encoding model (i.e. not trained model)\n",
+    "tokenizer = Tokenizer(BPE.empty())\n",
+    "\n",
+    "# Then we enable lower-casing and unicode-normalization\n",
+    "# The Sequence normalizer allows us to combine multiple Normalizer, that will be\n",
+    "# executed in sequence.\n",
+    "tokenizer.normalizer = Sequence([\n",
+    "    NFKC(),\n",
+    "    Lowercase()\n",
+    "])\n",
+    "\n",
+    "# Out tokenizer also needs a pre-tokenizer responsible for converting the input to a ByteLevel representation.\n",
+    "tokenizer.pre_tokenizer = ByteLevel()\n",
+    "\n",
+    "# And finally, let's plug a decoder so we can recover from a tokenized input to the original one\n",
+    "tokenizer.decoder = ByteLevelDecoder()"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%% code\n",
+     "is_executing": false
+    }
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "The overall pipeline is now ready to be trained on the corpus we downloaded earlier in this notebook."
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "outputs": [
+    {
+     "name": "stdout",
+     "text": [
+      "Trained vocab size: 25000\n"
+     ],
+     "output_type": "stream"
+    }
+   ],
+   "source": [
+    "from tokenizers.trainers import BpeTrainer\n",
+    "\n",
+    "# We initialize our trainer, giving him the details about the vocabulary we want to generate\n",
+    "trainer = BpeTrainer(vocab_size=25000, show_progress=True, initial_alphabet=ByteLevel.alphabet())\n",
+    "tokenizer.train(trainer, [\"big.txt\"])\n",
+    "\n",
+    "print(\"Trained vocab size: {}\".format(tokenizer.get_vocab_size()))"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%% code\n",
+     "is_executing": false
+    }
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "Et voilà ! You trained your very first tokenizer from scratch using `tokenizers`. Of course, this \n",
+    "covers only the basics, and you may want to have a look at the `add_special_tokens` or `special_tokens` parameters\n",
+    "on the `Trainer` class, but the overall process should be very similar.\n",
+    "\n",
+    "We can save the content of the model to reuse it later."
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "outputs": [
+    {
+     "data": {
+      "text/plain": "['./vocab.json', './merges.txt']"
+     },
+     "metadata": {},
+     "output_type": "execute_result",
+     "execution_count": 12
+    }
+   ],
+   "source": [
+    "# You will see the generated files in the output.\n",
+    "tokenizer.model.save('.')"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%% code\n",
+     "is_executing": false
+    }
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "Now, let load the trained model and start using out newly trained tokenizer"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "outputs": [
+    {
+     "name": "stdout",
+     "text": [
+      "Encoded string: ['Ġthis', 'Ġis', 'Ġa', 'Ġsimple', 'Ġin', 'put', 'Ġto', 'Ġbe', 'Ġtoken', 'ized']\n",
+      "Decoded string:  this is a simple input to be tokenized\n"
+     ],
+     "output_type": "stream"
+    }
+   ],
+   "source": [
+    "# Let's tokenizer a simple input\n",
+    "tokenizer.model = BPE.from_files('vocab.json', 'merges.txt')\n",
+    "encoding = tokenizer.encode(\"This is a simple input to be tokenized\")\n",
+    "\n",
+    "print(\"Encoded string: {}\".format(encoding.tokens))\n",
+    "\n",
+    "decoded = tokenizer.decode(encoding.ids)\n",
+    "print(\"Decoded string: {}\".format(decoded))"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%% code\n",
+     "is_executing": false
+    }
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "The Encoding structure exposes multiple properties which are useful when working with transformers models\n",
+    "\n",
+    "- normalized_str: The input string after normalization (lower-casing, unicode, stripping, etc.)\n",
+    "- original_str: The input string as it was provided\n",
+    "- tokens: The generated tokens with their string representation\n",
+    "- input_ids: The generated tokens with their integer representation\n",
+    "- attention_mask: If your input has been padded by the tokenizer, then this would be a vector of 1 for any non padded token and 0 for padded ones.\n",
+    "- special_token_mask: If your input contains special tokens such as [CLS], [SEP], [MASK], [PAD], then this would be a vector with 1 in places where a special token has been added.\n",
+    "- type_ids: If your was made of multiple \"parts\" such as (question, context), then this would be a vector with for each token the segment it belongs to.\n",
+    "- overflowing: If your has been truncated into multiple subparts because of a length limit (for BERT for example the sequence length is limited to 512), this will contain all the remaining overflowing parts."
+   ],
+   "metadata": {
+    "collapsed": false,
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   }
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  },
+  "pycharm": {
+   "stem_cell": {
+    "cell_type": "raw",
+    "source": [],
+    "metadata": {
+     "collapsed": false
+    }
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/notebooks/02-transformers.ipynb b/notebooks/02-transformers.ipynb
new file mode 100644
index 00000000000000..fcd9db55cd9cf0
--- /dev/null
+++ b/notebooks/02-transformers.ipynb
@@ -0,0 +1,502 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true,
+    "pycharm": {
+     "is_executing": false,
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "## Introduction\n",
+    "The transformers library is an open-source, community-based repository to train, use and share models based on \n",
+    "the Transformer architecture [(Vaswani & al., 2017)](https://arxiv.org/abs/1706.03762) such as Bert [(Devlin & al., 2018)](https://arxiv.org/abs/1810.04805),\n",
+    "Roberta [(Liu & al., 2019)](https://arxiv.org/abs/1907.11692), GPT2 [(Radford & al., 2019)](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf),\n",
+    "XLNet [(Yang & al., 2019)](https://arxiv.org/abs/1906.08237), etc. \n",
+    "\n",
+    "Along with the models, the library contains multiple variations of each of them for a large variety of \n",
+    "downstream-tasks like **Named Entity Recognition (NER)**, **Sentiment Analysis**, \n",
+    "**Language Modeling**, **Question Answering** and so on.\n",
+    "\n",
+    "## Before Transformer\n",
+    "\n",
+    "Back to 2017, most of the people using Neural Networks when working on Natural Language Processing were relying on \n",
+    "sequential processing of the input through [Recurrent Neural Network (RNN)](https://en.wikipedia.org/wiki/Recurrent_neural_network).\n",
+    "\n",
+    "![rnn](http://colah.github.io/posts/2015-09-NN-Types-FP/img/RNN-general.png)   \n",
+    "\n",
+    "RNNs were performing well on large variety of tasks involving sequential dependency over the input sequence. \n",
+    "However, this sequentially-dependent process had issues modeling very long range dependencies and \n",
+    "was not well suited for the kind of hardware we're currently leveraging due to bad parallelization capabilities. \n",
+    "\n",
+    "Some extensions were provided by the academic community, such as Bidirectional RNN ([Schuster & Paliwal., 1997](https://www.researchgate.net/publication/3316656_Bidirectional_recurrent_neural_networks), [Graves & al., 2005](https://mediatum.ub.tum.de/doc/1290195/file.pdf)), \n",
+    "which can be seen as a concatenation of two sequential process, on going forward, the other one going backward over the sequence input.\n",
+    "\n",
+    "![birnn](https://miro.medium.com/max/764/1*6QnPUSv_t9BY9Fv8_aLb-Q.png)\n",
+    "\n",
+    "\n",
+    "And also, the Attention mechanism, which introduced a good improvement over \"raw\" RNNs by giving \n",
+    "a learned, weighted-importance to each element in the sequence, allowing the model to focus on important elements.\n",
+    "\n",
+    "![attention_rnn](https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uploads/2017/08/Example-of-Attention.png)  \n",
+    "\n",
+    "## Then comes the Transformer  \n",
+    "\n",
+    "The Transformers era originally started from the work of [(Vaswani & al., 2017)](https://arxiv.org/abs/1706.03762) who\n",
+    "demonstrated its superiority over [Recurrent Neural Network (RNN)](https://en.wikipedia.org/wiki/Recurrent_neural_network)\n",
+    "on translation tasks but it quickly extended to almost all the tasks RNNs were State-of-the-Art at that time.\n",
+    "\n",
+    "One advantage of Transformer over its RNN counterpart was its non sequential attention model. Remember, the RNNs had to\n",
+    "iterate over each element of the input sequence one-by-one and carry an \"updatable-state\" between each hop. With Transformer\n",
+    "the, the model is able to look at every position in the sequence, at the same time, in one operation.\n",
+    "\n",
+    "For a deep-dive into the Transformer architecture, [The Annotated Transformer](https://nlp.seas.harvard.edu/2018/04/03/attention.html#encoder-and-decoder-stacks) \n",
+    "will drive you along all the details of the paper.\n",
+    "\n",
+    "![transformer-encoder-decoder](https://nlp.seas.harvard.edu/images/the-annotated-transformer_14_0.png)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "## Getting started with transformers\n",
+    "\n",
+    "For the rest of this notebook, we will use a BERT model, as it's the most simple and there are plenty of content about it\n",
+    "over the internet, it will be easy to dig more over this architecture if you want to.\n",
+    "\n",
+    "The transformers library allows you to benefits from large, pretrained language models without requiring a huge and costly computational\n",
+    "infrastructure. Most of the State-of-the-Art models are provided directly by their author and made available in the library \n",
+    "in PyTorch and TensorFlow in a transparent and interchangeable way. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 74,
+   "metadata": {
+    "pycharm": {
+     "is_executing": false,
+     "name": "#%% code\n"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "<torch.autograd.grad_mode.set_grad_enabled at 0x1af62fd450>"
+      ]
+     },
+     "execution_count": 74,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "from transformers import AutoModel, AutoTokenizer, BertTokenizer\n",
+    "\n",
+    "torch.set_grad_enabled(False)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 75,
+   "metadata": {
+    "pycharm": {
+     "is_executing": false,
+     "name": "#%% code\n"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Store the model we want to use\n",
+    "MODEL_NAME = \"bert-base-cased\"\n",
+    "\n",
+    "# We need to create the model and tokenizer\n",
+    "model = AutoModel.from_pretrained(MODEL_NAME)\n",
+    "tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME) "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "With only the above two lines of code, you're ready to use a BERT pre-trained model. \n",
+    "The tokenizers will allow us to map a raw textual input to a sequence of integers representing our textual input\n",
+    "in a way the model can manipulate."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 76,
+   "metadata": {
+    "pycharm": {
+     "is_executing": false,
+     "name": "#%% code\n"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Tokens: ['[CLS]', 'This', 'is', 'an', 'input', 'example', '[SEP]']\n",
+      "Tokens id: [101, 1188, 1110, 1126, 7758, 1859, 102]\n",
+      "Tokens PyTorch: tensor([[ 101, 1188, 1110, 1126, 7758, 1859,  102]])\n",
+      "Tokenwise output: torch.Size([1, 7, 768]), Pooled output: torch.Size([1, 768])\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Tokens comes from a process that splits the input into sub-entities with interesting linguistic properties. \n",
+    "tokens = tokenizer.tokenize(\"This is an input example\")\n",
+    "print(\"Tokens: {}\".format(tokens))\n",
+    "\n",
+    "# This is not sufficient for the model, as it requires integers as input, \n",
+    "# not a problem, let's convert tokens to ids.\n",
+    "tokens_ids = tokenizer.convert_tokens_to_ids(tokens)\n",
+    "print(\"Tokens id: {}\".format(tokens_ids))\n",
+    "\n",
+    "# We need to convert to a Deep Learning framework specific format, let's use PyTorch for now.\n",
+    "tokens_pt = torch.tensor([tokens_ids])\n",
+    "print(\"Tokens PyTorch: {}\".format(tokens_pt))\n",
+    "\n",
+    "# Now we're ready to go through BERT with out input\n",
+    "outputs, pooled = model(tokens_pt)\n",
+    "print(\"Tokenwise output: {}, Pooled output: {}\".format(outputs.shape, pooled.shape))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "As you can see, BERT outputs two tensors:\n",
+    " - One with the generated representation for every token in the input `(1, NB_TOKENS, REPRESENTATION_SIZE)`\n",
+    " - One with an aggregated representation for the whole input `(1, REPRESENTATION_SIZE)`\n",
+    " \n",
+    "The first, token-based, representation can be leveraged if your task requires to keep the sequence representation and you\n",
+    "want to operate at a token-level. This is particularly useful for Named Entity Recognition and Question-Answering.\n",
+    "\n",
+    "The second, aggregated, representation is especially useful if you need to extract the overall context of the sequence and don't\n",
+    "require a fine-grained token-leven. This is the case for Sentiment-Analysis of the sequence or Information Retrieval."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "The code you saw in the previous section introduced all the steps required to do simple model invocation.\n",
+    "For more day-to-day usage, transformers provides you higher-level methods which will makes your NLP journey easier\n",
+    "Let's improve our previous example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 77,
+   "metadata": {
+    "pycharm": {
+     "is_executing": false,
+     "name": "#%% code\n"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "input_ids:\n",
+      "\ttensor([[ 101, 1188, 1110, 1126, 7758, 1859,  102]])\n",
+      "token_type_ids:\n",
+      "\ttensor([[0, 0, 0, 0, 0, 0, 0]])\n",
+      "attention_mask:\n",
+      "\ttensor([[1, 1, 1, 1, 1, 1, 1]])\n",
+      "Difference with previous code: (0.0, 0.0)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# tokens = tokenizer.tokenize(\"This is an input example\")\n",
+    "# tokens_ids = tokenizer.convert_tokens_to_ids(tokens)\n",
+    "# tokens_pt = torch.tensor([tokens_ids])\n",
+    "\n",
+    "# This code can be factored into one-line as follow\n",
+    "tokens_pt2 = tokenizer.encode_plus(\"This is an input example\", return_tensors=\"pt\")\n",
+    "\n",
+    "for key, value in tokens_pt2.items():\n",
+    "    print(\"{}:\\n\\t{}\".format(key, value))\n",
+    "\n",
+    "outputs2, pooled2 = model(**tokens_pt2)\n",
+    "print(\"Difference with previous code: ({}, {})\".format((outputs2 - outputs).sum(), (pooled2 - pooled).sum()))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As you can see above, the methode `encode_plus` provides a convenient way to generate all the required parameters\n",
+    "that will go through the model. \n",
+    "\n",
+    "Moreover, you might have noticed it generated some additional tensors: \n",
+    "\n",
+    "- token_type_ids: This tensor will map every tokens to their corresponding segment (see below).\n",
+    "- attention_mask: This tensor is used to \"mask\" padded values in a batch of sequence with different lengths (see below)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 78,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Single segment token (str): ['[CLS]', 'This', 'is', 'a', 'sample', 'input', '[SEP]']\n",
+      "Single segment token (int): [101, 1188, 1110, 170, 6876, 7758, 102]\n",
+      "Single segment type       : [0, 0, 0, 0, 0, 0, 0]\n",
+      "\n",
+      "Multi segment token (str): ['[CLS]', 'This', 'is', 'segment', 'A', '[SEP]', 'This', 'is', 'segment', 'B', '[SEP]']\n",
+      "Multi segment token (int): [101, 1188, 1110, 6441, 138, 102, 1188, 1110, 6441, 139, 102]\n",
+      "Multi segment type       : [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Single segment input\n",
+    "single_seg_input = tokenizer.encode_plus(\"This is a sample input\")\n",
+    "\n",
+    "# Multiple segment input\n",
+    "multi_seg_input = tokenizer.encode_plus(\"This is segment A\", \"This is segment B\")\n",
+    "\n",
+    "print(\"Single segment token (str): {}\".format(tokenizer.convert_ids_to_tokens(single_seg_input['input_ids'])))\n",
+    "print(\"Single segment token (int): {}\".format(single_seg_input['input_ids']))\n",
+    "print(\"Single segment type       : {}\".format(single_seg_input['token_type_ids']))\n",
+    "\n",
+    "# Segments are concatened in the input to the model, with \n",
+    "print()\n",
+    "print(\"Multi segment token (str): {}\".format(tokenizer.convert_ids_to_tokens(multi_seg_input['input_ids'])))\n",
+    "print(\"Multi segment token (int): {}\".format(multi_seg_input['input_ids']))\n",
+    "print(\"Multi segment type       : {}\".format(multi_seg_input['token_type_ids']))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 79,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Tokens (int)      : [101, 1188, 1110, 170, 6876, 102, 0, 0]\n",
+      "Tokens (str)      : ['[CLS]', 'This', 'is', 'a', 'sample', '[SEP]', '[PAD]', '[PAD]']\n",
+      "Tokens (attn_mask): [1, 1, 1, 1, 1, 1, 0, 0]\n",
+      "\n",
+      "Tokens (int)      : [101, 1188, 1110, 1330, 2039, 6876, 3087, 102]\n",
+      "Tokens (str)      : ['[CLS]', 'This', 'is', 'another', 'longer', 'sample', 'text', '[SEP]']\n",
+      "Tokens (attn_mask): [1, 1, 1, 1, 1, 1, 1, 1]\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Padding highlight\n",
+    "tokens = tokenizer.batch_encode_plus(\n",
+    "    [\"This is a sample\", \"This is another longer sample text\"], \n",
+    "    pad_to_max_length=True  # First sentence will have some PADDED tokens to match second sequence length\n",
+    ")\n",
+    "\n",
+    "for i in range(2):\n",
+    "    print(\"Tokens (int)      : {}\".format(tokens['input_ids'][i]))\n",
+    "    print(\"Tokens (str)      : {}\".format([tokenizer.convert_ids_to_tokens(s) for s in tokens['input_ids'][i]]))\n",
+    "    print(\"Tokens (attn_mask): {}\".format(tokens['attention_mask'][i]))\n",
+    "    print()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Frameworks interoperability\n",
+    "\n",
+    "One of the most powerfull feature of transformers is its ability to seamlessly move from PyTorch to Tensorflow\n",
+    "without pain for the user.\n",
+    "\n",
+    "We provide some convenient methods to load TensorFlow pretrained weight insinde a PyTorch model and opposite."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 80,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from transformers import TFBertModel, BertModel\n",
+    "\n",
+    "# Let's load a BERT model for TensorFlow and PyTorch\n",
+    "model_tf = TFBertModel.from_pretrained('bert-base-cased')\n",
+    "model_pt = BertModel.from_pretrained('bert-base-cased')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 81,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "output differences: 2.971128560602665e-05\n",
+      "pooled differences: -8.576549589633942e-06\n"
+     ]
+    }
+   ],
+   "source": [
+    "# transformers generates a ready to use dictionary with all the required parameters for the specific framework.\n",
+    "input_tf = tokenizer.encode_plus(\"This is a sample input\", return_tensors=\"tf\")\n",
+    "input_pt = tokenizer.encode_plus(\"This is a sample input\", return_tensors=\"pt\")\n",
+    "\n",
+    "# Let's compare the outputs\n",
+    "output_tf, output_pt = model_tf(input_tf), model_pt(**input_pt)\n",
+    "\n",
+    "# Models outputs 2 values (The value for each tokens, the pooled representation of the input sentence)\n",
+    "# Here we compare the output differences between PyTorch and TensorFlow.\n",
+    "for name, o_tf, o_pt in zip([\"output\", \"pooled\"], output_tf, output_pt):\n",
+    "    print(\"{} differences: {}\".format(name, (o_tf.numpy() - o_pt.numpy()).sum()))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "## Want it lighter? Faster? Let's talk distillation! \n",
+    "\n",
+    "One of the main concerns when using these Transformer based models is the computational power they require. All over this notebook we are using BERT model as it can be run on common machines but that's not the case for all of the models.\n",
+    "\n",
+    "For example, Google released a few months ago **T5** an Encoder/Decoder architecture based on Transformer and available in `transformers` with no more than 11 billions parameters. Microsoft also recently entered the game with **Turing-NLG** using 17 billions parameters. This kind of model requires tens of gigabytes to store the weights and a tremendous compute infrastructure to run such models which makes it impracticable for the common man !\n",
+    "\n",
+    "![transformers-parameters](https://lh5.googleusercontent.com/NRdXzEcgZV3ooykjIaTm9uvbr9QnSjDQHHAHb2kk_Lm9lIF0AhS-PJdXGzpcBDztax922XAp386hyNmWZYsZC1lUN2r4Ip5p9v-PHO19-jevRGg4iQFxgv5Olq4DWaqSA_8ptep7)\n",
+    "\n",
+    "With the goal of making Transformer-based NLP accessible to everyone we @huggingface developed models that take advantage of a training process called **Distillation** which allows us to drastically reduce the resources needed to run such models with almost zero drop in performances.\n",
+    "\n",
+    "Going over the whole Distillation process is out of the scope of this notebook, but if you want more information on the subject you may refer to [this Medium article written by my colleague Victor SANH, author of DistilBERT paper](https://medium.com/huggingface/distilbert-8cf3380435b5), you might also want to directly have a look at the paper [(Sanh & al., 2019)](https://arxiv.org/abs/1910.01108)\n",
+    "\n",
+    "Of course, in `transformers` we have distilled some models and made them available directly in the library ! "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 82,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 57.1 ms, sys: 2.44 ms, total: 59.5 ms\n",
+      "Wall time: 35.5 ms\n",
+      "CPU times: user 98.8 ms, sys: 725 µs, total: 99.5 ms\n",
+      "Wall time: 50 ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "from transformers import DistilBertModel\n",
+    "\n",
+    "bert_distil = DistilBertModel.from_pretrained('distilbert-base-cased')\n",
+    "input_pt = tokenizer.encode_plus(\n",
+    "    'This is a sample input to demonstrate performance of distiled models especially inference time', \n",
+    "    return_tensors=\"pt\"\n",
+    ")\n",
+    "\n",
+    "\n",
+    "%time _ = bert_distil(input_pt['input_ids'])\n",
+    "%time _ = model_pt(input_pt['input_ids'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Community provided models\n",
+    "\n",
+    "Last but not least, earlier in this notebook we introduced Hugging Face `transformers` as a repository for the NLP community to exchange pretrained models. We wanted to highlight this features and all the possibilities it offers for the end-user.\n",
+    "\n",
+    "To leverage community pretrained models, just provide the organisation name and name of the model to `from_pretrained` and it will do all the magic for you ! \n",
+    "\n",
+    "\n",
+    "We currently have more 50 models provided by the community and more are added every day, don't hesitate to give it a try !"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 83,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Let's load German BERT from the Bavarian State Library\n",
+    "de_bert = BertModel.from_pretrained(\"dbmdz/bert-base-german-cased\")\n",
+    "de_tokenizer = BertTokenizer.from_pretrained(\"dbmdz/bert-base-german-cased\")\n",
+    "\n",
+    "de_input = de_tokenizer.encode_plus(\n",
+    "    \"Hugging Face ist einen französische Firma Mitarbeitern in New-York.\",\n",
+    "    return_tensors=\"pt\"\n",
+    ")\n",
+    "output_de, pooled_de = de_bert(**de_input)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.6"
+  },
+  "pycharm": {
+   "stem_cell": {
+    "cell_type": "raw",
+    "source": [],
+    "metadata": {
+     "collapsed": false
+    }
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
\ No newline at end of file
diff --git a/notebooks/03-pipelines.ipynb b/notebooks/03-pipelines.ipynb
new file mode 100644
index 00000000000000..9a5b3f7c4f10be
--- /dev/null
+++ b/notebooks/03-pipelines.ipynb
@@ -0,0 +1,594 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "## How can I leverage State-of-the-Art Natural Language Models with only one line of code ?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "Newly introduced in transformers v2.3.0, **pipelines** provides a high-level, easy to use,\n",
+    "API for doing inference over a variety of downstream-tasks, including: \n",
+    "\n",
+    "- Sentence Classification (Sentiment Analysis): Indicate if the overall sentence is either positive or negative. _(Binary Classification task or Logitic Regression task)_\n",
+    "- Token Classification (Named Entity Recognition, Part-of-Speech tagging): For each sub-entities _(**tokens**)_ in the input, assign them a label _(Classification task)_.\n",
+    "- Question-Answering: Provided a tuple (question, context) the model should find the span of text in **content** answering the **question**.\n",
+    "- Mask-Filling: Suggests possible word(s) to fill the masked input with respect to the provided **context**.\n",
+    "- Feature Extraction: Maps the input to a higher, multi-dimensional space learned from the data.\n",
+    "\n",
+    "Pipelines encapsulate the overall process of every NLP process:\n",
+    " \n",
+    " 1. Tokenization: Split the initial input into multiple sub-entities with ... properties (i.e. tokens).\n",
+    " 2. Inference: Maps every tokens into a more meaningful representation. \n",
+    " 3. Decoding: Use the above representation to generate and/or extract the final output for the underlying task.\n",
+    "\n",
+    "The overall API is exposed to the end-user through the `pipeline()` method with the following \n",
+    "structure:\n",
+    "\n",
+    "```python\n",
+    "from transformers import pipeline\n",
+    "\n",
+    "# Using default model and tokenizer for the task\n",
+    "pipeline(\"<task-name>\")\n",
+    "\n",
+    "# Using a user-specified model\n",
+    "pipeline(\"<task-name>\", model=\"<model_name>\")\n",
+    "\n",
+    "# Using custom model/tokenizer as str\n",
+    "pipeline('<task-name>', model='<model name>', tokenizer='<tokenizer_name>')\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "metadata": {
+    "pycharm": {
+     "is_executing": false,
+     "name": "#%% code \n"
+    }
+   },
+   "outputs": [
+    {
+     "ename": "SyntaxError",
+     "evalue": "from __future__ imports must occur at the beginning of the file (<ipython-input-29-c3a037bd4c55>, line 5)",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;36m  File \u001b[0;32m\"<ipython-input-29-c3a037bd4c55>\"\u001b[0;36m, line \u001b[0;32m5\u001b[0m\n\u001b[0;31m    from transformers import pipeline\u001b[0m\n\u001b[0m           ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m from __future__ imports must occur at the beginning of the file\n"
+     ]
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "from __future__ import print_function\n",
+    "from ipywidgets import interact, interactive, fixed, interact_manual\n",
+    "import ipywidgets as widgets\n",
+    "from transformers import pipeline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "## 1. Sentence Classification - Sentiment Analysis"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "pycharm": {
+     "is_executing": false,
+     "name": "#%% code\n"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "6aeccfdf51994149bdd1f3d3533e380f",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[{'label': 'POSITIVE', 'score': 0.800251},\n",
+       " {'label': 'NEGATIVE', 'score': 1.2489903}]"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "nlp_sentence_classif = pipeline('sentiment-analysis')\n",
+    "nlp_sentence_classif(['Such a nice weather outside !', 'This movie was kind of boring.'])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "## 2. Token Classification - Named Entity Recognition"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {
+    "pycharm": {
+     "is_executing": false,
+     "name": "#%% code\n"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "b5549c53c27346a899af553c977f00bc",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[{'word': 'Hu', 'score': 0.9970937967300415, 'entity': 'I-ORG'},\n",
+       " {'word': '##gging', 'score': 0.9345750212669373, 'entity': 'I-ORG'},\n",
+       " {'word': 'Face', 'score': 0.9787060022354126, 'entity': 'I-ORG'},\n",
+       " {'word': 'French', 'score': 0.9981995820999146, 'entity': 'I-MISC'},\n",
+       " {'word': 'New', 'score': 0.9983047246932983, 'entity': 'I-LOC'},\n",
+       " {'word': '-', 'score': 0.8913455009460449, 'entity': 'I-LOC'},\n",
+       " {'word': 'York', 'score': 0.9979523420333862, 'entity': 'I-LOC'}]"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "nlp_token_class = pipeline('ner')\n",
+    "nlp_token_class('Hugging Face is a French company based in New-York.')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Question Answering"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {
+    "pycharm": {
+     "is_executing": false,
+     "name": "#%% code\n"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "6e56a8edcef44ec2ae838711ecd22d3a",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "convert squad examples to features: 100%|██████████| 1/1 [00:00<00:00, 53.05it/s]\n",
+      "add example index and unique id: 100%|██████████| 1/1 [00:00<00:00, 2673.23it/s]\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "{'score': 0.9632966867654424, 'start': 42, 'end': 50, 'answer': 'New-York.'}"
+      ]
+     },
+     "execution_count": 18,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "nlp_qa = pipeline('question-answering')\n",
+    "nlp_qa(context='Hugging Face is a French company based in New-York.', question='Where is based Hugging Face ?')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Text Generation - Mask Filling"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {
+    "pycharm": {
+     "is_executing": false,
+     "name": "#%% code\n"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "1930695ea2d24ca98c6d7c13842d377f",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "[{'sequence': '<s> Hugging Face is a French company based in Paris</s>',\n",
+       "  'score': 0.25288480520248413,\n",
+       "  'token': 2201},\n",
+       " {'sequence': '<s> Hugging Face is a French company based in Lyon</s>',\n",
+       "  'score': 0.07639515399932861,\n",
+       "  'token': 12790},\n",
+       " {'sequence': '<s> Hugging Face is a French company based in Brussels</s>',\n",
+       "  'score': 0.055500105023384094,\n",
+       "  'token': 6497},\n",
+       " {'sequence': '<s> Hugging Face is a French company based in Geneva</s>',\n",
+       "  'score': 0.04264815151691437,\n",
+       "  'token': 11559},\n",
+       " {'sequence': '<s> Hugging Face is a French company based in France</s>',\n",
+       "  'score': 0.03868963569402695,\n",
+       "  'token': 1470}]"
+      ]
+     },
+     "execution_count": 20,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "nlp_fill = pipeline('fill-mask')\n",
+    "nlp_fill('Hugging Face is a French company based in <mask>')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Projection - Features Extraction "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "metadata": {
+    "pycharm": {
+     "is_executing": false,
+     "name": "#%% code\n"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "92fa4d67290f49a3943dc0abd7529892",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=230.0, style=ProgressStyle(description_…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "(1, 12, 768)"
+      ]
+     },
+     "execution_count": 32,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "nlp_features = pipeline('feature-extraction')\n",
+    "output = nlp_features('Hugging Face is a French company based in Paris')\n",
+    "np.array(output).shape   # (Samples, Tokens, Vector Size)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "Alright ! Now you have a nice picture of what is possible through transformers' pipelines, and there is more\n",
+    "to come in future releases. \n",
+    "\n",
+    "In the meantime, you can try the different pipelines with your own inputs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "metadata": {
+    "pycharm": {
+     "is_executing": false,
+     "name": "#%% code\n"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "261ae9fa30e84d1d84a3b0d9682ac477",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Dropdown(description='Task:', index=1, options=('sentiment-analysis', 'ner', 'fill_mask'), value='ner')"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "ddc51b71c6eb40e5ab60998664e6a857",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Text(value='', description='Your input:', placeholder='Enter something')"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[{'word': 'Paris', 'score': 0.9991844296455383, 'entity': 'I-LOC'}]\n",
+      "[{'sequence': '<s> I\\'m from Paris.\"</s>', 'score': 0.224044069647789, 'token': 72}, {'sequence': \"<s> I'm from Paris.)</s>\", 'score': 0.16959427297115326, 'token': 1592}, {'sequence': \"<s> I'm from Paris.]</s>\", 'score': 0.10994981974363327, 'token': 21838}, {'sequence': '<s> I\\'m from Paris!\"</s>', 'score': 0.0706234946846962, 'token': 2901}, {'sequence': \"<s> I'm from Paris.</s>\", 'score': 0.0698278620839119, 'token': 4}]\n",
+      "[{'sequence': \"<s> I'm from Paris and London</s>\", 'score': 0.12238534539937973, 'token': 928}, {'sequence': \"<s> I'm from Paris and Brussels</s>\", 'score': 0.07107886672019958, 'token': 6497}, {'sequence': \"<s> I'm from Paris and Belgium</s>\", 'score': 0.040912602096796036, 'token': 7320}, {'sequence': \"<s> I'm from Paris and Berlin</s>\", 'score': 0.039884064346551895, 'token': 5459}, {'sequence': \"<s> I'm from Paris and Melbourne</s>\", 'score': 0.038133684545755386, 'token': 5703}]\n",
+      "[{'sequence': '<s> I like go to sleep</s>', 'score': 0.08942786604166031, 'token': 3581}, {'sequence': '<s> I like go to bed</s>', 'score': 0.07789064943790436, 'token': 3267}, {'sequence': '<s> I like go to concerts</s>', 'score': 0.06356740742921829, 'token': 12858}, {'sequence': '<s> I like go to school</s>', 'score': 0.03660670667886734, 'token': 334}, {'sequence': '<s> I like go to dinner</s>', 'score': 0.032155368477106094, 'token': 3630}]\n"
+     ]
+    }
+   ],
+   "source": [
+    "task = widgets.Dropdown(\n",
+    "    options=['sentiment-analysis', 'ner', 'fill_mask'],\n",
+    "    value='ner',\n",
+    "    description='Task:',\n",
+    "    disabled=False\n",
+    ")\n",
+    "\n",
+    "input = widgets.Text(\n",
+    "    value='',\n",
+    "    placeholder='Enter something',\n",
+    "    description='Your input:',\n",
+    "    disabled=False\n",
+    ")\n",
+    "\n",
+    "def forward(_):\n",
+    "    if len(input.value) > 0: \n",
+    "        if task.value == 'ner':\n",
+    "            output = nlp_token_class(input.value)\n",
+    "        elif task.value == 'sentiment-analysis':\n",
+    "            output = nlp_sentence_classif(input.value)\n",
+    "        else:\n",
+    "            if input.value.find('<mask>') == -1:\n",
+    "                output = nlp_fill(input.value + ' <mask>')\n",
+    "            else:\n",
+    "                output = nlp_fill(input.value)                \n",
+    "        print(output)\n",
+    "\n",
+    "input.on_submit(forward)\n",
+    "display(task, input)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 43,
+   "metadata": {
+    "pycharm": {
+     "is_executing": false,
+     "name": "#%% Question Answering\n"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "5ae68677bd8a41f990355aa43840d3f8",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Textarea(value='Einstein is famous for the general theory of relativity', description='Context:', placeholder=…"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "14bcfd9a2c5a47e6b1383989ab7632c8",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Text(value='Why is Einstein famous for ?', description='Question:', placeholder='Enter something')"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "convert squad examples to features: 100%|██████████| 1/1 [00:00<00:00, 168.83it/s]\n",
+      "add example index and unique id: 100%|██████████| 1/1 [00:00<00:00, 1919.59it/s]\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'score': 0.40340670623875496, 'start': 27, 'end': 54, 'answer': 'general theory of relativity'}\n"
+     ]
+    }
+   ],
+   "source": [
+    "context = widgets.Textarea(\n",
+    "    value='Einstein is famous for the general theory of relativity',\n",
+    "    placeholder='Enter something',\n",
+    "    description='Context:',\n",
+    "    disabled=False\n",
+    ")\n",
+    "\n",
+    "query = widgets.Text(\n",
+    "    value='Why is Einstein famous for ?',\n",
+    "    placeholder='Enter something',\n",
+    "    description='Question:',\n",
+    "    disabled=False\n",
+    ")\n",
+    "\n",
+    "def forward(_):\n",
+    "    if len(context.value) > 0 and len(query.value) > 0: \n",
+    "        output = nlp_qa(question=query.value, context=context.value)            \n",
+    "        print(output)\n",
+    "\n",
+    "query.on_submit(forward)\n",
+    "display(context, query)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.6"
+  },
+  "pycharm": {
+   "stem_cell": {
+    "cell_type": "raw",
+    "source": [],
+    "metadata": {
+     "collapsed": false
+    }
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
\ No newline at end of file
diff --git a/notebooks/Comparing-PT-and-TF-models.ipynb b/notebooks/Comparing-PT-and-TF-models.ipynb
deleted file mode 100644
index 321c2ebe30e215..00000000000000
--- a/notebooks/Comparing-PT-and-TF-models.ipynb
+++ /dev/null
@@ -1,1630 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Pytorch to Tensorflow Conversion Test Notebook\n",
-    "\n",
-    "To run this notebook follow these steps, modifying the **Config** section as necessary:\n",
-    "\n",
-    "1. Point `pt_model_dir` to your local directory containing the pytorch Bert model to be converted.\n",
-    "2. Point `tf_bert_dir` to your clone of Google's Bert implementation which can be found here: https://github.com/google-research/bert.\n",
-    "\n",
-    "Note: \n",
-    "1. This feature currently only supports the base BERT models (uncased/cased).\n",
-    "2. Tensorflow model will be dumped in `tf_model_dir`."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Config"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "import sys\n",
-    "\n",
-    "model_cls  = 'BertModel'\n",
-    "model_typ  = 'bert-base-uncased'\n",
-    "token_cls  = 'BertTokenizer'\n",
-    "max_seq    = 12\n",
-    "CLS        = \"[CLS]\"\n",
-    "SEP        = \"[SEP]\"\n",
-    "MASK       = \"[MASK]\"\n",
-    "CLS_IDX    = 0\n",
-    "layer_idxs = tuple(range(12))\n",
-    "input_text = \"jim henson was a puppeteer\"\n",
-    "\n",
-    "pt_model_dir = \"/home/ubuntu/.pytorch-pretrained-BERT-cache/{}\".format(model_typ)\n",
-    "tf_bert_dir  = \"/home/ubuntu/bert\"\n",
-    "\n",
-    "pt_vocab_file  = os.path.join(pt_model_dir, \"vocab.txt\")\n",
-    "pt_init_ckpt   = os.path.join(pt_model_dir, model_typ.replace(\"-\", \"_\") + \".bin\")\n",
-    "tf_model_dir   = os.path.join(pt_model_dir, 'tf')\n",
-    "tf_vocab_file  = os.path.join(tf_model_dir, \"vocab.txt\")\n",
-    "tf_init_ckpt   = os.path.join(tf_model_dir, model_typ.replace(\"-\", \"_\") + \".ckpt\")\n",
-    "tf_config_file = os.path.join(tf_model_dir, \"bert_config.json\")\n",
-    "\n",
-    "if not os.path.isdir(tf_model_dir): \n",
-    "    os.makedirs(tf_model_dir, exist_ok=True)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Tokenization"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def tokenize(text, tokenizer):\n",
-    "    text = text.strip().lower()\n",
-    "    tok_ids = tokenizer.tokenize(text)\n",
-    "    if len(tok_ids) > max_seq - 2:\n",
-    "        tok_ids = tok_ids[:max_seq - 2]\n",
-    "    tok_ids.insert(CLS_IDX, CLS)\n",
-    "    tok_ids.append(SEP)\n",
-    "    input_ids = tokenizer.convert_tokens_to_ids(tok_ids)\n",
-    "    mask_ids = [1] * len(input_ids)\n",
-    "    seg_ids = [0] * len(input_ids)\n",
-    "    padding = [0] * (max_seq - len(input_ids))\n",
-    "    input_ids += padding\n",
-    "    mask_ids += padding\n",
-    "    seg_ids += padding\n",
-    "    return input_ids, mask_ids, seg_ids"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Pytorch execution"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "100%|██████████| 231508/231508 [00:00<00:00, 41092464.26B/s]\n",
-      "100%|██████████| 407873900/407873900 [00:07<00:00, 58092479.52B/s]\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Pytorch embedding shape: (1, 768)\n"
-     ]
-    }
-   ],
-   "source": [
-    "import numpy as np\n",
-    "import torch\n",
-    "from pytorch_pretrained_bert import (BertConfig,\n",
-    "                                     BertModel, \n",
-    "                                     BertTokenizer, \n",
-    "                                     BertForSequenceClassification)\n",
-    "\n",
-    "# Save Vocab\n",
-    "pt_tokenizer = BertTokenizer.from_pretrained(\n",
-    "    pretrained_model_name_or_path=model_typ, \n",
-    "    cache_dir=pt_model_dir)\n",
-    "pt_tokenizer.save_vocabulary(pt_model_dir)\n",
-    "pt_tokenizer.save_vocabulary(tf_model_dir)\n",
-    "\n",
-    "# Save Model\n",
-    "pt_model = BertModel.from_pretrained(\n",
-    "    pretrained_model_name_or_path=model_typ, \n",
-    "    cache_dir=pt_model_dir).to('cpu')\n",
-    "pt_model.eval()\n",
-    "pt_model.config.hidden_dropout_prob = 0.0\n",
-    "pt_model.config.attention_probs_dropout_prob = 0.0\n",
-    "pt_model.config.to_json_file(tf_config_file)\n",
-    "torch.save(pt_model.state_dict(), pt_init_ckpt)\n",
-    "\n",
-    "# Inputs\n",
-    "input_ids_pt, mask_ids_pt, seg_ids_pt = tokenize(input_text, pt_tokenizer)\n",
-    "\n",
-    "# PT Embedding\n",
-    "tok_tensor = torch.tensor(input_ids_pt).to('cpu').unsqueeze(0)\n",
-    "seg_tensor = torch.tensor(seg_ids_pt).to('cpu').unsqueeze(0)\n",
-    "msk_tensor = torch.tensor(mask_ids_pt).to('cpu').unsqueeze(0)\n",
-    "attn_blks, nsp_logits = pt_model(tok_tensor, seg_tensor, msk_tensor)\n",
-    "pt_embedding = nsp_logits.detach().numpy() \n",
-    "print(\"Pytorch embedding shape: {}\".format(pt_embedding.shape))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Pytorch &rarr; Tensorflow conversion"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/nlp/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\n",
-      "Instructions for updating:\n",
-      "Colocations handled automatically by placer.\n",
-      "bert/embeddings/word_embeddings                             initialized\n",
-      "bert/embeddings/position_embeddings                         initialized\n",
-      "bert/embeddings/token_type_embeddings                       initialized\n",
-      "bert/embeddings/LayerNorm/gamma                             initialized\n",
-      "bert/embeddings/LayerNorm/beta                              initialized\n",
-      "bert/encoder/layer_0/attention/self/query/kernel            initialized\n",
-      "bert/encoder/layer_0/attention/self/query/bias              initialized\n",
-      "bert/encoder/layer_0/attention/self/key/kernel              initialized\n",
-      "bert/encoder/layer_0/attention/self/key/bias                initialized\n",
-      "bert/encoder/layer_0/attention/self/value/kernel            initialized\n",
-      "bert/encoder/layer_0/attention/self/value/bias              initialized\n",
-      "bert/encoder/layer_0/attention/output/dense/kernel          initialized\n",
-      "bert/encoder/layer_0/attention/output/dense/bias            initialized\n",
-      "bert/encoder/layer_0/attention/output/LayerNorm/gamma       initialized\n",
-      "bert/encoder/layer_0/attention/output/LayerNorm/beta        initialized\n",
-      "bert/encoder/layer_0/intermediate/dense/kernel              initialized\n",
-      "bert/encoder/layer_0/intermediate/dense/bias                initialized\n",
-      "bert/encoder/layer_0/output/dense/kernel                    initialized\n",
-      "bert/encoder/layer_0/output/dense/bias                      initialized\n",
-      "bert/encoder/layer_0/output/LayerNorm/gamma                 initialized\n",
-      "bert/encoder/layer_0/output/LayerNorm/beta                  initialized\n",
-      "bert/encoder/layer_1/attention/self/query/kernel            initialized\n",
-      "bert/encoder/layer_1/attention/self/query/bias              initialized\n",
-      "bert/encoder/layer_1/attention/self/key/kernel              initialized\n",
-      "bert/encoder/layer_1/attention/self/key/bias                initialized\n",
-      "bert/encoder/layer_1/attention/self/value/kernel            initialized\n",
-      "bert/encoder/layer_1/attention/self/value/bias              initialized\n",
-      "bert/encoder/layer_1/attention/output/dense/kernel          initialized\n",
-      "bert/encoder/layer_1/attention/output/dense/bias            initialized\n",
-      "bert/encoder/layer_1/attention/output/LayerNorm/gamma       initialized\n",
-      "bert/encoder/layer_1/attention/output/LayerNorm/beta        initialized\n",
-      "bert/encoder/layer_1/intermediate/dense/kernel              initialized\n",
-      "bert/encoder/layer_1/intermediate/dense/bias                initialized\n",
-      "bert/encoder/layer_1/output/dense/kernel                    initialized\n",
-      "bert/encoder/layer_1/output/dense/bias                      initialized\n",
-      "bert/encoder/layer_1/output/LayerNorm/gamma                 initialized\n",
-      "bert/encoder/layer_1/output/LayerNorm/beta                  initialized\n",
-      "bert/encoder/layer_2/attention/self/query/kernel            initialized\n",
-      "bert/encoder/layer_2/attention/self/query/bias              initialized\n",
-      "bert/encoder/layer_2/attention/self/key/kernel              initialized\n",
-      "bert/encoder/layer_2/attention/self/key/bias                initialized\n",
-      "bert/encoder/layer_2/attention/self/value/kernel            initialized\n",
-      "bert/encoder/layer_2/attention/self/value/bias              initialized\n",
-      "bert/encoder/layer_2/attention/output/dense/kernel          initialized\n",
-      "bert/encoder/layer_2/attention/output/dense/bias            initialized\n",
-      "bert/encoder/layer_2/attention/output/LayerNorm/gamma       initialized\n",
-      "bert/encoder/layer_2/attention/output/LayerNorm/beta        initialized\n",
-      "bert/encoder/layer_2/intermediate/dense/kernel              initialized\n",
-      "bert/encoder/layer_2/intermediate/dense/bias                initialized\n",
-      "bert/encoder/layer_2/output/dense/kernel                    initialized\n",
-      "bert/encoder/layer_2/output/dense/bias                      initialized\n",
-      "bert/encoder/layer_2/output/LayerNorm/gamma                 initialized\n",
-      "bert/encoder/layer_2/output/LayerNorm/beta                  initialized\n",
-      "bert/encoder/layer_3/attention/self/query/kernel            initialized\n",
-      "bert/encoder/layer_3/attention/self/query/bias              initialized\n",
-      "bert/encoder/layer_3/attention/self/key/kernel              initialized\n",
-      "bert/encoder/layer_3/attention/self/key/bias                initialized\n",
-      "bert/encoder/layer_3/attention/self/value/kernel            initialized\n",
-      "bert/encoder/layer_3/attention/self/value/bias              initialized\n",
-      "bert/encoder/layer_3/attention/output/dense/kernel          initialized\n",
-      "bert/encoder/layer_3/attention/output/dense/bias            initialized\n",
-      "bert/encoder/layer_3/attention/output/LayerNorm/gamma       initialized\n",
-      "bert/encoder/layer_3/attention/output/LayerNorm/beta        initialized\n",
-      "bert/encoder/layer_3/intermediate/dense/kernel              initialized\n",
-      "bert/encoder/layer_3/intermediate/dense/bias                initialized\n",
-      "bert/encoder/layer_3/output/dense/kernel                    initialized\n",
-      "bert/encoder/layer_3/output/dense/bias                      initialized\n",
-      "bert/encoder/layer_3/output/LayerNorm/gamma                 initialized\n",
-      "bert/encoder/layer_3/output/LayerNorm/beta                  initialized\n",
-      "bert/encoder/layer_4/attention/self/query/kernel            initialized\n",
-      "bert/encoder/layer_4/attention/self/query/bias              initialized\n",
-      "bert/encoder/layer_4/attention/self/key/kernel              initialized\n",
-      "bert/encoder/layer_4/attention/self/key/bias                initialized\n",
-      "bert/encoder/layer_4/attention/self/value/kernel            initialized\n",
-      "bert/encoder/layer_4/attention/self/value/bias              initialized\n",
-      "bert/encoder/layer_4/attention/output/dense/kernel          initialized\n",
-      "bert/encoder/layer_4/attention/output/dense/bias            initialized\n",
-      "bert/encoder/layer_4/attention/output/LayerNorm/gamma       initialized\n",
-      "bert/encoder/layer_4/attention/output/LayerNorm/beta        initialized\n",
-      "bert/encoder/layer_4/intermediate/dense/kernel              initialized\n",
-      "bert/encoder/layer_4/intermediate/dense/bias                initialized\n",
-      "bert/encoder/layer_4/output/dense/kernel                    initialized\n",
-      "bert/encoder/layer_4/output/dense/bias                      initialized\n",
-      "bert/encoder/layer_4/output/LayerNorm/gamma                 initialized\n",
-      "bert/encoder/layer_4/output/LayerNorm/beta                  initialized\n",
-      "bert/encoder/layer_5/attention/self/query/kernel            initialized\n",
-      "bert/encoder/layer_5/attention/self/query/bias              initialized\n",
-      "bert/encoder/layer_5/attention/self/key/kernel              initialized\n",
-      "bert/encoder/layer_5/attention/self/key/bias                initialized\n",
-      "bert/encoder/layer_5/attention/self/value/kernel            initialized\n",
-      "bert/encoder/layer_5/attention/self/value/bias              initialized\n",
-      "bert/encoder/layer_5/attention/output/dense/kernel          initialized\n",
-      "bert/encoder/layer_5/attention/output/dense/bias            initialized\n",
-      "bert/encoder/layer_5/attention/output/LayerNorm/gamma       initialized\n",
-      "bert/encoder/layer_5/attention/output/LayerNorm/beta        initialized\n",
-      "bert/encoder/layer_5/intermediate/dense/kernel              initialized\n",
-      "bert/encoder/layer_5/intermediate/dense/bias                initialized\n",
-      "bert/encoder/layer_5/output/dense/kernel                    initialized\n",
-      "bert/encoder/layer_5/output/dense/bias                      initialized\n",
-      "bert/encoder/layer_5/output/LayerNorm/gamma                 initialized\n",
-      "bert/encoder/layer_5/output/LayerNorm/beta                  initialized\n",
-      "bert/encoder/layer_6/attention/self/query/kernel            initialized\n",
-      "bert/encoder/layer_6/attention/self/query/bias              initialized\n",
-      "bert/encoder/layer_6/attention/self/key/kernel              initialized\n",
-      "bert/encoder/layer_6/attention/self/key/bias                initialized\n",
-      "bert/encoder/layer_6/attention/self/value/kernel            initialized\n",
-      "bert/encoder/layer_6/attention/self/value/bias              initialized\n",
-      "bert/encoder/layer_6/attention/output/dense/kernel          initialized\n",
-      "bert/encoder/layer_6/attention/output/dense/bias            initialized\n",
-      "bert/encoder/layer_6/attention/output/LayerNorm/gamma       initialized\n",
-      "bert/encoder/layer_6/attention/output/LayerNorm/beta        initialized\n",
-      "bert/encoder/layer_6/intermediate/dense/kernel              initialized\n",
-      "bert/encoder/layer_6/intermediate/dense/bias                initialized\n",
-      "bert/encoder/layer_6/output/dense/kernel                    initialized\n",
-      "bert/encoder/layer_6/output/dense/bias                      initialized\n",
-      "bert/encoder/layer_6/output/LayerNorm/gamma                 initialized\n",
-      "bert/encoder/layer_6/output/LayerNorm/beta                  initialized\n",
-      "bert/encoder/layer_7/attention/self/query/kernel            initialized\n",
-      "bert/encoder/layer_7/attention/self/query/bias              initialized\n",
-      "bert/encoder/layer_7/attention/self/key/kernel              initialized\n",
-      "bert/encoder/layer_7/attention/self/key/bias                initialized\n",
-      "bert/encoder/layer_7/attention/self/value/kernel            initialized\n",
-      "bert/encoder/layer_7/attention/self/value/bias              initialized\n",
-      "bert/encoder/layer_7/attention/output/dense/kernel          initialized\n",
-      "bert/encoder/layer_7/attention/output/dense/bias            initialized\n",
-      "bert/encoder/layer_7/attention/output/LayerNorm/gamma       initialized\n",
-      "bert/encoder/layer_7/attention/output/LayerNorm/beta        initialized\n",
-      "bert/encoder/layer_7/intermediate/dense/kernel              initialized\n",
-      "bert/encoder/layer_7/intermediate/dense/bias                initialized\n",
-      "bert/encoder/layer_7/output/dense/kernel                    initialized\n",
-      "bert/encoder/layer_7/output/dense/bias                      initialized\n",
-      "bert/encoder/layer_7/output/LayerNorm/gamma                 initialized\n",
-      "bert/encoder/layer_7/output/LayerNorm/beta                  initialized\n",
-      "bert/encoder/layer_8/attention/self/query/kernel            initialized\n",
-      "bert/encoder/layer_8/attention/self/query/bias              initialized\n",
-      "bert/encoder/layer_8/attention/self/key/kernel              initialized\n",
-      "bert/encoder/layer_8/attention/self/key/bias                initialized\n",
-      "bert/encoder/layer_8/attention/self/value/kernel            initialized\n",
-      "bert/encoder/layer_8/attention/self/value/bias              initialized\n",
-      "bert/encoder/layer_8/attention/output/dense/kernel          initialized\n",
-      "bert/encoder/layer_8/attention/output/dense/bias            initialized\n",
-      "bert/encoder/layer_8/attention/output/LayerNorm/gamma       initialized\n",
-      "bert/encoder/layer_8/attention/output/LayerNorm/beta        initialized\n",
-      "bert/encoder/layer_8/intermediate/dense/kernel              initialized\n",
-      "bert/encoder/layer_8/intermediate/dense/bias                initialized\n",
-      "bert/encoder/layer_8/output/dense/kernel                    initialized\n",
-      "bert/encoder/layer_8/output/dense/bias                      initialized\n",
-      "bert/encoder/layer_8/output/LayerNorm/gamma                 initialized\n",
-      "bert/encoder/layer_8/output/LayerNorm/beta                  initialized\n",
-      "bert/encoder/layer_9/attention/self/query/kernel            initialized\n",
-      "bert/encoder/layer_9/attention/self/query/bias              initialized\n",
-      "bert/encoder/layer_9/attention/self/key/kernel              initialized\n",
-      "bert/encoder/layer_9/attention/self/key/bias                initialized\n",
-      "bert/encoder/layer_9/attention/self/value/kernel            initialized\n",
-      "bert/encoder/layer_9/attention/self/value/bias              initialized\n",
-      "bert/encoder/layer_9/attention/output/dense/kernel          initialized\n",
-      "bert/encoder/layer_9/attention/output/dense/bias            initialized\n",
-      "bert/encoder/layer_9/attention/output/LayerNorm/gamma       initialized\n",
-      "bert/encoder/layer_9/attention/output/LayerNorm/beta        initialized\n",
-      "bert/encoder/layer_9/intermediate/dense/kernel              initialized\n",
-      "bert/encoder/layer_9/intermediate/dense/bias                initialized\n",
-      "bert/encoder/layer_9/output/dense/kernel                    initialized\n",
-      "bert/encoder/layer_9/output/dense/bias                      initialized\n",
-      "bert/encoder/layer_9/output/LayerNorm/gamma                 initialized\n",
-      "bert/encoder/layer_9/output/LayerNorm/beta                  initialized\n",
-      "bert/encoder/layer_10/attention/self/query/kernel           initialized\n",
-      "bert/encoder/layer_10/attention/self/query/bias             initialized\n",
-      "bert/encoder/layer_10/attention/self/key/kernel             initialized\n",
-      "bert/encoder/layer_10/attention/self/key/bias               initialized\n",
-      "bert/encoder/layer_10/attention/self/value/kernel           initialized\n",
-      "bert/encoder/layer_10/attention/self/value/bias             initialized\n",
-      "bert/encoder/layer_10/attention/output/dense/kernel         initialized\n",
-      "bert/encoder/layer_10/attention/output/dense/bias           initialized\n",
-      "bert/encoder/layer_10/attention/output/LayerNorm/gamma      initialized\n",
-      "bert/encoder/layer_10/attention/output/LayerNorm/beta       initialized\n",
-      "bert/encoder/layer_10/intermediate/dense/kernel             initialized\n",
-      "bert/encoder/layer_10/intermediate/dense/bias               initialized\n",
-      "bert/encoder/layer_10/output/dense/kernel                   initialized\n",
-      "bert/encoder/layer_10/output/dense/bias                     initialized\n",
-      "bert/encoder/layer_10/output/LayerNorm/gamma                initialized\n",
-      "bert/encoder/layer_10/output/LayerNorm/beta                 initialized\n",
-      "bert/encoder/layer_11/attention/self/query/kernel           initialized\n",
-      "bert/encoder/layer_11/attention/self/query/bias             initialized\n",
-      "bert/encoder/layer_11/attention/self/key/kernel             initialized\n",
-      "bert/encoder/layer_11/attention/self/key/bias               initialized\n",
-      "bert/encoder/layer_11/attention/self/value/kernel           initialized\n",
-      "bert/encoder/layer_11/attention/self/value/bias             initialized\n",
-      "bert/encoder/layer_11/attention/output/dense/kernel         initialized\n",
-      "bert/encoder/layer_11/attention/output/dense/bias           initialized\n",
-      "bert/encoder/layer_11/attention/output/LayerNorm/gamma      initialized\n",
-      "bert/encoder/layer_11/attention/output/LayerNorm/beta       initialized\n",
-      "bert/encoder/layer_11/intermediate/dense/kernel             initialized\n",
-      "bert/encoder/layer_11/intermediate/dense/bias               initialized\n",
-      "bert/encoder/layer_11/output/dense/kernel                   initialized\n",
-      "bert/encoder/layer_11/output/dense/bias                     initialized\n",
-      "bert/encoder/layer_11/output/LayerNorm/gamma                initialized\n",
-      "bert/encoder/layer_11/output/LayerNorm/beta                 initialized\n",
-      "bert/pooler/dense/kernel                                    initialized\n",
-      "bert/pooler/dense/bias                                      initialized\n"
-     ]
-    }
-   ],
-   "source": [
-    "from pytorch_pretrained_bert.convert_pytorch_checkpoint_to_tf import main\n",
-    "\n",
-    "main([\n",
-    "    '--model_name', model_typ, \n",
-    "    '--pytorch_model_path', pt_init_ckpt,\n",
-    "    '--tf_cache_dir', tf_model_dir,\n",
-    "    '--cache_dir', pt_model_dir\n",
-    "])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Tensorflow execution"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "\n",
-      "WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.\n",
-      "For more information, please see:\n",
-      "  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n",
-      "  * https://github.com/tensorflow/addons\n",
-      "If you depend on functionality not listed there, please file an issue.\n",
-      "\n",
-      "WARNING:tensorflow:From /home/ubuntu/bert/modeling.py:671: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\n",
-      "Instructions for updating:\n",
-      "Use keras.layers.dense instead.\n",
-      "WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/nlp/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.\n",
-      "Instructions for updating:\n",
-      "Use standard file APIs to check for files with this prefix.\n",
-      "INFO:tensorflow:Restoring parameters from /home/ubuntu/.pytorch-pretrained-BERT-cache/bert-base-uncased/tf/bert_base_uncased.ckpt\n",
-      "Tensorflow embedding shape: (1, 768)\n"
-     ]
-    }
-   ],
-   "source": [
-    "import tensorflow as tf\n",
-    "sys.path.insert(0, tf_bert_dir)\n",
-    "import modeling\n",
-    "import tokenization\n",
-    "\n",
-    "tf.reset_default_graph()\n",
-    "\n",
-    "# Process text\n",
-    "tf_tokenizer = tokenization.FullTokenizer(vocab_file=tf_vocab_file)\n",
-    "\n",
-    "# Graph inputs\n",
-    "input_ids_tf, mask_ids_tf, seg_ids_tf = tokenize(input_text, tf_tokenizer)\n",
-    "config = modeling.BertConfig.from_json_file(\n",
-    "    os.path.join(tf_model_dir, 'bert_config.json'))\n",
-    "input_tensor = tf.placeholder(\n",
-    "    dtype=tf.int32,\n",
-    "    shape=[1, None],\n",
-    "    name='input_ids')\n",
-    "mask_tensor = tf.placeholder(\n",
-    "    dtype=tf.int32,\n",
-    "    shape=[1, None],\n",
-    "    name='mask_ids')\n",
-    "seg_tensor = tf.placeholder(\n",
-    "    dtype=tf.int32,\n",
-    "    shape=[1, None],\n",
-    "    name='seg_ids')\n",
-    "tf_model = modeling.BertModel(\n",
-    "    config=config,\n",
-    "    is_training=False,\n",
-    "    input_ids=input_tensor,\n",
-    "    input_mask=mask_tensor,\n",
-    "    token_type_ids=seg_tensor,\n",
-    "    use_one_hot_embeddings=False)\n",
-    "output_layer = tf_model.get_pooled_output()\n",
-    "\n",
-    "# Load tf model\n",
-    "session = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))\n",
-    "vars_to_load = [v for v in tf.global_variables()]\n",
-    "session.run(tf.variables_initializer(var_list=vars_to_load))\n",
-    "saver = tf.train.Saver(vars_to_load)\n",
-    "saver.restore(session, save_path=tf_init_ckpt)\n",
-    "\n",
-    "# TF Embedding\n",
-    "fetches = output_layer\n",
-    "feed_dict  = {\n",
-    "    input_tensor: [input_ids_tf],\n",
-    "    mask_tensor: [mask_ids_tf],\n",
-    "    seg_tensor: [seg_ids_tf]\n",
-    "}\n",
-    "tf_embedding = session.run(fetches=fetches, feed_dict=feed_dict)\n",
-    "print(\"Tensorflow embedding shape: {}\".format(tf_embedding.shape))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Compare Tokenization"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "TOKEN_IDS_PT: [101, 3958, 27227, 2001, 1037, 13997, 11510, 102, 0, 0, 0, 0]\n",
-      "TOKEN_IDS_TF: [101, 3958, 27227, 2001, 1037, 13997, 11510, 102, 0, 0, 0, 0]\n",
-      "SEG_IDS_PT:   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
-      "SEG_IDS_TF:   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
-      "MASK_IDS_PT:  [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0]\n",
-      "MASK_IDS_TF:  [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0]\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(\"TOKEN_IDS_PT: {}\".format(input_ids_pt))\n",
-    "print(\"TOKEN_IDS_TF: {}\".format(input_ids_tf))\n",
-    "print(\"SEG_IDS_PT:   {}\".format(seg_ids_pt))\n",
-    "print(\"SEG_IDS_TF:   {}\".format(seg_ids_tf))\n",
-    "print(\"MASK_IDS_PT:  {}\".format(mask_ids_pt))\n",
-    "print(\"MASK_IDS_TF:  {}\".format(mask_ids_tf))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Compare Model Weights"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "bert/embeddings/word_embeddings\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (30522, 768) values: [-0.01018257 -0.06154883 -0.02649689 -0.0420608   0.00116716]\n",
-      "TF: shape: (30522, 768) values: [-0.01018257 -0.06154883 -0.02649689 -0.0420608   0.00116716]\n",
-      "\n",
-      "bert/embeddings/token_type_embeddings\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (2, 768) values: [0.00043164 0.01098826 0.00370439 0.00150542 0.00057812]\n",
-      "TF: shape: (2, 768) values: [0.00043164 0.01098826 0.00370439 0.00150542 0.00057812]\n",
-      "\n",
-      "bert/embeddings/position_embeddings\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (512, 768) values: [ 0.01750538 -0.02563101 -0.03664156 -0.02528613  0.00797095]\n",
-      "TF: shape: (512, 768) values: [ 0.01750538 -0.02563101 -0.03664156 -0.02528613  0.00797095]\n",
-      "\n",
-      "bert/embeddings/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.02591471 -0.0195513   0.02423946  0.08904593 -0.06281059]\n",
-      "TF: shape: (768,) values: [-0.02591471 -0.0195513   0.02423946  0.08904593 -0.06281059]\n",
-      "\n",
-      "bert/embeddings/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.9260566  0.8851115  0.85807985 0.8616906  0.8937205 ]\n",
-      "TF: shape: (768,) values: [0.9260566  0.8851115  0.85807985 0.8616906  0.8937205 ]\n",
-      "\n",
-      "bert/encoder/layer_0/attention/self/query/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.01640572 -0.03257025  0.01046295 -0.04442816 -0.02256124]\n",
-      "TF: shape: (768, 768) values: [-0.01640572 -0.03257025  0.01046295 -0.04442816 -0.02256124]\n",
-      "\n",
-      "bert/encoder/layer_0/attention/self/query/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.58488506 -0.3312432  -0.43010172  0.37446147 -0.29811692]\n",
-      "TF: shape: (768,) values: [ 0.58488506 -0.3312432  -0.43010172  0.37446147 -0.29811692]\n",
-      "\n",
-      "bert/encoder/layer_0/attention/self/key/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.00807745  0.02652155 -0.01866494  0.01797846  0.00450485]\n",
-      "TF: shape: (768, 768) values: [ 0.00807745  0.02652155 -0.01866494  0.01797846  0.00450485]\n",
-      "\n",
-      "bert/encoder/layer_0/attention/self/key/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.00104306  0.00035106 -0.0024626  -0.00010567 -0.00119283]\n",
-      "TF: shape: (768,) values: [ 0.00104306  0.00035106 -0.0024626  -0.00010567 -0.00119283]\n",
-      "\n",
-      "bert/encoder/layer_0/attention/self/value/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.01144261 -0.02663044  0.01911472 -0.02206182 -0.00287949]\n",
-      "TF: shape: (768, 768) values: [ 0.01144261 -0.02663044  0.01911472 -0.02206182 -0.00287949]\n",
-      "\n",
-      "bert/encoder/layer_0/attention/self/value/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.01184616 -0.01596605 -0.00251847  0.01736802  0.00449983]\n",
-      "TF: shape: (768,) values: [-0.01184616 -0.01596605 -0.00251847  0.01736802  0.00449983]\n",
-      "\n",
-      "bert/encoder/layer_0/attention/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.00581949  0.03170148 -0.06135742 -0.01706108 -0.00759045]\n",
-      "TF: shape: (768, 768) values: [ 0.00581949  0.03170148 -0.06135742 -0.01706108 -0.00759045]\n",
-      "\n",
-      "bert/encoder/layer_0/attention/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.00511063 -0.0166625   0.02812938 -0.01166061  0.01942627]\n",
-      "TF: shape: (768,) values: [ 0.00511063 -0.0166625   0.02812938 -0.01166061  0.01942627]\n",
-      "\n",
-      "bert/encoder/layer_0/attention/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.25779155 -0.03077853 -0.2772697  -0.38847703  0.36841765]\n",
-      "TF: shape: (768,) values: [ 0.25779155 -0.03077853 -0.2772697  -0.38847703  0.36841765]\n",
-      "\n",
-      "bert/encoder/layer_0/attention/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.9803408  0.959969   0.96368986 0.9603653  0.9801324 ]\n",
-      "TF: shape: (768,) values: [0.9803408  0.959969   0.96368986 0.9603653  0.9801324 ]\n",
-      "\n",
-      "bert/encoder/layer_0/intermediate/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 3072) values: [-0.01010427 -0.060398   -0.01468864  0.00311493  0.02862451]\n",
-      "TF: shape: (768, 3072) values: [-0.01010427 -0.060398   -0.01468864  0.00311493  0.02862451]\n",
-      "\n",
-      "bert/encoder/layer_0/intermediate/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072,) values: [-0.11498757 -0.09629171 -0.12399033 -0.129036   -0.06369043]\n",
-      "TF: shape: (3072,) values: [-0.11498757 -0.09629171 -0.12399033 -0.129036   -0.06369043]\n",
-      "\n",
-      "bert/encoder/layer_0/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072, 768) values: [-0.03710171  0.0648794   0.00758566 -0.05224452 -0.04348791]\n",
-      "TF: shape: (3072, 768) values: [-0.03710171  0.0648794   0.00758566 -0.05224452 -0.04348791]\n",
-      "\n",
-      "bert/encoder/layer_0/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.04801027  0.19766568  0.02154854  0.02880666  0.0444298 ]\n",
-      "TF: shape: (768,) values: [-0.04801027  0.19766568  0.02154854  0.02880666  0.0444298 ]\n",
-      "\n",
-      "bert/encoder/layer_0/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.10142924 -0.00499344  0.04274083  0.09324206 -0.10700516]\n",
-      "TF: shape: (768,) values: [-0.10142924 -0.00499344  0.04274083  0.09324206 -0.10700516]\n",
-      "\n",
-      "bert/encoder/layer_0/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.7835125  0.8072406  0.7670588  0.73706394 0.76303864]\n",
-      "TF: shape: (768,) values: [0.7835125  0.8072406  0.7670588  0.73706394 0.76303864]\n",
-      "\n",
-      "bert/encoder/layer_1/attention/self/query/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.03132744 -0.01340016 -0.07761582  0.0655639  -0.00337808]\n",
-      "TF: shape: (768, 768) values: [ 0.03132744 -0.01340016 -0.07761582  0.0655639  -0.00337808]\n",
-      "\n",
-      "bert/encoder/layer_1/attention/self/query/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.27827993  0.17387655 -0.2497937  -0.8809636   0.41262135]\n",
-      "TF: shape: (768,) values: [-0.27827993  0.17387655 -0.2497937  -0.8809636   0.41262135]\n",
-      "\n",
-      "bert/encoder/layer_1/attention/self/key/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.03353037  0.04007257  0.05320328 -0.02166729 -0.03581231]\n",
-      "TF: shape: (768, 768) values: [-0.03353037  0.04007257  0.05320328 -0.02166729 -0.03581231]\n",
-      "\n",
-      "bert/encoder/layer_1/attention/self/key/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.00504407  0.00136887 -0.00394336  0.00646125 -0.00148919]\n",
-      "TF: shape: (768,) values: [-0.00504407  0.00136887 -0.00394336  0.00646125 -0.00148919]\n",
-      "\n",
-      "bert/encoder/layer_1/attention/self/value/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.00464159  0.06674305 -0.00970626 -0.0276653  -0.01597566]\n",
-      "TF: shape: (768, 768) values: [-0.00464159  0.06674305 -0.00970626 -0.0276653  -0.01597566]\n",
-      "\n",
-      "bert/encoder/layer_1/attention/self/value/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.00381288  0.02650839 -0.0059689  -0.00508269 -0.01293722]\n",
-      "TF: shape: (768,) values: [ 0.00381288  0.02650839 -0.0059689  -0.00508269 -0.01293722]\n",
-      "\n",
-      "bert/encoder/layer_1/attention/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.01390745 -0.01100563  0.01303005 -0.01969771  0.0125082 ]\n",
-      "TF: shape: (768, 768) values: [-0.01390745 -0.01100563  0.01303005 -0.01969771  0.0125082 ]\n",
-      "\n",
-      "bert/encoder/layer_1/attention/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.02946591 0.05715097 0.01293636 0.01920356 0.00805334]\n",
-      "TF: shape: (768,) values: [0.02946591 0.05715097 0.01293636 0.01920356 0.00805334]\n",
-      "\n",
-      "bert/encoder/layer_1/attention/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.08583715  0.14199966 -0.0856637  -0.18797271  0.21056814]\n",
-      "TF: shape: (768,) values: [ 0.08583715  0.14199966 -0.0856637  -0.18797271  0.21056814]\n",
-      "\n",
-      "bert/encoder/layer_1/attention/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.896962   0.87148863 0.8531161  0.8690647  0.9488987 ]\n",
-      "TF: shape: (768,) values: [0.896962   0.87148863 0.8531161  0.8690647  0.9488987 ]\n",
-      "\n",
-      "bert/encoder/layer_1/intermediate/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 3072) values: [ 0.01841293 -0.02650284 -0.09708428 -0.01734244 -0.05529237]\n",
-      "TF: shape: (768, 3072) values: [ 0.01841293 -0.02650284 -0.09708428 -0.01734244 -0.05529237]\n",
-      "\n",
-      "bert/encoder/layer_1/intermediate/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072,) values: [-0.15203774 -0.10449131 -0.08440229 -0.09323178 -0.08511415]\n",
-      "TF: shape: (3072,) values: [-0.15203774 -0.10449131 -0.08440229 -0.09323178 -0.08511415]\n",
-      "\n",
-      "bert/encoder/layer_1/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072, 768) values: [-0.02372648  0.03326349  0.08291997 -0.01519038  0.01868557]\n",
-      "TF: shape: (3072, 768) values: [-0.02372648  0.03326349  0.08291997 -0.01519038  0.01868557]\n",
-      "\n",
-      "bert/encoder/layer_1/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.02514724  0.09868994 -0.027811    0.03749462  0.01086514]\n",
-      "TF: shape: (768,) values: [-0.02514724  0.09868994 -0.027811    0.03749462  0.01086514]\n",
-      "\n",
-      "bert/encoder/layer_1/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.07662535 -0.10506564  0.03191236  0.07633785 -0.11187791]\n",
-      "TF: shape: (768,) values: [-0.07662535 -0.10506564  0.03191236  0.07633785 -0.11187791]\n",
-      "\n",
-      "bert/encoder/layer_1/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.9017883  0.8868776  0.8862677  0.85865664 0.87496454]\n",
-      "TF: shape: (768,) values: [0.9017883  0.8868776  0.8862677  0.85865664 0.87496454]\n",
-      "\n",
-      "bert/encoder/layer_2/attention/self/query/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.08433672  0.09580533  0.07543895 -0.01126779 -0.01354045]\n",
-      "TF: shape: (768, 768) values: [ 0.08433672  0.09580533  0.07543895 -0.01126779 -0.01354045]\n",
-      "\n",
-      "bert/encoder/layer_2/attention/self/query/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.0371241   0.03406003  0.27713948 -0.21613775 -0.05275448]\n",
-      "TF: shape: (768,) values: [ 0.0371241   0.03406003  0.27713948 -0.21613775 -0.05275448]\n",
-      "\n",
-      "bert/encoder/layer_2/attention/self/key/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.04794507  0.02517631 -0.01319554 -0.02094732  0.09073472]\n",
-      "TF: shape: (768, 768) values: [ 0.04794507  0.02517631 -0.01319554 -0.02094732  0.09073472]\n",
-      "\n",
-      "bert/encoder/layer_2/attention/self/key/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.00037404 -0.00125881 -0.00114734 -0.00157741  0.00037122]\n",
-      "TF: shape: (768,) values: [-0.00037404 -0.00125881 -0.00114734 -0.00157741  0.00037122]\n",
-      "\n",
-      "bert/encoder/layer_2/attention/self/value/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.01119406 -0.01488636 -0.02960914  0.04746444  0.00428481]\n",
-      "TF: shape: (768, 768) values: [-0.01119406 -0.01488636 -0.02960914  0.04746444  0.00428481]\n",
-      "\n",
-      "bert/encoder/layer_2/attention/self/value/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.02728729  0.04979054  0.08326469  0.04150949  0.600959  ]\n",
-      "TF: shape: (768,) values: [-0.02728729  0.04979054  0.08326469  0.04150949  0.600959  ]\n",
-      "\n",
-      "bert/encoder/layer_2/attention/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.00517425  0.01197957  0.0393172  -0.0063884  -0.02673388]\n",
-      "TF: shape: (768, 768) values: [ 0.00517425  0.01197957  0.0393172  -0.0063884  -0.02673388]\n",
-      "\n",
-      "bert/encoder/layer_2/attention/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.01754025  0.1226335  -0.05733554  0.06844623  0.00879776]\n",
-      "TF: shape: (768,) values: [ 0.01754025  0.1226335  -0.05733554  0.06844623  0.00879776]\n",
-      "\n",
-      "bert/encoder/layer_2/attention/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.1490809   0.12386955 -0.19382021 -0.26515856  0.32723007]\n",
-      "TF: shape: (768,) values: [ 0.1490809   0.12386955 -0.19382021 -0.26515856  0.32723007]\n",
-      "\n",
-      "bert/encoder/layer_2/attention/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.8983343  0.88877076 0.86283594 0.8584952  0.9587886 ]\n",
-      "TF: shape: (768,) values: [0.8983343  0.88877076 0.86283594 0.8584952  0.9587886 ]\n",
-      "\n",
-      "bert/encoder/layer_2/intermediate/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 3072) values: [-0.01619919  0.00662888  0.01492284 -0.01280748  0.01318596]\n",
-      "TF: shape: (768, 3072) values: [-0.01619919  0.00662888  0.01492284 -0.01280748  0.01318596]\n",
-      "\n",
-      "bert/encoder/layer_2/intermediate/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072,) values: [-0.08474881 -0.12850781 -0.11550345 -0.09513011 -0.02519853]\n",
-      "TF: shape: (3072,) values: [-0.08474881 -0.12850781 -0.11550345 -0.09513011 -0.02519853]\n",
-      "\n",
-      "bert/encoder/layer_2/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072, 768) values: [-0.07225161 -0.0129784   0.00618811 -0.01593373 -0.02160194]\n",
-      "TF: shape: (3072, 768) values: [-0.07225161 -0.0129784   0.00618811 -0.01593373 -0.02160194]\n",
-      "\n",
-      "bert/encoder/layer_2/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.06319264  0.06169628 -0.03041368  0.00924282  0.06277442]\n",
-      "TF: shape: (768,) values: [-0.06319264  0.06169628 -0.03041368  0.00924282  0.06277442]\n",
-      "\n",
-      "bert/encoder/layer_2/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.1139038  -0.11665309  0.07883061  0.07796711 -0.14219187]\n",
-      "TF: shape: (768,) values: [-0.1139038  -0.11665309  0.07883061  0.07796711 -0.14219187]\n",
-      "\n",
-      "bert/encoder/layer_2/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.8813261  0.85744697 0.8511922  0.85261875 0.8329574 ]\n",
-      "TF: shape: (768,) values: [0.8813261  0.85744697 0.8511922  0.85261875 0.8329574 ]\n",
-      "\n",
-      "bert/encoder/layer_3/attention/self/query/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.05855456 -0.00111438 -0.00828963  0.04117409 -0.07591715]\n",
-      "TF: shape: (768, 768) values: [ 0.05855456 -0.00111438 -0.00828963  0.04117409 -0.07591715]\n",
-      "\n",
-      "bert/encoder/layer_3/attention/self/query/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.09740101 -0.19290674  0.04332267  0.17937997 -0.08023558]\n",
-      "TF: shape: (768,) values: [ 0.09740101 -0.19290674  0.04332267  0.17937997 -0.08023558]\n",
-      "\n",
-      "bert/encoder/layer_3/attention/self/key/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.02562077  0.02507281 -0.03361562  0.05613289 -0.05435724]\n",
-      "TF: shape: (768, 768) values: [ 0.02562077  0.02507281 -0.03361562  0.05613289 -0.05435724]\n",
-      "\n",
-      "bert/encoder/layer_3/attention/self/key/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.00188639 -0.00379197 -0.01020415  0.00969649 -0.00094182]\n",
-      "TF: shape: (768,) values: [ 0.00188639 -0.00379197 -0.01020415  0.00969649 -0.00094182]\n",
-      "\n",
-      "bert/encoder/layer_3/attention/self/value/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.00539032  0.00959642  0.01325458  0.00490616  0.0129908 ]\n",
-      "TF: shape: (768, 768) values: [-0.00539032  0.00959642  0.01325458  0.00490616  0.0129908 ]\n",
-      "\n",
-      "bert/encoder/layer_3/attention/self/value/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.04573824 0.05405985 0.00681163 0.00655945 0.01141771]\n",
-      "TF: shape: (768,) values: [0.04573824 0.05405985 0.00681163 0.00655945 0.01141771]\n",
-      "\n",
-      "bert/encoder/layer_3/attention/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.01850341  0.03148198  0.02705758 -0.0004669   0.01367511]\n",
-      "TF: shape: (768, 768) values: [ 0.01850341  0.03148198  0.02705758 -0.0004669   0.01367511]\n",
-      "\n",
-      "bert/encoder/layer_3/attention/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.01981483  0.03566506 -0.05016088  0.02958186  0.04989756]\n",
-      "TF: shape: (768,) values: [ 0.01981483  0.03566506 -0.05016088  0.02958186  0.04989756]\n",
-      "\n",
-      "bert/encoder/layer_3/attention/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.09815404  0.00063774 -0.01257733 -0.26485074  0.22568701]\n",
-      "TF: shape: (768,) values: [ 0.09815404  0.00063774 -0.01257733 -0.26485074  0.22568701]\n",
-      "\n",
-      "bert/encoder/layer_3/attention/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.91457725 0.88453823 0.8340887  0.84203583 0.95247847]\n",
-      "TF: shape: (768,) values: [0.91457725 0.88453823 0.8340887  0.84203583 0.95247847]\n",
-      "\n",
-      "bert/encoder/layer_3/intermediate/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 3072) values: [-0.02733567  0.03307878 -0.01331292 -0.00032527  0.03252084]\n",
-      "TF: shape: (768, 3072) values: [-0.02733567  0.03307878 -0.01331292 -0.00032527  0.03252084]\n",
-      "\n",
-      "bert/encoder/layer_3/intermediate/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072,) values: [-0.11436842 -0.15038085 -0.07842971  0.01335877 -0.09492484]\n",
-      "TF: shape: (3072,) values: [-0.11436842 -0.15038085 -0.07842971  0.01335877 -0.09492484]\n",
-      "\n",
-      "bert/encoder/layer_3/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072, 768) values: [-0.01751153  0.01631314 -0.02660011  0.03569947 -0.01394763]\n",
-      "TF: shape: (3072, 768) values: [-0.01751153  0.01631314 -0.02660011  0.03569947 -0.01394763]\n",
-      "\n",
-      "bert/encoder/layer_3/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.03873252  0.08414765 -0.0399323   0.01997361  0.12924597]\n",
-      "TF: shape: (768,) values: [-0.03873252  0.08414765 -0.0399323   0.01997361  0.12924597]\n",
-      "\n",
-      "bert/encoder/layer_3/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.08049371 -0.06923949 -0.03357155  0.05231095 -0.09717073]\n",
-      "TF: shape: (768,) values: [-0.08049371 -0.06923949 -0.03357155  0.05231095 -0.09717073]\n",
-      "\n",
-      "bert/encoder/layer_3/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.827748   0.83012533 0.82399255 0.81772    0.80794513]\n",
-      "TF: shape: (768,) values: [0.827748   0.83012533 0.82399255 0.81772    0.80794513]\n",
-      "\n",
-      "bert/encoder/layer_4/attention/self/query/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.08296382  0.02076941  0.06525186 -0.02659729  0.03491377]\n",
-      "TF: shape: (768, 768) values: [ 0.08296382  0.02076941  0.06525186 -0.02659729  0.03491377]\n",
-      "\n",
-      "bert/encoder/layer_4/attention/self/query/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.07045844 -0.13412629 -0.0514146   0.00061329  0.1248519 ]\n",
-      "TF: shape: (768,) values: [ 0.07045844 -0.13412629 -0.0514146   0.00061329  0.1248519 ]\n",
-      "\n",
-      "bert/encoder/layer_4/attention/self/key/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.06941643  0.08133814 -0.0453992   0.0668715  -0.06014847]\n",
-      "TF: shape: (768, 768) values: [ 0.06941643  0.08133814 -0.0453992   0.0668715  -0.06014847]\n",
-      "\n",
-      "bert/encoder/layer_4/attention/self/key/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.00588725 -0.00235185  0.00281131  0.00173088 -0.00546653]\n",
-      "TF: shape: (768,) values: [-0.00588725 -0.00235185  0.00281131  0.00173088 -0.00546653]\n",
-      "\n",
-      "bert/encoder/layer_4/attention/self/value/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.06889665  0.06645385  0.01232084  0.0132611  -0.01595679]\n",
-      "TF: shape: (768, 768) values: [ 0.06889665  0.06645385  0.01232084  0.0132611  -0.01595679]\n",
-      "\n",
-      "bert/encoder/layer_4/attention/self/value/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.01126871 -0.02704018  0.0301532   0.02332082 -0.04233487]\n",
-      "TF: shape: (768,) values: [-0.01126871 -0.02704018  0.0301532   0.02332082 -0.04233487]\n",
-      "\n",
-      "bert/encoder/layer_4/attention/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.02285513 -0.04172142 -0.0146292   0.04862929 -0.0442014 ]\n",
-      "TF: shape: (768, 768) values: [ 0.02285513 -0.04172142 -0.0146292   0.04862929 -0.0442014 ]\n",
-      "\n",
-      "bert/encoder/layer_4/attention/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.03054528  0.00479777 -0.02729505 -0.0325212  -0.00525727]\n",
-      "TF: shape: (768,) values: [ 0.03054528  0.00479777 -0.02729505 -0.0325212  -0.00525727]\n",
-      "\n",
-      "bert/encoder/layer_4/attention/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.00903359  0.0052285  -0.02841488 -0.22355485  0.28281343]\n",
-      "TF: shape: (768,) values: [ 0.00903359  0.0052285  -0.02841488 -0.22355485  0.28281343]\n",
-      "\n",
-      "bert/encoder/layer_4/attention/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.8849676  0.86927813 0.8114595  0.80269504 0.94864094]\n",
-      "TF: shape: (768,) values: [0.8849676  0.86927813 0.8114595  0.80269504 0.94864094]\n",
-      "\n",
-      "bert/encoder/layer_4/intermediate/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 3072) values: [-0.00639783  0.06198016 -0.03184223  0.00485356 -0.02453273]\n",
-      "TF: shape: (768, 3072) values: [-0.00639783  0.06198016 -0.03184223  0.00485356 -0.02453273]\n",
-      "\n",
-      "bert/encoder/layer_4/intermediate/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072,) values: [-0.08770327 -0.11779705 -0.11764182 -0.00192611 -0.1335473 ]\n",
-      "TF: shape: (3072,) values: [-0.08770327 -0.11779705 -0.11764182 -0.00192611 -0.1335473 ]\n",
-      "\n",
-      "bert/encoder/layer_4/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072, 768) values: [-0.05421264  0.0221118  -0.02674172  0.03672203 -0.02399626]\n",
-      "TF: shape: (3072, 768) values: [-0.05421264  0.0221118  -0.02674172  0.03672203 -0.02399626]\n",
-      "\n",
-      "bert/encoder/layer_4/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.05068972  0.04838871  0.01156022  0.05381602  0.08857913]\n",
-      "TF: shape: (768,) values: [-0.05068972  0.04838871  0.01156022  0.05381602  0.08857913]\n",
-      "\n",
-      "bert/encoder/layer_4/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.04338909 -0.0781464  -0.01518662  0.04936362 -0.12378412]\n",
-      "TF: shape: (768,) values: [-0.04338909 -0.0781464  -0.01518662  0.04936362 -0.12378412]\n",
-      "\n",
-      "bert/encoder/layer_4/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.8734387 0.8576282 0.8339444 0.8450325 0.8105372]\n",
-      "TF: shape: (768,) values: [0.8734387 0.8576282 0.8339444 0.8450325 0.8105372]\n",
-      "\n",
-      "bert/encoder/layer_5/attention/self/query/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.00858843 -0.03920127  0.02552994 -0.02786552  0.02436485]\n",
-      "TF: shape: (768, 768) values: [-0.00858843 -0.03920127  0.02552994 -0.02786552  0.02436485]\n",
-      "\n",
-      "bert/encoder/layer_5/attention/self/query/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.00859117 -0.01642405 -0.04391079  0.01085692  0.02925887]\n",
-      "TF: shape: (768,) values: [-0.00859117 -0.01642405 -0.04391079  0.01085692  0.02925887]\n",
-      "\n",
-      "bert/encoder/layer_5/attention/self/key/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.00352847  0.02330176 -0.00369894 -0.03904612  0.00294574]\n",
-      "TF: shape: (768, 768) values: [ 0.00352847  0.02330176 -0.00369894 -0.03904612  0.00294574]\n",
-      "\n",
-      "bert/encoder/layer_5/attention/self/key/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.01087186 -0.01176561  0.00016575 -0.01163023  0.00946616]\n",
-      "TF: shape: (768,) values: [-0.01087186 -0.01176561  0.00016575 -0.01163023  0.00946616]\n",
-      "\n",
-      "bert/encoder/layer_5/attention/self/value/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.06134222  0.04238288  0.02796064 -0.01284983  0.03683741]\n",
-      "TF: shape: (768, 768) values: [ 0.06134222  0.04238288  0.02796064 -0.01284983  0.03683741]\n",
-      "\n",
-      "bert/encoder/layer_5/attention/self/value/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.05061118 -0.02954445 -0.0034053  -0.00025261  0.0437019 ]\n",
-      "TF: shape: (768,) values: [ 0.05061118 -0.02954445 -0.0034053  -0.00025261  0.0437019 ]\n",
-      "\n",
-      "bert/encoder/layer_5/attention/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.00739815  0.0533964  -0.03736389 -0.04999201  0.01693069]\n",
-      "TF: shape: (768, 768) values: [-0.00739815  0.0533964  -0.03736389 -0.04999201  0.01693069]\n",
-      "\n",
-      "bert/encoder/layer_5/attention/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.0021682   0.01711399 -0.04201518  0.01605333  0.00552063]\n",
-      "TF: shape: (768,) values: [-0.0021682   0.01711399 -0.04201518  0.01605333  0.00552063]\n",
-      "\n",
-      "bert/encoder/layer_5/attention/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.06841327 -0.0146848   0.09792476 -0.23284538  0.2785602 ]\n",
-      "TF: shape: (768,) values: [-0.06841327 -0.0146848   0.09792476 -0.23284538  0.2785602 ]\n",
-      "\n",
-      "bert/encoder/layer_5/attention/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.8908311  0.87884724 0.81637293 0.8047641  0.96539867]\n",
-      "TF: shape: (768,) values: [0.8908311  0.87884724 0.81637293 0.8047641  0.96539867]\n",
-      "\n",
-      "bert/encoder/layer_5/intermediate/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 3072) values: [-0.03246041  0.07251058 -0.08201726  0.00772481  0.02532209]\n",
-      "TF: shape: (768, 3072) values: [-0.03246041  0.07251058 -0.08201726  0.00772481  0.02532209]\n",
-      "\n",
-      "bert/encoder/layer_5/intermediate/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072,) values: [-0.09689714 -0.27696273 -0.13047501 -0.10892326 -0.1057625 ]\n",
-      "TF: shape: (3072,) values: [-0.09689714 -0.27696273 -0.13047501 -0.10892326 -0.1057625 ]\n",
-      "\n",
-      "bert/encoder/layer_5/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072, 768) values: [ 0.0642072  -0.01738782 -0.05095377  0.00523853  0.04425264]\n",
-      "TF: shape: (3072, 768) values: [ 0.0642072  -0.01738782 -0.05095377  0.00523853  0.04425264]\n",
-      "\n",
-      "bert/encoder/layer_5/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.0007217   0.06006297  0.0016595   0.03848181  0.06703516]\n",
-      "TF: shape: (768,) values: [-0.0007217   0.06006297  0.0016595   0.03848181  0.06703516]\n",
-      "\n",
-      "bert/encoder/layer_5/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.00278729 -0.05594506 -0.0631047   0.06023621 -0.18672828]\n",
-      "TF: shape: (768,) values: [-0.00278729 -0.05594506 -0.0631047   0.06023621 -0.18672828]\n",
-      "\n",
-      "bert/encoder/layer_5/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.8621183  0.8515807  0.82654256 0.81729776 0.7985204 ]\n",
-      "TF: shape: (768,) values: [0.8621183  0.8515807  0.82654256 0.81729776 0.7985204 ]\n",
-      "\n",
-      "bert/encoder/layer_6/attention/self/query/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.02527807 -0.01429243  0.01467054  0.08624706 -0.00188593]\n",
-      "TF: shape: (768, 768) values: [-0.02527807 -0.01429243  0.01467054  0.08624706 -0.00188593]\n",
-      "\n",
-      "bert/encoder/layer_6/attention/self/query/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.17319514  0.27564248  0.16801168 -0.10946485  0.1643271 ]\n",
-      "TF: shape: (768,) values: [-0.17319514  0.27564248  0.16801168 -0.10946485  0.1643271 ]\n",
-      "\n",
-      "bert/encoder/layer_6/attention/self/key/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.05886372  0.00706217  0.0398422   0.00882155 -0.04571463]\n",
-      "TF: shape: (768, 768) values: [ 0.05886372  0.00706217  0.0398422   0.00882155 -0.04571463]\n",
-      "\n",
-      "bert/encoder/layer_6/attention/self/key/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.00424696 -0.0001192   0.0046079  -0.00315606  0.00434314]\n",
-      "TF: shape: (768,) values: [-0.00424696 -0.0001192   0.0046079  -0.00315606  0.00434314]\n",
-      "\n",
-      "bert/encoder/layer_6/attention/self/value/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.01720381  0.01170722  0.02346902 -0.02284313 -0.03173028]\n",
-      "TF: shape: (768, 768) values: [-0.01720381  0.01170722  0.02346902 -0.02284313 -0.03173028]\n",
-      "\n",
-      "bert/encoder/layer_6/attention/self/value/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.03492057  0.01813157 -0.00182878 -0.01420629 -0.00508944]\n",
-      "TF: shape: (768,) values: [-0.03492057  0.01813157 -0.00182878 -0.01420629 -0.00508944]\n",
-      "\n",
-      "bert/encoder/layer_6/attention/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.0323688  -0.00689882  0.07379091  0.01121114 -0.02059202]\n",
-      "TF: shape: (768, 768) values: [ 0.0323688  -0.00689882  0.07379091  0.01121114 -0.02059202]\n",
-      "\n",
-      "bert/encoder/layer_6/attention/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.00648672 -0.05935453 -0.05673229 -0.01152384 -0.02766573]\n",
-      "TF: shape: (768,) values: [-0.00648672 -0.05935453 -0.05673229 -0.01152384 -0.02766573]\n",
-      "\n",
-      "bert/encoder/layer_6/attention/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.06793639  0.03157783  0.15647687 -0.15025291  0.14727171]\n",
-      "TF: shape: (768,) values: [-0.06793639  0.03157783  0.15647687 -0.15025291  0.14727171]\n",
-      "\n",
-      "bert/encoder/layer_6/attention/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.8882361  0.8704905  0.80289173 0.77365315 0.92333615]\n",
-      "TF: shape: (768,) values: [0.8882361  0.8704905  0.80289173 0.77365315 0.92333615]\n",
-      "\n",
-      "bert/encoder/layer_6/intermediate/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 3072) values: [ 0.04492201  0.05160861  0.09041415 -0.00742628  0.048133  ]\n",
-      "TF: shape: (768, 3072) values: [ 0.04492201  0.05160861  0.09041415 -0.00742628  0.048133  ]\n",
-      "\n",
-      "bert/encoder/layer_6/intermediate/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072,) values: [-0.09301704 -0.158612   -0.10633879 -0.09706812 -0.17319229]\n",
-      "TF: shape: (3072,) values: [-0.09301704 -0.158612   -0.10633879 -0.09706812 -0.17319229]\n",
-      "\n",
-      "bert/encoder/layer_6/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072, 768) values: [-0.00085372 -0.00974195  0.00684915  0.00038686  0.06610142]\n",
-      "TF: shape: (3072, 768) values: [-0.00085372 -0.00974195  0.00684915  0.00038686  0.06610142]\n",
-      "\n",
-      "bert/encoder/layer_6/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.03254414  0.05681704  0.03720434  0.01936359  0.09134153]\n",
-      "TF: shape: (768,) values: [-0.03254414  0.05681704  0.03720434  0.01936359  0.09134153]\n",
-      "\n",
-      "bert/encoder/layer_6/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.0117129  -0.03209404 -0.08646043  0.03760341 -0.13841423]\n",
-      "TF: shape: (768,) values: [-0.0117129  -0.03209404 -0.08646043  0.03760341 -0.13841423]\n",
-      "\n",
-      "bert/encoder/layer_6/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.8674175  0.8657014  0.8151861  0.82301307 0.8305737 ]\n",
-      "TF: shape: (768,) values: [0.8674175  0.8657014  0.8151861  0.82301307 0.8305737 ]\n",
-      "\n",
-      "bert/encoder/layer_7/attention/self/query/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.00075523 -0.01501983  0.04090893  0.01884826  0.04670674]\n",
-      "TF: shape: (768, 768) values: [-0.00075523 -0.01501983  0.04090893  0.01884826  0.04670674]\n",
-      "\n",
-      "bert/encoder/layer_7/attention/self/query/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.0010344  -0.00423982  0.3117479   0.04494623 -0.01260845]\n",
-      "TF: shape: (768,) values: [ 0.0010344  -0.00423982  0.3117479   0.04494623 -0.01260845]\n",
-      "\n",
-      "bert/encoder/layer_7/attention/self/key/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.02781927 -0.00906972  0.02121989  0.0298591   0.05854786]\n",
-      "TF: shape: (768, 768) values: [ 0.02781927 -0.00906972  0.02121989  0.0298591   0.05854786]\n",
-      "\n",
-      "bert/encoder/layer_7/attention/self/key/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.00074918  0.00731079  0.00089338  0.00345652  0.00043817]\n",
-      "TF: shape: (768,) values: [-0.00074918  0.00731079  0.00089338  0.00345652  0.00043817]\n",
-      "\n",
-      "bert/encoder/layer_7/attention/self/value/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.01080035 -0.03468366  0.03167168  0.01583073  0.0327719 ]\n",
-      "TF: shape: (768, 768) values: [-0.01080035 -0.03468366  0.03167168  0.01583073  0.0327719 ]\n",
-      "\n",
-      "bert/encoder/layer_7/attention/self/value/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.02824226  0.01605172  0.00067929 -0.04553111  0.0076044 ]\n",
-      "TF: shape: (768,) values: [-0.02824226  0.01605172  0.00067929 -0.04553111  0.0076044 ]\n",
-      "\n",
-      "bert/encoder/layer_7/attention/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.05496112  0.01006968  0.02206531 -0.01873116  0.02149118]\n",
-      "TF: shape: (768, 768) values: [-0.05496112  0.01006968  0.02206531 -0.01873116  0.02149118]\n",
-      "\n",
-      "bert/encoder/layer_7/attention/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.00349772 -0.05831751 -0.0594084  -0.0342187   0.02965918]\n",
-      "TF: shape: (768,) values: [ 0.00349772 -0.05831751 -0.0594084  -0.0342187   0.02965918]\n",
-      "\n",
-      "bert/encoder/layer_7/attention/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.02826844  0.04427591  0.05678326 -0.0475907   0.16136196]\n",
-      "TF: shape: (768,) values: [-0.02826844  0.04427591  0.05678326 -0.0475907   0.16136196]\n",
-      "\n",
-      "bert/encoder/layer_7/attention/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.8742141  0.870608   0.79147685 0.7595279  0.9223656 ]\n",
-      "TF: shape: (768,) values: [0.8742141  0.870608   0.79147685 0.7595279  0.9223656 ]\n",
-      "\n",
-      "bert/encoder/layer_7/intermediate/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 3072) values: [ 0.03598932 -0.12225644  0.03019998  0.05691092  0.03717208]\n",
-      "TF: shape: (768, 3072) values: [ 0.03598932 -0.12225644  0.03019998  0.05691092  0.03717208]\n",
-      "\n",
-      "bert/encoder/layer_7/intermediate/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072,) values: [-0.12465011 -0.08639494 -0.06206005 -0.08012587 -0.08773767]\n",
-      "TF: shape: (3072,) values: [-0.12465011 -0.08639494 -0.06206005 -0.08012587 -0.08773767]\n",
-      "\n",
-      "bert/encoder/layer_7/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072, 768) values: [-0.02190432 -0.02279165  0.03279508  0.01011065 -0.07793335]\n",
-      "TF: shape: (3072, 768) values: [-0.02190432 -0.02279165  0.03279508  0.01011065 -0.07793335]\n",
-      "\n",
-      "bert/encoder/layer_7/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.04282642  0.03700675  0.06142357 -0.04787201  0.02958163]\n",
-      "TF: shape: (768,) values: [-0.04282642  0.03700675  0.06142357 -0.04787201  0.02958163]\n",
-      "\n",
-      "bert/encoder/layer_7/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.03142036 -0.04358427 -0.05132087 -0.01788123 -0.16399944]\n",
-      "TF: shape: (768,) values: [-0.03142036 -0.04358427 -0.05132087 -0.01788123 -0.16399944]\n",
-      "\n",
-      "bert/encoder/layer_7/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.83858097 0.8179645  0.80693793 0.81225365 0.7844832 ]\n",
-      "TF: shape: (768,) values: [0.83858097 0.8179645  0.80693793 0.81225365 0.7844832 ]\n",
-      "\n",
-      "bert/encoder/layer_8/attention/self/query/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [0.0448719  0.02289526 0.03083764 0.03048073 0.02436891]\n",
-      "TF: shape: (768, 768) values: [0.0448719  0.02289526 0.03083764 0.03048073 0.02436891]\n",
-      "\n",
-      "bert/encoder/layer_8/attention/self/query/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.25132924 -0.23753347  0.02581017  0.00901509  0.18424493]\n",
-      "TF: shape: (768,) values: [-0.25132924 -0.23753347  0.02581017  0.00901509  0.18424493]\n",
-      "\n",
-      "bert/encoder/layer_8/attention/self/key/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.01999719  0.00711403  0.03949134 -0.0102224   0.03152475]\n",
-      "TF: shape: (768, 768) values: [-0.01999719  0.00711403  0.03949134 -0.0102224   0.03152475]\n",
-      "\n",
-      "bert/encoder/layer_8/attention/self/key/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 5.5668897e-05  3.4638541e-03 -1.7605867e-03 -6.1321147e-03\n",
-      " -4.4074579e-04]\n",
-      "TF: shape: (768,) values: [ 5.5668897e-05  3.4638541e-03 -1.7605867e-03 -6.1321147e-03\n",
-      " -4.4074579e-04]\n",
-      "\n",
-      "bert/encoder/layer_8/attention/self/value/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.00736056 -0.01795213  0.00104576 -0.00034653  0.03190543]\n",
-      "TF: shape: (768, 768) values: [-0.00736056 -0.01795213  0.00104576 -0.00034653  0.03190543]\n",
-      "\n",
-      "bert/encoder/layer_8/attention/self/value/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.02892835  0.00642501 -0.03608712  0.00264269 -0.0245198 ]\n",
-      "TF: shape: (768,) values: [ 0.02892835  0.00642501 -0.03608712  0.00264269 -0.0245198 ]\n",
-      "\n",
-      "bert/encoder/layer_8/attention/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.03971623  0.05307067 -0.01298818  0.00946693 -0.00121235]\n",
-      "TF: shape: (768, 768) values: [ 0.03971623  0.05307067 -0.01298818  0.00946693 -0.00121235]\n",
-      "\n",
-      "bert/encoder/layer_8/attention/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.01468131 -0.05406622 -0.06289103  0.004484    0.0240819 ]\n",
-      "TF: shape: (768,) values: [ 0.01468131 -0.05406622 -0.06289103  0.004484    0.0240819 ]\n",
-      "\n",
-      "bert/encoder/layer_8/attention/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.06004262  0.0457275   0.08688109 -0.14416659 -0.05500487]\n",
-      "TF: shape: (768,) values: [-0.06004262  0.0457275   0.08688109 -0.14416659 -0.05500487]\n",
-      "\n",
-      "bert/encoder/layer_8/attention/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.8907534  0.89116573 0.811639   0.7810443  0.9045574 ]\n",
-      "TF: shape: (768,) values: [0.8907534  0.89116573 0.811639   0.7810443  0.9045574 ]\n",
-      "\n",
-      "bert/encoder/layer_8/intermediate/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 3072) values: [-0.01962814 -0.01482586 -0.02292624  0.03397145  0.02457482]\n",
-      "TF: shape: (768, 3072) values: [-0.01962814 -0.01482586 -0.02292624  0.03397145  0.02457482]\n",
-      "\n",
-      "bert/encoder/layer_8/intermediate/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072,) values: [-0.08129632 -0.1691108  -0.10681771 -0.10392351 -0.13120006]\n",
-      "TF: shape: (3072,) values: [-0.08129632 -0.1691108  -0.10681771 -0.10392351 -0.13120006]\n",
-      "\n",
-      "bert/encoder/layer_8/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072, 768) values: [-0.04683433 -0.02690669  0.02979059  0.02223369 -0.00130287]\n",
-      "TF: shape: (3072, 768) values: [-0.04683433 -0.02690669  0.02979059  0.02223369 -0.00130287]\n",
-      "\n",
-      "bert/encoder/layer_8/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.09155537 -0.04465394  0.05649116 -0.09628641  0.11875238]\n",
-      "TF: shape: (768,) values: [-0.09155537 -0.04465394  0.05649116 -0.09628641  0.11875238]\n",
-      "\n",
-      "bert/encoder/layer_8/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.06043394 -0.06657387 -0.05341128 -0.00374733 -0.10855272]\n",
-      "TF: shape: (768,) values: [-0.06043394 -0.06657387 -0.05341128 -0.00374733 -0.10855272]\n",
-      "\n",
-      "bert/encoder/layer_8/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.84467345 0.84421015 0.82582206 0.84553087 0.8207573 ]\n",
-      "TF: shape: (768,) values: [0.84467345 0.84421015 0.82582206 0.84553087 0.8207573 ]\n",
-      "\n",
-      "bert/encoder/layer_9/attention/self/query/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.08004542 -0.0143706  -0.04219061 -0.05175152 -0.01147588]\n",
-      "TF: shape: (768, 768) values: [ 0.08004542 -0.0143706  -0.04219061 -0.05175152 -0.01147588]\n",
-      "\n",
-      "bert/encoder/layer_9/attention/self/query/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.14508031  0.40926442 -0.3281781  -0.02869792 -0.26104516]\n",
-      "TF: shape: (768,) values: [-0.14508031  0.40926442 -0.3281781  -0.02869792 -0.26104516]\n",
-      "\n",
-      "bert/encoder/layer_9/attention/self/key/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.01337681  0.00615428 -0.0455939   0.03379053 -0.01992556]\n",
-      "TF: shape: (768, 768) values: [-0.01337681  0.00615428 -0.0455939   0.03379053 -0.01992556]\n",
-      "\n",
-      "bert/encoder/layer_9/attention/self/key/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.0051302   0.0083288   0.00377641  0.00928865 -0.00418182]\n",
-      "TF: shape: (768,) values: [-0.0051302   0.0083288   0.00377641  0.00928865 -0.00418182]\n",
-      "\n",
-      "bert/encoder/layer_9/attention/self/value/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.02485976 -0.0301923   0.00984638 -0.02495162  0.01074037]\n",
-      "TF: shape: (768, 768) values: [-0.02485976 -0.0301923   0.00984638 -0.02495162  0.01074037]\n",
-      "\n",
-      "bert/encoder/layer_9/attention/self/value/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.04229928 -0.02636711  0.0060447   0.00222829  0.04979481]\n",
-      "TF: shape: (768,) values: [-0.04229928 -0.02636711  0.0060447   0.00222829  0.04979481]\n",
-      "\n",
-      "bert/encoder/layer_9/attention/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.01258144  0.00871274  0.00482882 -0.00675888 -0.04390825]\n",
-      "TF: shape: (768, 768) values: [-0.01258144  0.00871274  0.00482882 -0.00675888 -0.04390825]\n",
-      "\n",
-      "bert/encoder/layer_9/attention/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.02457753  0.05051134 -0.06890804 -0.00962795  0.00864793]\n",
-      "TF: shape: (768,) values: [ 0.02457753  0.05051134 -0.06890804 -0.00962795  0.00864793]\n",
-      "\n",
-      "bert/encoder/layer_9/attention/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.08963391 -0.06362236  0.0676669  -0.09895685  0.08318913]\n",
-      "TF: shape: (768,) values: [-0.08963391 -0.06362236  0.0676669  -0.09895685  0.08318913]\n",
-      "\n",
-      "bert/encoder/layer_9/attention/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.85100883 0.82569736 0.7927931  0.7660444  0.8912934 ]\n",
-      "TF: shape: (768,) values: [0.85100883 0.82569736 0.7927931  0.7660444  0.8912934 ]\n",
-      "\n",
-      "bert/encoder/layer_9/intermediate/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 3072) values: [ 0.06290598  0.0203122  -0.05384256  0.05442941  0.00484769]\n",
-      "TF: shape: (768, 3072) values: [ 0.06290598  0.0203122  -0.05384256  0.05442941  0.00484769]\n",
-      "\n",
-      "bert/encoder/layer_9/intermediate/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072,) values: [-0.10818483 -0.00169527 -0.08962701 -0.10280421 -0.14310956]\n",
-      "TF: shape: (3072,) values: [-0.10818483 -0.00169527 -0.08962701 -0.10280421 -0.14310956]\n",
-      "\n",
-      "bert/encoder/layer_9/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072, 768) values: [ 0.05487705  0.01644666  0.00436198 -0.00490768 -0.03238423]\n",
-      "TF: shape: (3072, 768) values: [ 0.05487705  0.01644666  0.00436198 -0.00490768 -0.03238423]\n",
-      "\n",
-      "bert/encoder/layer_9/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.08755219 -0.01910074 -0.02988298 -0.08150438  0.09897955]\n",
-      "TF: shape: (768,) values: [-0.08755219 -0.01910074 -0.02988298 -0.08150438  0.09897955]\n",
-      "\n",
-      "bert/encoder/layer_9/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.04136161 -0.02113917 -0.07581077 -0.00809791 -0.09790538]\n",
-      "TF: shape: (768,) values: [-0.04136161 -0.02113917 -0.07581077 -0.00809791 -0.09790538]\n",
-      "\n",
-      "bert/encoder/layer_9/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.8250572  0.83477134 0.7794141  0.81264955 0.7827918 ]\n",
-      "TF: shape: (768,) values: [0.8250572  0.83477134 0.7794141  0.81264955 0.7827918 ]\n",
-      "\n",
-      "bert/encoder/layer_10/attention/self/query/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.00071212 -0.00853064  0.01776993  0.03189976  0.02183623]\n",
-      "TF: shape: (768, 768) values: [ 0.00071212 -0.00853064  0.01776993  0.03189976  0.02183623]\n",
-      "\n",
-      "bert/encoder/layer_10/attention/self/query/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.03667567 -0.01449654 -0.03822913  0.00118343 -0.05489838]\n",
-      "TF: shape: (768,) values: [-0.03667567 -0.01449654 -0.03822913  0.00118343 -0.05489838]\n",
-      "\n",
-      "bert/encoder/layer_10/attention/self/key/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.0494106   0.05531096 -0.02459413 -0.06019118 -0.02829785]\n",
-      "TF: shape: (768, 768) values: [-0.0494106   0.05531096 -0.02459413 -0.06019118 -0.02829785]\n",
-      "\n",
-      "bert/encoder/layer_10/attention/self/key/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.00692997  0.00855893  0.00670777 -0.0052475  -0.00017074]\n",
-      "TF: shape: (768,) values: [-0.00692997  0.00855893  0.00670777 -0.0052475  -0.00017074]\n",
-      "\n",
-      "bert/encoder/layer_10/attention/self/value/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.01911842  0.04858809 -0.02608485  0.00794924 -0.02246636]\n",
-      "TF: shape: (768, 768) values: [ 0.01911842  0.04858809 -0.02608485  0.00794924 -0.02246636]\n",
-      "\n",
-      "bert/encoder/layer_10/attention/self/value/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.0133503  -0.01224133 -0.0051834  -0.00232528  0.00148614]\n",
-      "TF: shape: (768,) values: [-0.0133503  -0.01224133 -0.0051834  -0.00232528  0.00148614]\n",
-      "\n",
-      "bert/encoder/layer_10/attention/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.05904732  0.02616     0.00794104 -0.02889086 -0.03692576]\n",
-      "TF: shape: (768, 768) values: [-0.05904732  0.02616     0.00794104 -0.02889086 -0.03692576]\n",
-      "\n",
-      "bert/encoder/layer_10/attention/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.02089205 0.01458059 0.05217785 0.0324267  0.00907548]\n",
-      "TF: shape: (768,) values: [0.02089205 0.01458059 0.05217785 0.0324267  0.00907548]\n",
-      "\n",
-      "bert/encoder/layer_10/attention/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.10986238 -0.04332284  0.02603893 -0.06236923  0.14469369]\n",
-      "TF: shape: (768,) values: [-0.10986238 -0.04332284  0.02603893 -0.06236923  0.14469369]\n",
-      "\n",
-      "bert/encoder/layer_10/attention/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.8515822  0.81392974 0.836747   0.78040504 0.88091415]\n",
-      "TF: shape: (768,) values: [0.8515822  0.81392974 0.836747   0.78040504 0.88091415]\n",
-      "\n",
-      "bert/encoder/layer_10/intermediate/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 3072) values: [-0.07061081  0.06997397  0.01433633  0.04150929  0.02865192]\n",
-      "TF: shape: (768, 3072) values: [-0.07061081  0.06997397  0.01433633  0.04150929  0.02865192]\n",
-      "\n",
-      "bert/encoder/layer_10/intermediate/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072,) values: [-0.13879126 -0.06401426 -0.1408043  -0.15043251 -0.10193057]\n",
-      "TF: shape: (3072,) values: [-0.13879126 -0.06401426 -0.1408043  -0.15043251 -0.10193057]\n",
-      "\n",
-      "bert/encoder/layer_10/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072, 768) values: [ 0.02918765  0.02609882 -0.02259856  0.01636725 -0.00038442]\n",
-      "TF: shape: (3072, 768) values: [ 0.02918765  0.02609882 -0.02259856  0.01636725 -0.00038442]\n",
-      "\n",
-      "bert/encoder/layer_10/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.01799502  0.10970547 -0.02384165 -0.03350981  0.10491351]\n",
-      "TF: shape: (768,) values: [-0.01799502  0.10970547 -0.02384165 -0.03350981  0.10491351]\n",
-      "\n",
-      "bert/encoder/layer_10/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.00999107 -0.0217309  -0.0854177  -0.01109101 -0.07902174]\n",
-      "TF: shape: (768,) values: [ 0.00999107 -0.0217309  -0.0854177  -0.01109101 -0.07902174]\n",
-      "\n",
-      "bert/encoder/layer_10/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.8272796  0.8597452  0.79116803 0.81267637 0.8273501 ]\n",
-      "TF: shape: (768,) values: [0.8272796  0.8597452  0.79116803 0.81267637 0.8273501 ]\n",
-      "\n",
-      "bert/encoder/layer_11/attention/self/query/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.04141425 -0.06491017 -0.03202523  0.06226195  0.02193764]\n",
-      "TF: shape: (768, 768) values: [-0.04141425 -0.06491017 -0.03202523  0.06226195  0.02193764]\n",
-      "\n",
-      "bert/encoder/layer_11/attention/self/query/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.0501296   0.11886728  0.2186807   0.08720991 -0.20476632]\n",
-      "TF: shape: (768,) values: [ 0.0501296   0.11886728  0.2186807   0.08720991 -0.20476632]\n",
-      "\n",
-      "bert/encoder/layer_11/attention/self/key/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.02634268 -0.01357682 -0.06076496  0.04210597  0.01783857]\n",
-      "TF: shape: (768, 768) values: [ 0.02634268 -0.01357682 -0.06076496  0.04210597  0.01783857]\n",
-      "\n",
-      "bert/encoder/layer_11/attention/self/key/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.0007798  -0.00065806 -0.00010521  0.00119144 -0.00180091]\n",
-      "TF: shape: (768,) values: [-0.0007798  -0.00065806 -0.00010521  0.00119144 -0.00180091]\n",
-      "\n",
-      "bert/encoder/layer_11/attention/self/value/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.03520973 -0.00678078 -0.02883583 -0.01011515  0.04519828]\n",
-      "TF: shape: (768, 768) values: [ 0.03520973 -0.00678078 -0.02883583 -0.01011515  0.04519828]\n",
-      "\n",
-      "bert/encoder/layer_11/attention/self/value/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.01502306 -0.00530942  0.00023572  0.00205218 -0.00578036]\n",
-      "TF: shape: (768,) values: [ 0.01502306 -0.00530942  0.00023572  0.00205218 -0.00578036]\n",
-      "\n",
-      "bert/encoder/layer_11/attention/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [ 0.02361419  0.03112707 -0.00063031  0.04209773 -0.02434015]\n",
-      "TF: shape: (768, 768) values: [ 0.02361419  0.03112707 -0.00063031  0.04209773 -0.02434015]\n",
-      "\n",
-      "bert/encoder/layer_11/attention/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [ 0.02566087  0.0028438  -0.00475678  0.02149458 -0.01755187]\n",
-      "TF: shape: (768,) values: [ 0.02566087  0.0028438  -0.00475678  0.02149458 -0.01755187]\n",
-      "\n",
-      "bert/encoder/layer_11/attention/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.03134411  0.01207957 -0.04636396 -0.03013046  0.07944281]\n",
-      "TF: shape: (768,) values: [-0.03134411  0.01207957 -0.04636396 -0.03013046  0.07944281]\n",
-      "\n",
-      "bert/encoder/layer_11/attention/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.85203767 0.8020145  0.8554237  0.8150477  0.8441815 ]\n",
-      "TF: shape: (768,) values: [0.85203767 0.8020145  0.8554237  0.8150477  0.8441815 ]\n",
-      "\n",
-      "bert/encoder/layer_11/intermediate/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 3072) values: [ 0.05871898 -0.01124212  0.00206979 -0.04366514 -0.00716808]\n",
-      "TF: shape: (768, 3072) values: [ 0.05871898 -0.01124212  0.00206979 -0.04366514 -0.00716808]\n",
-      "\n",
-      "bert/encoder/layer_11/intermediate/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072,) values: [-0.09762521 -0.06175711 -0.05153917 -0.08580919 -0.09734315]\n",
-      "TF: shape: (3072,) values: [-0.09762521 -0.06175711 -0.05153917 -0.08580919 -0.09734315]\n",
-      "\n",
-      "bert/encoder/layer_11/output/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (3072, 768) values: [-0.022382    0.01073206 -0.01357213  0.02484621  0.01403091]\n",
-      "TF: shape: (3072, 768) values: [-0.022382    0.01073206 -0.01357213  0.02484621  0.01403091]\n",
-      "\n",
-      "bert/encoder/layer_11/output/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.06574099  0.04207807  0.01201084  0.00229322  0.05551811]\n",
-      "TF: shape: (768,) values: [-0.06574099  0.04207807  0.01201084  0.00229322  0.05551811]\n",
-      "\n",
-      "bert/encoder/layer_11/output/LayerNorm/beta\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.00634605 -0.01989403  0.04628465  0.01585056 -0.04256899]\n",
-      "TF: shape: (768,) values: [-0.00634605 -0.01989403  0.04628465  0.01585056 -0.04256899]\n",
-      "\n",
-      "bert/encoder/layer_11/output/LayerNorm/gamma\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [0.6384234  0.6300364  0.66570055 0.6126921  0.63756436]\n",
-      "TF: shape: (768,) values: [0.6384234  0.6300364  0.66570055 0.6126921  0.63756436]\n",
-      "\n",
-      "bert/pooler/dense/kernel\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768, 768) values: [-0.00127425  0.00199868 -0.03863145 -0.00139355  0.00691627]\n",
-      "TF: shape: (768, 768) values: [-0.00127425  0.00199868 -0.03863145 -0.00139355  0.00691627]\n",
-      "\n",
-      "bert/pooler/dense/bias\n",
-      "|sum(pt_wts - tf_wts)| = 0.0\n",
-      "PT: shape: (768,) values: [-0.03597581 -0.00389536  0.05181352  0.02224747 -0.00493723]\n",
-      "TF: shape: (768,) values: [-0.03597581 -0.00389536  0.05181352  0.02224747 -0.00493723]\n",
-      "\n"
-     ]
-    }
-   ],
-   "source": [
-    "tensors_to_transopse = (\n",
-    "    \"dense.weight\",\n",
-    "    \"attention.self.query\",\n",
-    "    \"attention.self.key\",\n",
-    "    \"attention.self.value\"\n",
-    ")\n",
-    "var_map = (\n",
-    "    ('layer.', 'layer_'),\n",
-    "    ('word_embeddings.weight', 'word_embeddings'),\n",
-    "    ('position_embeddings.weight', 'position_embeddings'),\n",
-    "    ('token_type_embeddings.weight', 'token_type_embeddings'),\n",
-    "    ('.', '/'),\n",
-    "    ('LayerNorm/weight', 'LayerNorm/gamma'),\n",
-    "    ('LayerNorm/bias', 'LayerNorm/beta'),\n",
-    "    ('weight', 'kernel')\n",
-    ")\n",
-    "\n",
-    "def to_tf_var_name(name:str):\n",
-    "    for patt, repl in iter(var_map):\n",
-    "        name = name.replace(patt, repl)\n",
-    "    return 'bert/{}'.format(name)\n",
-    "\n",
-    "tf_vars = {v.name: session.run(fetches=v) for v in tf.global_variables()}\n",
-    "pt_vars = {}\n",
-    "for v, T in pt_model.state_dict().items():\n",
-    "    T = T.detach().numpy()\n",
-    "    if any([x in v for x in tensors_to_transopse]):\n",
-    "        T = T.T\n",
-    "    pt_vars.update({to_tf_var_name(v): T})\n",
-    "\n",
-    "for var_name in tf_vars:\n",
-    "    \n",
-    "    pt = pt_vars[var_name.strip(\":0\")]\n",
-    "    tf = tf_vars[var_name]\n",
-    "\n",
-    "    print(var_name.strip(\":0\"))\n",
-    "    \n",
-    "    # Assert equivalence\n",
-    "    print(\"|sum(pt_wts - tf_wts)| = {}\".format(\n",
-    "        np.abs(np.sum(pt - tf, keepdims=False))\n",
-    "    ))\n",
-    "    assert not np.sum(pt - tf, keepdims=False)\n",
-    "    \n",
-    "    if len(pt.shape) == 2:\n",
-    "        print(\"PT: shape: {0} values: {1}\".format(pt.shape, pt[0, :5]))\n",
-    "        print(\"TF: shape: {0} values: {1}\".format(tf.shape, tf[0, :5]))\n",
-    "    else:\n",
-    "        print(\"PT: shape: {0} values: {1}\".format(pt.shape, pt[:5]))\n",
-    "        print(\"TF: shape: {0} values: {1}\".format(tf.shape, tf[:5]))\n",
-    "    print()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Compare Layer-12 Projections"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "MSE: 2.7155439966009e-05\n",
-      "PT-values: [-0.876663   -0.41088238 -0.12200808  0.44941     0.19445966]\n",
-      "TF-values: [-0.8742865  -0.40621698 -0.10585472  0.444904    0.1825743 ]\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Mean Squared Error (MSE) between last projection of each model\n",
-    "MSE = np.mean((pt_embedding - tf_embedding) ** 2, keepdims=False)\n",
-    "print(\"MSE: {}\".format(MSE))\n",
-    "print(\"PT-values: {}\".format(pt_embedding[0, :5]))\n",
-    "print(\"TF-values: {}\".format(tf_embedding[0, :5]))"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "nlp",
-   "language": "python",
-   "name": "nlp"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.6.8"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/notebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb b/notebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb
deleted file mode 100644
index 809f6ea6e0f326..00000000000000
--- a/notebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb
+++ /dev/null
@@ -1,4815 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Comparing TensorFlow (original) and PyTorch models\n",
-    "\n",
-    "You can use this small notebook to check the conversion of the model's weights from the TensorFlow model to the PyTorch model. In the following, we compare the weights of the last layer on a simple example (in `input.txt`) but both models returns all the hidden layers so you can check every stage of the model.\n",
-    "\n",
-    "To run this notebook, follow these instructions:\n",
-    "- make sure that your Python environment has both TensorFlow and PyTorch installed,\n",
-    "- download the original TensorFlow implementation,\n",
-    "- download a pre-trained TensorFlow model as indicaded in the TensorFlow implementation readme,\n",
-    "- run the script `convert_tf_checkpoint_to_pytorch.py` as indicated in the `README` to convert the pre-trained TensorFlow model to PyTorch.\n",
-    "\n",
-    "If needed change the relative paths indicated in this notebook (at the beggining of Sections 1 and 2) to point to the relevent models and code."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-16T10:02:26.999106Z",
-     "start_time": "2018-11-16T10:02:26.985709Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "os.chdir('../')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## 1/ TensorFlow code"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-16T10:02:27.664528Z",
-     "start_time": "2018-11-16T10:02:27.651019Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "original_tf_inplem_dir = \"./tensorflow_code/\"\n",
-    "model_dir = \"../google_models/uncased_L-12_H-768_A-12/\"\n",
-    "\n",
-    "vocab_file = model_dir + \"vocab.txt\"\n",
-    "bert_config_file = model_dir + \"bert_config.json\"\n",
-    "init_checkpoint = model_dir + \"bert_model.ckpt\"\n",
-    "\n",
-    "input_file = \"./samples/input.txt\"\n",
-    "max_seq_length = 128\n",
-    "max_predictions_per_seq = 20\n",
-    "\n",
-    "masked_lm_positions = [6]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-16T10:02:30.202182Z",
-     "start_time": "2018-11-16T10:02:28.112570Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "import importlib.util\n",
-    "import sys\n",
-    "import tensorflow as tf\n",
-    "import pytorch_transformers as ppb\n",
-    "\n",
-    "def del_all_flags(FLAGS):\n",
-    "    flags_dict = FLAGS._flags()    \n",
-    "    keys_list = [keys for keys in flags_dict]    \n",
-    "    for keys in keys_list:\n",
-    "        FLAGS.__delattr__(keys)\n",
-    "\n",
-    "del_all_flags(tf.flags.FLAGS)\n",
-    "import tensorflow_code.extract_features as ef\n",
-    "del_all_flags(tf.flags.FLAGS)\n",
-    "import tensorflow_code.modeling as tfm\n",
-    "del_all_flags(tf.flags.FLAGS)\n",
-    "import tensorflow_code.tokenization as tft\n",
-    "del_all_flags(tf.flags.FLAGS)\n",
-    "import tensorflow_code.run_pretraining as rp\n",
-    "del_all_flags(tf.flags.FLAGS)\n",
-    "import tensorflow_code.create_pretraining_data as cpp"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-16T10:02:30.238027Z",
-     "start_time": "2018-11-16T10:02:30.204943Z"
-    },
-    "code_folding": [
-     15
-    ]
-   },
-   "outputs": [],
-   "source": [
-    "import re\n",
-    "class InputExample(object):\n",
-    "    \"\"\"A single instance example.\"\"\"\n",
-    "\n",
-    "    def __init__(self, tokens, segment_ids, masked_lm_positions,\n",
-    "                 masked_lm_labels, is_random_next):\n",
-    "        self.tokens = tokens\n",
-    "        self.segment_ids = segment_ids\n",
-    "        self.masked_lm_positions = masked_lm_positions\n",
-    "        self.masked_lm_labels = masked_lm_labels\n",
-    "        self.is_random_next = is_random_next\n",
-    "    def __repr__(self):\n",
-    "        return '\\n'.join(k + \":\" + str(v) for k, v in self.__dict__.items())\n",
-    "\n",
-    "\n",
-    "def read_examples(input_file, tokenizer, masked_lm_positions):\n",
-    "    \"\"\"Read a list of `InputExample`s from an input file.\"\"\"\n",
-    "    examples = []\n",
-    "    unique_id = 0\n",
-    "    with tf.gfile.GFile(input_file, \"r\") as reader:\n",
-    "        while True:\n",
-    "            line = reader.readline()\n",
-    "            if not line:\n",
-    "                break\n",
-    "            line = line.strip()\n",
-    "            text_a = None\n",
-    "            text_b = None\n",
-    "            m = re.match(r\"^(.*) \\|\\|\\| (.*)$\", line)\n",
-    "            if m is None:\n",
-    "                text_a = line\n",
-    "            else:\n",
-    "                text_a = m.group(1)\n",
-    "                text_b = m.group(2)\n",
-    "            tokens_a = tokenizer.tokenize(text_a)\n",
-    "            tokens_b = None\n",
-    "            if text_b:\n",
-    "                tokens_b = tokenizer.tokenize(text_b)\n",
-    "            tokens = tokens_a + tokens_b\n",
-    "            masked_lm_labels = []\n",
-    "            for m_pos in masked_lm_positions:\n",
-    "                masked_lm_labels.append(tokens[m_pos])\n",
-    "                tokens[m_pos] = '[MASK]'\n",
-    "            examples.append(\n",
-    "                InputExample(\n",
-    "                    tokens = tokens,\n",
-    "                    segment_ids = [0] * len(tokens_a) + [1] * len(tokens_b),\n",
-    "                    masked_lm_positions = masked_lm_positions,\n",
-    "                    masked_lm_labels = masked_lm_labels,\n",
-    "                    is_random_next = False))\n",
-    "            unique_id += 1\n",
-    "    return examples"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-16T10:02:30.304018Z",
-     "start_time": "2018-11-16T10:02:30.240189Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "tokens:['who', 'was', 'jim', 'henson', '?', 'jim', '[MASK]', 'was', 'a', 'puppet', '##eer']\n",
-      "segment_ids:[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]\n",
-      "masked_lm_positions:[6]\n",
-      "masked_lm_labels:['henson']\n",
-      "is_random_next:False\n"
-     ]
-    }
-   ],
-   "source": [
-    "bert_config = tfm.BertConfig.from_json_file(bert_config_file)\n",
-    "tokenizer = ppb.BertTokenizer(\n",
-    "    vocab_file=vocab_file, do_lower_case=True)\n",
-    "examples = read_examples(input_file, tokenizer, masked_lm_positions=masked_lm_positions)\n",
-    "\n",
-    "print(examples[0])"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-16T10:02:33.324167Z",
-     "start_time": "2018-11-16T10:02:33.291909Z"
-    },
-    "code_folding": [
-     16
-    ]
-   },
-   "outputs": [],
-   "source": [
-    "class InputFeatures(object):\n",
-    "    \"\"\"A single set of features of data.\"\"\"\n",
-    "\n",
-    "    def __init__(self, input_ids, input_mask, segment_ids, masked_lm_positions,\n",
-    "                 masked_lm_ids, masked_lm_weights, next_sentence_label):\n",
-    "        self.input_ids = input_ids\n",
-    "        self.input_mask = input_mask\n",
-    "        self.segment_ids = segment_ids\n",
-    "        self.masked_lm_positions = masked_lm_positions\n",
-    "        self.masked_lm_ids = masked_lm_ids\n",
-    "        self.masked_lm_weights = masked_lm_weights\n",
-    "        self.next_sentence_labels = next_sentence_label\n",
-    "\n",
-    "    def __repr__(self):\n",
-    "        return '\\n'.join(k + \":\" + str(v) for k, v in self.__dict__.items())\n",
-    "\n",
-    "def pretraining_convert_examples_to_features(instances, tokenizer, max_seq_length,\n",
-    "                                 max_predictions_per_seq):\n",
-    "    \"\"\"Create TF example files from `TrainingInstance`s.\"\"\"\n",
-    "    features = []\n",
-    "    for (inst_index, instance) in enumerate(instances):\n",
-    "        input_ids = tokenizer.convert_tokens_to_ids(instance.tokens)\n",
-    "        input_mask = [1] * len(input_ids)\n",
-    "        segment_ids = list(instance.segment_ids)\n",
-    "        assert len(input_ids) <= max_seq_length\n",
-    "\n",
-    "        while len(input_ids) < max_seq_length:\n",
-    "            input_ids.append(0)\n",
-    "            input_mask.append(0)\n",
-    "            segment_ids.append(0)\n",
-    "\n",
-    "        assert len(input_ids) == max_seq_length\n",
-    "        assert len(input_mask) == max_seq_length\n",
-    "        assert len(segment_ids) == max_seq_length\n",
-    "\n",
-    "        masked_lm_positions = list(instance.masked_lm_positions)\n",
-    "        masked_lm_ids = tokenizer.convert_tokens_to_ids(instance.masked_lm_labels)\n",
-    "        masked_lm_weights = [1.0] * len(masked_lm_ids)\n",
-    "\n",
-    "        while len(masked_lm_positions) < max_predictions_per_seq:\n",
-    "            masked_lm_positions.append(0)\n",
-    "            masked_lm_ids.append(0)\n",
-    "            masked_lm_weights.append(0.0)\n",
-    "\n",
-    "        next_sentence_label = 1 if instance.is_random_next else 0\n",
-    "\n",
-    "        features.append(\n",
-    "            InputFeatures(input_ids, input_mask, segment_ids,\n",
-    "                          masked_lm_positions, masked_lm_ids,\n",
-    "                          masked_lm_weights, next_sentence_label))\n",
-    "\n",
-    "        if inst_index < 5:\n",
-    "            tf.logging.info(\"*** Example ***\")\n",
-    "            tf.logging.info(\"tokens: %s\" % \" \".join(\n",
-    "                [str(x) for x in instance.tokens]))\n",
-    "            tf.logging.info(\"features: %s\" % str(features[-1]))\n",
-    "    return features"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-16T10:02:34.185367Z",
-     "start_time": "2018-11-16T10:02:34.155046Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:*** Example ***\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:34 - INFO - tensorflow -   *** Example ***\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:tokens: who was jim henson ? jim [MASK] was a puppet ##eer\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:34 - INFO - tensorflow -   tokens: who was jim henson ? jim [MASK] was a puppet ##eer\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:features: input_ids:[2040, 2001, 3958, 27227, 1029, 3958, 103, 2001, 1037, 13997, 11510, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
-      "input_mask:[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
-      "segment_ids:[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
-      "masked_lm_positions:[6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
-      "masked_lm_ids:[27227, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
-      "masked_lm_weights:[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]\n",
-      "next_sentence_labels:0\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:34 - INFO - tensorflow -   features: input_ids:[2040, 2001, 3958, 27227, 1029, 3958, 103, 2001, 1037, 13997, 11510, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
-      "input_mask:[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
-      "segment_ids:[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
-      "masked_lm_positions:[6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
-      "masked_lm_ids:[27227, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n",
-      "masked_lm_weights:[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]\n",
-      "next_sentence_labels:0\n"
-     ]
-    }
-   ],
-   "source": [
-    "features = pretraining_convert_examples_to_features(\n",
-    "    instances=examples, max_seq_length=max_seq_length, \n",
-    "    max_predictions_per_seq=max_predictions_per_seq, tokenizer=tokenizer)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-16T10:02:34.912005Z",
-     "start_time": "2018-11-16T10:02:34.882111Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def input_fn_builder(features, seq_length, max_predictions_per_seq, tokenizer):\n",
-    "    \"\"\"Creates an `input_fn` closure to be passed to TPUEstimator.\"\"\"\n",
-    "\n",
-    "    all_input_ids = []\n",
-    "    all_input_mask = []\n",
-    "    all_segment_ids = []\n",
-    "    all_masked_lm_positions = []\n",
-    "    all_masked_lm_ids = []\n",
-    "    all_masked_lm_weights = []\n",
-    "    all_next_sentence_labels = []\n",
-    "\n",
-    "    for feature in features:\n",
-    "        all_input_ids.append(feature.input_ids)\n",
-    "        all_input_mask.append(feature.input_mask)\n",
-    "        all_segment_ids.append(feature.segment_ids)\n",
-    "        all_masked_lm_positions.append(feature.masked_lm_positions)\n",
-    "        all_masked_lm_ids.append(feature.masked_lm_ids)\n",
-    "        all_masked_lm_weights.append(feature.masked_lm_weights)\n",
-    "        all_next_sentence_labels.append(feature.next_sentence_labels)\n",
-    "\n",
-    "    def input_fn(params):\n",
-    "        \"\"\"The actual input function.\"\"\"\n",
-    "        batch_size = params[\"batch_size\"]\n",
-    "\n",
-    "        num_examples = len(features)\n",
-    "\n",
-    "        # This is for demo purposes and does NOT scale to large data sets. We do\n",
-    "        # not use Dataset.from_generator() because that uses tf.py_func which is\n",
-    "        # not TPU compatible. The right way to load data is with TFRecordReader.\n",
-    "        d = tf.data.Dataset.from_tensor_slices({\n",
-    "            \"input_ids\":\n",
-    "                tf.constant(\n",
-    "                    all_input_ids, shape=[num_examples, seq_length],\n",
-    "                    dtype=tf.int32),\n",
-    "            \"input_mask\":\n",
-    "                tf.constant(\n",
-    "                    all_input_mask,\n",
-    "                    shape=[num_examples, seq_length],\n",
-    "                    dtype=tf.int32),\n",
-    "            \"segment_ids\":\n",
-    "                tf.constant(\n",
-    "                    all_segment_ids,\n",
-    "                    shape=[num_examples, seq_length],\n",
-    "                    dtype=tf.int32),\n",
-    "            \"masked_lm_positions\":\n",
-    "                tf.constant(\n",
-    "                    all_masked_lm_positions,\n",
-    "                    shape=[num_examples, max_predictions_per_seq],\n",
-    "                    dtype=tf.int32),\n",
-    "        \"masked_lm_ids\":\n",
-    "                tf.constant(\n",
-    "                    all_masked_lm_ids,\n",
-    "                    shape=[num_examples, max_predictions_per_seq],\n",
-    "                    dtype=tf.int32),\n",
-    "        \"masked_lm_weights\":\n",
-    "                tf.constant(\n",
-    "                    all_masked_lm_weights,\n",
-    "                    shape=[num_examples, max_predictions_per_seq],\n",
-    "                    dtype=tf.float32),\n",
-    "        \"next_sentence_labels\":\n",
-    "                tf.constant(\n",
-    "                    all_next_sentence_labels,\n",
-    "                    shape=[num_examples, 1],\n",
-    "                    dtype=tf.int32),\n",
-    "        })\n",
-    "\n",
-    "        d = d.batch(batch_size=batch_size, drop_remainder=False)\n",
-    "        return d\n",
-    "\n",
-    "    return input_fn\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-16T10:02:35.671603Z",
-     "start_time": "2018-11-16T10:02:35.626167Z"
-    },
-    "code_folding": [
-     64,
-     77
-    ]
-   },
-   "outputs": [],
-   "source": [
-    "def model_fn_builder(bert_config, init_checkpoint, learning_rate,\n",
-    "                     num_train_steps, num_warmup_steps, use_tpu,\n",
-    "                     use_one_hot_embeddings):\n",
-    "    \"\"\"Returns `model_fn` closure for TPUEstimator.\"\"\"\n",
-    "\n",
-    "    def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument\n",
-    "        \"\"\"The `model_fn` for TPUEstimator.\"\"\"\n",
-    "\n",
-    "        tf.logging.info(\"*** Features ***\")\n",
-    "        for name in sorted(features.keys()):\n",
-    "            tf.logging.info(\"  name = %s, shape = %s\" % (name, features[name].shape))\n",
-    "\n",
-    "        input_ids = features[\"input_ids\"]\n",
-    "        input_mask = features[\"input_mask\"]\n",
-    "        segment_ids = features[\"segment_ids\"]\n",
-    "        masked_lm_positions = features[\"masked_lm_positions\"]\n",
-    "        masked_lm_ids = features[\"masked_lm_ids\"]\n",
-    "        masked_lm_weights = features[\"masked_lm_weights\"]\n",
-    "        next_sentence_labels = features[\"next_sentence_labels\"]\n",
-    "\n",
-    "        is_training = (mode == tf.estimator.ModeKeys.TRAIN)\n",
-    "\n",
-    "        model = tfm.BertModel(\n",
-    "            config=bert_config,\n",
-    "            is_training=is_training,\n",
-    "            input_ids=input_ids,\n",
-    "            input_mask=input_mask,\n",
-    "            token_type_ids=segment_ids,\n",
-    "            use_one_hot_embeddings=use_one_hot_embeddings)\n",
-    "\n",
-    "        (masked_lm_loss,\n",
-    "         masked_lm_example_loss, masked_lm_log_probs) = rp.get_masked_lm_output(\n",
-    "            bert_config, model.get_sequence_output(), model.get_embedding_table(),\n",
-    "            masked_lm_positions, masked_lm_ids, masked_lm_weights)\n",
-    "\n",
-    "        (next_sentence_loss, next_sentence_example_loss,\n",
-    "         next_sentence_log_probs) = rp.get_next_sentence_output(\n",
-    "            bert_config, model.get_pooled_output(), next_sentence_labels)\n",
-    "\n",
-    "        total_loss = masked_lm_loss + next_sentence_loss\n",
-    "\n",
-    "        tvars = tf.trainable_variables()\n",
-    "\n",
-    "        initialized_variable_names = {}\n",
-    "        scaffold_fn = None\n",
-    "        if init_checkpoint:\n",
-    "            (assignment_map,\n",
-    "             initialized_variable_names) = tfm.get_assigment_map_from_checkpoint(\n",
-    "                tvars, init_checkpoint)\n",
-    "            if use_tpu:\n",
-    "\n",
-    "                def tpu_scaffold():\n",
-    "                    tf.train.init_from_checkpoint(init_checkpoint, assignment_map)\n",
-    "                    return tf.train.Scaffold()\n",
-    "\n",
-    "                scaffold_fn = tpu_scaffold\n",
-    "            else:\n",
-    "                tf.train.init_from_checkpoint(init_checkpoint, assignment_map)\n",
-    "\n",
-    "        tf.logging.info(\"**** Trainable Variables ****\")\n",
-    "        for var in tvars:\n",
-    "            init_string = \"\"\n",
-    "            if var.name in initialized_variable_names:\n",
-    "                init_string = \", *INIT_FROM_CKPT*\"\n",
-    "            tf.logging.info(\"  name = %s, shape = %s%s\", var.name, var.shape,\n",
-    "                            init_string)\n",
-    "\n",
-    "        output_spec = None\n",
-    "        if mode == tf.estimator.ModeKeys.TRAIN:\n",
-    "            masked_lm_positions = features[\"masked_lm_positions\"]\n",
-    "            masked_lm_ids = features[\"masked_lm_ids\"]\n",
-    "            masked_lm_weights = features[\"masked_lm_weights\"]\n",
-    "            next_sentence_labels = features[\"next_sentence_labels\"]\n",
-    "            train_op = optimization.create_optimizer(\n",
-    "                total_loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu)\n",
-    "\n",
-    "            output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n",
-    "                mode=mode,\n",
-    "                loss=total_loss,\n",
-    "                train_op=train_op,\n",
-    "                scaffold_fn=scaffold_fn)\n",
-    "        elif mode == tf.estimator.ModeKeys.EVAL:\n",
-    "            masked_lm_positions = features[\"masked_lm_positions\"]\n",
-    "            masked_lm_ids = features[\"masked_lm_ids\"]\n",
-    "            masked_lm_weights = features[\"masked_lm_weights\"]\n",
-    "            next_sentence_labels = features[\"next_sentence_labels\"]\n",
-    "\n",
-    "            def metric_fn(masked_lm_example_loss, masked_lm_log_probs, masked_lm_ids,\n",
-    "                          masked_lm_weights, next_sentence_example_loss,\n",
-    "                          next_sentence_log_probs, next_sentence_labels):\n",
-    "                \"\"\"Computes the loss and accuracy of the model.\"\"\"\n",
-    "                masked_lm_log_probs = tf.reshape(masked_lm_log_probs,\n",
-    "                                                 [-1, masked_lm_log_probs.shape[-1]])\n",
-    "                masked_lm_predictions = tf.argmax(\n",
-    "                    masked_lm_log_probs, axis=-1, output_type=tf.int32)\n",
-    "                masked_lm_example_loss = tf.reshape(masked_lm_example_loss, [-1])\n",
-    "                masked_lm_ids = tf.reshape(masked_lm_ids, [-1])\n",
-    "                masked_lm_weights = tf.reshape(masked_lm_weights, [-1])\n",
-    "                masked_lm_accuracy = tf.metrics.accuracy(\n",
-    "                    labels=masked_lm_ids,\n",
-    "                    predictions=masked_lm_predictions,\n",
-    "                    weights=masked_lm_weights)\n",
-    "                masked_lm_mean_loss = tf.metrics.mean(\n",
-    "                    values=masked_lm_example_loss, weights=masked_lm_weights)\n",
-    "\n",
-    "                next_sentence_log_probs = tf.reshape(\n",
-    "                    next_sentence_log_probs, [-1, next_sentence_log_probs.shape[-1]])\n",
-    "                next_sentence_predictions = tf.argmax(\n",
-    "                    next_sentence_log_probs, axis=-1, output_type=tf.int32)\n",
-    "                next_sentence_labels = tf.reshape(next_sentence_labels, [-1])\n",
-    "                next_sentence_accuracy = tf.metrics.accuracy(\n",
-    "                    labels=next_sentence_labels, predictions=next_sentence_predictions)\n",
-    "                next_sentence_mean_loss = tf.metrics.mean(\n",
-    "                    values=next_sentence_example_loss)\n",
-    "\n",
-    "                return {\n",
-    "                    \"masked_lm_accuracy\": masked_lm_accuracy,\n",
-    "                    \"masked_lm_loss\": masked_lm_mean_loss,\n",
-    "                    \"next_sentence_accuracy\": next_sentence_accuracy,\n",
-    "                    \"next_sentence_loss\": next_sentence_mean_loss,\n",
-    "                }\n",
-    "\n",
-    "            eval_metrics = (metric_fn, [\n",
-    "                masked_lm_example_loss, masked_lm_log_probs, masked_lm_ids,\n",
-    "                masked_lm_weights, next_sentence_example_loss,\n",
-    "                next_sentence_log_probs, next_sentence_labels\n",
-    "            ])\n",
-    "            output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n",
-    "                mode=mode,\n",
-    "                loss=total_loss,\n",
-    "                eval_metrics=eval_metrics,\n",
-    "                scaffold_fn=scaffold_fn)\n",
-    "        elif mode == tf.estimator.ModeKeys.PREDICT:\n",
-    "            masked_lm_log_probs = tf.reshape(masked_lm_log_probs,\n",
-    "                                                [-1, masked_lm_log_probs.shape[-1]])\n",
-    "            masked_lm_predictions = tf.argmax(\n",
-    "                masked_lm_log_probs, axis=-1, output_type=tf.int32)\n",
-    "\n",
-    "            next_sentence_log_probs = tf.reshape(\n",
-    "                next_sentence_log_probs, [-1, next_sentence_log_probs.shape[-1]])\n",
-    "            next_sentence_predictions = tf.argmax(\n",
-    "                next_sentence_log_probs, axis=-1, output_type=tf.int32)\n",
-    "\n",
-    "            masked_lm_predictions = tf.reshape(masked_lm_predictions,\n",
-    "                                                [1, masked_lm_positions.shape[-1]])\n",
-    "            next_sentence_predictions = tf.reshape(next_sentence_predictions,\n",
-    "                                                [1, 1])\n",
-    "\n",
-    "            predictions = {\n",
-    "                \"masked_lm_predictions\": masked_lm_predictions,\n",
-    "                \"next_sentence_predictions\": next_sentence_predictions\n",
-    "            }\n",
-    "\n",
-    "            output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n",
-    "                mode=mode, predictions=predictions, scaffold_fn=scaffold_fn)\n",
-    "            return output_spec\n",
-    "        else:\n",
-    "            raise ValueError(\"Only TRAIN, EVAL and PREDICT modes are supported: %s\" % (mode))\n",
-    "\n",
-    "        return output_spec\n",
-    "\n",
-    "    return model_fn"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-16T10:02:40.328700Z",
-     "start_time": "2018-11-16T10:02:36.289676Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x12a864ae8>) includes params argument, but params are not passed to Estimator.\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:40 - WARNING - tensorflow -   Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x12a864ae8>) includes params argument, but params are not passed to Estimator.\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "WARNING:tensorflow:Using temporary folder as model directory: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmp4x8r3x3d\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:40 - WARNING - tensorflow -   Using temporary folder as model directory: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmp4x8r3x3d\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:Using config: {'_model_dir': '/var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmp4x8r3x3d', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true\n",
-      "graph_options {\n",
-      "  rewrite_options {\n",
-      "    meta_optimizer_iterations: ONE\n",
-      "  }\n",
-      "}\n",
-      ", '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x12dbb5ac8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=2, num_shards=1, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:40 - INFO - tensorflow -   Using config: {'_model_dir': '/var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmp4x8r3x3d', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true\n",
-      "graph_options {\n",
-      "  rewrite_options {\n",
-      "    meta_optimizer_iterations: ONE\n",
-      "  }\n",
-      "}\n",
-      ", '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x12dbb5ac8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=2, num_shards=1, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "WARNING:tensorflow:Setting TPUConfig.num_shards==1 is an unsupported behavior. Please fix as soon as possible (leaving num_shards as None.\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:40 - WARNING - tensorflow -   Setting TPUConfig.num_shards==1 is an unsupported behavior. Please fix as soon as possible (leaving num_shards as None.\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:_TPUContext: eval_on_tpu True\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:40 - INFO - tensorflow -   _TPUContext: eval_on_tpu True\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:40 - WARNING - tensorflow -   eval_on_tpu ignored because use_tpu is False.\n"
-     ]
-    }
-   ],
-   "source": [
-    "is_per_host = tf.contrib.tpu.InputPipelineConfig.PER_HOST_V2\n",
-    "run_config = tf.contrib.tpu.RunConfig(\n",
-    "    master=None,\n",
-    "    tpu_config=tf.contrib.tpu.TPUConfig(\n",
-    "        num_shards=1,\n",
-    "        per_host_input_for_training=is_per_host))\n",
-    "\n",
-    "model_fn = model_fn_builder(\n",
-    "    bert_config=bert_config,\n",
-    "    init_checkpoint=init_checkpoint,\n",
-    "    learning_rate=0,\n",
-    "    num_train_steps=1,\n",
-    "    num_warmup_steps=1,\n",
-    "    use_tpu=False,\n",
-    "    use_one_hot_embeddings=False)\n",
-    "\n",
-    "# If TPU is not available, this will fall back to normal Estimator on CPU\n",
-    "# or GPU.\n",
-    "estimator = tf.contrib.tpu.TPUEstimator(\n",
-    "    use_tpu=False,\n",
-    "    model_fn=model_fn,\n",
-    "    config=run_config,\n",
-    "    predict_batch_size=1)\n",
-    "\n",
-    "input_fn = input_fn_builder(\n",
-    "    features=features, seq_length=max_seq_length, max_predictions_per_seq=max_predictions_per_seq,\n",
-    "tokenizer=tokenizer)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-16T10:02:46.596956Z",
-     "start_time": "2018-11-16T10:02:40.331008Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:Could not find trained model in model_dir: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmp4x8r3x3d, running initialization to predict.\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:40 - INFO - tensorflow -   Could not find trained model in model_dir: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmp4x8r3x3d, running initialization to predict.\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:Calling model_fn.\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:40 - INFO - tensorflow -   Calling model_fn.\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:Running infer on CPU\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:40 - INFO - tensorflow -   Running infer on CPU\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:*** Features ***\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:40 - INFO - tensorflow -   *** Features ***\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = input_ids, shape = (?, 128)\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:40 - INFO - tensorflow -     name = input_ids, shape = (?, 128)\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = input_mask, shape = (?, 128)\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:40 - INFO - tensorflow -     name = input_mask, shape = (?, 128)\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = masked_lm_ids, shape = (?, 20)\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:40 - INFO - tensorflow -     name = masked_lm_ids, shape = (?, 20)\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = masked_lm_positions, shape = (?, 20)\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:40 - INFO - tensorflow -     name = masked_lm_positions, shape = (?, 20)\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = masked_lm_weights, shape = (?, 20)\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:40 - INFO - tensorflow -     name = masked_lm_weights, shape = (?, 20)\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = next_sentence_labels, shape = (?, 1)\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:40 - INFO - tensorflow -     name = next_sentence_labels, shape = (?, 1)\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = segment_ids, shape = (?, 128)\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:40 - INFO - tensorflow -     name = segment_ids, shape = (?, 128)\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:**** Trainable Variables ****\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -   **** Trainable Variables ****\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/embeddings/word_embeddings:0, shape = (30522, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/embeddings/word_embeddings:0, shape = (30522, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/embeddings/position_embeddings:0, shape = (512, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/embeddings/position_embeddings:0, shape = (512, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_0/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_0/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_0/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_0/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_0/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_0/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_0/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_0/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_0/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_0/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_0/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_0/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_0/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_0/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_0/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_0/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_0/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_0/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_0/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_0/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_0/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_0/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_1/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_1/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_1/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_1/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_1/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_1/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_1/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_1/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_1/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_1/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_1/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_1/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_1/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_1/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_1/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_1/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_1/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_1/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_1/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_1/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_1/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_1/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_2/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_2/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_2/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_2/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_2/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_2/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_2/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_2/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_2/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_2/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_2/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_2/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_2/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_2/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_2/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_2/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_2/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_2/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_2/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_2/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_2/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_2/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_3/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_3/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_3/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_3/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_3/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_3/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_3/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_3/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_3/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_3/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_3/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_3/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_3/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_3/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_3/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_3/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_3/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_3/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_3/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_3/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_3/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_3/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_4/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_4/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_4/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_4/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_4/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_4/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_4/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_4/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_4/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_4/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_4/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_4/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_4/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_4/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_4/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_4/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_4/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_4/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_4/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_4/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_4/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_4/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_5/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_5/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_5/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_5/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_5/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_5/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_5/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_5/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_5/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_5/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_5/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_5/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_5/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_5/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_5/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_5/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_5/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_5/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_5/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_5/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_5/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_5/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_6/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_6/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_6/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_6/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_6/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_6/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_6/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_6/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_6/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_6/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_6/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_6/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_6/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_6/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_6/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_6/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_6/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_6/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_6/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_6/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_6/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_6/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_7/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_7/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_7/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_7/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_7/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_7/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_7/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_7/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_7/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_7/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_7/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_7/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_7/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_7/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_7/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_7/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_7/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_7/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_7/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_7/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_7/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_7/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_8/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_8/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_8/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_8/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_8/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_8/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_8/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_8/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_8/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_8/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_8/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_8/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_8/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_8/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_8/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_8/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_8/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_8/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_8/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_8/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_8/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_8/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_9/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_9/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_9/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_9/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_9/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_9/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_9/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_9/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_9/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_9/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_9/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_9/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_9/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_9/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_9/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_9/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_9/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_9/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_9/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_9/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_9/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_9/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_10/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_10/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_10/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_10/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_10/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_10/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_10/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_10/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_10/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_10/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_10/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_10/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_10/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_10/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_10/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_10/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_10/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_10/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_10/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_10/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_10/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_10/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_11/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_11/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_11/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_11/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_11/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_11/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_11/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_11/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_11/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_11/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_11/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_11/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_11/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_11/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_11/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_11/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_11/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_11/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_11/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_11/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_11/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/encoder/layer_11/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/pooler/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/pooler/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/pooler/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = bert/pooler/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = cls/predictions/transform/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = cls/predictions/transform/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = cls/predictions/transform/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = cls/predictions/transform/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = cls/predictions/transform/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = cls/predictions/transform/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = cls/predictions/transform/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = cls/predictions/transform/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = cls/predictions/output_bias:0, shape = (30522,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = cls/predictions/output_bias:0, shape = (30522,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = cls/seq_relationship/output_weights:0, shape = (2, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = cls/seq_relationship/output_weights:0, shape = (2, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = cls/seq_relationship/output_bias:0, shape = (2,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -     name = cls/seq_relationship/output_bias:0, shape = (2,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:Done calling model_fn.\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:43 - INFO - tensorflow -   Done calling model_fn.\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:Graph was finalized.\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:44 - INFO - tensorflow -   Graph was finalized.\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:Running local_init_op.\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:45 - INFO - tensorflow -   Running local_init_op.\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:Done running local_init_op.\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:45 - INFO - tensorflow -   Done running local_init_op.\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:prediction_loop marked as finished\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:46 - INFO - tensorflow -   prediction_loop marked as finished\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:prediction_loop marked as finished\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:02:46 - INFO - tensorflow -   prediction_loop marked as finished\n"
-     ]
-    }
-   ],
-   "source": [
-    "tensorflow_all_out = []\n",
-    "for result in estimator.predict(input_fn, yield_single_examples=True):\n",
-    "    tensorflow_all_out.append(result)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-16T10:02:46.634304Z",
-     "start_time": "2018-11-16T10:02:46.598800Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "1\n",
-      "2\n",
-      "dict_keys(['masked_lm_predictions', 'next_sentence_predictions'])\n",
-      "masked_lm_predictions [27227  1010  1010  1010  1010  1010  1010  1010  1010  1010  1010  1010\n",
-      "  1010  1010  1010  1010  1010  1010  1010  1010]\n",
-      "predicted token ['henson', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',', ',']\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(len(tensorflow_all_out))\n",
-    "print(len(tensorflow_all_out[0]))\n",
-    "print(tensorflow_all_out[0].keys())\n",
-    "print(\"masked_lm_predictions\", tensorflow_all_out[0]['masked_lm_predictions'])\n",
-    "print(\"predicted token\", tokenizer.convert_ids_to_tokens(tensorflow_all_out[0]['masked_lm_predictions']))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-16T10:02:46.671229Z",
-     "start_time": "2018-11-16T10:02:46.637102Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "tensorflow_output: ['henson']\n"
-     ]
-    }
-   ],
-   "source": [
-    "tensorflow_outputs = tokenizer.convert_ids_to_tokens(tensorflow_all_out[0]['masked_lm_predictions'])[:len(masked_lm_positions)]\n",
-    "print(\"tensorflow_output:\", tensorflow_outputs)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## 2/ PyTorch code"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-16T10:03:03.556557Z",
-     "start_time": "2018-11-16T10:03:03.519654Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "from examples import extract_features\n",
-    "from examples.extract_features import *"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-16T10:03:03.952710Z",
-     "start_time": "2018-11-16T10:03:03.921917Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "init_checkpoint_pt = \"../google_models/uncased_L-12_H-768_A-12/pytorch_model.bin\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-16T10:03:12.307673Z",
-     "start_time": "2018-11-16T10:03:04.439317Z"
-    },
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/16/2018 11:03:05 - INFO - pytorch_transformers.modeling_bert -   loading archive file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz from cache at /Users/thomaswolf/.pytorch_transformers/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba\n",
-      "11/16/2018 11:03:05 - INFO - pytorch_transformers.modeling_bert -   extracting archive file /Users/thomaswolf/.pytorch_transformers/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba to temp dir /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmpaqgsm566\n",
-      "11/16/2018 11:03:08 - INFO - pytorch_transformers.modeling_bert -   Model config {\n",
-      "  \"attention_probs_dropout_prob\": 0.1,\n",
-      "  \"hidden_act\": \"gelu\",\n",
-      "  \"hidden_dropout_prob\": 0.1,\n",
-      "  \"hidden_size\": 768,\n",
-      "  \"initializer_range\": 0.02,\n",
-      "  \"intermediate_size\": 3072,\n",
-      "  \"max_position_embeddings\": 512,\n",
-      "  \"num_attention_heads\": 12,\n",
-      "  \"num_hidden_layers\": 12,\n",
-      "  \"type_vocab_size\": 2,\n",
-      "  \"vocab_size\": 30522\n",
-      "}\n",
-      "\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "BertForPreTraining(\n",
-       "  (bert): BertModel(\n",
-       "    (embeddings): BertEmbeddings(\n",
-       "      (word_embeddings): Embedding(30522, 768)\n",
-       "      (position_embeddings): Embedding(512, 768)\n",
-       "      (token_type_embeddings): Embedding(2, 768)\n",
-       "      (LayerNorm): BertLayerNorm()\n",
-       "      (dropout): Dropout(p=0.1)\n",
-       "    )\n",
-       "    (encoder): BertEncoder(\n",
-       "      (layer): ModuleList(\n",
-       "        (0): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (1): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (2): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (3): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (4): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (5): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (6): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (7): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (8): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (9): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (10): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (11): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "      )\n",
-       "    )\n",
-       "    (pooler): BertPooler(\n",
-       "      (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "      (activation): Tanh()\n",
-       "    )\n",
-       "  )\n",
-       "  (cls): BertPreTrainingHeads(\n",
-       "    (predictions): BertLMPredictionHead(\n",
-       "      (transform): BertPredictionHeadTransform(\n",
-       "        (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "        (LayerNorm): BertLayerNorm()\n",
-       "      )\n",
-       "      (decoder): Linear(in_features=768, out_features=30522, bias=False)\n",
-       "    )\n",
-       "    (seq_relationship): Linear(in_features=768, out_features=2, bias=True)\n",
-       "  )\n",
-       ")"
-      ]
-     },
-     "execution_count": 16,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "device = torch.device(\"cpu\")\n",
-    "model = ppb.BertForPreTraining.from_pretrained('bert-base-uncased')\n",
-    "model.to(device)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-16T10:03:12.351625Z",
-     "start_time": "2018-11-16T10:03:12.310736Z"
-    },
-    "code_folding": []
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "BertForPreTraining(\n",
-       "  (bert): BertModel(\n",
-       "    (embeddings): BertEmbeddings(\n",
-       "      (word_embeddings): Embedding(30522, 768)\n",
-       "      (position_embeddings): Embedding(512, 768)\n",
-       "      (token_type_embeddings): Embedding(2, 768)\n",
-       "      (LayerNorm): BertLayerNorm()\n",
-       "      (dropout): Dropout(p=0.1)\n",
-       "    )\n",
-       "    (encoder): BertEncoder(\n",
-       "      (layer): ModuleList(\n",
-       "        (0): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (1): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (2): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (3): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (4): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (5): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (6): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (7): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (8): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (9): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (10): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (11): BertLayer(\n",
-       "          (attention): BertAttention(\n",
-       "            (self): BertSelfAttention(\n",
-       "              (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "            (output): BertSelfOutput(\n",
-       "              (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "              (LayerNorm): BertLayerNorm()\n",
-       "              (dropout): Dropout(p=0.1)\n",
-       "            )\n",
-       "          )\n",
-       "          (intermediate): BertIntermediate(\n",
-       "            (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "          )\n",
-       "          (output): BertOutput(\n",
-       "            (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "      )\n",
-       "    )\n",
-       "    (pooler): BertPooler(\n",
-       "      (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "      (activation): Tanh()\n",
-       "    )\n",
-       "  )\n",
-       "  (cls): BertPreTrainingHeads(\n",
-       "    (predictions): BertLMPredictionHead(\n",
-       "      (transform): BertPredictionHeadTransform(\n",
-       "        (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "        (LayerNorm): BertLayerNorm()\n",
-       "      )\n",
-       "      (decoder): Linear(in_features=768, out_features=30522, bias=False)\n",
-       "    )\n",
-       "    (seq_relationship): Linear(in_features=768, out_features=2, bias=True)\n",
-       "  )\n",
-       ")"
-      ]
-     },
-     "execution_count": 17,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)\n",
-    "all_input_mask = torch.tensor([f.input_mask for f in features], dtype=torch.long)\n",
-    "all_segment_ids = torch.tensor([f.segment_ids for f in features], dtype=torch.long)\n",
-    "all_masked_lm_positions = torch.tensor([f.masked_lm_positions for f in features], dtype=torch.long)\n",
-    "\n",
-    "eval_data = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_masked_lm_positions)\n",
-    "eval_sampler = SequentialSampler(eval_data)\n",
-    "eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=1)\n",
-    "\n",
-    "model.eval()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-16T10:03:12.792741Z",
-     "start_time": "2018-11-16T10:03:12.354253Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "tensor([[ 2040,  2001,  3958, 27227,  1029,  3958,   103,  2001,  1037, 13997,\n",
-      "         11510,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0]])\n",
-      "tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
-      "         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
-      "         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
-      "         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
-      "         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
-      "         0, 0, 0, 0, 0, 0, 0, 0]])\n",
-      "tensor([[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
-      "         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
-      "         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
-      "         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
-      "         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
-      "         0, 0, 0, 0, 0, 0, 0, 0]])\n",
-      "(1, 20, 30522)\n",
-      "[27227, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010, 1010]\n"
-     ]
-    }
-   ],
-   "source": [
-    "import numpy as np\n",
-    "pytorch_all_out = []\n",
-    "for input_ids, input_mask, segment_ids, tensor_masked_lm_positions in eval_dataloader:\n",
-    "    print(input_ids)\n",
-    "    print(input_mask)\n",
-    "    print(segment_ids)\n",
-    "    input_ids = input_ids.to(device)\n",
-    "    input_mask = input_mask.to(device)\n",
-    "    segment_ids = segment_ids.to(device)\n",
-    "\n",
-    "    prediction_scores, _ = model(input_ids, token_type_ids=segment_ids, attention_mask=input_mask)\n",
-    "    prediction_scores = prediction_scores[0, tensor_masked_lm_positions].detach().cpu().numpy()\n",
-    "    print(prediction_scores.shape)\n",
-    "    masked_lm_predictions = np.argmax(prediction_scores, axis=-1).squeeze().tolist()\n",
-    "    print(masked_lm_predictions)\n",
-    "    pytorch_all_out.append(masked_lm_predictions)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-16T10:03:12.828439Z",
-     "start_time": "2018-11-16T10:03:12.795420Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "pytorch_output: ['henson']\n",
-      "tensorflow_output: ['henson']\n"
-     ]
-    }
-   ],
-   "source": [
-    "pytorch_outputs = tokenizer.convert_ids_to_tokens(pytorch_all_out[0])[:len(masked_lm_positions)]\n",
-    "print(\"pytorch_output:\", pytorch_outputs)\n",
-    "print(\"tensorflow_output:\", tensorflow_outputs)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "hide_input": false,
-  "kernelspec": {
-   "display_name": "Python [default]",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.6.7"
-  },
-  "toc": {
-   "colors": {
-    "hover_highlight": "#DAA520",
-    "running_highlight": "#FF0000",
-    "selected_highlight": "#FFD700"
-   },
-   "moveMenuLeft": true,
-   "nav_menu": {
-    "height": "48px",
-    "width": "252px"
-   },
-   "navigate_menu": true,
-   "number_sections": true,
-   "sideBar": true,
-   "threshold": 4,
-   "toc_cell": false,
-   "toc_section_display": "block",
-   "toc_window_display": false
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/notebooks/Comparing-TF-and-PT-models-SQuAD.ipynb b/notebooks/Comparing-TF-and-PT-models-SQuAD.ipynb
deleted file mode 100644
index a75e052643f59b..00000000000000
--- a/notebooks/Comparing-TF-and-PT-models-SQuAD.ipynb
+++ /dev/null
@@ -1,1644 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Comparing TensorFlow (original) and PyTorch model on the SQuAD task\n",
-    "\n",
-    "You can use this small notebook to check the loss computation from the TensorFlow model to the PyTorch model. In the following, we compare the total loss computed by the models starting from identical initializations (position prediction linear layers with weights at 1 and bias at 0).\n",
-    "\n",
-    "To run this notebook, follow these instructions:\n",
-    "- make sure that your Python environment has both TensorFlow and PyTorch installed,\n",
-    "- download the original TensorFlow implementation,\n",
-    "- download a pre-trained TensorFlow model as indicaded in the TensorFlow implementation readme,\n",
-    "- run the script `convert_tf_checkpoint_to_pytorch.py` as indicated in the `README` to convert the pre-trained TensorFlow model to PyTorch.\n",
-    "\n",
-    "If needed change the relative paths indicated in this notebook (at the beggining of Sections 1 and 2) to point to the relevent models and code."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:33.636911Z",
-     "start_time": "2018-11-06T10:11:33.623091Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "os.chdir('../')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## 1/ TensorFlow code"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:33.651792Z",
-     "start_time": "2018-11-06T10:11:33.638984Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "original_tf_inplem_dir = \"./tensorflow_code/\"\n",
-    "model_dir = \"../google_models/uncased_L-12_H-768_A-12/\"\n",
-    "\n",
-    "vocab_file = model_dir + \"vocab.txt\"\n",
-    "bert_config_file = model_dir + \"bert_config.json\"\n",
-    "init_checkpoint = model_dir + \"bert_model.ckpt\"\n",
-    "\n",
-    "input_file = \"../data/squad_data/train-v1.1.json\"\n",
-    "max_seq_length = 384\n",
-    "outside_pos = max_seq_length + 10\n",
-    "doc_stride = 128\n",
-    "max_query_length = 64\n",
-    "max_answer_length = 30\n",
-    "output_dir = \"/tmp/squad_base/\"\n",
-    "learning_rate = 3e-5"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:35.165788Z",
-     "start_time": "2018-11-06T10:11:33.653401Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "import importlib.util\n",
-    "import sys\n",
-    "\n",
-    "spec = importlib.util.spec_from_file_location('*', original_tf_inplem_dir + '/modeling.py')\n",
-    "module = importlib.util.module_from_spec(spec)\n",
-    "spec.loader.exec_module(module)\n",
-    "sys.modules['modeling_tensorflow'] = module\n",
-    "\n",
-    "spec = importlib.util.spec_from_file_location('*', original_tf_inplem_dir + '/run_bert_squad.py')\n",
-    "module = importlib.util.module_from_spec(spec)\n",
-    "spec.loader.exec_module(module)\n",
-    "sys.modules['run_squad_tensorflow'] = module\n",
-    "import modeling_tensorflow\n",
-    "from run_squad_tensorflow import *"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:37.494391Z",
-     "start_time": "2018-11-06T10:11:35.168615Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 1000000000\n",
-      "INFO:tensorflow:example_index: 0\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] to whom did the virgin mary allegedly appear in 1858 in lou ##rdes france ? [SEP] architectural ##ly , the school has a catholic character . atop the main building ' s gold dome is a golden statue of the virgin mary . immediately in front of the main building and facing it , is a copper statue of christ with arms up ##rai ##sed with the legend \" ve ##ni ##te ad me om ##nes \" . next to the main building is the basilica of the sacred heart . immediately behind the basilica is the gr ##otto , a marian place of prayer and reflection . it is a replica of the gr ##otto at lou ##rdes , france where the virgin mary reputed ##ly appeared to saint bern ##ade ##tte so ##ub ##iro ##us in 1858 . at the end of the main drive ( and in a direct line that connects through 3 statues and the gold dome ) , is a simple , modern stone statue of mary . [SEP]\n",
-      "INFO:tensorflow:token_to_orig_map: 17:0 18:0 19:0 20:1 21:2 22:3 23:4 24:5 25:6 26:6 27:7 28:8 29:9 30:10 31:10 32:10 33:11 34:12 35:13 36:14 37:15 38:16 39:17 40:18 41:19 42:20 43:20 44:21 45:22 46:23 47:24 48:25 49:26 50:27 51:28 52:29 53:30 54:30 55:31 56:32 57:33 58:34 59:35 60:36 61:37 62:38 63:39 64:39 65:39 66:40 67:41 68:42 69:43 70:43 71:43 72:43 73:44 74:45 75:46 76:46 77:46 78:46 79:47 80:48 81:49 82:50 83:51 84:52 85:53 86:54 87:55 88:56 89:57 90:58 91:58 92:59 93:60 94:61 95:62 96:63 97:64 98:65 99:65 100:65 101:66 102:67 103:68 104:69 105:70 106:71 107:72 108:72 109:73 110:74 111:75 112:76 113:77 114:78 115:79 116:79 117:80 118:81 119:81 120:81 121:82 122:83 123:84 124:85 125:86 126:87 127:87 128:88 129:89 130:90 131:91 132:91 133:91 134:92 135:92 136:92 137:92 138:93 139:94 140:94 141:95 142:96 143:97 144:98 145:99 146:100 147:101 148:102 149:102 150:103 151:104 152:105 153:106 154:107 155:108 156:109 157:110 158:111 159:112 160:113 161:114 162:115 163:115 164:115 165:116 166:117 167:118 168:118 169:119 170:120 171:121 172:122 173:123 174:123\n",
-      "INFO:tensorflow:token_is_max_context: 17:True 18:True 19:True 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True 154:True 155:True 156:True 157:True 158:True 159:True 160:True 161:True 162:True 163:True 164:True 165:True 166:True 167:True 168:True 169:True 170:True 171:True 172:True 173:True 174:True\n",
-      "INFO:tensorflow:input_ids: 101 2000 3183 2106 1996 6261 2984 9382 3711 1999 8517 1999 10223 26371 2605 1029 102 6549 2135 1010 1996 2082 2038 1037 3234 2839 1012 10234 1996 2364 2311 1005 1055 2751 8514 2003 1037 3585 6231 1997 1996 6261 2984 1012 3202 1999 2392 1997 1996 2364 2311 1998 5307 2009 1010 2003 1037 6967 6231 1997 4828 2007 2608 2039 14995 6924 2007 1996 5722 1000 2310 3490 2618 4748 2033 18168 5267 1000 1012 2279 2000 1996 2364 2311 2003 1996 13546 1997 1996 6730 2540 1012 3202 2369 1996 13546 2003 1996 24665 23052 1010 1037 14042 2173 1997 7083 1998 9185 1012 2009 2003 1037 15059 1997 1996 24665 23052 2012 10223 26371 1010 2605 2073 1996 6261 2984 22353 2135 2596 2000 3002 16595 9648 4674 2061 12083 9711 2271 1999 8517 1012 2012 1996 2203 1997 1996 2364 3298 1006 1998 1999 1037 3622 2240 2008 8539 2083 1017 11342 1998 1996 2751 8514 1007 1010 2003 1037 3722 1010 2715 2962 6231 1997 2984 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 130\n",
-      "INFO:tensorflow:end_position: 137\n",
-      "INFO:tensorflow:answer: saint bern ##ade ##tte so ##ub ##iro ##us\n",
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 1000000001\n",
-      "INFO:tensorflow:example_index: 1\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] what is in front of the notre dame main building ? [SEP] architectural ##ly , the school has a catholic character . atop the main building ' s gold dome is a golden statue of the virgin mary . immediately in front of the main building and facing it , is a copper statue of christ with arms up ##rai ##sed with the legend \" ve ##ni ##te ad me om ##nes \" . next to the main building is the basilica of the sacred heart . immediately behind the basilica is the gr ##otto , a marian place of prayer and reflection . it is a replica of the gr ##otto at lou ##rdes , france where the virgin mary reputed ##ly appeared to saint bern ##ade ##tte so ##ub ##iro ##us in 1858 . at the end of the main drive ( and in a direct line that connects through 3 statues and the gold dome ) , is a simple , modern stone statue of mary . [SEP]\n",
-      "INFO:tensorflow:token_to_orig_map: 13:0 14:0 15:0 16:1 17:2 18:3 19:4 20:5 21:6 22:6 23:7 24:8 25:9 26:10 27:10 28:10 29:11 30:12 31:13 32:14 33:15 34:16 35:17 36:18 37:19 38:20 39:20 40:21 41:22 42:23 43:24 44:25 45:26 46:27 47:28 48:29 49:30 50:30 51:31 52:32 53:33 54:34 55:35 56:36 57:37 58:38 59:39 60:39 61:39 62:40 63:41 64:42 65:43 66:43 67:43 68:43 69:44 70:45 71:46 72:46 73:46 74:46 75:47 76:48 77:49 78:50 79:51 80:52 81:53 82:54 83:55 84:56 85:57 86:58 87:58 88:59 89:60 90:61 91:62 92:63 93:64 94:65 95:65 96:65 97:66 98:67 99:68 100:69 101:70 102:71 103:72 104:72 105:73 106:74 107:75 108:76 109:77 110:78 111:79 112:79 113:80 114:81 115:81 116:81 117:82 118:83 119:84 120:85 121:86 122:87 123:87 124:88 125:89 126:90 127:91 128:91 129:91 130:92 131:92 132:92 133:92 134:93 135:94 136:94 137:95 138:96 139:97 140:98 141:99 142:100 143:101 144:102 145:102 146:103 147:104 148:105 149:106 150:107 151:108 152:109 153:110 154:111 155:112 156:113 157:114 158:115 159:115 160:115 161:116 162:117 163:118 164:118 165:119 166:120 167:121 168:122 169:123 170:123\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:token_is_max_context: 13:True 14:True 15:True 16:True 17:True 18:True 19:True 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True 154:True 155:True 156:True 157:True 158:True 159:True 160:True 161:True 162:True 163:True 164:True 165:True 166:True 167:True 168:True 169:True 170:True\n",
-      "INFO:tensorflow:input_ids: 101 2054 2003 1999 2392 1997 1996 10289 8214 2364 2311 1029 102 6549 2135 1010 1996 2082 2038 1037 3234 2839 1012 10234 1996 2364 2311 1005 1055 2751 8514 2003 1037 3585 6231 1997 1996 6261 2984 1012 3202 1999 2392 1997 1996 2364 2311 1998 5307 2009 1010 2003 1037 6967 6231 1997 4828 2007 2608 2039 14995 6924 2007 1996 5722 1000 2310 3490 2618 4748 2033 18168 5267 1000 1012 2279 2000 1996 2364 2311 2003 1996 13546 1997 1996 6730 2540 1012 3202 2369 1996 13546 2003 1996 24665 23052 1010 1037 14042 2173 1997 7083 1998 9185 1012 2009 2003 1037 15059 1997 1996 24665 23052 2012 10223 26371 1010 2605 2073 1996 6261 2984 22353 2135 2596 2000 3002 16595 9648 4674 2061 12083 9711 2271 1999 8517 1012 2012 1996 2203 1997 1996 2364 3298 1006 1998 1999 1037 3622 2240 2008 8539 2083 1017 11342 1998 1996 2751 8514 1007 1010 2003 1037 3722 1010 2715 2962 6231 1997 2984 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 52\n",
-      "INFO:tensorflow:end_position: 56\n",
-      "INFO:tensorflow:answer: a copper statue of christ\n",
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 1000000002\n",
-      "INFO:tensorflow:example_index: 2\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] the basilica of the sacred heart at notre dame is beside to which structure ? [SEP] architectural ##ly , the school has a catholic character . atop the main building ' s gold dome is a golden statue of the virgin mary . immediately in front of the main building and facing it , is a copper statue of christ with arms up ##rai ##sed with the legend \" ve ##ni ##te ad me om ##nes \" . next to the main building is the basilica of the sacred heart . immediately behind the basilica is the gr ##otto , a marian place of prayer and reflection . it is a replica of the gr ##otto at lou ##rdes , france where the virgin mary reputed ##ly appeared to saint bern ##ade ##tte so ##ub ##iro ##us in 1858 . at the end of the main drive ( and in a direct line that connects through 3 statues and the gold dome ) , is a simple , modern stone statue of mary . [SEP]\n",
-      "INFO:tensorflow:token_to_orig_map: 17:0 18:0 19:0 20:1 21:2 22:3 23:4 24:5 25:6 26:6 27:7 28:8 29:9 30:10 31:10 32:10 33:11 34:12 35:13 36:14 37:15 38:16 39:17 40:18 41:19 42:20 43:20 44:21 45:22 46:23 47:24 48:25 49:26 50:27 51:28 52:29 53:30 54:30 55:31 56:32 57:33 58:34 59:35 60:36 61:37 62:38 63:39 64:39 65:39 66:40 67:41 68:42 69:43 70:43 71:43 72:43 73:44 74:45 75:46 76:46 77:46 78:46 79:47 80:48 81:49 82:50 83:51 84:52 85:53 86:54 87:55 88:56 89:57 90:58 91:58 92:59 93:60 94:61 95:62 96:63 97:64 98:65 99:65 100:65 101:66 102:67 103:68 104:69 105:70 106:71 107:72 108:72 109:73 110:74 111:75 112:76 113:77 114:78 115:79 116:79 117:80 118:81 119:81 120:81 121:82 122:83 123:84 124:85 125:86 126:87 127:87 128:88 129:89 130:90 131:91 132:91 133:91 134:92 135:92 136:92 137:92 138:93 139:94 140:94 141:95 142:96 143:97 144:98 145:99 146:100 147:101 148:102 149:102 150:103 151:104 152:105 153:106 154:107 155:108 156:109 157:110 158:111 159:112 160:113 161:114 162:115 163:115 164:115 165:116 166:117 167:118 168:118 169:119 170:120 171:121 172:122 173:123 174:123\n",
-      "INFO:tensorflow:token_is_max_context: 17:True 18:True 19:True 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True 154:True 155:True 156:True 157:True 158:True 159:True 160:True 161:True 162:True 163:True 164:True 165:True 166:True 167:True 168:True 169:True 170:True 171:True 172:True 173:True 174:True\n",
-      "INFO:tensorflow:input_ids: 101 1996 13546 1997 1996 6730 2540 2012 10289 8214 2003 3875 2000 2029 3252 1029 102 6549 2135 1010 1996 2082 2038 1037 3234 2839 1012 10234 1996 2364 2311 1005 1055 2751 8514 2003 1037 3585 6231 1997 1996 6261 2984 1012 3202 1999 2392 1997 1996 2364 2311 1998 5307 2009 1010 2003 1037 6967 6231 1997 4828 2007 2608 2039 14995 6924 2007 1996 5722 1000 2310 3490 2618 4748 2033 18168 5267 1000 1012 2279 2000 1996 2364 2311 2003 1996 13546 1997 1996 6730 2540 1012 3202 2369 1996 13546 2003 1996 24665 23052 1010 1037 14042 2173 1997 7083 1998 9185 1012 2009 2003 1037 15059 1997 1996 24665 23052 2012 10223 26371 1010 2605 2073 1996 6261 2984 22353 2135 2596 2000 3002 16595 9648 4674 2061 12083 9711 2271 1999 8517 1012 2012 1996 2203 1997 1996 2364 3298 1006 1998 1999 1037 3622 2240 2008 8539 2083 1017 11342 1998 1996 2751 8514 1007 1010 2003 1037 3722 1010 2715 2962 6231 1997 2984 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 81\n",
-      "INFO:tensorflow:end_position: 83\n",
-      "INFO:tensorflow:answer: the main building\n",
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 1000000003\n",
-      "INFO:tensorflow:example_index: 3\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] what is the gr ##otto at notre dame ? [SEP] architectural ##ly , the school has a catholic character . atop the main building ' s gold dome is a golden statue of the virgin mary . immediately in front of the main building and facing it , is a copper statue of christ with arms up ##rai ##sed with the legend \" ve ##ni ##te ad me om ##nes \" . next to the main building is the basilica of the sacred heart . immediately behind the basilica is the gr ##otto , a marian place of prayer and reflection . it is a replica of the gr ##otto at lou ##rdes , france where the virgin mary reputed ##ly appeared to saint bern ##ade ##tte so ##ub ##iro ##us in 1858 . at the end of the main drive ( and in a direct line that connects through 3 statues and the gold dome ) , is a simple , modern stone statue of mary . [SEP]\n",
-      "INFO:tensorflow:token_to_orig_map: 11:0 12:0 13:0 14:1 15:2 16:3 17:4 18:5 19:6 20:6 21:7 22:8 23:9 24:10 25:10 26:10 27:11 28:12 29:13 30:14 31:15 32:16 33:17 34:18 35:19 36:20 37:20 38:21 39:22 40:23 41:24 42:25 43:26 44:27 45:28 46:29 47:30 48:30 49:31 50:32 51:33 52:34 53:35 54:36 55:37 56:38 57:39 58:39 59:39 60:40 61:41 62:42 63:43 64:43 65:43 66:43 67:44 68:45 69:46 70:46 71:46 72:46 73:47 74:48 75:49 76:50 77:51 78:52 79:53 80:54 81:55 82:56 83:57 84:58 85:58 86:59 87:60 88:61 89:62 90:63 91:64 92:65 93:65 94:65 95:66 96:67 97:68 98:69 99:70 100:71 101:72 102:72 103:73 104:74 105:75 106:76 107:77 108:78 109:79 110:79 111:80 112:81 113:81 114:81 115:82 116:83 117:84 118:85 119:86 120:87 121:87 122:88 123:89 124:90 125:91 126:91 127:91 128:92 129:92 130:92 131:92 132:93 133:94 134:94 135:95 136:96 137:97 138:98 139:99 140:100 141:101 142:102 143:102 144:103 145:104 146:105 147:106 148:107 149:108 150:109 151:110 152:111 153:112 154:113 155:114 156:115 157:115 158:115 159:116 160:117 161:118 162:118 163:119 164:120 165:121 166:122 167:123 168:123\n",
-      "INFO:tensorflow:token_is_max_context: 11:True 12:True 13:True 14:True 15:True 16:True 17:True 18:True 19:True 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True 154:True 155:True 156:True 157:True 158:True 159:True 160:True 161:True 162:True 163:True 164:True 165:True 166:True 167:True 168:True\n",
-      "INFO:tensorflow:input_ids: 101 2054 2003 1996 24665 23052 2012 10289 8214 1029 102 6549 2135 1010 1996 2082 2038 1037 3234 2839 1012 10234 1996 2364 2311 1005 1055 2751 8514 2003 1037 3585 6231 1997 1996 6261 2984 1012 3202 1999 2392 1997 1996 2364 2311 1998 5307 2009 1010 2003 1037 6967 6231 1997 4828 2007 2608 2039 14995 6924 2007 1996 5722 1000 2310 3490 2618 4748 2033 18168 5267 1000 1012 2279 2000 1996 2364 2311 2003 1996 13546 1997 1996 6730 2540 1012 3202 2369 1996 13546 2003 1996 24665 23052 1010 1037 14042 2173 1997 7083 1998 9185 1012 2009 2003 1037 15059 1997 1996 24665 23052 2012 10223 26371 1010 2605 2073 1996 6261 2984 22353 2135 2596 2000 3002 16595 9648 4674 2061 12083 9711 2271 1999 8517 1012 2012 1996 2203 1997 1996 2364 3298 1006 1998 1999 1037 3622 2240 2008 8539 2083 1017 11342 1998 1996 2751 8514 1007 1010 2003 1037 3722 1010 2715 2962 6231 1997 2984 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 95\n",
-      "INFO:tensorflow:end_position: 101\n",
-      "INFO:tensorflow:answer: a marian place of prayer and reflection\n",
-      "INFO:tensorflow:*** Example ***\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:unique_id: 1000000004\n",
-      "INFO:tensorflow:example_index: 4\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] what sits on top of the main building at notre dame ? [SEP] architectural ##ly , the school has a catholic character . atop the main building ' s gold dome is a golden statue of the virgin mary . immediately in front of the main building and facing it , is a copper statue of christ with arms up ##rai ##sed with the legend \" ve ##ni ##te ad me om ##nes \" . next to the main building is the basilica of the sacred heart . immediately behind the basilica is the gr ##otto , a marian place of prayer and reflection . it is a replica of the gr ##otto at lou ##rdes , france where the virgin mary reputed ##ly appeared to saint bern ##ade ##tte so ##ub ##iro ##us in 1858 . at the end of the main drive ( and in a direct line that connects through 3 statues and the gold dome ) , is a simple , modern stone statue of mary . [SEP]\n",
-      "INFO:tensorflow:token_to_orig_map: 14:0 15:0 16:0 17:1 18:2 19:3 20:4 21:5 22:6 23:6 24:7 25:8 26:9 27:10 28:10 29:10 30:11 31:12 32:13 33:14 34:15 35:16 36:17 37:18 38:19 39:20 40:20 41:21 42:22 43:23 44:24 45:25 46:26 47:27 48:28 49:29 50:30 51:30 52:31 53:32 54:33 55:34 56:35 57:36 58:37 59:38 60:39 61:39 62:39 63:40 64:41 65:42 66:43 67:43 68:43 69:43 70:44 71:45 72:46 73:46 74:46 75:46 76:47 77:48 78:49 79:50 80:51 81:52 82:53 83:54 84:55 85:56 86:57 87:58 88:58 89:59 90:60 91:61 92:62 93:63 94:64 95:65 96:65 97:65 98:66 99:67 100:68 101:69 102:70 103:71 104:72 105:72 106:73 107:74 108:75 109:76 110:77 111:78 112:79 113:79 114:80 115:81 116:81 117:81 118:82 119:83 120:84 121:85 122:86 123:87 124:87 125:88 126:89 127:90 128:91 129:91 130:91 131:92 132:92 133:92 134:92 135:93 136:94 137:94 138:95 139:96 140:97 141:98 142:99 143:100 144:101 145:102 146:102 147:103 148:104 149:105 150:106 151:107 152:108 153:109 154:110 155:111 156:112 157:113 158:114 159:115 160:115 161:115 162:116 163:117 164:118 165:118 166:119 167:120 168:121 169:122 170:123 171:123\n",
-      "INFO:tensorflow:token_is_max_context: 14:True 15:True 16:True 17:True 18:True 19:True 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True 154:True 155:True 156:True 157:True 158:True 159:True 160:True 161:True 162:True 163:True 164:True 165:True 166:True 167:True 168:True 169:True 170:True 171:True\n",
-      "INFO:tensorflow:input_ids: 101 2054 7719 2006 2327 1997 1996 2364 2311 2012 10289 8214 1029 102 6549 2135 1010 1996 2082 2038 1037 3234 2839 1012 10234 1996 2364 2311 1005 1055 2751 8514 2003 1037 3585 6231 1997 1996 6261 2984 1012 3202 1999 2392 1997 1996 2364 2311 1998 5307 2009 1010 2003 1037 6967 6231 1997 4828 2007 2608 2039 14995 6924 2007 1996 5722 1000 2310 3490 2618 4748 2033 18168 5267 1000 1012 2279 2000 1996 2364 2311 2003 1996 13546 1997 1996 6730 2540 1012 3202 2369 1996 13546 2003 1996 24665 23052 1010 1037 14042 2173 1997 7083 1998 9185 1012 2009 2003 1037 15059 1997 1996 24665 23052 2012 10223 26371 1010 2605 2073 1996 6261 2984 22353 2135 2596 2000 3002 16595 9648 4674 2061 12083 9711 2271 1999 8517 1012 2012 1996 2203 1997 1996 2364 3298 1006 1998 1999 1037 3622 2240 2008 8539 2083 1017 11342 1998 1996 2751 8514 1007 1010 2003 1037 3722 1010 2715 2962 6231 1997 2984 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 33\n",
-      "INFO:tensorflow:end_position: 39\n",
-      "INFO:tensorflow:answer: a golden statue of the virgin mary\n",
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 1000000005\n",
-      "INFO:tensorflow:example_index: 5\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] when did the scholastic magazine of notre dame begin publishing ? [SEP] as at most other universities , notre dame ' s students run a number of news media outlets . the nine student - run outlets include three newspapers , both a radio and television station , and several magazines and journals . begun as a one - page journal in september 1876 , the scholastic magazine is issued twice monthly and claims to be the oldest continuous collegiate publication in the united states . the other magazine , the jug ##gler , is released twice a year and focuses on student literature and artwork . the dome yearbook is published annually . the newspapers have varying publication interests , with the observer published daily and mainly reporting university and other news , and staffed by students from both notre dame and saint mary ' s college . unlike scholastic and the dome , the observer is an independent publication and does not have a faculty advisor or any editorial oversight from the university . in 1987 , when some students believed that the observer began to show a conservative bias , a liberal newspaper , common sense was published . likewise , in 2003 , when other students believed that the paper showed a liberal bias , the conservative paper irish rover went into production . neither paper is published as often as the observer ; however , all three are distributed to all students . finally , in spring 2008 an undergraduate journal for political science research , beyond politics , made its debut . [SEP]\n",
-      "INFO:tensorflow:token_to_orig_map: 13:0 14:1 15:2 16:3 17:4 18:4 19:5 20:6 21:6 22:6 23:7 24:8 25:9 26:10 27:11 28:12 29:13 30:14 31:14 32:15 33:16 34:17 35:17 36:17 37:18 38:19 39:20 40:21 41:21 42:22 43:23 44:24 45:25 46:26 47:27 48:27 49:28 50:29 51:30 52:31 53:32 54:32 55:33 56:34 57:35 58:36 59:36 60:36 61:37 62:38 63:39 64:40 65:40 66:41 67:42 68:43 69:44 70:45 71:46 72:47 73:48 74:49 75:50 76:51 77:52 78:53 79:54 80:55 81:56 82:57 83:58 84:59 85:60 86:60 87:61 88:62 89:63 90:63 91:64 92:65 93:65 94:65 95:66 96:67 97:68 98:69 99:70 100:71 101:72 102:73 103:74 104:75 105:76 106:77 107:77 108:78 109:79 110:80 111:81 112:82 113:83 114:83 115:84 116:85 117:86 118:87 119:88 120:89 121:89 122:90 123:91 124:92 125:93 126:94 127:95 128:96 129:97 130:98 131:99 132:100 133:101 134:101 135:102 136:103 137:104 138:105 139:106 140:107 141:108 142:109 143:110 144:111 145:112 146:112 147:112 148:113 149:113 150:114 151:115 152:116 153:117 154:118 155:118 156:119 157:120 158:121 159:122 160:123 161:124 162:125 163:126 164:127 165:128 166:129 167:130 168:131 169:132 170:133 171:134 172:135 173:136 174:137 175:138 176:138 177:139 178:140 179:140 180:141 181:142 182:143 183:144 184:145 185:146 186:147 187:148 188:149 189:150 190:151 191:152 192:153 193:153 194:154 195:155 196:156 197:156 198:157 199:158 200:159 201:160 202:160 203:161 204:161 205:162 206:163 207:163 208:164 209:165 210:166 211:167 212:168 213:169 214:170 215:171 216:172 217:173 218:174 219:174 220:175 221:176 222:177 223:178 224:179 225:180 226:181 227:182 228:182 229:183 230:184 231:185 232:186 233:187 234:188 235:189 236:190 237:191 238:191 239:192 240:192 241:193 242:194 243:195 244:196 245:197 246:198 247:199 248:199 249:200 250:200 251:201 252:202 253:203 254:204 255:205 256:206 257:207 258:208 259:209 260:210 261:210 262:211 263:212 264:212 265:213 266:214 267:215 268:215\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:token_is_max_context: 13:True 14:True 15:True 16:True 17:True 18:True 19:True 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True 154:True 155:True 156:True 157:True 158:True 159:True 160:True 161:True 162:True 163:True 164:True 165:True 166:True 167:True 168:True 169:True 170:True 171:True 172:True 173:True 174:True 175:True 176:True 177:True 178:True 179:True 180:True 181:True 182:True 183:True 184:True 185:True 186:True 187:True 188:True 189:True 190:True 191:True 192:True 193:True 194:True 195:True 196:True 197:True 198:True 199:True 200:True 201:True 202:True 203:True 204:True 205:True 206:True 207:True 208:True 209:True 210:True 211:True 212:True 213:True 214:True 215:True 216:True 217:True 218:True 219:True 220:True 221:True 222:True 223:True 224:True 225:True 226:True 227:True 228:True 229:True 230:True 231:True 232:True 233:True 234:True 235:True 236:True 237:True 238:True 239:True 240:True 241:True 242:True 243:True 244:True 245:True 246:True 247:True 248:True 249:True 250:True 251:True 252:True 253:True 254:True 255:True 256:True 257:True 258:True 259:True 260:True 261:True 262:True 263:True 264:True 265:True 266:True 267:True 268:True\n",
-      "INFO:tensorflow:input_ids: 101 2043 2106 1996 24105 2932 1997 10289 8214 4088 4640 1029 102 2004 2012 2087 2060 5534 1010 10289 8214 1005 1055 2493 2448 1037 2193 1997 2739 2865 11730 1012 1996 3157 3076 1011 2448 11730 2421 2093 6399 1010 2119 1037 2557 1998 2547 2276 1010 1998 2195 7298 1998 9263 1012 5625 2004 1037 2028 1011 3931 3485 1999 2244 7326 1010 1996 24105 2932 2003 3843 3807 7058 1998 4447 2000 2022 1996 4587 7142 9234 4772 1999 1996 2142 2163 1012 1996 2060 2932 1010 1996 26536 17420 1010 2003 2207 3807 1037 2095 1998 7679 2006 3076 3906 1998 8266 1012 1996 8514 24803 2003 2405 6604 1012 1996 6399 2031 9671 4772 5426 1010 2007 1996 9718 2405 3679 1998 3701 7316 2118 1998 2060 2739 1010 1998 21121 2011 2493 2013 2119 10289 8214 1998 3002 2984 1005 1055 2267 1012 4406 24105 1998 1996 8514 1010 1996 9718 2003 2019 2981 4772 1998 2515 2025 2031 1037 4513 8619 2030 2151 8368 15709 2013 1996 2118 1012 1999 3055 1010 2043 2070 2493 3373 2008 1996 9718 2211 2000 2265 1037 4603 13827 1010 1037 4314 3780 1010 2691 3168 2001 2405 1012 10655 1010 1999 2494 1010 2043 2060 2493 3373 2008 1996 3259 3662 1037 4314 13827 1010 1996 4603 3259 3493 13631 2253 2046 2537 1012 4445 3259 2003 2405 2004 2411 2004 1996 9718 1025 2174 1010 2035 2093 2024 5500 2000 2035 2493 1012 2633 1010 1999 3500 2263 2019 8324 3485 2005 2576 2671 2470 1010 3458 4331 1010 2081 2049 2834 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 63\n",
-      "INFO:tensorflow:end_position: 64\n",
-      "INFO:tensorflow:answer: september 1876\n",
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 1000000006\n",
-      "INFO:tensorflow:example_index: 6\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] how often is notre dame ' s the jug ##gler published ? [SEP] as at most other universities , notre dame ' s students run a number of news media outlets . the nine student - run outlets include three newspapers , both a radio and television station , and several magazines and journals . begun as a one - page journal in september 1876 , the scholastic magazine is issued twice monthly and claims to be the oldest continuous collegiate publication in the united states . the other magazine , the jug ##gler , is released twice a year and focuses on student literature and artwork . the dome yearbook is published annually . the newspapers have varying publication interests , with the observer published daily and mainly reporting university and other news , and staffed by students from both notre dame and saint mary ' s college . unlike scholastic and the dome , the observer is an independent publication and does not have a faculty advisor or any editorial oversight from the university . in 1987 , when some students believed that the observer began to show a conservative bias , a liberal newspaper , common sense was published . likewise , in 2003 , when other students believed that the paper showed a liberal bias , the conservative paper irish rover went into production . neither paper is published as often as the observer ; however , all three are distributed to all students . finally , in spring 2008 an undergraduate journal for political science research , beyond politics , made its debut . [SEP]\n",
-      "INFO:tensorflow:token_to_orig_map: 14:0 15:1 16:2 17:3 18:4 19:4 20:5 21:6 22:6 23:6 24:7 25:8 26:9 27:10 28:11 29:12 30:13 31:14 32:14 33:15 34:16 35:17 36:17 37:17 38:18 39:19 40:20 41:21 42:21 43:22 44:23 45:24 46:25 47:26 48:27 49:27 50:28 51:29 52:30 53:31 54:32 55:32 56:33 57:34 58:35 59:36 60:36 61:36 62:37 63:38 64:39 65:40 66:40 67:41 68:42 69:43 70:44 71:45 72:46 73:47 74:48 75:49 76:50 77:51 78:52 79:53 80:54 81:55 82:56 83:57 84:58 85:59 86:60 87:60 88:61 89:62 90:63 91:63 92:64 93:65 94:65 95:65 96:66 97:67 98:68 99:69 100:70 101:71 102:72 103:73 104:74 105:75 106:76 107:77 108:77 109:78 110:79 111:80 112:81 113:82 114:83 115:83 116:84 117:85 118:86 119:87 120:88 121:89 122:89 123:90 124:91 125:92 126:93 127:94 128:95 129:96 130:97 131:98 132:99 133:100 134:101 135:101 136:102 137:103 138:104 139:105 140:106 141:107 142:108 143:109 144:110 145:111 146:112 147:112 148:112 149:113 150:113 151:114 152:115 153:116 154:117 155:118 156:118 157:119 158:120 159:121 160:122 161:123 162:124 163:125 164:126 165:127 166:128 167:129 168:130 169:131 170:132 171:133 172:134 173:135 174:136 175:137 176:138 177:138 178:139 179:140 180:140 181:141 182:142 183:143 184:144 185:145 186:146 187:147 188:148 189:149 190:150 191:151 192:152 193:153 194:153 195:154 196:155 197:156 198:156 199:157 200:158 201:159 202:160 203:160 204:161 205:161 206:162 207:163 208:163 209:164 210:165 211:166 212:167 213:168 214:169 215:170 216:171 217:172 218:173 219:174 220:174 221:175 222:176 223:177 224:178 225:179 226:180 227:181 228:182 229:182 230:183 231:184 232:185 233:186 234:187 235:188 236:189 237:190 238:191 239:191 240:192 241:192 242:193 243:194 244:195 245:196 246:197 247:198 248:199 249:199 250:200 251:200 252:201 253:202 254:203 255:204 256:205 257:206 258:207 259:208 260:209 261:210 262:210 263:211 264:212 265:212 266:213 267:214 268:215 269:215\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:token_is_max_context: 14:True 15:True 16:True 17:True 18:True 19:True 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True 154:True 155:True 156:True 157:True 158:True 159:True 160:True 161:True 162:True 163:True 164:True 165:True 166:True 167:True 168:True 169:True 170:True 171:True 172:True 173:True 174:True 175:True 176:True 177:True 178:True 179:True 180:True 181:True 182:True 183:True 184:True 185:True 186:True 187:True 188:True 189:True 190:True 191:True 192:True 193:True 194:True 195:True 196:True 197:True 198:True 199:True 200:True 201:True 202:True 203:True 204:True 205:True 206:True 207:True 208:True 209:True 210:True 211:True 212:True 213:True 214:True 215:True 216:True 217:True 218:True 219:True 220:True 221:True 222:True 223:True 224:True 225:True 226:True 227:True 228:True 229:True 230:True 231:True 232:True 233:True 234:True 235:True 236:True 237:True 238:True 239:True 240:True 241:True 242:True 243:True 244:True 245:True 246:True 247:True 248:True 249:True 250:True 251:True 252:True 253:True 254:True 255:True 256:True 257:True 258:True 259:True 260:True 261:True 262:True 263:True 264:True 265:True 266:True 267:True 268:True 269:True\n",
-      "INFO:tensorflow:input_ids: 101 2129 2411 2003 10289 8214 1005 1055 1996 26536 17420 2405 1029 102 2004 2012 2087 2060 5534 1010 10289 8214 1005 1055 2493 2448 1037 2193 1997 2739 2865 11730 1012 1996 3157 3076 1011 2448 11730 2421 2093 6399 1010 2119 1037 2557 1998 2547 2276 1010 1998 2195 7298 1998 9263 1012 5625 2004 1037 2028 1011 3931 3485 1999 2244 7326 1010 1996 24105 2932 2003 3843 3807 7058 1998 4447 2000 2022 1996 4587 7142 9234 4772 1999 1996 2142 2163 1012 1996 2060 2932 1010 1996 26536 17420 1010 2003 2207 3807 1037 2095 1998 7679 2006 3076 3906 1998 8266 1012 1996 8514 24803 2003 2405 6604 1012 1996 6399 2031 9671 4772 5426 1010 2007 1996 9718 2405 3679 1998 3701 7316 2118 1998 2060 2739 1010 1998 21121 2011 2493 2013 2119 10289 8214 1998 3002 2984 1005 1055 2267 1012 4406 24105 1998 1996 8514 1010 1996 9718 2003 2019 2981 4772 1998 2515 2025 2031 1037 4513 8619 2030 2151 8368 15709 2013 1996 2118 1012 1999 3055 1010 2043 2070 2493 3373 2008 1996 9718 2211 2000 2265 1037 4603 13827 1010 1037 4314 3780 1010 2691 3168 2001 2405 1012 10655 1010 1999 2494 1010 2043 2060 2493 3373 2008 1996 3259 3662 1037 4314 13827 1010 1996 4603 3259 3493 13631 2253 2046 2537 1012 4445 3259 2003 2405 2004 2411 2004 1996 9718 1025 2174 1010 2035 2093 2024 5500 2000 2035 2493 1012 2633 1010 1999 3500 2263 2019 8324 3485 2005 2576 2671 2470 1010 3458 4331 1010 2081 2049 2834 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 98\n",
-      "INFO:tensorflow:end_position: 98\n",
-      "INFO:tensorflow:answer: twice\n",
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 1000000007\n",
-      "INFO:tensorflow:example_index: 7\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] what is the daily student paper at notre dame called ? [SEP] as at most other universities , notre dame ' s students run a number of news media outlets . the nine student - run outlets include three newspapers , both a radio and television station , and several magazines and journals . begun as a one - page journal in september 1876 , the scholastic magazine is issued twice monthly and claims to be the oldest continuous collegiate publication in the united states . the other magazine , the jug ##gler , is released twice a year and focuses on student literature and artwork . the dome yearbook is published annually . the newspapers have varying publication interests , with the observer published daily and mainly reporting university and other news , and staffed by students from both notre dame and saint mary ' s college . unlike scholastic and the dome , the observer is an independent publication and does not have a faculty advisor or any editorial oversight from the university . in 1987 , when some students believed that the observer began to show a conservative bias , a liberal newspaper , common sense was published . likewise , in 2003 , when other students believed that the paper showed a liberal bias , the conservative paper irish rover went into production . neither paper is published as often as the observer ; however , all three are distributed to all students . finally , in spring 2008 an undergraduate journal for political science research , beyond politics , made its debut . [SEP]\n",
-      "INFO:tensorflow:token_to_orig_map: 13:0 14:1 15:2 16:3 17:4 18:4 19:5 20:6 21:6 22:6 23:7 24:8 25:9 26:10 27:11 28:12 29:13 30:14 31:14 32:15 33:16 34:17 35:17 36:17 37:18 38:19 39:20 40:21 41:21 42:22 43:23 44:24 45:25 46:26 47:27 48:27 49:28 50:29 51:30 52:31 53:32 54:32 55:33 56:34 57:35 58:36 59:36 60:36 61:37 62:38 63:39 64:40 65:40 66:41 67:42 68:43 69:44 70:45 71:46 72:47 73:48 74:49 75:50 76:51 77:52 78:53 79:54 80:55 81:56 82:57 83:58 84:59 85:60 86:60 87:61 88:62 89:63 90:63 91:64 92:65 93:65 94:65 95:66 96:67 97:68 98:69 99:70 100:71 101:72 102:73 103:74 104:75 105:76 106:77 107:77 108:78 109:79 110:80 111:81 112:82 113:83 114:83 115:84 116:85 117:86 118:87 119:88 120:89 121:89 122:90 123:91 124:92 125:93 126:94 127:95 128:96 129:97 130:98 131:99 132:100 133:101 134:101 135:102 136:103 137:104 138:105 139:106 140:107 141:108 142:109 143:110 144:111 145:112 146:112 147:112 148:113 149:113 150:114 151:115 152:116 153:117 154:118 155:118 156:119 157:120 158:121 159:122 160:123 161:124 162:125 163:126 164:127 165:128 166:129 167:130 168:131 169:132 170:133 171:134 172:135 173:136 174:137 175:138 176:138 177:139 178:140 179:140 180:141 181:142 182:143 183:144 184:145 185:146 186:147 187:148 188:149 189:150 190:151 191:152 192:153 193:153 194:154 195:155 196:156 197:156 198:157 199:158 200:159 201:160 202:160 203:161 204:161 205:162 206:163 207:163 208:164 209:165 210:166 211:167 212:168 213:169 214:170 215:171 216:172 217:173 218:174 219:174 220:175 221:176 222:177 223:178 224:179 225:180 226:181 227:182 228:182 229:183 230:184 231:185 232:186 233:187 234:188 235:189 236:190 237:191 238:191 239:192 240:192 241:193 242:194 243:195 244:196 245:197 246:198 247:199 248:199 249:200 250:200 251:201 252:202 253:203 254:204 255:205 256:206 257:207 258:208 259:209 260:210 261:210 262:211 263:212 264:212 265:213 266:214 267:215 268:215\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:token_is_max_context: 13:True 14:True 15:True 16:True 17:True 18:True 19:True 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True 154:True 155:True 156:True 157:True 158:True 159:True 160:True 161:True 162:True 163:True 164:True 165:True 166:True 167:True 168:True 169:True 170:True 171:True 172:True 173:True 174:True 175:True 176:True 177:True 178:True 179:True 180:True 181:True 182:True 183:True 184:True 185:True 186:True 187:True 188:True 189:True 190:True 191:True 192:True 193:True 194:True 195:True 196:True 197:True 198:True 199:True 200:True 201:True 202:True 203:True 204:True 205:True 206:True 207:True 208:True 209:True 210:True 211:True 212:True 213:True 214:True 215:True 216:True 217:True 218:True 219:True 220:True 221:True 222:True 223:True 224:True 225:True 226:True 227:True 228:True 229:True 230:True 231:True 232:True 233:True 234:True 235:True 236:True 237:True 238:True 239:True 240:True 241:True 242:True 243:True 244:True 245:True 246:True 247:True 248:True 249:True 250:True 251:True 252:True 253:True 254:True 255:True 256:True 257:True 258:True 259:True 260:True 261:True 262:True 263:True 264:True 265:True 266:True 267:True 268:True\n",
-      "INFO:tensorflow:input_ids: 101 2054 2003 1996 3679 3076 3259 2012 10289 8214 2170 1029 102 2004 2012 2087 2060 5534 1010 10289 8214 1005 1055 2493 2448 1037 2193 1997 2739 2865 11730 1012 1996 3157 3076 1011 2448 11730 2421 2093 6399 1010 2119 1037 2557 1998 2547 2276 1010 1998 2195 7298 1998 9263 1012 5625 2004 1037 2028 1011 3931 3485 1999 2244 7326 1010 1996 24105 2932 2003 3843 3807 7058 1998 4447 2000 2022 1996 4587 7142 9234 4772 1999 1996 2142 2163 1012 1996 2060 2932 1010 1996 26536 17420 1010 2003 2207 3807 1037 2095 1998 7679 2006 3076 3906 1998 8266 1012 1996 8514 24803 2003 2405 6604 1012 1996 6399 2031 9671 4772 5426 1010 2007 1996 9718 2405 3679 1998 3701 7316 2118 1998 2060 2739 1010 1998 21121 2011 2493 2013 2119 10289 8214 1998 3002 2984 1005 1055 2267 1012 4406 24105 1998 1996 8514 1010 1996 9718 2003 2019 2981 4772 1998 2515 2025 2031 1037 4513 8619 2030 2151 8368 15709 2013 1996 2118 1012 1999 3055 1010 2043 2070 2493 3373 2008 1996 9718 2211 2000 2265 1037 4603 13827 1010 1037 4314 3780 1010 2691 3168 2001 2405 1012 10655 1010 1999 2494 1010 2043 2060 2493 3373 2008 1996 3259 3662 1037 4314 13827 1010 1996 4603 3259 3493 13631 2253 2046 2537 1012 4445 3259 2003 2405 2004 2411 2004 1996 9718 1025 2174 1010 2035 2093 2024 5500 2000 2035 2493 1012 2633 1010 1999 3500 2263 2019 8324 3485 2005 2576 2671 2470 1010 3458 4331 1010 2081 2049 2834 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 123\n",
-      "INFO:tensorflow:end_position: 124\n",
-      "INFO:tensorflow:answer: the observer\n",
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 1000000008\n",
-      "INFO:tensorflow:example_index: 8\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] how many student news papers are found at notre dame ? [SEP] as at most other universities , notre dame ' s students run a number of news media outlets . the nine student - run outlets include three newspapers , both a radio and television station , and several magazines and journals . begun as a one - page journal in september 1876 , the scholastic magazine is issued twice monthly and claims to be the oldest continuous collegiate publication in the united states . the other magazine , the jug ##gler , is released twice a year and focuses on student literature and artwork . the dome yearbook is published annually . the newspapers have varying publication interests , with the observer published daily and mainly reporting university and other news , and staffed by students from both notre dame and saint mary ' s college . unlike scholastic and the dome , the observer is an independent publication and does not have a faculty advisor or any editorial oversight from the university . in 1987 , when some students believed that the observer began to show a conservative bias , a liberal newspaper , common sense was published . likewise , in 2003 , when other students believed that the paper showed a liberal bias , the conservative paper irish rover went into production . neither paper is published as often as the observer ; however , all three are distributed to all students . finally , in spring 2008 an undergraduate journal for political science research , beyond politics , made its debut . [SEP]\n",
-      "INFO:tensorflow:token_to_orig_map: 13:0 14:1 15:2 16:3 17:4 18:4 19:5 20:6 21:6 22:6 23:7 24:8 25:9 26:10 27:11 28:12 29:13 30:14 31:14 32:15 33:16 34:17 35:17 36:17 37:18 38:19 39:20 40:21 41:21 42:22 43:23 44:24 45:25 46:26 47:27 48:27 49:28 50:29 51:30 52:31 53:32 54:32 55:33 56:34 57:35 58:36 59:36 60:36 61:37 62:38 63:39 64:40 65:40 66:41 67:42 68:43 69:44 70:45 71:46 72:47 73:48 74:49 75:50 76:51 77:52 78:53 79:54 80:55 81:56 82:57 83:58 84:59 85:60 86:60 87:61 88:62 89:63 90:63 91:64 92:65 93:65 94:65 95:66 96:67 97:68 98:69 99:70 100:71 101:72 102:73 103:74 104:75 105:76 106:77 107:77 108:78 109:79 110:80 111:81 112:82 113:83 114:83 115:84 116:85 117:86 118:87 119:88 120:89 121:89 122:90 123:91 124:92 125:93 126:94 127:95 128:96 129:97 130:98 131:99 132:100 133:101 134:101 135:102 136:103 137:104 138:105 139:106 140:107 141:108 142:109 143:110 144:111 145:112 146:112 147:112 148:113 149:113 150:114 151:115 152:116 153:117 154:118 155:118 156:119 157:120 158:121 159:122 160:123 161:124 162:125 163:126 164:127 165:128 166:129 167:130 168:131 169:132 170:133 171:134 172:135 173:136 174:137 175:138 176:138 177:139 178:140 179:140 180:141 181:142 182:143 183:144 184:145 185:146 186:147 187:148 188:149 189:150 190:151 191:152 192:153 193:153 194:154 195:155 196:156 197:156 198:157 199:158 200:159 201:160 202:160 203:161 204:161 205:162 206:163 207:163 208:164 209:165 210:166 211:167 212:168 213:169 214:170 215:171 216:172 217:173 218:174 219:174 220:175 221:176 222:177 223:178 224:179 225:180 226:181 227:182 228:182 229:183 230:184 231:185 232:186 233:187 234:188 235:189 236:190 237:191 238:191 239:192 240:192 241:193 242:194 243:195 244:196 245:197 246:198 247:199 248:199 249:200 250:200 251:201 252:202 253:203 254:204 255:205 256:206 257:207 258:208 259:209 260:210 261:210 262:211 263:212 264:212 265:213 266:214 267:215 268:215\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:token_is_max_context: 13:True 14:True 15:True 16:True 17:True 18:True 19:True 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True 154:True 155:True 156:True 157:True 158:True 159:True 160:True 161:True 162:True 163:True 164:True 165:True 166:True 167:True 168:True 169:True 170:True 171:True 172:True 173:True 174:True 175:True 176:True 177:True 178:True 179:True 180:True 181:True 182:True 183:True 184:True 185:True 186:True 187:True 188:True 189:True 190:True 191:True 192:True 193:True 194:True 195:True 196:True 197:True 198:True 199:True 200:True 201:True 202:True 203:True 204:True 205:True 206:True 207:True 208:True 209:True 210:True 211:True 212:True 213:True 214:True 215:True 216:True 217:True 218:True 219:True 220:True 221:True 222:True 223:True 224:True 225:True 226:True 227:True 228:True 229:True 230:True 231:True 232:True 233:True 234:True 235:True 236:True 237:True 238:True 239:True 240:True 241:True 242:True 243:True 244:True 245:True 246:True 247:True 248:True 249:True 250:True 251:True 252:True 253:True 254:True 255:True 256:True 257:True 258:True 259:True 260:True 261:True 262:True 263:True 264:True 265:True 266:True 267:True 268:True\n",
-      "INFO:tensorflow:input_ids: 101 2129 2116 3076 2739 4981 2024 2179 2012 10289 8214 1029 102 2004 2012 2087 2060 5534 1010 10289 8214 1005 1055 2493 2448 1037 2193 1997 2739 2865 11730 1012 1996 3157 3076 1011 2448 11730 2421 2093 6399 1010 2119 1037 2557 1998 2547 2276 1010 1998 2195 7298 1998 9263 1012 5625 2004 1037 2028 1011 3931 3485 1999 2244 7326 1010 1996 24105 2932 2003 3843 3807 7058 1998 4447 2000 2022 1996 4587 7142 9234 4772 1999 1996 2142 2163 1012 1996 2060 2932 1010 1996 26536 17420 1010 2003 2207 3807 1037 2095 1998 7679 2006 3076 3906 1998 8266 1012 1996 8514 24803 2003 2405 6604 1012 1996 6399 2031 9671 4772 5426 1010 2007 1996 9718 2405 3679 1998 3701 7316 2118 1998 2060 2739 1010 1998 21121 2011 2493 2013 2119 10289 8214 1998 3002 2984 1005 1055 2267 1012 4406 24105 1998 1996 8514 1010 1996 9718 2003 2019 2981 4772 1998 2515 2025 2031 1037 4513 8619 2030 2151 8368 15709 2013 1996 2118 1012 1999 3055 1010 2043 2070 2493 3373 2008 1996 9718 2211 2000 2265 1037 4603 13827 1010 1037 4314 3780 1010 2691 3168 2001 2405 1012 10655 1010 1999 2494 1010 2043 2060 2493 3373 2008 1996 3259 3662 1037 4314 13827 1010 1996 4603 3259 3493 13631 2253 2046 2537 1012 4445 3259 2003 2405 2004 2411 2004 1996 9718 1025 2174 1010 2035 2093 2024 5500 2000 2035 2493 1012 2633 1010 1999 3500 2263 2019 8324 3485 2005 2576 2671 2470 1010 3458 4331 1010 2081 2049 2834 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 39\n",
-      "INFO:tensorflow:end_position: 39\n",
-      "INFO:tensorflow:answer: three\n",
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 1000000009\n",
-      "INFO:tensorflow:example_index: 9\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] in what year did the student paper common sense begin publication at notre dame ? [SEP] as at most other universities , notre dame ' s students run a number of news media outlets . the nine student - run outlets include three newspapers , both a radio and television station , and several magazines and journals . begun as a one - page journal in september 1876 , the scholastic magazine is issued twice monthly and claims to be the oldest continuous collegiate publication in the united states . the other magazine , the jug ##gler , is released twice a year and focuses on student literature and artwork . the dome yearbook is published annually . the newspapers have varying publication interests , with the observer published daily and mainly reporting university and other news , and staffed by students from both notre dame and saint mary ' s college . unlike scholastic and the dome , the observer is an independent publication and does not have a faculty advisor or any editorial oversight from the university . in 1987 , when some students believed that the observer began to show a conservative bias , a liberal newspaper , common sense was published . likewise , in 2003 , when other students believed that the paper showed a liberal bias , the conservative paper irish rover went into production . neither paper is published as often as the observer ; however , all three are distributed to all students . finally , in spring 2008 an undergraduate journal for political science research , beyond politics , made its debut . [SEP]\n",
-      "INFO:tensorflow:token_to_orig_map: 17:0 18:1 19:2 20:3 21:4 22:4 23:5 24:6 25:6 26:6 27:7 28:8 29:9 30:10 31:11 32:12 33:13 34:14 35:14 36:15 37:16 38:17 39:17 40:17 41:18 42:19 43:20 44:21 45:21 46:22 47:23 48:24 49:25 50:26 51:27 52:27 53:28 54:29 55:30 56:31 57:32 58:32 59:33 60:34 61:35 62:36 63:36 64:36 65:37 66:38 67:39 68:40 69:40 70:41 71:42 72:43 73:44 74:45 75:46 76:47 77:48 78:49 79:50 80:51 81:52 82:53 83:54 84:55 85:56 86:57 87:58 88:59 89:60 90:60 91:61 92:62 93:63 94:63 95:64 96:65 97:65 98:65 99:66 100:67 101:68 102:69 103:70 104:71 105:72 106:73 107:74 108:75 109:76 110:77 111:77 112:78 113:79 114:80 115:81 116:82 117:83 118:83 119:84 120:85 121:86 122:87 123:88 124:89 125:89 126:90 127:91 128:92 129:93 130:94 131:95 132:96 133:97 134:98 135:99 136:100 137:101 138:101 139:102 140:103 141:104 142:105 143:106 144:107 145:108 146:109 147:110 148:111 149:112 150:112 151:112 152:113 153:113 154:114 155:115 156:116 157:117 158:118 159:118 160:119 161:120 162:121 163:122 164:123 165:124 166:125 167:126 168:127 169:128 170:129 171:130 172:131 173:132 174:133 175:134 176:135 177:136 178:137 179:138 180:138 181:139 182:140 183:140 184:141 185:142 186:143 187:144 188:145 189:146 190:147 191:148 192:149 193:150 194:151 195:152 196:153 197:153 198:154 199:155 200:156 201:156 202:157 203:158 204:159 205:160 206:160 207:161 208:161 209:162 210:163 211:163 212:164 213:165 214:166 215:167 216:168 217:169 218:170 219:171 220:172 221:173 222:174 223:174 224:175 225:176 226:177 227:178 228:179 229:180 230:181 231:182 232:182 233:183 234:184 235:185 236:186 237:187 238:188 239:189 240:190 241:191 242:191 243:192 244:192 245:193 246:194 247:195 248:196 249:197 250:198 251:199 252:199 253:200 254:200 255:201 256:202 257:203 258:204 259:205 260:206 261:207 262:208 263:209 264:210 265:210 266:211 267:212 268:212 269:213 270:214 271:215 272:215\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:token_is_max_context: 17:True 18:True 19:True 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True 154:True 155:True 156:True 157:True 158:True 159:True 160:True 161:True 162:True 163:True 164:True 165:True 166:True 167:True 168:True 169:True 170:True 171:True 172:True 173:True 174:True 175:True 176:True 177:True 178:True 179:True 180:True 181:True 182:True 183:True 184:True 185:True 186:True 187:True 188:True 189:True 190:True 191:True 192:True 193:True 194:True 195:True 196:True 197:True 198:True 199:True 200:True 201:True 202:True 203:True 204:True 205:True 206:True 207:True 208:True 209:True 210:True 211:True 212:True 213:True 214:True 215:True 216:True 217:True 218:True 219:True 220:True 221:True 222:True 223:True 224:True 225:True 226:True 227:True 228:True 229:True 230:True 231:True 232:True 233:True 234:True 235:True 236:True 237:True 238:True 239:True 240:True 241:True 242:True 243:True 244:True 245:True 246:True 247:True 248:True 249:True 250:True 251:True 252:True 253:True 254:True 255:True 256:True 257:True 258:True 259:True 260:True 261:True 262:True 263:True 264:True 265:True 266:True 267:True 268:True 269:True 270:True 271:True 272:True\n",
-      "INFO:tensorflow:input_ids: 101 1999 2054 2095 2106 1996 3076 3259 2691 3168 4088 4772 2012 10289 8214 1029 102 2004 2012 2087 2060 5534 1010 10289 8214 1005 1055 2493 2448 1037 2193 1997 2739 2865 11730 1012 1996 3157 3076 1011 2448 11730 2421 2093 6399 1010 2119 1037 2557 1998 2547 2276 1010 1998 2195 7298 1998 9263 1012 5625 2004 1037 2028 1011 3931 3485 1999 2244 7326 1010 1996 24105 2932 2003 3843 3807 7058 1998 4447 2000 2022 1996 4587 7142 9234 4772 1999 1996 2142 2163 1012 1996 2060 2932 1010 1996 26536 17420 1010 2003 2207 3807 1037 2095 1998 7679 2006 3076 3906 1998 8266 1012 1996 8514 24803 2003 2405 6604 1012 1996 6399 2031 9671 4772 5426 1010 2007 1996 9718 2405 3679 1998 3701 7316 2118 1998 2060 2739 1010 1998 21121 2011 2493 2013 2119 10289 8214 1998 3002 2984 1005 1055 2267 1012 4406 24105 1998 1996 8514 1010 1996 9718 2003 2019 2981 4772 1998 2515 2025 2031 1037 4513 8619 2030 2151 8368 15709 2013 1996 2118 1012 1999 3055 1010 2043 2070 2493 3373 2008 1996 9718 2211 2000 2265 1037 4603 13827 1010 1037 4314 3780 1010 2691 3168 2001 2405 1012 10655 1010 1999 2494 1010 2043 2060 2493 3373 2008 1996 3259 3662 1037 4314 13827 1010 1996 4603 3259 3493 13631 2253 2046 2537 1012 4445 3259 2003 2405 2004 2411 2004 1996 9718 1025 2174 1010 2035 2093 2024 5500 2000 2035 2493 1012 2633 1010 1999 3500 2263 2019 8324 3485 2005 2576 2671 2470 1010 3458 4331 1010 2081 2049 2834 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 182\n",
-      "INFO:tensorflow:end_position: 182\n",
-      "INFO:tensorflow:answer: 1987\n",
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 1000000010\n",
-      "INFO:tensorflow:example_index: 10\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] where is the headquarters of the congregation of the holy cross ? [SEP] the university is the major seat of the congregation of holy cross ( albeit not its official headquarters , which are in rome ) . its main seminary , more ##au seminary , is located on the campus across st . joseph lake from the main building . old college , the oldest building on campus and located near the shore of st . mary lake , houses undergraduate seminar ##ians . retired priests and brothers reside in fatima house ( a former retreat center ) , holy cross house , as well as col ##umb ##a hall near the gr ##otto . the university through the more ##au seminary has ties to theologian frederick bu ##ech ##ner . while not catholic , bu ##ech ##ner has praised writers from notre dame and more ##au seminary created a bu ##ech ##ner prize for preaching . [SEP]\n",
-      "INFO:tensorflow:token_to_orig_map: 14:0 15:1 16:2 17:3 18:4 19:5 20:6 21:7 22:8 23:9 24:10 25:11 26:12 27:12 28:13 29:14 30:15 31:16 32:16 33:17 34:18 35:19 36:20 37:20 38:20 39:21 40:22 41:23 42:23 43:24 44:24 45:25 46:25 47:26 48:27 49:28 50:29 51:30 52:31 53:32 54:32 55:33 56:34 57:35 58:36 59:37 60:38 61:38 62:39 63:40 64:40 65:41 66:42 67:43 68:44 69:45 70:46 71:47 72:48 73:49 74:50 75:51 76:52 77:52 78:53 79:54 80:54 81:55 82:56 83:57 84:57 85:57 86:58 87:59 88:60 89:61 90:62 91:63 92:64 93:65 94:66 95:66 96:67 97:68 98:69 99:69 100:69 101:70 102:71 103:72 104:72 105:73 106:74 107:75 108:76 109:76 110:76 111:77 112:78 113:79 114:80 115:80 116:80 117:81 118:82 119:83 120:84 121:85 122:85 123:86 124:87 125:88 126:89 127:90 128:91 129:92 130:92 131:92 132:92 133:93 134:94 135:95 136:95 137:96 138:96 139:96 140:97 141:98 142:99 143:100 144:101 145:102 146:103 147:104 148:104 149:105 150:106 151:107 152:108 153:108 154:108 155:109 156:110 157:111 158:111\n",
-      "INFO:tensorflow:token_is_max_context: 14:True 15:True 16:True 17:True 18:True 19:True 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True 154:True 155:True 156:True 157:True 158:True\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:input_ids: 101 2073 2003 1996 4075 1997 1996 7769 1997 1996 4151 2892 1029 102 1996 2118 2003 1996 2350 2835 1997 1996 7769 1997 4151 2892 1006 12167 2025 2049 2880 4075 1010 2029 2024 1999 4199 1007 1012 2049 2364 8705 1010 2062 4887 8705 1010 2003 2284 2006 1996 3721 2408 2358 1012 3312 2697 2013 1996 2364 2311 1012 2214 2267 1010 1996 4587 2311 2006 3721 1998 2284 2379 1996 5370 1997 2358 1012 2984 2697 1010 3506 8324 18014 7066 1012 3394 8656 1998 3428 13960 1999 27596 2160 1006 1037 2280 7822 2415 1007 1010 4151 2892 2160 1010 2004 2092 2004 8902 25438 2050 2534 2379 1996 24665 23052 1012 1996 2118 2083 1996 2062 4887 8705 2038 7208 2000 17200 5406 20934 15937 3678 1012 2096 2025 3234 1010 20934 15937 3678 2038 5868 4898 2013 10289 8214 1998 2062 4887 8705 2580 1037 20934 15937 3678 3396 2005 17979 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 36\n",
-      "INFO:tensorflow:end_position: 36\n",
-      "INFO:tensorflow:answer: rome\n",
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 1000000011\n",
-      "INFO:tensorflow:example_index: 11\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] what is the primary seminary of the congregation of the holy cross ? [SEP] the university is the major seat of the congregation of holy cross ( albeit not its official headquarters , which are in rome ) . its main seminary , more ##au seminary , is located on the campus across st . joseph lake from the main building . old college , the oldest building on campus and located near the shore of st . mary lake , houses undergraduate seminar ##ians . retired priests and brothers reside in fatima house ( a former retreat center ) , holy cross house , as well as col ##umb ##a hall near the gr ##otto . the university through the more ##au seminary has ties to theologian frederick bu ##ech ##ner . while not catholic , bu ##ech ##ner has praised writers from notre dame and more ##au seminary created a bu ##ech ##ner prize for preaching . [SEP]\n",
-      "INFO:tensorflow:token_to_orig_map: 15:0 16:1 17:2 18:3 19:4 20:5 21:6 22:7 23:8 24:9 25:10 26:11 27:12 28:12 29:13 30:14 31:15 32:16 33:16 34:17 35:18 36:19 37:20 38:20 39:20 40:21 41:22 42:23 43:23 44:24 45:24 46:25 47:25 48:26 49:27 50:28 51:29 52:30 53:31 54:32 55:32 56:33 57:34 58:35 59:36 60:37 61:38 62:38 63:39 64:40 65:40 66:41 67:42 68:43 69:44 70:45 71:46 72:47 73:48 74:49 75:50 76:51 77:52 78:52 79:53 80:54 81:54 82:55 83:56 84:57 85:57 86:57 87:58 88:59 89:60 90:61 91:62 92:63 93:64 94:65 95:66 96:66 97:67 98:68 99:69 100:69 101:69 102:70 103:71 104:72 105:72 106:73 107:74 108:75 109:76 110:76 111:76 112:77 113:78 114:79 115:80 116:80 117:80 118:81 119:82 120:83 121:84 122:85 123:85 124:86 125:87 126:88 127:89 128:90 129:91 130:92 131:92 132:92 133:92 134:93 135:94 136:95 137:95 138:96 139:96 140:96 141:97 142:98 143:99 144:100 145:101 146:102 147:103 148:104 149:104 150:105 151:106 152:107 153:108 154:108 155:108 156:109 157:110 158:111 159:111\n",
-      "INFO:tensorflow:token_is_max_context: 15:True 16:True 17:True 18:True 19:True 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True 154:True 155:True 156:True 157:True 158:True 159:True\n",
-      "INFO:tensorflow:input_ids: 101 2054 2003 1996 3078 8705 1997 1996 7769 1997 1996 4151 2892 1029 102 1996 2118 2003 1996 2350 2835 1997 1996 7769 1997 4151 2892 1006 12167 2025 2049 2880 4075 1010 2029 2024 1999 4199 1007 1012 2049 2364 8705 1010 2062 4887 8705 1010 2003 2284 2006 1996 3721 2408 2358 1012 3312 2697 2013 1996 2364 2311 1012 2214 2267 1010 1996 4587 2311 2006 3721 1998 2284 2379 1996 5370 1997 2358 1012 2984 2697 1010 3506 8324 18014 7066 1012 3394 8656 1998 3428 13960 1999 27596 2160 1006 1037 2280 7822 2415 1007 1010 4151 2892 2160 1010 2004 2092 2004 8902 25438 2050 2534 2379 1996 24665 23052 1012 1996 2118 2083 1996 2062 4887 8705 2038 7208 2000 17200 5406 20934 15937 3678 1012 2096 2025 3234 1010 20934 15937 3678 2038 5868 4898 2013 10289 8214 1998 2062 4887 8705 2580 1037 20934 15937 3678 3396 2005 17979 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 44\n",
-      "INFO:tensorflow:end_position: 46\n",
-      "INFO:tensorflow:answer: more ##au seminary\n",
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 1000000012\n",
-      "INFO:tensorflow:example_index: 12\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] what is the oldest structure at notre dame ? [SEP] the university is the major seat of the congregation of holy cross ( albeit not its official headquarters , which are in rome ) . its main seminary , more ##au seminary , is located on the campus across st . joseph lake from the main building . old college , the oldest building on campus and located near the shore of st . mary lake , houses undergraduate seminar ##ians . retired priests and brothers reside in fatima house ( a former retreat center ) , holy cross house , as well as col ##umb ##a hall near the gr ##otto . the university through the more ##au seminary has ties to theologian frederick bu ##ech ##ner . while not catholic , bu ##ech ##ner has praised writers from notre dame and more ##au seminary created a bu ##ech ##ner prize for preaching . [SEP]\n",
-      "INFO:tensorflow:token_to_orig_map: 11:0 12:1 13:2 14:3 15:4 16:5 17:6 18:7 19:8 20:9 21:10 22:11 23:12 24:12 25:13 26:14 27:15 28:16 29:16 30:17 31:18 32:19 33:20 34:20 35:20 36:21 37:22 38:23 39:23 40:24 41:24 42:25 43:25 44:26 45:27 46:28 47:29 48:30 49:31 50:32 51:32 52:33 53:34 54:35 55:36 56:37 57:38 58:38 59:39 60:40 61:40 62:41 63:42 64:43 65:44 66:45 67:46 68:47 69:48 70:49 71:50 72:51 73:52 74:52 75:53 76:54 77:54 78:55 79:56 80:57 81:57 82:57 83:58 84:59 85:60 86:61 87:62 88:63 89:64 90:65 91:66 92:66 93:67 94:68 95:69 96:69 97:69 98:70 99:71 100:72 101:72 102:73 103:74 104:75 105:76 106:76 107:76 108:77 109:78 110:79 111:80 112:80 113:80 114:81 115:82 116:83 117:84 118:85 119:85 120:86 121:87 122:88 123:89 124:90 125:91 126:92 127:92 128:92 129:92 130:93 131:94 132:95 133:95 134:96 135:96 136:96 137:97 138:98 139:99 140:100 141:101 142:102 143:103 144:104 145:104 146:105 147:106 148:107 149:108 150:108 151:108 152:109 153:110 154:111 155:111\n",
-      "INFO:tensorflow:token_is_max_context: 11:True 12:True 13:True 14:True 15:True 16:True 17:True 18:True 19:True 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True 154:True 155:True\n",
-      "INFO:tensorflow:input_ids: 101 2054 2003 1996 4587 3252 2012 10289 8214 1029 102 1996 2118 2003 1996 2350 2835 1997 1996 7769 1997 4151 2892 1006 12167 2025 2049 2880 4075 1010 2029 2024 1999 4199 1007 1012 2049 2364 8705 1010 2062 4887 8705 1010 2003 2284 2006 1996 3721 2408 2358 1012 3312 2697 2013 1996 2364 2311 1012 2214 2267 1010 1996 4587 2311 2006 3721 1998 2284 2379 1996 5370 1997 2358 1012 2984 2697 1010 3506 8324 18014 7066 1012 3394 8656 1998 3428 13960 1999 27596 2160 1006 1037 2280 7822 2415 1007 1010 4151 2892 2160 1010 2004 2092 2004 8902 25438 2050 2534 2379 1996 24665 23052 1012 1996 2118 2083 1996 2062 4887 8705 2038 7208 2000 17200 5406 20934 15937 3678 1012 2096 2025 3234 1010 20934 15937 3678 2038 5868 4898 2013 10289 8214 1998 2062 4887 8705 2580 1037 20934 15937 3678 3396 2005 17979 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 59\n",
-      "INFO:tensorflow:end_position: 60\n",
-      "INFO:tensorflow:answer: old college\n",
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 1000000013\n",
-      "INFO:tensorflow:example_index: 13\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] what individuals live at fatima house at notre dame ? [SEP] the university is the major seat of the congregation of holy cross ( albeit not its official headquarters , which are in rome ) . its main seminary , more ##au seminary , is located on the campus across st . joseph lake from the main building . old college , the oldest building on campus and located near the shore of st . mary lake , houses undergraduate seminar ##ians . retired priests and brothers reside in fatima house ( a former retreat center ) , holy cross house , as well as col ##umb ##a hall near the gr ##otto . the university through the more ##au seminary has ties to theologian frederick bu ##ech ##ner . while not catholic , bu ##ech ##ner has praised writers from notre dame and more ##au seminary created a bu ##ech ##ner prize for preaching . [SEP]\n",
-      "INFO:tensorflow:token_to_orig_map: 12:0 13:1 14:2 15:3 16:4 17:5 18:6 19:7 20:8 21:9 22:10 23:11 24:12 25:12 26:13 27:14 28:15 29:16 30:16 31:17 32:18 33:19 34:20 35:20 36:20 37:21 38:22 39:23 40:23 41:24 42:24 43:25 44:25 45:26 46:27 47:28 48:29 49:30 50:31 51:32 52:32 53:33 54:34 55:35 56:36 57:37 58:38 59:38 60:39 61:40 62:40 63:41 64:42 65:43 66:44 67:45 68:46 69:47 70:48 71:49 72:50 73:51 74:52 75:52 76:53 77:54 78:54 79:55 80:56 81:57 82:57 83:57 84:58 85:59 86:60 87:61 88:62 89:63 90:64 91:65 92:66 93:66 94:67 95:68 96:69 97:69 98:69 99:70 100:71 101:72 102:72 103:73 104:74 105:75 106:76 107:76 108:76 109:77 110:78 111:79 112:80 113:80 114:80 115:81 116:82 117:83 118:84 119:85 120:85 121:86 122:87 123:88 124:89 125:90 126:91 127:92 128:92 129:92 130:92 131:93 132:94 133:95 134:95 135:96 136:96 137:96 138:97 139:98 140:99 141:100 142:101 143:102 144:103 145:104 146:104 147:105 148:106 149:107 150:108 151:108 152:108 153:109 154:110 155:111 156:111\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:token_is_max_context: 12:True 13:True 14:True 15:True 16:True 17:True 18:True 19:True 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True 154:True 155:True 156:True\n",
-      "INFO:tensorflow:input_ids: 101 2054 3633 2444 2012 27596 2160 2012 10289 8214 1029 102 1996 2118 2003 1996 2350 2835 1997 1996 7769 1997 4151 2892 1006 12167 2025 2049 2880 4075 1010 2029 2024 1999 4199 1007 1012 2049 2364 8705 1010 2062 4887 8705 1010 2003 2284 2006 1996 3721 2408 2358 1012 3312 2697 2013 1996 2364 2311 1012 2214 2267 1010 1996 4587 2311 2006 3721 1998 2284 2379 1996 5370 1997 2358 1012 2984 2697 1010 3506 8324 18014 7066 1012 3394 8656 1998 3428 13960 1999 27596 2160 1006 1037 2280 7822 2415 1007 1010 4151 2892 2160 1010 2004 2092 2004 8902 25438 2050 2534 2379 1996 24665 23052 1012 1996 2118 2083 1996 2062 4887 8705 2038 7208 2000 17200 5406 20934 15937 3678 1012 2096 2025 3234 1010 20934 15937 3678 2038 5868 4898 2013 10289 8214 1998 2062 4887 8705 2580 1037 20934 15937 3678 3396 2005 17979 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 84\n",
-      "INFO:tensorflow:end_position: 87\n",
-      "INFO:tensorflow:answer: retired priests and brothers\n",
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 1000000014\n",
-      "INFO:tensorflow:example_index: 14\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] which prize did frederick bu ##ech ##ner create ? [SEP] the university is the major seat of the congregation of holy cross ( albeit not its official headquarters , which are in rome ) . its main seminary , more ##au seminary , is located on the campus across st . joseph lake from the main building . old college , the oldest building on campus and located near the shore of st . mary lake , houses undergraduate seminar ##ians . retired priests and brothers reside in fatima house ( a former retreat center ) , holy cross house , as well as col ##umb ##a hall near the gr ##otto . the university through the more ##au seminary has ties to theologian frederick bu ##ech ##ner . while not catholic , bu ##ech ##ner has praised writers from notre dame and more ##au seminary created a bu ##ech ##ner prize for preaching . [SEP]\n",
-      "INFO:tensorflow:token_to_orig_map: 11:0 12:1 13:2 14:3 15:4 16:5 17:6 18:7 19:8 20:9 21:10 22:11 23:12 24:12 25:13 26:14 27:15 28:16 29:16 30:17 31:18 32:19 33:20 34:20 35:20 36:21 37:22 38:23 39:23 40:24 41:24 42:25 43:25 44:26 45:27 46:28 47:29 48:30 49:31 50:32 51:32 52:33 53:34 54:35 55:36 56:37 57:38 58:38 59:39 60:40 61:40 62:41 63:42 64:43 65:44 66:45 67:46 68:47 69:48 70:49 71:50 72:51 73:52 74:52 75:53 76:54 77:54 78:55 79:56 80:57 81:57 82:57 83:58 84:59 85:60 86:61 87:62 88:63 89:64 90:65 91:66 92:66 93:67 94:68 95:69 96:69 97:69 98:70 99:71 100:72 101:72 102:73 103:74 104:75 105:76 106:76 107:76 108:77 109:78 110:79 111:80 112:80 113:80 114:81 115:82 116:83 117:84 118:85 119:85 120:86 121:87 122:88 123:89 124:90 125:91 126:92 127:92 128:92 129:92 130:93 131:94 132:95 133:95 134:96 135:96 136:96 137:97 138:98 139:99 140:100 141:101 142:102 143:103 144:104 145:104 146:105 147:106 148:107 149:108 150:108 151:108 152:109 153:110 154:111 155:111\n",
-      "INFO:tensorflow:token_is_max_context: 11:True 12:True 13:True 14:True 15:True 16:True 17:True 18:True 19:True 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True 154:True 155:True\n",
-      "INFO:tensorflow:input_ids: 101 2029 3396 2106 5406 20934 15937 3678 3443 1029 102 1996 2118 2003 1996 2350 2835 1997 1996 7769 1997 4151 2892 1006 12167 2025 2049 2880 4075 1010 2029 2024 1999 4199 1007 1012 2049 2364 8705 1010 2062 4887 8705 1010 2003 2284 2006 1996 3721 2408 2358 1012 3312 2697 2013 1996 2364 2311 1012 2214 2267 1010 1996 4587 2311 2006 3721 1998 2284 2379 1996 5370 1997 2358 1012 2984 2697 1010 3506 8324 18014 7066 1012 3394 8656 1998 3428 13960 1999 27596 2160 1006 1037 2280 7822 2415 1007 1010 4151 2892 2160 1010 2004 2092 2004 8902 25438 2050 2534 2379 1996 24665 23052 1012 1996 2118 2083 1996 2062 4887 8705 2038 7208 2000 17200 5406 20934 15937 3678 1012 2096 2025 3234 1010 20934 15937 3678 2038 5868 4898 2013 10289 8214 1998 2062 4887 8705 2580 1037 20934 15937 3678 3396 2005 17979 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 149\n",
-      "INFO:tensorflow:end_position: 154\n",
-      "INFO:tensorflow:answer: bu ##ech ##ner prize for preaching\n",
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 1000000015\n",
-      "INFO:tensorflow:example_index: 15\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] how many bs level degrees are offered in the college of engineering at notre dame ? [SEP] the college of engineering was established in 1920 , however , early courses in civil and mechanical engineering were a part of the college of science since the 1870s . today the college , housed in the fitzpatrick , cu ##shing , and st ##ins ##on - re ##mic ##k halls of engineering , includes five departments of study – aerospace and mechanical engineering , chemical and bio ##mo ##le ##cular engineering , civil engineering and geological sciences , computer science and engineering , and electrical engineering – with eight b . s . degrees offered . additionally , the college offers five - year dual degree programs with the colleges of arts and letters and of business awarding additional b . a . and master of business administration ( mba ) degrees , respectively . [SEP]\n",
-      "INFO:tensorflow:token_to_orig_map: 18:0 19:1 20:2 21:3 22:4 23:5 24:6 25:7 26:7 27:8 28:8 29:9 30:10 31:11 32:12 33:13 34:14 35:15 36:16 37:17 38:18 39:19 40:20 41:21 42:22 43:23 44:24 45:25 46:26 47:26 48:27 49:28 50:29 51:29 52:30 53:31 54:32 55:33 56:33 57:34 58:34 59:34 60:35 61:36 62:36 63:36 64:36 65:36 66:36 67:36 68:37 69:38 70:39 71:39 72:40 73:41 74:42 75:43 76:44 77:45 78:46 79:47 80:48 81:49 82:49 83:50 84:51 85:52 86:52 87:52 88:52 89:53 90:53 91:54 92:55 93:56 94:57 95:58 96:58 97:59 98:60 99:61 100:62 101:62 102:63 103:64 104:65 105:66 106:67 107:68 108:69 109:69 110:69 111:69 112:70 113:71 114:71 115:72 116:72 117:73 118:74 119:75 120:76 121:76 122:76 123:77 124:78 125:79 126:80 127:81 128:82 129:83 130:84 131:85 132:86 133:87 134:88 135:89 136:90 137:91 138:92 139:92 140:92 141:92 142:93 143:94 144:95 145:96 146:97 147:98 148:98 149:98 150:99 151:99 152:100 153:100\n",
-      "INFO:tensorflow:token_is_max_context: 18:True 19:True 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True\n",
-      "INFO:tensorflow:input_ids: 101 2129 2116 18667 2504 5445 2024 3253 1999 1996 2267 1997 3330 2012 10289 8214 1029 102 1996 2267 1997 3330 2001 2511 1999 4444 1010 2174 1010 2220 5352 1999 2942 1998 6228 3330 2020 1037 2112 1997 1996 2267 1997 2671 2144 1996 14896 1012 2651 1996 2267 1010 7431 1999 1996 26249 1010 12731 12227 1010 1998 2358 7076 2239 1011 2128 7712 2243 9873 1997 3330 1010 2950 2274 7640 1997 2817 1516 13395 1998 6228 3330 1010 5072 1998 16012 5302 2571 15431 3330 1010 2942 3330 1998 9843 4163 1010 3274 2671 1998 3330 1010 1998 5992 3330 1516 2007 2809 1038 1012 1055 1012 5445 3253 1012 5678 1010 1996 2267 4107 2274 1011 2095 7037 3014 3454 2007 1996 6667 1997 2840 1998 4144 1998 1997 2449 21467 3176 1038 1012 1037 1012 1998 3040 1997 2449 3447 1006 15038 1007 5445 1010 4414 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 107\n",
-      "INFO:tensorflow:end_position: 107\n",
-      "INFO:tensorflow:answer: eight\n",
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 1000000016\n",
-      "INFO:tensorflow:example_index: 16\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] in what year was the college of engineering at notre dame formed ? [SEP] the college of engineering was established in 1920 , however , early courses in civil and mechanical engineering were a part of the college of science since the 1870s . today the college , housed in the fitzpatrick , cu ##shing , and st ##ins ##on - re ##mic ##k halls of engineering , includes five departments of study – aerospace and mechanical engineering , chemical and bio ##mo ##le ##cular engineering , civil engineering and geological sciences , computer science and engineering , and electrical engineering – with eight b . s . degrees offered . additionally , the college offers five - year dual degree programs with the colleges of arts and letters and of business awarding additional b . a . and master of business administration ( mba ) degrees , respectively . [SEP]\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:token_to_orig_map: 15:0 16:1 17:2 18:3 19:4 20:5 21:6 22:7 23:7 24:8 25:8 26:9 27:10 28:11 29:12 30:13 31:14 32:15 33:16 34:17 35:18 36:19 37:20 38:21 39:22 40:23 41:24 42:25 43:26 44:26 45:27 46:28 47:29 48:29 49:30 50:31 51:32 52:33 53:33 54:34 55:34 56:34 57:35 58:36 59:36 60:36 61:36 62:36 63:36 64:36 65:37 66:38 67:39 68:39 69:40 70:41 71:42 72:43 73:44 74:45 75:46 76:47 77:48 78:49 79:49 80:50 81:51 82:52 83:52 84:52 85:52 86:53 87:53 88:54 89:55 90:56 91:57 92:58 93:58 94:59 95:60 96:61 97:62 98:62 99:63 100:64 101:65 102:66 103:67 104:68 105:69 106:69 107:69 108:69 109:70 110:71 111:71 112:72 113:72 114:73 115:74 116:75 117:76 118:76 119:76 120:77 121:78 122:79 123:80 124:81 125:82 126:83 127:84 128:85 129:86 130:87 131:88 132:89 133:90 134:91 135:92 136:92 137:92 138:92 139:93 140:94 141:95 142:96 143:97 144:98 145:98 146:98 147:99 148:99 149:100 150:100\n",
-      "INFO:tensorflow:token_is_max_context: 15:True 16:True 17:True 18:True 19:True 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True\n",
-      "INFO:tensorflow:input_ids: 101 1999 2054 2095 2001 1996 2267 1997 3330 2012 10289 8214 2719 1029 102 1996 2267 1997 3330 2001 2511 1999 4444 1010 2174 1010 2220 5352 1999 2942 1998 6228 3330 2020 1037 2112 1997 1996 2267 1997 2671 2144 1996 14896 1012 2651 1996 2267 1010 7431 1999 1996 26249 1010 12731 12227 1010 1998 2358 7076 2239 1011 2128 7712 2243 9873 1997 3330 1010 2950 2274 7640 1997 2817 1516 13395 1998 6228 3330 1010 5072 1998 16012 5302 2571 15431 3330 1010 2942 3330 1998 9843 4163 1010 3274 2671 1998 3330 1010 1998 5992 3330 1516 2007 2809 1038 1012 1055 1012 5445 3253 1012 5678 1010 1996 2267 4107 2274 1011 2095 7037 3014 3454 2007 1996 6667 1997 2840 1998 4144 1998 1997 2449 21467 3176 1038 1012 1037 1012 1998 3040 1997 2449 3447 1006 15038 1007 5445 1010 4414 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 22\n",
-      "INFO:tensorflow:end_position: 22\n",
-      "INFO:tensorflow:answer: 1920\n",
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 1000000017\n",
-      "INFO:tensorflow:example_index: 17\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] before the creation of the college of engineering similar studies were carried out at which notre dame college ? [SEP] the college of engineering was established in 1920 , however , early courses in civil and mechanical engineering were a part of the college of science since the 1870s . today the college , housed in the fitzpatrick , cu ##shing , and st ##ins ##on - re ##mic ##k halls of engineering , includes five departments of study – aerospace and mechanical engineering , chemical and bio ##mo ##le ##cular engineering , civil engineering and geological sciences , computer science and engineering , and electrical engineering – with eight b . s . degrees offered . additionally , the college offers five - year dual degree programs with the colleges of arts and letters and of business awarding additional b . a . and master of business administration ( mba ) degrees , respectively . [SEP]\n",
-      "INFO:tensorflow:token_to_orig_map: 21:0 22:1 23:2 24:3 25:4 26:5 27:6 28:7 29:7 30:8 31:8 32:9 33:10 34:11 35:12 36:13 37:14 38:15 39:16 40:17 41:18 42:19 43:20 44:21 45:22 46:23 47:24 48:25 49:26 50:26 51:27 52:28 53:29 54:29 55:30 56:31 57:32 58:33 59:33 60:34 61:34 62:34 63:35 64:36 65:36 66:36 67:36 68:36 69:36 70:36 71:37 72:38 73:39 74:39 75:40 76:41 77:42 78:43 79:44 80:45 81:46 82:47 83:48 84:49 85:49 86:50 87:51 88:52 89:52 90:52 91:52 92:53 93:53 94:54 95:55 96:56 97:57 98:58 99:58 100:59 101:60 102:61 103:62 104:62 105:63 106:64 107:65 108:66 109:67 110:68 111:69 112:69 113:69 114:69 115:70 116:71 117:71 118:72 119:72 120:73 121:74 122:75 123:76 124:76 125:76 126:77 127:78 128:79 129:80 130:81 131:82 132:83 133:84 134:85 135:86 136:87 137:88 138:89 139:90 140:91 141:92 142:92 143:92 144:92 145:93 146:94 147:95 148:96 149:97 150:98 151:98 152:98 153:99 154:99 155:100 156:100\n",
-      "INFO:tensorflow:token_is_max_context: 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True 154:True 155:True 156:True\n",
-      "INFO:tensorflow:input_ids: 101 2077 1996 4325 1997 1996 2267 1997 3330 2714 2913 2020 3344 2041 2012 2029 10289 8214 2267 1029 102 1996 2267 1997 3330 2001 2511 1999 4444 1010 2174 1010 2220 5352 1999 2942 1998 6228 3330 2020 1037 2112 1997 1996 2267 1997 2671 2144 1996 14896 1012 2651 1996 2267 1010 7431 1999 1996 26249 1010 12731 12227 1010 1998 2358 7076 2239 1011 2128 7712 2243 9873 1997 3330 1010 2950 2274 7640 1997 2817 1516 13395 1998 6228 3330 1010 5072 1998 16012 5302 2571 15431 3330 1010 2942 3330 1998 9843 4163 1010 3274 2671 1998 3330 1010 1998 5992 3330 1516 2007 2809 1038 1012 1055 1012 5445 3253 1012 5678 1010 1996 2267 4107 2274 1011 2095 7037 3014 3454 2007 1996 6667 1997 2840 1998 4144 1998 1997 2449 21467 3176 1038 1012 1037 1012 1998 3040 1997 2449 3447 1006 15038 1007 5445 1010 4414 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 43\n",
-      "INFO:tensorflow:end_position: 46\n",
-      "INFO:tensorflow:answer: the college of science\n",
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 1000000018\n",
-      "INFO:tensorflow:example_index: 18\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] how many departments are within the st ##ins ##on - re ##mic ##k hall of engineering ? [SEP] the college of engineering was established in 1920 , however , early courses in civil and mechanical engineering were a part of the college of science since the 1870s . today the college , housed in the fitzpatrick , cu ##shing , and st ##ins ##on - re ##mic ##k halls of engineering , includes five departments of study – aerospace and mechanical engineering , chemical and bio ##mo ##le ##cular engineering , civil engineering and geological sciences , computer science and engineering , and electrical engineering – with eight b . s . degrees offered . additionally , the college offers five - year dual degree programs with the colleges of arts and letters and of business awarding additional b . a . and master of business administration ( mba ) degrees , respectively . [SEP]\n",
-      "INFO:tensorflow:token_to_orig_map: 19:0 20:1 21:2 22:3 23:4 24:5 25:6 26:7 27:7 28:8 29:8 30:9 31:10 32:11 33:12 34:13 35:14 36:15 37:16 38:17 39:18 40:19 41:20 42:21 43:22 44:23 45:24 46:25 47:26 48:26 49:27 50:28 51:29 52:29 53:30 54:31 55:32 56:33 57:33 58:34 59:34 60:34 61:35 62:36 63:36 64:36 65:36 66:36 67:36 68:36 69:37 70:38 71:39 72:39 73:40 74:41 75:42 76:43 77:44 78:45 79:46 80:47 81:48 82:49 83:49 84:50 85:51 86:52 87:52 88:52 89:52 90:53 91:53 92:54 93:55 94:56 95:57 96:58 97:58 98:59 99:60 100:61 101:62 102:62 103:63 104:64 105:65 106:66 107:67 108:68 109:69 110:69 111:69 112:69 113:70 114:71 115:71 116:72 117:72 118:73 119:74 120:75 121:76 122:76 123:76 124:77 125:78 126:79 127:80 128:81 129:82 130:83 131:84 132:85 133:86 134:87 135:88 136:89 137:90 138:91 139:92 140:92 141:92 142:92 143:93 144:94 145:95 146:96 147:97 148:98 149:98 150:98 151:99 152:99 153:100 154:100\n",
-      "INFO:tensorflow:token_is_max_context: 19:True 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True 154:True\n",
-      "INFO:tensorflow:input_ids: 101 2129 2116 7640 2024 2306 1996 2358 7076 2239 1011 2128 7712 2243 2534 1997 3330 1029 102 1996 2267 1997 3330 2001 2511 1999 4444 1010 2174 1010 2220 5352 1999 2942 1998 6228 3330 2020 1037 2112 1997 1996 2267 1997 2671 2144 1996 14896 1012 2651 1996 2267 1010 7431 1999 1996 26249 1010 12731 12227 1010 1998 2358 7076 2239 1011 2128 7712 2243 9873 1997 3330 1010 2950 2274 7640 1997 2817 1516 13395 1998 6228 3330 1010 5072 1998 16012 5302 2571 15431 3330 1010 2942 3330 1998 9843 4163 1010 3274 2671 1998 3330 1010 1998 5992 3330 1516 2007 2809 1038 1012 1055 1012 5445 3253 1012 5678 1010 1996 2267 4107 2274 1011 2095 7037 3014 3454 2007 1996 6667 1997 2840 1998 4144 1998 1997 2449 21467 3176 1038 1012 1037 1012 1998 3040 1997 2449 3447 1006 15038 1007 5445 1010 4414 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 74\n",
-      "INFO:tensorflow:end_position: 74\n",
-      "INFO:tensorflow:answer: five\n",
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 1000000019\n",
-      "INFO:tensorflow:example_index: 19\n",
-      "INFO:tensorflow:doc_span_index: 0\n",
-      "INFO:tensorflow:tokens: [CLS] the college of science began to offer civil engineering courses beginning at what time at notre dame ? [SEP] the college of engineering was established in 1920 , however , early courses in civil and mechanical engineering were a part of the college of science since the 1870s . today the college , housed in the fitzpatrick , cu ##shing , and st ##ins ##on - re ##mic ##k halls of engineering , includes five departments of study – aerospace and mechanical engineering , chemical and bio ##mo ##le ##cular engineering , civil engineering and geological sciences , computer science and engineering , and electrical engineering – with eight b . s . degrees offered . additionally , the college offers five - year dual degree programs with the colleges of arts and letters and of business awarding additional b . a . and master of business administration ( mba ) degrees , respectively . [SEP]\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:token_to_orig_map: 20:0 21:1 22:2 23:3 24:4 25:5 26:6 27:7 28:7 29:8 30:8 31:9 32:10 33:11 34:12 35:13 36:14 37:15 38:16 39:17 40:18 41:19 42:20 43:21 44:22 45:23 46:24 47:25 48:26 49:26 50:27 51:28 52:29 53:29 54:30 55:31 56:32 57:33 58:33 59:34 60:34 61:34 62:35 63:36 64:36 65:36 66:36 67:36 68:36 69:36 70:37 71:38 72:39 73:39 74:40 75:41 76:42 77:43 78:44 79:45 80:46 81:47 82:48 83:49 84:49 85:50 86:51 87:52 88:52 89:52 90:52 91:53 92:53 93:54 94:55 95:56 96:57 97:58 98:58 99:59 100:60 101:61 102:62 103:62 104:63 105:64 106:65 107:66 108:67 109:68 110:69 111:69 112:69 113:69 114:70 115:71 116:71 117:72 118:72 119:73 120:74 121:75 122:76 123:76 124:76 125:77 126:78 127:79 128:80 129:81 130:82 131:83 132:84 133:85 134:86 135:87 136:88 137:89 138:90 139:91 140:92 141:92 142:92 143:92 144:93 145:94 146:95 147:96 148:97 149:98 150:98 151:98 152:99 153:99 154:100 155:100\n",
-      "INFO:tensorflow:token_is_max_context: 20:True 21:True 22:True 23:True 24:True 25:True 26:True 27:True 28:True 29:True 30:True 31:True 32:True 33:True 34:True 35:True 36:True 37:True 38:True 39:True 40:True 41:True 42:True 43:True 44:True 45:True 46:True 47:True 48:True 49:True 50:True 51:True 52:True 53:True 54:True 55:True 56:True 57:True 58:True 59:True 60:True 61:True 62:True 63:True 64:True 65:True 66:True 67:True 68:True 69:True 70:True 71:True 72:True 73:True 74:True 75:True 76:True 77:True 78:True 79:True 80:True 81:True 82:True 83:True 84:True 85:True 86:True 87:True 88:True 89:True 90:True 91:True 92:True 93:True 94:True 95:True 96:True 97:True 98:True 99:True 100:True 101:True 102:True 103:True 104:True 105:True 106:True 107:True 108:True 109:True 110:True 111:True 112:True 113:True 114:True 115:True 116:True 117:True 118:True 119:True 120:True 121:True 122:True 123:True 124:True 125:True 126:True 127:True 128:True 129:True 130:True 131:True 132:True 133:True 134:True 135:True 136:True 137:True 138:True 139:True 140:True 141:True 142:True 143:True 144:True 145:True 146:True 147:True 148:True 149:True 150:True 151:True 152:True 153:True 154:True 155:True\n",
-      "INFO:tensorflow:input_ids: 101 1996 2267 1997 2671 2211 2000 3749 2942 3330 5352 2927 2012 2054 2051 2012 10289 8214 1029 102 1996 2267 1997 3330 2001 2511 1999 4444 1010 2174 1010 2220 5352 1999 2942 1998 6228 3330 2020 1037 2112 1997 1996 2267 1997 2671 2144 1996 14896 1012 2651 1996 2267 1010 7431 1999 1996 26249 1010 12731 12227 1010 1998 2358 7076 2239 1011 2128 7712 2243 9873 1997 3330 1010 2950 2274 7640 1997 2817 1516 13395 1998 6228 3330 1010 5072 1998 16012 5302 2571 15431 3330 1010 2942 3330 1998 9843 4163 1010 3274 2671 1998 3330 1010 1998 5992 3330 1516 2007 2809 1038 1012 1055 1012 5445 3253 1012 5678 1010 1996 2267 4107 2274 1011 2095 7037 3014 3454 2007 1996 6667 1997 2840 1998 4144 1998 1997 2449 21467 3176 1038 1012 1037 1012 1998 3040 1997 2449 3447 1006 15038 1007 5445 1010 4414 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:start_position: 47\n",
-      "INFO:tensorflow:end_position: 48\n",
-      "INFO:tensorflow:answer: the 1870s\n"
-     ]
-    }
-   ],
-   "source": [
-    "bert_config = modeling_tensorflow.BertConfig.from_json_file(bert_config_file)\n",
-    "tokenizer = tokenization.BertTokenizer(\n",
-    "    vocab_file=vocab_file, do_lower_case=True)\n",
-    "\n",
-    "eval_examples = read_squad_examples(\n",
-    "    input_file=input_file, is_training=True, max_num=16)\n",
-    "\n",
-    "eval_features = convert_examples_to_features(\n",
-    "    examples=eval_examples,\n",
-    "    tokenizer=tokenizer,\n",
-    "    max_seq_length=max_seq_length,\n",
-    "    doc_stride=doc_stride,\n",
-    "    max_query_length=max_query_length,\n",
-    "    is_training=True)\n",
-    "\n",
-    "# You can use that to test the behavior of the models when target are outside of the model input sequence\n",
-    "# for feature in eval_features:\n",
-    "#     feature.start_position = outside_pos\n",
-    "#     feature.end_position = outside_pos"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:37.525632Z",
-     "start_time": "2018-11-06T10:11:37.498695Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "eval_unique_id_to_feature = {}\n",
-    "for eval_feature in eval_features:\n",
-    "    eval_unique_id_to_feature[eval_feature.unique_id] = eval_feature"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:37.558325Z",
-     "start_time": "2018-11-06T10:11:37.527972Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def input_fn_builder(features, seq_length, drop_remainder):\n",
-    "    \"\"\"Creates an `input_fn` closure to be passed to TPUEstimator.\"\"\"\n",
-    "\n",
-    "    all_unique_ids = []\n",
-    "    all_input_ids = []\n",
-    "    all_input_mask = []\n",
-    "    all_segment_ids = []\n",
-    "    all_start_positions = []\n",
-    "    all_end_positions = []\n",
-    "\n",
-    "    for feature in features:\n",
-    "        all_unique_ids.append(feature.unique_id)\n",
-    "        all_input_ids.append(feature.input_ids)\n",
-    "        all_input_mask.append(feature.input_mask)\n",
-    "        all_segment_ids.append(feature.segment_ids)\n",
-    "        all_start_positions.append(feature.start_position)\n",
-    "        all_end_positions.append(feature.end_position)\n",
-    "\n",
-    "    def input_fn(params):\n",
-    "        \"\"\"The actual input function.\"\"\"\n",
-    "        batch_size = params[\"batch_size\"]\n",
-    "\n",
-    "        num_examples = len(features)\n",
-    "\n",
-    "        # This is for demo purposes and does NOT scale to large data sets. We do\n",
-    "        # not use Dataset.from_generator() because that uses tf.py_func which is\n",
-    "        # not TPU compatible. The right way to load data is with TFRecordReader.\n",
-    "        feature_map = {\n",
-    "            \"unique_ids\":\n",
-    "                tf.constant(all_unique_ids, shape=[num_examples], dtype=tf.int32),\n",
-    "            \"input_ids\":\n",
-    "                tf.constant(\n",
-    "                    all_input_ids, shape=[num_examples, seq_length],\n",
-    "                    dtype=tf.int32),\n",
-    "            \"input_mask\":\n",
-    "                tf.constant(\n",
-    "                    all_input_mask,\n",
-    "                    shape=[num_examples, seq_length],\n",
-    "                    dtype=tf.int32),\n",
-    "            \"segment_ids\":\n",
-    "                tf.constant(\n",
-    "                    all_segment_ids,\n",
-    "                    shape=[num_examples, seq_length],\n",
-    "                    dtype=tf.int32),\n",
-    "            \"start_positions\":\n",
-    "                tf.constant(\n",
-    "                    all_start_positions,\n",
-    "                    shape=[num_examples],\n",
-    "                    dtype=tf.int32),\n",
-    "            \"end_positions\":\n",
-    "                tf.constant(\n",
-    "                    all_end_positions,\n",
-    "                    shape=[num_examples],\n",
-    "                    dtype=tf.int32),\n",
-    "        }\n",
-    "\n",
-    "        d = tf.data.Dataset.from_tensor_slices(feature_map)\n",
-    "        d = d.repeat()\n",
-    "        d = d.batch(batch_size=batch_size, drop_remainder=drop_remainder)\n",
-    "        return d\n",
-    "\n",
-    "    return input_fn"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:37.601666Z",
-     "start_time": "2018-11-06T10:11:37.560082Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def model_fn_builder(bert_config, init_checkpoint, learning_rate,\n",
-    "                     num_train_steps, num_warmup_steps, use_tpu,\n",
-    "                     use_one_hot_embeddings):\n",
-    "    \"\"\"Returns `model_fn` closure for TPUEstimator.\"\"\"\n",
-    "\n",
-    "    def model_fn(features, labels, mode, params):  # pylint: disable=unused-argument\n",
-    "        \"\"\"The `model_fn` for TPUEstimator.\"\"\"\n",
-    "\n",
-    "        tf.logging.info(\"*** Features ***\")\n",
-    "        for name in sorted(features.keys()):\n",
-    "            tf.logging.info(\"  name = %s, shape = %s\" % (name, features[name].shape))\n",
-    "\n",
-    "        unique_ids = features[\"unique_ids\"]\n",
-    "        input_ids = features[\"input_ids\"]\n",
-    "        input_mask = features[\"input_mask\"]\n",
-    "        segment_ids = features[\"segment_ids\"]\n",
-    "\n",
-    "        is_training = (mode == tf.estimator.ModeKeys.TRAIN)\n",
-    "\n",
-    "        (start_logits, end_logits) = create_model(\n",
-    "            bert_config=bert_config,\n",
-    "            is_training=is_training,\n",
-    "            input_ids=input_ids,\n",
-    "            input_mask=input_mask,\n",
-    "            segment_ids=segment_ids,\n",
-    "            use_one_hot_embeddings=use_one_hot_embeddings)\n",
-    "\n",
-    "        tvars = tf.trainable_variables()\n",
-    "\n",
-    "        initialized_variable_names = {}\n",
-    "        scaffold_fn = None\n",
-    "        if init_checkpoint:\n",
-    "            (assignment_map,\n",
-    "             initialized_variable_names) = modeling_tensorflow.get_assigment_map_from_checkpoint(\n",
-    "                tvars, init_checkpoint)\n",
-    "            if use_tpu:\n",
-    "\n",
-    "                def tpu_scaffold():\n",
-    "                    tf.train.init_from_checkpoint(init_checkpoint, assignment_map)\n",
-    "                    return tf.train.Scaffold()\n",
-    "\n",
-    "                scaffold_fn = tpu_scaffold\n",
-    "            else:\n",
-    "                tf.train.init_from_checkpoint(init_checkpoint, assignment_map)\n",
-    "\n",
-    "        tf.logging.info(\"**** Trainable Variables ****\")\n",
-    "        for var in tvars:\n",
-    "            init_string = \"\"\n",
-    "            if var.name in initialized_variable_names:\n",
-    "                init_string = \", *INIT_FROM_CKPT*\"\n",
-    "            tf.logging.info(\"  name = %s, shape = %s%s\", var.name, var.shape,\n",
-    "                            init_string)\n",
-    "\n",
-    "        output_spec = None\n",
-    "        if mode == tf.estimator.ModeKeys.TRAIN:\n",
-    "            seq_length = modeling_tensorflow.get_shape_list(input_ids)[1]\n",
-    "\n",
-    "            def compute_loss(logits, positions):\n",
-    "                one_hot_positions = tf.one_hot(\n",
-    "                    positions, depth=seq_length, dtype=tf.float32)\n",
-    "                log_probs = tf.nn.log_softmax(logits, axis=-1)\n",
-    "                loss = -tf.reduce_mean(\n",
-    "                    tf.reduce_sum(one_hot_positions * log_probs, axis=-1))\n",
-    "                return loss\n",
-    "\n",
-    "            start_positions = features[\"start_positions\"]\n",
-    "            end_positions = features[\"end_positions\"]\n",
-    "\n",
-    "            start_loss = compute_loss(start_logits, start_positions)\n",
-    "            end_loss = compute_loss(end_logits, end_positions)\n",
-    "\n",
-    "            total_loss = (start_loss + end_loss) / 2.0\n",
-    "\n",
-    "            train_op = optimization.create_optimizer(\n",
-    "                total_loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu)\n",
-    "\n",
-    "            output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n",
-    "                mode=mode,\n",
-    "                loss=total_loss,\n",
-    "                train_op=train_op,\n",
-    "                scaffold_fn=scaffold_fn)\n",
-    "        elif mode == tf.estimator.ModeKeys.PREDICT:\n",
-    "            batch_size = modeling_tensorflow.get_shape_list(start_logits)[0]\n",
-    "            seq_length = modeling_tensorflow.get_shape_list(input_ids)[1]\n",
-    "\n",
-    "            def compute_loss(logits, positions):\n",
-    "                one_hot_positions = tf.one_hot(\n",
-    "                    positions, depth=seq_length, dtype=tf.float32)\n",
-    "                log_probs = tf.nn.log_softmax(logits, axis=-1)\n",
-    "                loss = -tf.reduce_mean(\n",
-    "                    tf.reduce_sum(one_hot_positions * log_probs, axis=-1))\n",
-    "                return loss\n",
-    "\n",
-    "            start_positions = features[\"start_positions\"]\n",
-    "            end_positions = features[\"end_positions\"]\n",
-    "\n",
-    "            start_loss = compute_loss(start_logits, start_positions)\n",
-    "            end_loss = compute_loss(end_logits, end_positions)\n",
-    "\n",
-    "            total_loss = (start_loss + end_loss) / 2.0\n",
-    "\n",
-    "            predictions = {\n",
-    "                \"unique_ids\": unique_ids,\n",
-    "                \"start_logits\": start_logits,\n",
-    "                \"end_logits\": end_logits,\n",
-    "                \"total_loss\": tf.reshape(total_loss, [batch_size, 1]),\n",
-    "                \"start_loss\": tf.reshape(start_loss, [batch_size, 1]),\n",
-    "                \"end_loss\": tf.reshape(end_loss, [batch_size, 1]),\n",
-    "            }\n",
-    "            output_spec = tf.contrib.tpu.TPUEstimatorSpec(\n",
-    "                mode=mode, predictions=predictions, scaffold_fn=scaffold_fn)\n",
-    "        else:\n",
-    "            raise ValueError(\n",
-    "                \"Only TRAIN and PREDICT modes are supported: %s\" % (mode))\n",
-    "\n",
-    "        return output_spec\n",
-    "\n",
-    "    return model_fn"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:41.104542Z",
-     "start_time": "2018-11-06T10:11:37.603474Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x120df3f28>) includes params argument, but params are not passed to Estimator.\n",
-      "INFO:tensorflow:Using config: {'_model_dir': '/tmp/squad_base/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true\n",
-      "graph_options {\n",
-      "  rewrite_options {\n",
-      "    meta_optimizer_iterations: ONE\n",
-      "  }\n",
-      "}\n",
-      ", '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x11fd09630>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}\n",
-      "INFO:tensorflow:_TPUContext: eval_on_tpu True\n",
-      "WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.\n"
-     ]
-    }
-   ],
-   "source": [
-    "is_per_host = tf.contrib.tpu.InputPipelineConfig.PER_HOST_V2\n",
-    "run_config = tf.contrib.tpu.RunConfig(\n",
-    "    cluster=None,\n",
-    "    master=None,\n",
-    "    model_dir=output_dir,\n",
-    "    save_checkpoints_steps=1000,\n",
-    "    tpu_config=tf.contrib.tpu.TPUConfig(\n",
-    "        iterations_per_loop=1000,\n",
-    "        num_shards=8,\n",
-    "        per_host_input_for_training=is_per_host))\n",
-    "\n",
-    "model_fn = model_fn_builder(\n",
-    "    bert_config=bert_config,\n",
-    "    init_checkpoint=init_checkpoint,\n",
-    "    learning_rate=learning_rate,\n",
-    "    num_train_steps=None,\n",
-    "    num_warmup_steps=None,\n",
-    "    use_tpu=False,\n",
-    "    use_one_hot_embeddings=False)\n",
-    "\n",
-    "estimator = tf.contrib.tpu.TPUEstimator(\n",
-    "    use_tpu=False,\n",
-    "    model_fn=model_fn,\n",
-    "    config=run_config,\n",
-    "    train_batch_size=12,\n",
-    "    predict_batch_size=1)\n",
-    "\n",
-    "predict_input_fn = input_fn_builder(\n",
-    "    features=eval_features,\n",
-    "    seq_length=max_seq_length,\n",
-    "    drop_remainder=True)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:47.857601Z",
-     "start_time": "2018-11-06T10:11:41.106219Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:Could not find trained model in model_dir: /tmp/squad_base/, running initialization to predict.\n",
-      "INFO:tensorflow:Calling model_fn.\n",
-      "INFO:tensorflow:Running infer on CPU\n",
-      "INFO:tensorflow:*** Features ***\n",
-      "INFO:tensorflow:  name = end_positions, shape = (1,)\n",
-      "INFO:tensorflow:  name = input_ids, shape = (1, 384)\n",
-      "INFO:tensorflow:  name = input_mask, shape = (1, 384)\n",
-      "INFO:tensorflow:  name = segment_ids, shape = (1, 384)\n",
-      "INFO:tensorflow:  name = start_positions, shape = (1,)\n",
-      "INFO:tensorflow:  name = unique_ids, shape = (1,)\n",
-      "INFO:tensorflow:**** Trainable Variables ****\n",
-      "INFO:tensorflow:  name = bert/embeddings/word_embeddings:0, shape = (30522, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/embeddings/token_type_embeddings:0, shape = (2, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/embeddings/position_embeddings:0, shape = (512, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/embeddings/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/embeddings/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_0/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_0/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_0/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_0/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_0/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_0/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_0/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_1/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_1/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_1/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_1/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_1/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_1/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_1/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_2/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_2/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_2/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_2/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_2/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_2/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_2/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_3/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_3/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_3/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_3/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_3/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_3/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_3/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_4/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_4/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_4/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_4/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_4/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_4/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_4/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_5/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_5/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_5/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_5/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_5/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_5/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_5/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_6/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_6/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_6/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_6/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_6/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_6/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_6/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_7/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_7/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_7/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_7/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_7/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_7/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_7/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_8/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_8/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_8/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:  name = bert/encoder/layer_8/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_8/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_8/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_8/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_9/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_9/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_9/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_9/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_9/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_9/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_9/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_10/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_10/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_10/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_10/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_10/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_10/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_10/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/self/query/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/self/query/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/self/key/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/self/key/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/self/value/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/self/value/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/output/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_11/attention/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_11/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_11/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_11/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_11/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_11/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/encoder/layer_11/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/pooler/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = bert/pooler/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*\n",
-      "INFO:tensorflow:  name = cls/squad/output_weights:0, shape = (2, 768)\n",
-      "INFO:tensorflow:  name = cls/squad/output_bias:0, shape = (2,)\n",
-      "INFO:tensorflow:Done calling model_fn.\n",
-      "INFO:tensorflow:Graph was finalized.\n",
-      "INFO:tensorflow:Running local_init_op.\n",
-      "INFO:tensorflow:Done running local_init_op.\n",
-      "INFO:tensorflow:prediction_loop marked as finished\n"
-     ]
-    }
-   ],
-   "source": [
-    "tensorflow_all_out = []\n",
-    "tensorflow_all_results = []\n",
-    "for result in estimator.predict(predict_input_fn, yield_single_examples=True):\n",
-    "    unique_id = int(result[\"unique_ids\"])\n",
-    "    eval_feature = eval_unique_id_to_feature[unique_id]\n",
-    "    start_logits = result[\"start_logits\"]\n",
-    "    end_logits = result[\"end_logits\"]\n",
-    "    total_loss = result[\"total_loss\"]\n",
-    "    start_loss = result[\"start_loss\"]\n",
-    "    end_loss = result[\"end_loss\"]\n",
-    "\n",
-    "    output_json = collections.OrderedDict()\n",
-    "    output_json[\"linex_index\"] = unique_id\n",
-    "    output_json[\"tokens\"] = [token for (i, token) in enumerate(eval_feature.tokens)]\n",
-    "    output_json[\"start_logits\"] = [round(float(x), 6) for x in start_logits.flat]\n",
-    "    output_json[\"end_logits\"] = [round(float(x), 6) for x in end_logits.flat]\n",
-    "    output_json[\"total_loss\"] = [round(float(x), 6) for x in total_loss.flat]\n",
-    "    output_json[\"start_loss\"] = [round(float(x), 6) for x in start_loss.flat]\n",
-    "    output_json[\"end_loss\"] = [round(float(x), 6) for x in end_loss.flat]\n",
-    "    tensorflow_all_out.append(output_json)\n",
-    "    tensorflow_all_results.append(RawResult(\n",
-    "                                    unique_id=unique_id,\n",
-    "                                    start_logits=start_logits,\n",
-    "                                    end_logits=end_logits))\n",
-    "    break"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:47.912836Z",
-     "start_time": "2018-11-06T10:11:47.859679Z"
-    },
-    "code_folding": []
-   },
-   "outputs": [],
-   "source": [
-    "def _get_best_indexes(logits, n_best_size):\n",
-    "    \"\"\"Get the n-best logits from a list.\"\"\"\n",
-    "    index_and_score = sorted(enumerate(logits), key=lambda x: x[1], reverse=True)\n",
-    "\n",
-    "    best_indexes = []\n",
-    "    for i in range(len(index_and_score)):\n",
-    "        if i >= n_best_size:\n",
-    "            break\n",
-    "        best_indexes.append(index_and_score[i][0])\n",
-    "    return best_indexes\n",
-    "\n",
-    "def _compute_softmax(scores):\n",
-    "    \"\"\"Compute softmax probability over raw logits.\"\"\"\n",
-    "    if not scores:\n",
-    "        return []\n",
-    "\n",
-    "    max_score = None\n",
-    "    for score in scores:\n",
-    "        if max_score is None or score > max_score:\n",
-    "            max_score = score\n",
-    "\n",
-    "    exp_scores = []\n",
-    "    total_sum = 0.0\n",
-    "    for score in scores:\n",
-    "        x = math.exp(score - max_score)\n",
-    "        exp_scores.append(x)\n",
-    "        total_sum += x\n",
-    "\n",
-    "    probs = []\n",
-    "    for score in exp_scores:\n",
-    "        probs.append(score / total_sum)\n",
-    "    return probs\n",
-    "\n",
-    "\n",
-    "def compute_predictions(all_examples, all_features, all_results, n_best_size,\n",
-    "                      max_answer_length, do_lower_case):\n",
-    "    \"\"\"Compute final predictions.\"\"\"\n",
-    "    example_index_to_features = collections.defaultdict(list)\n",
-    "    for feature in all_features:\n",
-    "        example_index_to_features[feature.example_index].append(feature)\n",
-    "\n",
-    "    unique_id_to_result = {}\n",
-    "    for result in all_results:\n",
-    "        unique_id_to_result[result.unique_id] = result\n",
-    "\n",
-    "    _PrelimPrediction = collections.namedtuple(  # pylint: disable=invalid-name\n",
-    "        \"PrelimPrediction\",\n",
-    "        [\"feature_index\", \"start_index\", \"end_index\", \"start_logit\", \"end_logit\"])\n",
-    "\n",
-    "    all_predictions = collections.OrderedDict()\n",
-    "    all_nbest_json = collections.OrderedDict()\n",
-    "    for (example_index, example) in enumerate(all_examples):\n",
-    "        features = example_index_to_features[example_index]\n",
-    "\n",
-    "        prelim_predictions = []\n",
-    "        for (feature_index, feature) in enumerate(features):\n",
-    "            result = unique_id_to_result[feature.unique_id]\n",
-    "\n",
-    "            start_indexes = _get_best_indexes(result.start_logits, n_best_size)\n",
-    "            end_indexes = _get_best_indexes(result.end_logits, n_best_size)\n",
-    "            for start_index in start_indexes:\n",
-    "                for end_index in end_indexes:\n",
-    "                    # We could hypothetically create invalid predictions, e.g., predict\n",
-    "                    # that the start of the span is in the question. We throw out all\n",
-    "                    # invalid predictions.\n",
-    "                    if start_index >= len(feature.tokens):\n",
-    "                        continue\n",
-    "                    if end_index >= len(feature.tokens):\n",
-    "                        continue\n",
-    "                    if start_index not in feature.token_to_orig_map:\n",
-    "                        continue\n",
-    "                    if end_index not in feature.token_to_orig_map:\n",
-    "                        continue\n",
-    "                    if not feature.token_is_max_context.get(start_index, False):\n",
-    "                        continue\n",
-    "                    if end_index < start_index:\n",
-    "                        continue\n",
-    "                    length = end_index - start_index + 1\n",
-    "                    if length > max_answer_length:\n",
-    "                        continue\n",
-    "                    prelim_predictions.append(\n",
-    "                        _PrelimPrediction(\n",
-    "                            feature_index=feature_index,\n",
-    "                            start_index=start_index,\n",
-    "                            end_index=end_index,\n",
-    "                            start_logit=result.start_logits[start_index],\n",
-    "                            end_logit=result.end_logits[end_index]))\n",
-    "\n",
-    "        prelim_predictions = sorted(\n",
-    "            prelim_predictions,\n",
-    "            key=lambda x: (x.start_logit + x.end_logit),\n",
-    "            reverse=True)\n",
-    "\n",
-    "        _NbestPrediction = collections.namedtuple(  # pylint: disable=invalid-name\n",
-    "            \"NbestPrediction\", [\"text\", \"start_logit\", \"end_logit\"])\n",
-    "\n",
-    "        seen_predictions = {}\n",
-    "        nbest = []\n",
-    "        for pred in prelim_predictions:\n",
-    "            if len(nbest) >= n_best_size:\n",
-    "                break\n",
-    "            feature = features[pred.feature_index]\n",
-    "\n",
-    "            tok_tokens = feature.tokens[pred.start_index:(pred.end_index + 1)]\n",
-    "            orig_doc_start = feature.token_to_orig_map[pred.start_index]\n",
-    "            orig_doc_end = feature.token_to_orig_map[pred.end_index]\n",
-    "            orig_tokens = example.doc_tokens[orig_doc_start:(orig_doc_end + 1)]\n",
-    "            tok_text = \" \".join(tok_tokens)\n",
-    "\n",
-    "            # De-tokenize WordPieces that have been split off.\n",
-    "            tok_text = tok_text.replace(\" ##\", \"\")\n",
-    "            tok_text = tok_text.replace(\"##\", \"\")\n",
-    "\n",
-    "            # Clean whitespace\n",
-    "            tok_text = tok_text.strip()\n",
-    "            tok_text = \" \".join(tok_text.split())\n",
-    "            orig_text = \" \".join(orig_tokens)\n",
-    "\n",
-    "            final_text = get_final_text(tok_text, orig_text, do_lower_case)\n",
-    "            if final_text in seen_predictions:\n",
-    "                continue\n",
-    "\n",
-    "            seen_predictions[final_text] = True\n",
-    "            nbest.append(\n",
-    "                _NbestPrediction(\n",
-    "                    text=final_text,\n",
-    "                    start_logit=pred.start_logit,\n",
-    "                    end_logit=pred.end_logit))\n",
-    "\n",
-    "        # In very rare edge cases we could have no valid predictions. So we\n",
-    "        # just create a nonce prediction in this case to avoid failure.\n",
-    "        if not nbest:\n",
-    "            nbest.append(\n",
-    "                _NbestPrediction(text=\"empty\", start_logit=0.0, end_logit=0.0))\n",
-    "\n",
-    "        assert len(nbest) >= 1\n",
-    "\n",
-    "        total_scores = []\n",
-    "        for entry in nbest:\n",
-    "            total_scores.append(entry.start_logit + entry.end_logit)\n",
-    "\n",
-    "        probs = _compute_softmax(total_scores)\n",
-    "\n",
-    "        nbest_json = []\n",
-    "        for (i, entry) in enumerate(nbest):\n",
-    "            output = collections.OrderedDict()\n",
-    "            output[\"text\"] = entry.text\n",
-    "            output[\"probability\"] = probs[i]\n",
-    "            output[\"start_logit\"] = entry.start_logit\n",
-    "            output[\"end_logit\"] = entry.end_logit\n",
-    "            nbest_json.append(output)\n",
-    "\n",
-    "        assert len(nbest_json) >= 1\n",
-    "\n",
-    "        all_predictions[example.qas_id] = nbest_json[0][\"text\"]\n",
-    "        all_nbest_json[example.qas_id] = nbest_json\n",
-    "\n",
-    "    return all_predictions, all_nbest_json"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:47.953205Z",
-     "start_time": "2018-11-06T10:11:47.914751Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "all_predictions, all_nbest_json = compute_predictions(eval_examples[:1], eval_features[:1], tensorflow_all_results, 20, max_answer_length, True)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:47.994647Z",
-     "start_time": "2018-11-06T10:11:47.955015Z"
-    }
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "OrderedDict([('5733be284776f41900661182',\n",
-       "              [OrderedDict([('text', 'empty'),\n",
-       "                            ('probability', 1.0),\n",
-       "                            ('start_logit', 0.0),\n",
-       "                            ('end_logit', 0.0)])])])"
-      ]
-     },
-     "execution_count": 12,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "all_nbest_json"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:48.028473Z",
-     "start_time": "2018-11-06T10:11:47.996311Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "1\n",
-      "7\n",
-      "odict_keys(['linex_index', 'tokens', 'start_logits', 'end_logits', 'total_loss', 'start_loss', 'end_loss'])\n",
-      "number of tokens 176\n",
-      "number of start_logits 384\n",
-      "shape of end_logits 384\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(len(tensorflow_all_out))\n",
-    "print(len(tensorflow_all_out[0]))\n",
-    "print(tensorflow_all_out[0].keys())\n",
-    "print(\"number of tokens\", len(tensorflow_all_out[0]['tokens']))\n",
-    "print(\"number of start_logits\", len(tensorflow_all_out[0]['start_logits']))\n",
-    "print(\"shape of end_logits\", len(tensorflow_all_out[0]['end_logits']))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:48.060658Z",
-     "start_time": "2018-11-06T10:11:48.030289Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "tensorflow_outputs = [tensorflow_all_out[0]['start_logits'], tensorflow_all_out[0]['end_logits'],\n",
-    "                     tensorflow_all_out[0]['total_loss'], tensorflow_all_out[0]['start_loss'],\n",
-    "                     tensorflow_all_out[0]['end_loss']]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## 2/ PyTorch code"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:48.478814Z",
-     "start_time": "2018-11-06T10:11:48.062585Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "import modeling\n",
-    "from run_squad import *"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:48.512607Z",
-     "start_time": "2018-11-06T10:11:48.480729Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "init_checkpoint_pt = \"../google_models/uncased_L-12_H-768_A-12/pytorch_model.bin\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:51.023405Z",
-     "start_time": "2018-11-06T10:11:48.514306Z"
-    },
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "tensor([0., 0.])"
-      ]
-     },
-     "execution_count": 17,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "device = torch.device(\"cpu\")\n",
-    "model = modeling.BertForQuestionAnswering(bert_config)\n",
-    "model.bert.load_state_dict(torch.load(init_checkpoint_pt, map_location='cpu'))\n",
-    "model.to(device)\n",
-    "model.qa_outputs.weight.data.fill_(1.0)\n",
-    "model.qa_outputs.bias.data.zero_()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:51.079364Z",
-     "start_time": "2018-11-06T10:11:51.028228Z"
-    },
-    "code_folding": []
-   },
-   "outputs": [],
-   "source": [
-    "all_input_ids = torch.tensor([f.input_ids for f in eval_features], dtype=torch.long)\n",
-    "all_input_mask = torch.tensor([f.input_mask for f in eval_features], dtype=torch.long)\n",
-    "all_segment_ids = torch.tensor([f.segment_ids for f in eval_features], dtype=torch.long)\n",
-    "all_example_index = torch.arange(all_input_ids.size(0), dtype=torch.long)\n",
-    "all_start_positions = torch.tensor([[f.start_position] for f in eval_features], dtype=torch.long)\n",
-    "all_end_positions = torch.tensor([[f.end_position] for f in eval_features], dtype=torch.long)\n",
-    "\n",
-    "eval_data = TensorDataset(all_input_ids, all_input_mask, all_segment_ids,\n",
-    "                                   all_start_positions, all_end_positions, all_example_index)\n",
-    "eval_sampler = SequentialSampler(eval_data)\n",
-    "eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=1)\n",
-    "\n",
-    "model.eval()\n",
-    "None"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:51.114686Z",
-     "start_time": "2018-11-06T10:11:51.081474Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[torch.Size([1, 384]), torch.Size([1, 384]), torch.Size([1, 384]), torch.Size([1, 1]), torch.Size([1, 1]), torch.Size([1])]\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "torch.Size([1, 1])"
-      ]
-     },
-     "execution_count": 19,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "batch = iter(eval_dataloader).next()\n",
-    "input_ids, input_mask, segment_ids, start_positions, end_positions, example_index = batch\n",
-    "print([t.shape for t in batch])\n",
-    "start_positions.size()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:52.298367Z",
-     "start_time": "2018-11-06T10:11:51.116219Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Evaluating:   0%|          | 0/270 [00:00<?, ?it/s]\n"
-     ]
-    }
-   ],
-   "source": [
-    "pytorch_all_out = []\n",
-    "for batch in tqdm(eval_dataloader, desc=\"Evaluating\"):\n",
-    "    input_ids, input_mask, segment_ids, start_positions, end_positions, example_index = batch\n",
-    "    input_ids = input_ids.to(device)\n",
-    "    input_mask = input_mask.to(device)\n",
-    "    segment_ids = segment_ids.to(device)\n",
-    "    start_positions = start_positions.to(device)\n",
-    "    end_positions = end_positions.to(device)\n",
-    "\n",
-    "    total_loss, (start_logits, end_logits) = model(input_ids, segment_ids, input_mask, start_positions, end_positions)\n",
-    "    \n",
-    "    eval_feature = eval_features[example_index.item()]\n",
-    "\n",
-    "    output_json = collections.OrderedDict()\n",
-    "    output_json[\"linex_index\"] = unique_id\n",
-    "    output_json[\"tokens\"] = [token for (i, token) in enumerate(eval_feature.tokens)]\n",
-    "    output_json[\"total_loss\"] = total_loss.detach().cpu().numpy()\n",
-    "    output_json[\"start_logits\"] = start_logits.detach().cpu().numpy()\n",
-    "    output_json[\"end_logits\"] = end_logits.detach().cpu().numpy()\n",
-    "    pytorch_all_out.append(output_json)\n",
-    "    break"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:52.339553Z",
-     "start_time": "2018-11-06T10:11:52.300335Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "1\n",
-      "5\n",
-      "odict_keys(['linex_index', 'tokens', 'total_loss', 'start_logits', 'end_logits'])\n",
-      "number of tokens 176\n",
-      "number of start_logits 1\n",
-      "number of end_logits 1\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(len(pytorch_all_out))\n",
-    "print(len(pytorch_all_out[0]))\n",
-    "print(pytorch_all_out[0].keys())\n",
-    "print(\"number of tokens\", len(pytorch_all_out[0]['tokens']))\n",
-    "print(\"number of start_logits\", len(pytorch_all_out[0]['start_logits']))\n",
-    "print(\"number of end_logits\", len(pytorch_all_out[0]['end_logits']))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 22,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:52.372827Z",
-     "start_time": "2018-11-06T10:11:52.341393Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "pytorch_outputs = [pytorch_all_out[0]['start_logits'], pytorch_all_out[0]['end_logits'], pytorch_all_out[0]['total_loss']]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## 3/ Comparing the standard deviation of start_logits, end_logits and loss of both models"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:52.402814Z",
-     "start_time": "2018-11-06T10:11:52.374329Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "import numpy as np"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 24,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:11:52.434743Z",
-     "start_time": "2018-11-06T10:11:52.404345Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "shape tensorflow layer, shape pytorch layer, standard deviation\n",
-      "((384,), (1, 384), 5.244962470555037e-06)\n",
-      "((384,), (1, 384), 5.244962470555037e-06)\n",
-      "((1,), (), 4.560241698925438e-06)\n"
-     ]
-    }
-   ],
-   "source": [
-    "print('shape tensorflow layer, shape pytorch layer, standard deviation')\n",
-    "print('\\n'.join(list(str((np.array(tensorflow_outputs[i]).shape,\n",
-    "                          np.array(pytorch_outputs[i]).shape, \n",
-    "                          np.sqrt(np.mean((np.array(tensorflow_outputs[i]) - np.array(pytorch_outputs[i]))**2.0)))) for i in range(3))))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 27,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-06T10:12:54.200059Z",
-     "start_time": "2018-11-06T10:12:54.167355Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Total loss of the TF model 9.06024 - Total loss of the PT model 9.0602445602417\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(\"Total loss of the TF model {} - Total loss of the PT model {}\".format(tensorflow_outputs[2][0], pytorch_outputs[2]))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "hide_input": false,
-  "kernelspec": {
-   "display_name": "Python [default]",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.6.7"
-  },
-  "toc": {
-   "colors": {
-    "hover_highlight": "#DAA520",
-    "running_highlight": "#FF0000",
-    "selected_highlight": "#FFD700"
-   },
-   "moveMenuLeft": true,
-   "nav_menu": {
-    "height": "48px",
-    "width": "252px"
-   },
-   "navigate_menu": true,
-   "number_sections": true,
-   "sideBar": true,
-   "threshold": 4,
-   "toc_cell": false,
-   "toc_section_display": "block",
-   "toc_window_display": false
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/notebooks/Comparing-TF-and-PT-models.ipynb b/notebooks/Comparing-TF-and-PT-models.ipynb
deleted file mode 100644
index b7382e4652bc5c..00000000000000
--- a/notebooks/Comparing-TF-and-PT-models.ipynb
+++ /dev/null
@@ -1,1318 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Comparing TensorFlow (original) and PyTorch models\n",
-    "\n",
-    "You can use this small notebook to check the conversion of the model's weights from the TensorFlow model to the PyTorch model. In the following, we compare the weights of the last layer on a simple example (in `input.txt`) but both models returns all the hidden layers so you can check every stage of the model.\n",
-    "\n",
-    "To run this notebook, follow these instructions:\n",
-    "- make sure that your Python environment has both TensorFlow and PyTorch installed,\n",
-    "- download the original TensorFlow implementation,\n",
-    "- download a pre-trained TensorFlow model as indicaded in the TensorFlow implementation readme,\n",
-    "- run the script `convert_tf_checkpoint_to_pytorch.py` as indicated in the `README` to convert the pre-trained TensorFlow model to PyTorch.\n",
-    "\n",
-    "If needed change the relative paths indicated in this notebook (at the beggining of Sections 1 and 2) to point to the relevent models and code."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-15T14:56:48.412622Z",
-     "start_time": "2018-11-15T14:56:48.400110Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "os.chdir('../')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## 1/ TensorFlow code"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-15T14:56:49.483829Z",
-     "start_time": "2018-11-15T14:56:49.471296Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "original_tf_inplem_dir = \"./tensorflow_code/\"\n",
-    "model_dir = \"../google_models/uncased_L-12_H-768_A-12/\"\n",
-    "\n",
-    "vocab_file = model_dir + \"vocab.txt\"\n",
-    "bert_config_file = model_dir + \"bert_config.json\"\n",
-    "init_checkpoint = model_dir + \"bert_model.ckpt\"\n",
-    "\n",
-    "input_file = \"./samples/input.txt\"\n",
-    "max_seq_length = 128"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-15T14:57:51.597932Z",
-     "start_time": "2018-11-15T14:57:51.549466Z"
-    }
-   },
-   "outputs": [
-    {
-     "ename": "DuplicateFlagError",
-     "evalue": "The flag 'input_file' is defined twice. First from *, Second from *.  Description from first occurrence: (no help available)",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mDuplicateFlagError\u001b[0m                        Traceback (most recent call last)",
-      "\u001b[0;32m<ipython-input-6-86ecffb49060>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0mspec\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mimportlib\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mutil\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mspec_from_file_location\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'*'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moriginal_tf_inplem_dir\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;34m'/extract_features_tensorflow.py'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0mmodule\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mimportlib\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mutil\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmodule_from_spec\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mspec\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mspec\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mloader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexec_module\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodule\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      7\u001b[0m \u001b[0msys\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmodules\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'extract_features_tensorflow'\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmodule\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      8\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
-      "\u001b[0;32m~/miniconda3/envs/bert/lib/python3.6/importlib/_bootstrap_external.py\u001b[0m in \u001b[0;36mexec_module\u001b[0;34m(self, module)\u001b[0m\n",
-      "\u001b[0;32m~/miniconda3/envs/bert/lib/python3.6/importlib/_bootstrap.py\u001b[0m in \u001b[0;36m_call_with_frames_removed\u001b[0;34m(f, *args, **kwds)\u001b[0m\n",
-      "\u001b[0;32m~/Documents/Thomas/Code/HF/BERT/pytorch-pretrained-BERT/tensorflow_code/extract_features_tensorflow.py\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m     32\u001b[0m \u001b[0mFLAGS\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mflags\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mFLAGS\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     33\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 34\u001b[0;31m \u001b[0mflags\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mDEFINE_string\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"input_file\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     35\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     36\u001b[0m \u001b[0mflags\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mDEFINE_string\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"output_file\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
-      "\u001b[0;32m~/miniconda3/envs/bert/lib/python3.6/site-packages/tensorflow/python/platform/flags.py\u001b[0m in \u001b[0;36mwrapper\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m     56\u001b[0m           \u001b[0;34m'Use of the keyword argument names (flag_name, default_value, '\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     57\u001b[0m           'docstring) is deprecated, please use (name, default, help) instead.')\n\u001b[0;32m---> 58\u001b[0;31m     \u001b[0;32mreturn\u001b[0m \u001b[0moriginal_function\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     59\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     60\u001b[0m   \u001b[0;32mreturn\u001b[0m \u001b[0mtf_decorator\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmake_decorator\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moriginal_function\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mwrapper\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
-      "\u001b[0;32m~/miniconda3/envs/bert/lib/python3.6/site-packages/absl/flags/_defines.py\u001b[0m in \u001b[0;36mDEFINE_string\u001b[0;34m(name, default, help, flag_values, **args)\u001b[0m\n\u001b[1;32m    239\u001b[0m   \u001b[0mparser\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_argument_parser\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mArgumentParser\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    240\u001b[0m   \u001b[0mserializer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0m_argument_parser\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mArgumentSerializer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 241\u001b[0;31m   \u001b[0mDEFINE\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mparser\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdefault\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhelp\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mflag_values\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mserializer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    242\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    243\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
-      "\u001b[0;32m~/miniconda3/envs/bert/lib/python3.6/site-packages/absl/flags/_defines.py\u001b[0m in \u001b[0;36mDEFINE\u001b[0;34m(parser, name, default, help, flag_values, serializer, module_name, **args)\u001b[0m\n\u001b[1;32m     80\u001b[0m   \"\"\"\n\u001b[1;32m     81\u001b[0m   DEFINE_flag(_flag.Flag(parser, serializer, name, default, help, **args),\n\u001b[0;32m---> 82\u001b[0;31m               flag_values, module_name)\n\u001b[0m\u001b[1;32m     83\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     84\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
-      "\u001b[0;32m~/miniconda3/envs/bert/lib/python3.6/site-packages/absl/flags/_defines.py\u001b[0m in \u001b[0;36mDEFINE_flag\u001b[0;34m(flag, flag_values, module_name)\u001b[0m\n\u001b[1;32m    102\u001b[0m   \u001b[0;31m# Copying the reference to flag_values prevents pychecker warnings.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    103\u001b[0m   \u001b[0mfv\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mflag_values\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 104\u001b[0;31m   \u001b[0mfv\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mflag\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mflag\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    105\u001b[0m   \u001b[0;31m# Tell flag_values who's defining the flag.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    106\u001b[0m   \u001b[0;32mif\u001b[0m \u001b[0mmodule_name\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
-      "\u001b[0;32m~/miniconda3/envs/bert/lib/python3.6/site-packages/absl/flags/_flagvalues.py\u001b[0m in \u001b[0;36m__setitem__\u001b[0;34m(self, name, flag)\u001b[0m\n\u001b[1;32m    427\u001b[0m         \u001b[0;31m# module is simply being imported a subsequent time.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    428\u001b[0m         \u001b[0;32mreturn\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 429\u001b[0;31m       \u001b[0;32mraise\u001b[0m \u001b[0m_exceptions\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mDuplicateFlagError\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfrom_flag\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    430\u001b[0m     \u001b[0mshort_name\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mflag\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshort_name\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    431\u001b[0m     \u001b[0;31m# If a new flag overrides an old one, we need to cleanup the old flag's\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
-      "\u001b[0;31mDuplicateFlagError\u001b[0m: The flag 'input_file' is defined twice. First from *, Second from *.  Description from first occurrence: (no help available)"
-     ]
-    }
-   ],
-   "source": [
-    "import importlib.util\n",
-    "import sys\n",
-    "\n",
-    "spec = importlib.util.spec_from_file_location('*', original_tf_inplem_dir + '/extract_features_tensorflow.py')\n",
-    "module = importlib.util.module_from_spec(spec)\n",
-    "spec.loader.exec_module(module)\n",
-    "sys.modules['extract_features_tensorflow'] = module\n",
-    "\n",
-    "from extract_features_tensorflow import *"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-15T14:58:05.650987Z",
-     "start_time": "2018-11-15T14:58:05.541620Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:*** Example ***\n",
-      "INFO:tensorflow:unique_id: 0\n",
-      "INFO:tensorflow:tokens: [CLS] who was jim henson ? [SEP] jim henson was a puppet ##eer [SEP]\n",
-      "INFO:tensorflow:input_ids: 101 2040 2001 3958 27227 1029 102 3958 27227 2001 1037 13997 11510 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
-      "INFO:tensorflow:input_type_ids: 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n"
-     ]
-    }
-   ],
-   "source": [
-    "layer_indexes = list(range(12))\n",
-    "bert_config = modeling.BertConfig.from_json_file(bert_config_file)\n",
-    "tokenizer = tokenization.FullTokenizer(\n",
-    "    vocab_file=vocab_file, do_lower_case=True)\n",
-    "examples = read_examples(input_file)\n",
-    "\n",
-    "features = convert_examples_to_features(\n",
-    "    examples=examples, seq_length=max_seq_length, tokenizer=tokenizer)\n",
-    "unique_id_to_feature = {}\n",
-    "for feature in features:\n",
-    "    unique_id_to_feature[feature.unique_id] = feature"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-15T14:58:11.562443Z",
-     "start_time": "2018-11-15T14:58:08.036485Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "WARNING:tensorflow:Estimator's model_fn (<function model_fn_builder.<locals>.model_fn at 0x11ea7f1e0>) includes params argument, but params are not passed to Estimator.\n",
-      "WARNING:tensorflow:Using temporary folder as model directory: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmphs4_nsq9\n",
-      "INFO:tensorflow:Using config: {'_model_dir': '/var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmphs4_nsq9', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true\n",
-      "graph_options {\n",
-      "  rewrite_options {\n",
-      "    meta_optimizer_iterations: ONE\n",
-      "  }\n",
-      "}\n",
-      ", '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x121b163c8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=2, num_shards=1, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}\n",
-      "WARNING:tensorflow:Setting TPUConfig.num_shards==1 is an unsupported behavior. Please fix as soon as possible (leaving num_shards as None.\n",
-      "INFO:tensorflow:_TPUContext: eval_on_tpu True\n",
-      "WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.\n"
-     ]
-    }
-   ],
-   "source": [
-    "is_per_host = tf.contrib.tpu.InputPipelineConfig.PER_HOST_V2\n",
-    "run_config = tf.contrib.tpu.RunConfig(\n",
-    "    master=None,\n",
-    "    tpu_config=tf.contrib.tpu.TPUConfig(\n",
-    "        num_shards=1,\n",
-    "        per_host_input_for_training=is_per_host))\n",
-    "\n",
-    "model_fn = model_fn_builder(\n",
-    "    bert_config=bert_config,\n",
-    "    init_checkpoint=init_checkpoint,\n",
-    "    layer_indexes=layer_indexes,\n",
-    "    use_tpu=False,\n",
-    "    use_one_hot_embeddings=False)\n",
-    "\n",
-    "# If TPU is not available, this will fall back to normal Estimator on CPU\n",
-    "# or GPU.\n",
-    "estimator = tf.contrib.tpu.TPUEstimator(\n",
-    "    use_tpu=False,\n",
-    "    model_fn=model_fn,\n",
-    "    config=run_config,\n",
-    "    predict_batch_size=1)\n",
-    "\n",
-    "input_fn = input_fn_builder(\n",
-    "    features=features, seq_length=max_seq_length)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-15T14:58:21.736543Z",
-     "start_time": "2018-11-15T14:58:16.723829Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO:tensorflow:Could not find trained model in model_dir: /var/folders/yx/cw8n_njx3js5jksyw_qlp8p00000gn/T/tmphs4_nsq9, running initialization to predict.\n",
-      "INFO:tensorflow:Calling model_fn.\n",
-      "INFO:tensorflow:Running infer on CPU\n",
-      "INFO:tensorflow:Done calling model_fn.\n",
-      "INFO:tensorflow:Graph was finalized.\n",
-      "INFO:tensorflow:Running local_init_op.\n",
-      "INFO:tensorflow:Done running local_init_op.\n",
-      "extracting layer 0\n",
-      "extracting layer 1\n",
-      "extracting layer 2\n",
-      "extracting layer 3\n",
-      "extracting layer 4\n",
-      "extracting layer 5\n",
-      "extracting layer 6\n",
-      "extracting layer 7\n",
-      "extracting layer 8\n",
-      "extracting layer 9\n",
-      "extracting layer 10\n",
-      "extracting layer 11\n",
-      "INFO:tensorflow:prediction_loop marked as finished\n",
-      "INFO:tensorflow:prediction_loop marked as finished\n"
-     ]
-    }
-   ],
-   "source": [
-    "tensorflow_all_out = []\n",
-    "for result in estimator.predict(input_fn, yield_single_examples=True):\n",
-    "    unique_id = int(result[\"unique_id\"])\n",
-    "    feature = unique_id_to_feature[unique_id]\n",
-    "    output_json = collections.OrderedDict()\n",
-    "    output_json[\"linex_index\"] = unique_id\n",
-    "    tensorflow_all_out_features = []\n",
-    "    # for (i, token) in enumerate(feature.tokens):\n",
-    "    all_layers = []\n",
-    "    for (j, layer_index) in enumerate(layer_indexes):\n",
-    "        print(\"extracting layer {}\".format(j))\n",
-    "        layer_output = result[\"layer_output_%d\" % j]\n",
-    "        layers = collections.OrderedDict()\n",
-    "        layers[\"index\"] = layer_index\n",
-    "        layers[\"values\"] = layer_output\n",
-    "        all_layers.append(layers)\n",
-    "    tensorflow_out_features = collections.OrderedDict()\n",
-    "    tensorflow_out_features[\"layers\"] = all_layers\n",
-    "    tensorflow_all_out_features.append(tensorflow_out_features)\n",
-    "\n",
-    "    output_json[\"features\"] = tensorflow_all_out_features\n",
-    "    tensorflow_all_out.append(output_json)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-15T14:58:23.970714Z",
-     "start_time": "2018-11-15T14:58:23.931930Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "1\n",
-      "2\n",
-      "odict_keys(['linex_index', 'features'])\n",
-      "number of tokens 1\n",
-      "number of layers 12\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "(128, 768)"
-      ]
-     },
-     "execution_count": 11,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "print(len(tensorflow_all_out))\n",
-    "print(len(tensorflow_all_out[0]))\n",
-    "print(tensorflow_all_out[0].keys())\n",
-    "print(\"number of tokens\", len(tensorflow_all_out[0]['features']))\n",
-    "print(\"number of layers\", len(tensorflow_all_out[0]['features'][0]['layers']))\n",
-    "tensorflow_all_out[0]['features'][0]['layers'][0]['values'].shape"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-15T14:58:25.547012Z",
-     "start_time": "2018-11-15T14:58:25.516076Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "tensorflow_outputs = list(tensorflow_all_out[0]['features'][0]['layers'][t]['values'] for t in layer_indexes)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## 2/ PyTorch code"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "os.chdir('./examples')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-15T15:03:49.528679Z",
-     "start_time": "2018-11-15T15:03:49.497697Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "import extract_features\n",
-    "import pytorch_transformers as ppb\n",
-    "from extract_features import *"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 25,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-15T15:21:18.001177Z",
-     "start_time": "2018-11-15T15:21:17.970369Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "init_checkpoint_pt = \"../../google_models/uncased_L-12_H-768_A-12/\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 26,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-15T15:21:20.893669Z",
-     "start_time": "2018-11-15T15:21:18.786623Z"
-    },
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "11/15/2018 16:21:18 - INFO - pytorch_transformers.modeling_bert -   loading archive file ../../google_models/uncased_L-12_H-768_A-12/\n",
-      "11/15/2018 16:21:18 - INFO - pytorch_transformers.modeling_bert -   Model config {\n",
-      "  \"attention_probs_dropout_prob\": 0.1,\n",
-      "  \"hidden_act\": \"gelu\",\n",
-      "  \"hidden_dropout_prob\": 0.1,\n",
-      "  \"hidden_size\": 768,\n",
-      "  \"initializer_range\": 0.02,\n",
-      "  \"intermediate_size\": 3072,\n",
-      "  \"max_position_embeddings\": 512,\n",
-      "  \"num_attention_heads\": 12,\n",
-      "  \"num_hidden_layers\": 12,\n",
-      "  \"type_vocab_size\": 2,\n",
-      "  \"vocab_size\": 30522\n",
-      "}\n",
-      "\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "BertModel(\n",
-       "  (embeddings): BertEmbeddings(\n",
-       "    (word_embeddings): Embedding(30522, 768)\n",
-       "    (position_embeddings): Embedding(512, 768)\n",
-       "    (token_type_embeddings): Embedding(2, 768)\n",
-       "    (LayerNorm): BertLayerNorm()\n",
-       "    (dropout): Dropout(p=0.1)\n",
-       "  )\n",
-       "  (encoder): BertEncoder(\n",
-       "    (layer): ModuleList(\n",
-       "      (0): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (1): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (2): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (3): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (4): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (5): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (6): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (7): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (8): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (9): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (10): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (11): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "    )\n",
-       "  )\n",
-       "  (pooler): BertPooler(\n",
-       "    (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "    (activation): Tanh()\n",
-       "  )\n",
-       ")"
-      ]
-     },
-     "execution_count": 26,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "device = torch.device(\"cpu\")\n",
-    "model = ppb.BertModel.from_pretrained(init_checkpoint_pt)\n",
-    "model.to(device)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 27,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-15T15:21:26.963427Z",
-     "start_time": "2018-11-15T15:21:26.922494Z"
-    },
-    "code_folding": []
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "BertModel(\n",
-       "  (embeddings): BertEmbeddings(\n",
-       "    (word_embeddings): Embedding(30522, 768)\n",
-       "    (position_embeddings): Embedding(512, 768)\n",
-       "    (token_type_embeddings): Embedding(2, 768)\n",
-       "    (LayerNorm): BertLayerNorm()\n",
-       "    (dropout): Dropout(p=0.1)\n",
-       "  )\n",
-       "  (encoder): BertEncoder(\n",
-       "    (layer): ModuleList(\n",
-       "      (0): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (1): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (2): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (3): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (4): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (5): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (6): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (7): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (8): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (9): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (10): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "      (11): BertLayer(\n",
-       "        (attention): BertAttention(\n",
-       "          (self): BertSelfAttention(\n",
-       "            (query): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (key): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (value): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "          (output): BertSelfOutput(\n",
-       "            (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "            (LayerNorm): BertLayerNorm()\n",
-       "            (dropout): Dropout(p=0.1)\n",
-       "          )\n",
-       "        )\n",
-       "        (intermediate): BertIntermediate(\n",
-       "          (dense): Linear(in_features=768, out_features=3072, bias=True)\n",
-       "        )\n",
-       "        (output): BertOutput(\n",
-       "          (dense): Linear(in_features=3072, out_features=768, bias=True)\n",
-       "          (LayerNorm): BertLayerNorm()\n",
-       "          (dropout): Dropout(p=0.1)\n",
-       "        )\n",
-       "      )\n",
-       "    )\n",
-       "  )\n",
-       "  (pooler): BertPooler(\n",
-       "    (dense): Linear(in_features=768, out_features=768, bias=True)\n",
-       "    (activation): Tanh()\n",
-       "  )\n",
-       ")"
-      ]
-     },
-     "execution_count": 27,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)\n",
-    "all_input_mask = torch.tensor([f.input_mask for f in features], dtype=torch.long)\n",
-    "all_input_type_ids = torch.tensor([f.input_type_ids for f in features], dtype=torch.long)\n",
-    "all_example_index = torch.arange(all_input_ids.size(0), dtype=torch.long)\n",
-    "\n",
-    "eval_data = TensorDataset(all_input_ids, all_input_mask, all_input_type_ids, all_example_index)\n",
-    "eval_sampler = SequentialSampler(eval_data)\n",
-    "eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=1)\n",
-    "\n",
-    "model.eval()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 28,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-15T15:21:30.718724Z",
-     "start_time": "2018-11-15T15:21:30.329205Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "tensor([[  101,  2040,  2001,  3958, 27227,  1029,   102,  3958, 27227,  2001,\n",
-      "          1037, 13997, 11510,   102,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,\n",
-      "             0,     0,     0,     0,     0,     0,     0,     0]])\n",
-      "tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
-      "         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
-      "         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
-      "         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
-      "         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
-      "         0, 0, 0, 0, 0, 0, 0, 0]])\n",
-      "tensor([0])\n",
-      "layer 0 0\n",
-      "layer 1 1\n",
-      "layer 2 2\n",
-      "layer 3 3\n",
-      "layer 4 4\n",
-      "layer 5 5\n",
-      "layer 6 6\n",
-      "layer 7 7\n",
-      "layer 8 8\n",
-      "layer 9 9\n",
-      "layer 10 10\n",
-      "layer 11 11\n"
-     ]
-    }
-   ],
-   "source": [
-    "layer_indexes = list(range(12))\n",
-    "\n",
-    "pytorch_all_out = []\n",
-    "for input_ids, input_mask, input_type_ids, example_indices in eval_dataloader:\n",
-    "    print(input_ids)\n",
-    "    print(input_mask)\n",
-    "    print(example_indices)\n",
-    "    input_ids = input_ids.to(device)\n",
-    "    input_mask = input_mask.to(device)\n",
-    "\n",
-    "    all_encoder_layers, _ = model(input_ids, token_type_ids=input_type_ids, attention_mask=input_mask)\n",
-    "\n",
-    "    for b, example_index in enumerate(example_indices):\n",
-    "        feature = features[example_index.item()]\n",
-    "        unique_id = int(feature.unique_id)\n",
-    "        # feature = unique_id_to_feature[unique_id]\n",
-    "        output_json = collections.OrderedDict()\n",
-    "        output_json[\"linex_index\"] = unique_id\n",
-    "        all_out_features = []\n",
-    "        # for (i, token) in enumerate(feature.tokens):\n",
-    "        all_layers = []\n",
-    "        for (j, layer_index) in enumerate(layer_indexes):\n",
-    "            print(\"layer\", j, layer_index)\n",
-    "            layer_output = all_encoder_layers[int(layer_index)].detach().cpu().numpy()\n",
-    "            layer_output = layer_output[b]\n",
-    "            layers = collections.OrderedDict()\n",
-    "            layers[\"index\"] = layer_index\n",
-    "            layer_output = layer_output\n",
-    "            layers[\"values\"] = layer_output if not isinstance(layer_output, (int, float)) else [layer_output]\n",
-    "            all_layers.append(layers)\n",
-    "\n",
-    "            out_features = collections.OrderedDict()\n",
-    "            out_features[\"layers\"] = all_layers\n",
-    "            all_out_features.append(out_features)\n",
-    "        output_json[\"features\"] = all_out_features\n",
-    "        pytorch_all_out.append(output_json)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 29,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-15T15:21:35.703615Z",
-     "start_time": "2018-11-15T15:21:35.666150Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "1\n",
-      "2\n",
-      "odict_keys(['linex_index', 'features'])\n",
-      "number of tokens 1\n",
-      "number of layers 12\n",
-      "hidden_size 128\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "(128, 768)"
-      ]
-     },
-     "execution_count": 29,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "print(len(pytorch_all_out))\n",
-    "print(len(pytorch_all_out[0]))\n",
-    "print(pytorch_all_out[0].keys())\n",
-    "print(\"number of tokens\", len(pytorch_all_out))\n",
-    "print(\"number of layers\", len(pytorch_all_out[0]['features'][0]['layers']))\n",
-    "print(\"hidden_size\", len(pytorch_all_out[0]['features'][0]['layers'][0]['values']))\n",
-    "pytorch_all_out[0]['features'][0]['layers'][0]['values'].shape"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 30,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-15T15:21:36.999073Z",
-     "start_time": "2018-11-15T15:21:36.966762Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(128, 768)\n",
-      "(128, 768)\n"
-     ]
-    }
-   ],
-   "source": [
-    "pytorch_outputs = list(pytorch_all_out[0]['features'][0]['layers'][t]['values'] for t in layer_indexes)\n",
-    "print(pytorch_outputs[0].shape)\n",
-    "print(pytorch_outputs[1].shape)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 31,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-15T15:21:37.936522Z",
-     "start_time": "2018-11-15T15:21:37.905269Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(128, 768)\n",
-      "(128, 768)\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(tensorflow_outputs[0].shape)\n",
-    "print(tensorflow_outputs[1].shape)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## 3/ Comparing the standard deviation on the last layer of both models"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 32,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-15T15:21:39.437137Z",
-     "start_time": "2018-11-15T15:21:39.406150Z"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "import numpy as np"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 33,
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2018-11-15T15:21:40.181870Z",
-     "start_time": "2018-11-15T15:21:40.137023Z"
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "shape tensorflow layer, shape pytorch layer, standard deviation\n",
-      "((128, 768), (128, 768), 1.5258875e-07)\n",
-      "((128, 768), (128, 768), 2.342731e-07)\n",
-      "((128, 768), (128, 768), 2.801949e-07)\n",
-      "((128, 768), (128, 768), 3.5904986e-07)\n",
-      "((128, 768), (128, 768), 4.2842768e-07)\n",
-      "((128, 768), (128, 768), 5.127951e-07)\n",
-      "((128, 768), (128, 768), 6.14668e-07)\n",
-      "((128, 768), (128, 768), 7.063922e-07)\n",
-      "((128, 768), (128, 768), 7.906173e-07)\n",
-      "((128, 768), (128, 768), 8.475192e-07)\n",
-      "((128, 768), (128, 768), 8.975489e-07)\n",
-      "((128, 768), (128, 768), 4.1671223e-07)\n"
-     ]
-    }
-   ],
-   "source": [
-    "print('shape tensorflow layer, shape pytorch layer, standard deviation')\n",
-    "print('\\n'.join(list(str((np.array(tensorflow_outputs[i]).shape,\n",
-    "                          np.array(pytorch_outputs[i]).shape, \n",
-    "                          np.sqrt(np.mean((np.array(tensorflow_outputs[i]) - np.array(pytorch_outputs[i]))**2.0)))) for i in range(12))))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "hide_input": false,
-  "kernelspec": {
-   "display_name": "Python [default]",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.6.7"
-  },
-  "toc": {
-   "colors": {
-    "hover_highlight": "#DAA520",
-    "running_highlight": "#FF0000",
-    "selected_highlight": "#FFD700"
-   },
-   "moveMenuLeft": true,
-   "nav_menu": {
-    "height": "48px",
-    "width": "252px"
-   },
-   "navigate_menu": true,
-   "number_sections": true,
-   "sideBar": true,
-   "threshold": 4,
-   "toc_cell": false,
-   "toc_section_display": "block",
-   "toc_window_display": false
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/notebooks/README.md b/notebooks/README.md
new file mode 100644
index 00000000000000..894b2e7c823f00
--- /dev/null
+++ b/notebooks/README.md
@@ -0,0 +1,17 @@
+# Transformers Notebooks
+
+You can find here a list of the official notebooks provided by Hugging Face.
+
+Also, we would like to list here interesting content created by the community. 
+If you wrote some notebook(s) leveraging transformers and would like be listed here, please open a 
+Pull Request and we'll review it so it can be included here. 
+
+
+## Hugging Face's notebooks :hugs:
+
+| Notebook     |      Description      |   |
+|:----------|:-------------:|------:|
+| [Getting Started Tokenizers](01-training_tokenizers.ipynb)  | How to train and use your very own tokenizer  |[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/transformers/blob/docker-notebooks/notebooks/01-training-tokenizers.ipynb) |
+| [Getting Started Transformers](02-transformers.ipynb)   | How to easily start using transformers  | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/transformers/blob/docker-notebooks/notebooks/01-training-tokenizers.ipynb) |
+| [How to use Pipelines](03-pipelines.ipynb)  | Simple and efficient way to use State-of-the-Art models on downstream tasks through transformers | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/transformers/blob/docker-notebooks/notebooks/01-training-tokenizers.ipynb) |
+| [How to train a language model](https://github.com/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb)| Highlight all the steps to effectively train Transformer model on custom data | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vochicong/blog/blob/fix-notebook-add-tokenizer-config/notebooks/01_how_to_train.ipynb)|
\ No newline at end of file