Skip to content

Latest commit

 

History

History
107 lines (63 loc) · 5.63 KB

File metadata and controls

107 lines (63 loc) · 5.63 KB

Introduction to NLP with TensorFlow

Module Source

Introduction to NLP with TensorFlow

Goals

In this workshop, we will cover how text is processed using TensorFlow, a popular platform for machine learning.

Goal Description
What will you learn Analyze text using TensorFlow
What you'll need
Duration 1 hour
Slides Powerpoint

Video

Tensorflow and Keras for NLP

🎥 Click the image above to learn how to deliver this workshop

Pre-Learning

Prerequisites

You can activate the sandbox environments in each module, or alternately setup your local environment:

If you are trying the examples in a local environment, install required libraries and download the dataset in the same directory where the notebooks are:

$ pip install --quiet tensorflow_datasets==4.4.0
$ wget -q -O - https://mslearntensorflowlp.blob.core.windows.net/data/tfds-ag-news.tgz | tar xz

What you will learn

TensorFlow is a popular Machine Learning platform that allows many workflows and operations for Machine Learning in Python. In this workshop, you'll be learn how to process and analyze text, so that you can create generated text or answer questions based on context.

Image of generated text Image of sequential training

Milestone 1: Represent text as Tensors

Complete the sandboxed Jupyter Notebook which will go through the following:

  • Perform a text classification task using the AG NEWS dataset
  • Convert text into numbers that can be represented as tensors
  • Create a Bag-of-Words text representation
  • Automatically calculate Bag-of-Words vectors with TensorFlow

Milestone 2: Represent words with embeddings

Go through the sandboxed Jupyter Notebook to work with the AG News dataset and try to represent a semantic meaning of a word.

In this section you will:

  • Create and train a Classifier Neural Network
  • Work with semantic embeddings
  • Use pre-trained embeddings available in the Keras framework
  • Find about potential pitfalls and limitations of traditional pre-trained embedding representations like Word2Vec

Milestone 3: Capture patterns with recurrent neural networks

Work through the sandboxed Jupyter Notebook to understand not only the aggregated meaning of words but take into account the order. To capture the meaning of a text sequence you'll use a recurrent neural network.

The notebook will go through the following items:

  • Load the AG News dataset and train it with TensorFlow
  • Use masking to minimize the amount of padding
  • Use LSTM to learn relationships between distant tokens

Milestone 4: Generate text with recurrent networks

Complete the sandboxed Jupyter Notebook to discover how to generate text using RNNs (Recurrent Neural Networks). In this final section of the workshop, you'll cover the following:

  • Build character vocabulary with tokenization
  • Train an RNN to generate titles
  • Produce output with a custom decoding function
  • Sample generated strings during training to check for correctness

Quiz

Verify your knowledge with a short quiz

Next steps

There are other Learn Modules for TensorFlow that are grouped in the TensorFlow fundamentals Learning Path

Practice

In this workshop you used pre-trained models which may yield limited results. Try using other data sources to train your own model. What can you discover?

Feedback

Be sure to give feedback about this workshop!

Code of Conduct