Frequently Asked Questions About ktrain

Getting Started

I am a newcomer and am having trouble figuring out how to even get started. Where do I begin?
What kinds of applications have been built with ktrain?
How do I use ktrain with documents in PDF, DOC, or PPT formats?
Can I use ktrain without a GPU?

Installation/Deployment Issues

How do I install ktrain on a Windows machine?
How do I use ktrain without an internet connection?
Why am I seeing an ERROR when installing ktrain on Google Colab?
Why does texts_from_csv throw an error on Google Cloud Storage?
How do I deploy a model using Flask?
Why am I getting a 404 client error?
How do I convert a model to ONNX for deployment?

Training

How do I resume training from a saved checkpoint?
How do I save and/or reload a trained model?
How do I train using dataset too large for RAM?
How do I train using multiple GPUs?
How do I train a model using mixed precision?
How do I handle imbalanced datasets?
How do I use custom loss functions or optimizers?
How do I retrieve or visualize training history?
I have a model that accepts multiple inputs (e.g., both text and other numerical or categorical variables). How do I train it with ktrain?
Can I use tf.data.Dataset instances with ktrain?
Why am I seeing a "list index out of range" error when calling predict?
How do I train a transformers model from a saved checkpoint folder?
How do pretrain a language model for use with ktrain?
How do I get reproducible results?

Evaluation, Inspection, and Prediction

How do I get the predicted class "probabilities" of a model?
How do I use custom metrics with ktrain?
How do I obtain the word or sentence embeddings after fine-tuning a Transformer-based text classifier?
Running predictor.explain for text classification is slow. How can I speed it up?
Running preprocess_train for Transformer models is slow. How can I speed it up?
How do I make quantized predictions with transformers models?
How do I increase batch size for predictions?
How do I speed up predictions?
How do I do cross validation with transformers?

I am a newcomer and am having trouble figuring out how to even get started. Where do I begin?

Machine learning models (e.g., neural networks) are trained on example inputs and outputs to learn mappings between them. Once trained, given a new input, a correct output can be predicted. For example, if you train a neural network on documents as inputs and document categories (e.g., subject areas) as outputs, the neural network will learn to predict the categories of new documents.

Training neural network models can be computationally intensive due to the number of mathematical operations it takes to learn the mappings. GPUs (or Graphical Processing Units) are devices that allow you train neural networks faster by performing many mathematical operations at the same time.

ktrain is a Python library that allows you train a neural network and make predictions using a minimal number of "commands" or lines of code. It is built on top of a library by Google called TensorFlow. Only very basic and minimal Python knowledge is required to use it.

A challenge for newcomers is setting up the programming environment. This includes 1) gaining access to a computer with a GPU, 2) installing and setting up the TensorFlow library to use the GPU, and 3) setting up Jupyter notebook. (A Jupyter notebook is a programming environment that allows you to type code and see and save results of that code in an interacive fashion.)
Fortunately, Google did a nice thing and made notebook environments with GPU access freely available "in the cloud" to anyone with a Gmail account.

Here is how you can quickly get started using ktrain:

Go to Google Colab and sign in using your Gmail account.
Go to this example notebook on image classification.
Save the notebook to your Google Drive: File --> Save a copy in Drive
Make sure the notebook is setup to use a GPU: Runtime --> Change runtime type and select GPU in the menu.
Click on each cell in the notebook and execute it by pressing SHIFT and ENTER at the same time. The notebook shows you how to build a neural network that recoginizes cats vs. dogs in photos.

If you're on a Windows laptop, you can follow these Windows installation instructions for TensorFlow and ktrain and try out ktrain locally.

Next, you can go through the tutorials to learn more. If you have questions about a method or function, type a question mark before the method and press ENTER in a Google Colab or Jupyter notebook to learn more. Example: ?learner.autofit.

For more information on Python, see here.
For more information on neural networks, see this page.
For more information on Google Colab, see this video.
For more information on Jupyter notebooks, see this video.

ktrain is inspired by some other libraries like fastai and ludwig. For a deeper dive into neural networks, the fastai MOOC and the TensorFlow and Deep Learning Without a PhD series are recommended.

Files

FAQ.md

Latest commit

History

FAQ.md

File metadata and controls

Frequently Asked Questions About ktrain

Getting Started

Installation/Deployment Issues

Training

Evaluation, Inspection, and Prediction

I am a newcomer and am having trouble figuring out how to even get started. Where do I begin?

How do I train on dataset too large for RAM?

How do I resume training from a saved checkpoint?

Method 1: Using Predictor API (RECOMMENDED - works for any model)

Method 2: Using transformers library (if training Hugging Face Transformers model)

Method 3: Using checkpoint_folder argument to save model weights

How do I obtain the word or sentence embeddings after fine-tuning a Transformer-based text classifier?

How do I install ktrain on a Windows machine?

Installation on Windows

Resolving Problems

Running an Example

How do I use ktrain without an internet connection?

Example 1: Text Classification (with no internet)

Example 2: Open-Domain QA (with no internet)

How do I train using multiple GPUs?

How do I train a model using mixed precision?

How do I deploy a model using Flask?

How do I use custom metrics with ktrain?

How do I get the predicted class "probabilities" of a model?

How do I handle imbalanced datasets?

How do I use custom loss functions or optimizers?

How do I retrieve or visualize training history?

I have a model that accepts multiple inputs (e.g., both text and other numerical or categorical variables). How do I train it with ktrain?

Can I use tf.data.Dataset instances with ktrain?

Why am I seeing a "list index out of range" error when calling predict?

Why am I seeing an ERROR when installing ktrain on Google Colab?

Running predictor.explain for text classification is slow. How can I speed it up?

Running preprocess_train for Transformer models is slow. How can I speed it up?

Why does texts_from_csv throw an error on Google Cloud Storage?

Why am I getting a 404 client error?

How do I use ktrain with documents in PDF, DOC, or PPT formats?

Can I use ktrain without a GPU?

How do I make quantized predictions with transformers models?

How do I train a transformers model from a saved checkpoint folder?

How do I pretrain a language model for use with ktrain?

Approach 1

Approach 2

How do I get reproducible results?

How do I increase batch size for predictions?

How do I do cross validation with transformers?

What kinds of applications have been built with ktrain?

Method 2: Using `transformers` library (if training Hugging Face Transformers model)

Method 3: Using `checkpoint_folder` argument to save model weights

Can I use `tf.data.Dataset` instances with ktrain?

Running `predictor.explain` for text classification is slow. How can I speed it up?

Running `preprocess_train` for Transformer models is slow. How can I speed it up?

Why does `texts_from_csv` throw an error on Google Cloud Storage?

How do I make quantized predictions with `transformers` models?