Multivariate image analysis can be very helpful in many perspectives in computer vision applications, since it performs an effective data analysis in the intensity, spatial, and temporal domains. It helps data analysts achieve both descriptive and predictive goals. Meanwhile, in an iterative knowledge discovery and understanding, human experts gain the insight of the data generation process, hence integrate human expertise to improve the machine-generated models. In this tutorial, we will present a case study of multivariate data analysis in a thermal imaging application. This will help the audience understand the basic concept of multivariate image analysis and how the principle works in a relative intuitive manner.
The company Idletechs AS develops new methods, software and solutions for better use of Big Data streaming from modern sensors such as hyperspectral cameras, thermal cameras etc. Our methodology is based on more than 47 years of R&D, several books and several hundred research publications within multivariate data modelling of measurements. This research is now being cited more than 1000 times per year. Some of our most cited developments are: Partial Least Squares Regression (PLSR) for multivariate calibration, Multiplicative Signal Correction (MSC), Extended Multiplicative Signal Correction (EMSC) for separation of physical and chemical effects om light measurements. We focus mainly on industrial, shipping and subsea solutions, for new ways to detect unwanted process changes and to help humans understand their overwhelming Big Data streams of measurements. But our physics- and observation-based hybrid modelling algorithms are generic, and we also have a number of remote-sensing and aero-space activities. We also have considerable generic competence of relevance for satellite-based earth observation. Briefly summarized, Idletechs’ satellite-related activities and competences are:
Dr. Ping Zhao, Research Scientist, Email: ping.zhao@idletechs.com
This tutorial illustrates the basic concept of multivariate data analysis and how the principle works in practice via a thermal imaging application. Given a pre-recorded thermal image sequence, the task is to analyze the data with no prior information. The goal is to discover the enclosed thermal varying patterns, recognize potential objects, and identify thermal events. In the following sections, a typical workflow of multivariate image analysis is presented. Each section has essential texts for the problem description, figures for the solution illustration, code for the detail understanding and verification, and discussions for the model interpretation.
Idletechs collected the data used in this tutorial from an experimental setup shown above. On the table, we place a waffle maker to the left, an electric iron in the center with an occluded electronic welding, a hamburger grill to the right, and a bottle of water next to the window. The sequence has 18,717 thermal images as a time series of temperature measurements over a certain period. Each image has 60 by 80 pixels, and each pixel is a temperature measurement at a specific moment at a specific spatial location from the camera’s point of view. The temperature may vary between 9 to 183 celsius degrees.
The raw image data is a thermal time series stored in many CSV files. It can be time consuming to load them one by one while we improve our data analysis pipeline. We load all the images at once and restore them as a compact matrix. In a later phase, we load only the matrix to avoid loading and preprocessing overheads. A numerical matrix is a typical input to the multivariate image analysis. It has 18,717 rows, each of which corresponds to one thermal image, or in a different phase, one observation of all pixels at a specific moment; while, the 4800 (60 x 80) columns, each of which corresponds to the temperature variation of one pixel over the experimental period. In multivariate image analysis, we call one row in the input matrix as one observation, and one column as one variable.
# Setup a logging system
import logging
logging.basicConfig(format='%(asctime)s %(levelname)s: %(message)s', level=logging.INFO)
# Point to the data folder
from os.path import join
data_folder = join('..', 'Data', 'ThermalDataSet_20170221')
pp_folder = join(data_folder, 'Preprocessed')
pp_file = join(pp_folder, 'reshaped_data_matrix.npz')
import numpy as np
from os.path import exists
# Load existing input matrix if possible
if exists(pp_file):
logging.info('Loading raw data matrix and frame indices')
with np.load(pp_file) as loaded_data:
raw_data_matrix = loaded_data['raw_data_matrix']
frame_inds = loaded_data['frame_inds']
img_height = loaded_data['img_height']
img_width = loaded_data['img_width']
obs_num = raw_data_matrix.shape[0]
var_num = raw_data_matrix.shape[1]
logging.info('Loaded raw data matrix and frame indices')
# Generate the compact data matrix if necessary
else:
img_height = None
img_width = None
var_num = None
raw_data_matrix = None
frame_inds = None
# Create a list of CSV files
import glob
csv_folder = join(data_folder, 'CSV')
csv_list = [f for f in sorted(glob.glob(join(csv_folder, "*.csv")))]
obs_num = len(csv_list)
# Iterate through all CSV files
from os.path import basename
logging.info('Reading all CSV files')
for obs_ind in range(obs_num):
csv_file = csv_list[obs_ind]
base_name = basename(csv_file)
# Output the progress for every 10 percent of data
if obs_ind % np.ceil(obs_num * 0.1) == 0:
logging.info('Reading CSV file %s', base_name)
# Generate image frame indices
if frame_inds is None:
frame_inds = np.ndarray((obs_num))
frame_inds[obs_ind] = np.float(base_name[0:6])
# Load one image frame
import pandas
data = pandas.read_csv(csv_file, header=None, skiprows=2).to_numpy()
# Collect image information
if img_height is None:
img_height = data.shape[0]
else:
if img_height != data.shape[0]:
logging.error('Inconsistent image height in pixels: %s', base_name)
continue
if img_width is None:
img_width = data.shape[1]
else:
if img_width != data.shape[1]:
logging.error('Inconsistent image width in pixels: %s', base_name)
continue
if var_num is None:
var_num = img_height * img_width
if raw_data_matrix is None:
raw_data_matrix = np.ndarray((obs_num, var_num))
raw_data_matrix[obs_ind, :] = data.flatten()
# Matrix generation is finished
logging.info('Read all CSV files')
# Save the input matrix for future reloading
logging.info('Saving raw data matrix and frame indices')
np.savez_compressed(pp_file,
raw_data_matrix=raw_data_matrix,
frame_inds=frame_inds,
img_height=img_height,
img_width=img_width,
allow_pickle=False)
logging.info('Saved raw data matrix and frame indices')
The frame indices recorded as the variable frame_inds is incontinuous. This is likely because the data recorder did not report some data. This is not an obstacle to multivariate image analysis, but for the convenience of model interpretation we append frame_inds as one additional column to the input data matrix. The model construction process will exploit the correlation between the temporal information and thermal measurements if any.
import matplotlib.pyplot as plt
%matplotlib inline
continuous_frame_inds = range(len(frame_inds)) / np.max(frame_inds)
normalized_frame_inds = frame_inds / np.max(frame_inds)
# Generate a normalized dataset
normalized_data_matrix = raw_data_matrix / np.max(raw_data_matrix)
original_input_matrix = np.hstack((normalized_frame_inds.reshape(obs_num, 1), normalized_data_matrix))
inserted_col_num = original_input_matrix.shape[1] - var_num
extended_var_num = var_num + inserted_col_num
plt.figure(figsize=(8, 8))
plt.plot(continuous_frame_inds, continuous_frame_inds, 'b',
continuous_frame_inds, normalized_frame_inds, 'r')
plt.xlabel('Continuous Frame Indices')
plt.ylabel('Actual Frame Indices')
plt.legend(['Continuous Frame Indices', 'Normalized Frame Indices'])
plt.grid(True)
plt.title('Actual vs Continuous Frame Indices')
plt.show()
Before constructing a learning model, it is important to have an initial inspection of the data. The purpose is to understand the sample distribution, detect unexpected system behaviour ASAP, and perform mandatory data calibration. We know one major challenge in this context as curse of dimensionality. In this tutorial, the input matrix has a high dimensionality 18,713 x 4,800, so it is impossible to visualize the sample points in a 2D or 3D coordinate system to achieve an intuitive data exploration. However, data analysts are often being “blessed” more than being “cursed”. Most data collected from an artificial process have a nice intrinsic collinearity structure. That means each observation (correspond to each row in the data matrix) can be approximated by a linear combination of a fewer number of vectors with the same length.
For example, considering a simple case that the recorded time series data just a sequence of floating numbers. Let’s denote these observations with a variable $x$, then it can be rewritten as
$x = t \times q + c + \epsilon$,
where $q$ is a selected representative variable, $t$ is their scaling ratio, $c$ is their baseline difference which is a constant for a specific dataset, and $\epsilon$ is the error or lack of fit of the model to the data. In this way, the original variable $x$ is losslessly represented by another variable $q$. Since the collected data is essentially multivaraite, we rewrite the equation above by expanding both the dependent and independent variables to vectors,
$ \begin{pmatrix} X_{1}&...&X_{j}&...&X_{m} \end{pmatrix} = t \times \begin{pmatrix} Q_{1}&...&Q_{j}&...&Q_{m} \end{pmatrix} + \begin{pmatrix} C_{1}&...&C_{j}&...&C_{m} \end{pmatrix} + \begin{pmatrix} E_{1}&...&E_{j}&...&E_{m} \end{pmatrix} $,
To push this further, we stack all observations together, thus we get the input data as a matrix $X$,
$ X = \begin{pmatrix} X_{11}&...&X_{1j}&...&X_{1m}\\ ...&...&...&...&...\\ X_{i1}&...&X_{ij}&...&X_{im}\\ ...&...&...&...&...\\ X_{n1}&...&X_{nj}&...&X_{nm} \end{pmatrix} = \begin{pmatrix} T_{1}\\...\\T_{i}\\...\\T_{n} \end{pmatrix} \times \begin{pmatrix} Q_{1}&...&Q_{j}&...&Q_{m} \end{pmatrix} + \begin{pmatrix} C_{1}&...&C_{j}&...&C_{m}\\ ...&...&...&...&...\\ C_{1}&...&C_{j}&...&C_{m}\\ ...&...&...&...&...\\ C_{1}&...&C_{j}&...&C_{m} \end{pmatrix} + \begin{pmatrix} E_{11}&...&E_{1j}&...&E_{1m}\\ ...&...&...&...&...\\ E_{i1}&...&E_{ij}&...&E_{im}\\ ...&...&...&...&...\\ E_{n1}&...&E_{nj}&...&E_{nm} \end{pmatrix} $,
where $n$ is the number of observations. This representation is lossless, however it is not perfect. If any observation as the row vector $X_i$ has a relatively strong nonlinear relationship to the vector $Q$, then the absolute value of associated error $\epsilon_i$ will be large. One efficient way to reduce the absolute error is increasing the number of selected vector $Q$, then we have the following equation.
$ X = \begin{pmatrix} T_{11}&...&T_{1p}&...&T_{1k}\\ ...&...&...&...&...\\ T_{i1}&...&T_{ip}&...&T_{ik}\\ ...&...&...&...&...\\ T_{n1}&...&T_{nj}&...&T_{nk} \end{pmatrix} \times \begin{pmatrix} Q_{11}&...&Q_{1j}&...&Q_{1m}\\ ...&...&...&...&...\\ Q_{r1}&...&Q_{rj}&...&Q_{rm}\\ ...&...&...&...&...\\ Q_{k1}&...&Q_{kj}&...&Q_{km} \end{pmatrix} + \begin{pmatrix} C_{1}&...&C_{j}&...&C_{m}\\ ...&...&...&...&...\\ C_{1}&...&C_{j}&...&C_{m}\\ ...&...&...&...&...\\ C_{1}&...&C_{j}&...&C_{m} \end{pmatrix} + \begin{pmatrix} E_{11}&...&E_{1j}&...&E_{1m}\\ ...&...&...&...&...\\ E_{i1}&...&E_{ij}&...&E_{im}\\ ...&...&...&...&...\\ E_{n1}&...&E_{nj}&...&E_{nm} \end{pmatrix} $,
where $k$ is the number of selected vectors.
In this context, if we define a matrix $P$ as a transpose of the matrix $Q$
$ P = \begin{pmatrix} P_{11}&...&P_{1p}&...&P_{1k}\\ ...&...&...&...&...\\ P_{ij}&...&P_{ir}&...&P_{ik}\\ ...&...&...&...&...\\ P_{m1}&...&P_{mr}&...&P_{mk} \end{pmatrix} = \begin{pmatrix} Q_{11}&...&Q_{1j}&...&Q_{1m}\\ ...&...&...&...&...\\ Q_{r1}&...&Q_{rj}&...&Q_{rm}\\ ...&...&...&...&...\\ Q_{k1}&...&Q_{kj}&...&Q_{km} \end{pmatrix}^T $,
then we get the standard bilinear model written in a matrix form
$X = TP^T + C + E$,
where $T$ is commonly referred as score matrix, and $P$ is referred as loading matrix, in multivariate data analysis. The interpretation to this model is that all row vectors in $X$ are represented by a few column vectors in matrix $P$ by scaling them with coefficients defined in $T$ and shifting them by $C$ if you are willing to accept the errors presented in $E$. Does this sound familiar? Yes, this process defines a coordinate system may or may not be an Eucilidean version but an oblique version, since the selected column vectors in $P$ may not be orthogonal to each other. A geometrical illustration is shown below.
The original observation $\begin{pmatrix} X_{1}&...&X_{j}&...&X_{m} \end{pmatrix}$ in the coordinate system $\begin{pmatrix} E_{1}&...&E_{j}&...&E_{m} \end{pmatrix}$. After the linear transformation based on the bilinear model, the observation’s new coordinates are $\begin{pmatrix} P_{1}&...&P_{r}&...&P_{k} \end{pmatrix}$, where $k \ll m$. In this tutorial, $m = 4800$, while $k$ can go down even to 1 if the data analyst will accept large errors (lack of fit) in matrix $E$. Then the original $n \times m$ elements in matrix $X$ can be approximated by $(n + m) \times k + m$ elements. For example, in this tutorial after the transformation, we saved around $100 - ((18713 + 4800) * 3 + 4800)/(18713 * 4800) * 100 = 99.91$ percent of the storage space by losing a small amount of total variance in the data, provided the option $k = 3$ is used.
There are many methods and implementations can find a reasonable set of representative matrix $P$. One popular option is Singular Value Decomposition In this tutorial, I selected the truncated version and used to archieve a fast and efficient matrix decomposition. The popular programming languages, such as, Python, Matlab, R, and Julia, have very good even built-in supports to this method. Truncated SVD creates a more “detailed” bilinear model $X = U \times S \times V^T + C + E$, where $V$ is mathmetically equivalent to $P$, $U \times S$ is equivalent to $T$, and $k$ reserves a large amount of data variance in $X$. The principal components (column vectors) in $P$ are guaranteed to be orthgonal to each other, and they are optimized and ranked to always point to the directions where the data has the largest variances. I show a geometrical illustration of truncated SVD as the figure below.
import numpy as np
from copy import deepcopy
from numpy.linalg import svd
from scipy.sparse.linalg import svds
# https://en.wikipedia.org/wiki/Talk:Varimax_rotation
def varimax(Phi, gamma = 1.0, q = 20, tol = 1e-6):
from numpy import eye, asarray, dot, sum, diag
from numpy.linalg import svd
p,k = Phi.shape
R = eye(k)
d=0
for i in range(q):
d_old = d
Lambda = dot(Phi, R)
u,s,vh = svd(dot(Phi.T,asarray(Lambda)**3 - (gamma/p) * dot(Lambda, diag(diag(dot(Lambda.T,Lambda))))))
R = dot(u,vh)
d = sum(s)
if d_old!=0 and d/d_old < 1 + tol: break
return dot(Phi, R)
# Remove emperical mean
#data_range = range(1, 18713)
#input_matrix = original_input_matrix[data_range, :]
#normalized_frame_inds = frame_inds / np.max(frame_inds)
#normalized_frame_inds = normalized_frame_inds[data_range]
input_matrix = original_input_matrix
center = np.mean(input_matrix, axis=0).reshape((1, input_matrix.shape[1]))
centered_matrix = input_matrix - center
total_var = np.sum(np.square(centered_matrix))
# Matrix decomposition
extracted_comp_num = 6
[u, s, vt] = svds(centered_matrix, k=extracted_comp_num, return_singular_vectors=True)
# Numpy has an inversed rank order of singular values
s = np.flip(s)
vt = np.flip(vt, axis=0)
loadings = vt.T
scores = np.dot(centered_matrix, loadings)
# Calculate explained variance
explained_var = np.zeros((extracted_comp_num))
for comp_ind in range(extracted_comp_num):
score = scores[:, comp_ind].reshape((centered_matrix.shape[0], 1))
loading = loadings[:, comp_ind].reshape((centered_matrix.shape[1], 1))
recon_data = np.dot(score, loading.T)
recon_comp_var = np.sum(np.square(recon_data))
explained_var[comp_ind] = recon_comp_var / total_var * 100
explained_var = np.cumsum(explained_var)
rotated_loadings = varimax(loadings)
# Manual switch loading signs based on interpretation
for comp_ind in range(extracted_comp_num):
max_ind = np.argmax(np.abs(rotated_loadings[:, comp_ind].flatten()))
loading_sign = np.sign(rotated_loadings[max_ind, comp_ind])
rotated_loadings[:, comp_ind] = loading_sign * rotated_loadings[:, comp_ind]
#rotated_loadings[:, 0] = -rotated_loadings[:, 0]
rotated_loadings[:, 1] = -rotated_loadings[:, 1]
#rotated_loadings[:, 2] = rotated_loadings[:, 2]
rotated_scores = np.dot(centered_matrix, rotated_loadings)
rotated_explained_var = np.zeros((extracted_comp_num))
for comp_ind in range(extracted_comp_num):
rotated_score = rotated_scores[:, comp_ind].reshape((centered_matrix.shape[0], 1))
rotated_loading = rotated_loadings[:, comp_ind].reshape((centered_matrix.shape[1], 1))
rotated_recon_comp_var = np.sum(np.square(np.dot(rotated_score, rotated_loading.T)))
rotated_explained_var[comp_ind] = rotated_recon_comp_var / total_var * 100
rotated_explained_var = np.cumsum(rotated_explained_var)
import matplotlib.pyplot as plt
%matplotlib inline
plot_col_num = 2
plot_row_num = 1
plt.figure(figsize=(5 * plot_col_num, 5 * plot_row_num))
plt.subplot(1, 2, 1)
plt.bar(np.linspace(1, extracted_comp_num, extracted_comp_num), explained_var)
plt.grid(True)
plt.xlabel('Component Index')
plt.ylabel('Explained Variance')
plt.title('Explained Data Variance by Components')
plt.subplot(1, 2, 2)
plt.bar(np.linspace(1, extracted_comp_num, extracted_comp_num), rotated_explained_var)
plt.grid(True)
plt.xlabel('Component Index')
plt.ylabel('Explained Variance After Rotation')
plt.title('Explained Data Variance by Rotated Components')
plt.show()
import matplotlib.pyplot as plt
%matplotlib inline
plot_col_num = 2
plot_row_num = extracted_comp_num
bin_num = np.int(np.ceil(var_num / 100))
plt.figure(figsize=(5 * plot_col_num, 5 * plot_row_num))
for comp_ind in range(extracted_comp_num):
loading_img = loadings[inserted_col_num:, comp_ind].reshape((img_height, img_width))
plt.subplot(plot_row_num, plot_col_num, comp_ind * plot_col_num + 1)
plt.imshow(loading_img, cmap='gray')
plt.xlabel('Image Width (pixels)')
plt.ylabel('Image Height (pixels)')
plt.title('Loading Image ' + str(comp_ind + 1) + ' Before Rotation')
rotated_loading_img = rotated_loadings[inserted_col_num:None, comp_ind].reshape((img_height, img_width))
plt.subplot(plot_row_num, plot_col_num, comp_ind * plot_col_num + 2
)
plt.imshow(rotated_loading_img, cmap='gray')
plt.xlabel('Image Width (pixels)')
plt.ylabel('Image Height (pixels)')
plt.title('Loading Image ' + str(comp_ind + 1) + ' After Rotation')
plt.show()
from mpl_toolkits import mplot3d
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(12, 8))
ax = plt.axes(projection="3d")
ax.scatter3D(scores[:, 0], scores[:, 1], scores[:, 2], c=normalized_frame_inds, cmap='rainbow');
plt.grid(True)
plt.xlabel('Score 1')
plt.ylabel('Score 2')
plt.title('Score Plot')
plt.figure(figsize=(12, 8))
ax = plt.axes(projection="3d")
ax.scatter3D(rotated_scores[:, 0], rotated_scores[:, 1], rotated_scores[:, 2], c=normalized_frame_inds, cmap='rainbow');
plt.grid(True)
plt.xlabel('Score 1 After Rotation')
plt.ylabel('Score 2 After Rotation')
plt.title('Score Plot After Rotation')
plt.show()
def inspect_thermal_stage(first_frame, last_frame, comp_inds, frame_inds, scores):
first_frame_ind = np.argwhere(frame_inds == first_frame)
last_frame_ind = np.argwhere(frame_inds == last_frame)
frame_count = last_frame_ind - first_frame_ind + 1
obs_range = np.linspace(first_frame_ind, last_frame_ind, frame_count, dtype=np.int).flatten()
comp_num = np.max(comp_inds) + 1
plot_row_num = 1
plot_col_num = comp_num
plt.figure(figsize=(5 * plot_col_num, 5 * plot_row_num))
for ind in range(comp_num):
comp_ind = comp_inds[ind]
plt.subplot(plot_row_num, plot_col_num, ind + 1)
plt.plot(frame_inds[obs_range], scores[obs_range, comp_ind])
plt.xlabel('Frame Index')
plt.ylabel('Score ' + str(comp_ind))
plt.grid(True)
plt.title('Score ' + str(comp_ind))
[u, s, vt] = svds(centered_matrix[obs_range, :], k=comp_num, return_singular_vectors=True)
s = np.flip(s)
vt = np.flip(vt, axis=0)
loadings = vt.T
for ind in range(comp_num):
comp_ind = comp_inds[ind]
max_ind = np.argmax(np.abs(loadings[:, comp_ind].flatten()))
loading_sign = np.sign(loadings[max_ind, comp_ind])
loadings[:, comp_ind] = loading_sign * loadings[:, comp_ind]
plt.figure(figsize=(5 * plot_col_num, 5 * plot_row_num))
for ind in range(comp_num):
comp_ind = comp_inds[ind]
loading_img = loadings[inserted_col_num:, comp_ind].reshape((img_height, img_width))
plt.subplot(plot_row_num, plot_col_num, ind + 1)
plt.imshow(loading_img, cmap='gray')
plt.show()
%matplotlib inline
inspect_thermal_stage(1, np.max(frame_inds), (0, 1, 2), frame_inds, rotated_scores)
inspect_thermal_stage(1, 151, (0, 1, 2), frame_inds, rotated_scores)
inspect_thermal_stage(152, 407, (0, 1, 2), frame_inds, rotated_scores)
inspect_thermal_stage(408, 461, (0, 1, 2), frame_inds, rotated_scores)
inspect_thermal_stage(462, 3896, (0, 1, 2), frame_inds, rotated_scores)
inspect_thermal_stage(3897, 4875, (0, 1, 2), frame_inds, rotated_scores)
inspect_thermal_stage(4876, 12856, (0, 1, 2), frame_inds, rotated_scores)
inspect_thermal_stage(12857, 16432, (0, 1, 2), frame_inds, rotated_scores)
inspect_thermal_stage(16434, 21522, (0, 1, 2), frame_inds, rotated_scores)
inspect_thermal_stage(21523, 23826, (0, 1, 2), frame_inds, rotated_scores)
import numpy as np
from copy import deepcopy
from numpy.linalg import svd
from scipy.sparse.linalg import svds
# Remove emperical mean
data_range = range(200, 407)
input_matrix = original_input_matrix[data_range, :]
normalized_frame_inds = frame_inds[data_range] / np.max(frame_inds[data_range])
center = np.mean(input_matrix, axis=0).reshape((1, input_matrix.shape[1]))
centered_matrix = input_matrix - center
total_var = np.sum(np.square(centered_matrix))
# Matrix decomposition
extracted_comp_num = 3
[u, s, vt] = svds(centered_matrix, k=extracted_comp_num, return_singular_vectors=True)
# Numpy has an inversed rank order of singular values
s = np.flip(s)
vt = np.flip(vt, axis=0)
loadings = vt.T
rotated_loadings = varimax(loadings)
# Manual switch loading signs based on interpretation
for comp_ind in range(extracted_comp_num):
max_ind = np.argmax(np.abs(rotated_loadings[:, comp_ind].flatten()))
loading_sign = np.sign(rotated_loadings[max_ind, comp_ind])
rotated_loadings[:, comp_ind] = loading_sign * rotated_loadings[:, comp_ind]
#rotated_loadings[:, 0] = -rotated_loadings[:, 0]
#rotated_loadings[:, 1] = -rotated_loadings[:, 1]
#rotated_loadings[:, 2] = rotated_loadings[:, 2]
rotated_scores = np.dot(centered_matrix, rotated_loadings)
rotated_explained_var = np.zeros((extracted_comp_num))
for comp_ind in range(extracted_comp_num):
rotated_score = rotated_scores[:, comp_ind].reshape((centered_matrix.shape[0], 1))
rotated_loading = rotated_loadings[:, comp_ind].reshape((centered_matrix.shape[1], 1))
rotated_recon_comp_var = np.sum(np.square(np.dot(rotated_score, rotated_loading.T)))
rotated_explained_var[comp_ind] = rotated_recon_comp_var / total_var * 100
rotated_explained_var = np.cumsum(rotated_explained_var)
import matplotlib.pyplot as plt
%matplotlib inline
plot_col_num = 1
plot_row_num = 1
plt.figure(figsize=(6 * plot_col_num, 6 * plot_row_num))
plt.bar(np.linspace(1, extracted_comp_num, extracted_comp_num), rotated_explained_var)
plt.grid(True)
plt.xlabel('Component Index')
plt.ylabel('Explained Variance After Rotation')
plt.title('Explained Data Variance by Rotated Components')
plt.show()
import matplotlib.pyplot as plt
%matplotlib inline
plot_col_num = 1
plot_row_num = extracted_comp_num
bin_num = np.int(np.ceil(var_num / 100))
plt.figure(figsize=(6 * plot_col_num, 6 * plot_row_num))
for comp_ind in range(extracted_comp_num):
rotated_loading_img = rotated_loadings[inserted_col_num:None, comp_ind].reshape((img_height, img_width))
plt.subplot(plot_row_num, plot_col_num, comp_ind * plot_col_num + 1)
plt.imshow(rotated_loading_img, cmap='gray')
plt.xlabel('Image Width (pixels)')
plt.ylabel('Image Height (pixels)')
plt.title('Loading Image ' + str(comp_ind + 1) + ' After Rotation')
plt.show()
from mpl_toolkits import mplot3d
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(8, 8))
ax = plt.axes(projection="3d")
ax.scatter3D(rotated_scores[:, 0], rotated_scores[:, 1], rotated_scores[:, 2], c=normalized_frame_inds, cmap='rainbow');
plt.grid(True)
plt.xlabel('Score 1 After Rotation')
plt.ylabel('Score 2 After Rotation')
plt.title('Score Plot After Rotation')
plt.show()
Multivariate data analysis is very helpful in thermal imaging applications, and it has the following advantages:
Also a few disadvantages: