Added exercises B.1-B.7

PlayX-2333 · Jul 16, 2020 · 9855553 · 9855553
1 parent e8da3ee
commit 9855553
Showing 1 changed file with 165 additions and 5 deletions.
diff --git a/Appendix B Linear Algebra.ipynb b/Appendix B Linear Algebra.ipynb
@@ -57,15 +57,175 @@
     "Suppose $u=\\sum^d_{i=1}\\alpha_i v_i$ holds for unique $\\alpha_i$, if $v_i$ are not independent, then we have non-trivial solution to $\\sum^d_{i=1}\\beta_i v_i = 0$, we can write, e.g. $v_d = \\sum^{d-1}_{i=1}\\gamma_i v_i$, take this into $u$, we have $u=\\sum^d_{i=1} \\delta_i v_i$ where $\\delta_d = 0$. This contradicts with the assumption that $\\alpha$ is uniuqe. So $v_i$ must be independent.\n",
     "\n",
     "* (d) The vectors in set 3 are a basis for $R^2$.\n",
-    "* (e) "
+    "* (e) Consider $\\sum^m_{i=1}\\alpha_iv_i = 0$, if one of the $v_i$ is zero, we can set the corresponding $\\alpha_i \\ne 0$ and all other $\\alpha$ to be zeros, so we have non-trivial solution $\\alpha$, which says the vectors are dependent.\n",
+    "\n",
+    "* (f) Since $v_1,\\dots,v_m$ are in space of $R^d$, let the basis of $R^d$ be $e_1,\\dots,e_d$. We have $v_i = \\sum^d_{k=1}a_{ik}e_k$. From problem (e), if $v_1,\\dots,v_m$ are independent, they can'be be zero. So their coordinates $a_{ik}$ on the basis $e_k$ won't be all zero. Pick the first non-zero coordinate, e.g. $a_{11}$, we have $e_1 = \\frac{1}{a_{11}}(v_1 -\\sum^d_{i=2}a_{ik}e_k)$, so we see that $v_1, e_2, \\dots, e_d$ can span $R^d$. Similarly, we can replace $e_2$ with $v_2$, thus have $v_1, v_2, e_3, \\dots, e_d$ span the space $R^d$. Continue doing this until we have $v_1,\\dots, v_d$. If we add one more vector $v_{d+1}$ into the list of vectors, we know it can be represented by a linear combination of $v_1, \\dots, v_d$ since by construction they space the $R^d$. This contradicts with the assumption that $v_1, \\dots, v_m$ are independent. So we must have $m\\le d$\n",
+    "\n",
+    "See the [$n+1$ vectors in  $R^n$  cannot be linearly independent](https://math.stackexchange.com/questions/473853/n1-vectors-in-mathbbrn-cannot-be-linearly-independent) on math.stackexchange.com.\n",
+    "\n",
+    "* (g) We start from $v_1$ and construct a set of vectors that are perpendicular to each other. Let $u_1 = \\frac{v_1}{|v_1|}$. Let $u_2 = v_2 - v^T_2u_1u_1$. We see that $u_2$ is perpendicular to $u_1$ and both have unit length. Next, construct $u_3$ such that it's perpendicular to both $u_1,u_2$. Contintue in such pattern until we have $u_m$. As $u_i$ are constructed from $v_i$, we see that $u_i$ should span $R^d$ as well. If $m < d$, we can construct $u_{m+1}$ which is perpendicular to $u_1,\\dots, u_m$, this will contradict the assumption that $v_1, \\dots, v_m$ span $R^d$. So we must have $m \\ge d$.\n",
+    "\n",
+    "* (h) A basis spans $R^d$, from problem (g), we see that its cardinality $m \\ge d$. Also the vectors in basis are independent, from problem (f), we have $m \\le d$, combine both conditions, we have $m=d$.\n",
+    "\n",
+    "* (i) If $v_1,v_2$ are orthogonal, we have $v_1^Tv_2 = 0$, for any $\\alpha_1, \\alpha_2$, and \n",
+    "$\\alpha_1v_1 + \\alpha_2v_2$, multiply the expression by $v_1^T$, we have $\\alpha_1 v^T_1v_1 + \\alpha_2 v^T_1v_2 = \\alpha_1 |v_1|^2$, for this equal to zero, we would have $\\alpha_1 = 0$ unless $v_1=0$, which is not true by assumption. Similarly, we can show that $\\alpha_2 = 0$ as well. So $v_1,v_2$ are independent. \n",
+    "\n",
+    "If $v_1, v_2$ are independent, then we have $v^T_1(v_2 - \\lambda v_1) = v^T_1(v_2 - \\frac{v^T_1v_2}{v^T_1v_1}v_1) = v^T_1v_2 - \\frac{v^T_1v^T_1v_2v_1}{v^T_1v_1} = v^T_1v_2 - v^T_1v_2\\frac{v^T_1v_1}{v^T_1v_1} = v^T_1v_2 - v^T_1v_2 = 0$\n",
+    "\n",
+    "So $v_1$ and $v_2 - \\lambda v_1$ are orthogonal.\n",
+    "\n",
+    "If a vector $u$ is spanned by $v_1,v_2$, we have $u=a_1v_1 + a_2v_2 = (a_1+\\lambda a_2)v_1 + a_2 (v_2 - \\lambda v_1)  = b_1 v_1 + b_2 (v_2 - \\lambda v_1)$\n",
+    "\n",
+    "So they have the same span. \n",
+    "\n",
+    "The above derivation doesn't assume $v_1, v_2$ are dependent, as long as $v_1, v_2 \\ne 0$, we still have $v_1, v_2-\\lambda v_1$ orthogonal to each other and have the same span as $v_1, v_2$. \n",
+    "\n",
+    "* (j) Follow the construction of problem (i), we can transform any set of independent vectors to pairwise orthogonal vectors with the same span. \n",
+    "\n",
+    "* (k) We have $\\hat{v}_1 = \\frac{v_1}{|v_1|}$, $\\hat{v}_2 = v_2 - \\lambda_{21}v_1$, $\\hat{v}_3 = v_3 - \\lambda_{31}v_1 - \\lambda{32}v_2$, etc. \n",
+    "\n",
+    "* (l) Let $u=\\sum a_i v_i$, multiply both sides by $v^T_j$, we have $uv^T_j = \\sum a_i v^T_j v_i = a_j$, so $a_i = v^T_i u$.\n",
+    "\n",
+    "* (m) For any set of $d$ linearly independent vectors, we can construct an orthonormal basis by problem (k), this basis has a dimension of $d$, it's thus a basis for $R^d$.\n",
+    "\n",
+    "\n",
+    "#### Exercise B.2\n",
+    "\n",
+    "* (a) \n",
+    "\n",
+    "$AB = \\begin{bmatrix}3 & 8 \\\\3 & 7 \\\\ 6 & 0 \\end{bmatrix}$\n",
+    "\n",
+    "$Ax = \\begin{bmatrix}5 \\\\4 \\\\ 9 \\end{bmatrix}$\n",
+    "\n",
+    "$Bx$: doesn't exist\n",
+    "\n",
+    "$BA$: doesn't exist\n",
+    "\n",
+    "$B^TA^T = \\begin{bmatrix}3 & 3 & 6\\\\8 & 7 & 0 \\end{bmatrix}$\n",
+    "\n",
+    "$x^TAx = 40$\n",
+    "\n",
+    "$B^TAB = \\begin{bmatrix}18 & 15\\\\15 & 37 \\end{bmatrix}$\n",
+    "\n",
+    "$A^{-1} = \\begin{bmatrix}-\\frac{1}{3} & \\frac{2}{3} & 0\\\\\\frac{2}{3} & -\\frac{1}{3} & 0 \\\\ 0 & 0 & \\frac{1}{3} \\end{bmatrix}$\n",
+    "\n",
+    "* (b) $\\sum\\sum x_ix_jA_{ij} = \\sum x_i (1A_{i1}+2A_{i2}+3A_{i3}) = 1(1A_{11}+2A_{12}+3A_{13}) + 2(1A_{21}+2A_{22}+3A_{23}) + 3(1A_{31}+2A_{32}+3A_{33}) = 1(1+4)+2(2+2)+3(9) = 5 + 8 + 27 = 40$\n",
+    "\n",
+    "Which is equal to $x^TAx$ above.\n",
+    "\n",
+    "* (c) Solve for eigenvalues, we have $\\lambda_1=\\lambda_2 = 3, \\lambda_3 = -1$ and $v_1 =  \\begin{bmatrix}b \\\\ b \\\\ c \\end{bmatrix}$, $v_2 =  \\begin{bmatrix}-b \\\\ -b \\\\ \\frac{2b^2}{c} \\end{bmatrix}$, $v_3 = \\begin{bmatrix}a \\\\ -a \\\\ 0 \\end{bmatrix}$\n",
+    "\n",
+    "where $a,b,c$ are any numbers.\n",
+    "\n",
+    "* (d) There are 3 linearly independent rows in $B$. There are 3 linearly independent columns in $B$\n",
+    "\n",
+    "* (e) As $C$ is the basis matrix, we have for each \n",
+    "$a_i = \\sum^r_{k=1}r_{ik}c_k = \\sum^r_{k=1} a_ic^T_k c_k = C\\begin{bmatrix}a_ic^T_1 \\\\ a_ic^T_2 \\\\ \\dots \\\\ a_ic^T_r\\end{bmatrix}$\n",
+    "\n",
+    "so we can construct $R=\\begin{bmatrix}a_1c^T_1 & \\dots & a_dc^T_1\\\\ \\dots & \\dots & \\dots\\\\ a_1c^T_r & \\dots & a_dc^T_r\\end{bmatrix}$ and $A=CR$.\n",
+    "\n",
+    "  * (i) The column rank of $A$ is $r$ since the columns are spanned by basis $c_1, \\dots, c_r$\n",
+    "  * (ii) The dimension of $R$ is $r\\times d$\n",
+    "  * (iii) Let $R_{1,},\\dots, R_{r,}$ be the rows of matrix $R$, as $A=CR$, we have $A=c_1R_{1,} + \\dots + c_rR_{r,}$, clearly, the $i$-th row of $A$, $A_{i,} = c_{1i}R_{1,}+\\dots + c_{ri}R_{r,}$. It's a linear combination of rows in $R$.\n",
+    "  \n",
+    "  * (iv) Since rows in $A$ is a linear combination of rows in $R$, hence the dimension of the subspace spanned by the rows of $A$ is at most $r$, i.e. $\\text{column-rank}(A) \\ge \\text{row-rank}(A)$\n",
+    "  * (v) Similarly apply (iii)-(iv) on $A^T = R^TC^T$, The row-rank of $A^T$ (i.e. the column rank of $A$) is less than or equal to the column rank of $A^T$, which is the row rank of $A$, i.e.  $\\text{column-rank}(A) \\le \\text{row-rank}(A)$\n",
+    "  \n",
+    "  Combine (iv) and (v) we have $\\text{column-rank}(A) = \\text{row-rank}(A)$\n",
+    "  \n",
+    "See [Proofs that column rank = row rank on wikipedia](https://en.wikipedia.org/wiki/Rank_(linear_algebra))  \n",
+    "\n",
+    "#### Exercise B.3\n",
+    "\n",
+    "* $AA^{+}A = U\\Gamma V^T V \\Gamma^{-1} U^TU\\Gamma V^T = U\\Gamma\\Gamma^{-1}\\Gamma V^T = U\\Gamma V^T = A$\n",
+    "\n",
+    "* $A^{+}AA^{+} = V \\Gamma^{-1} U^T  U\\Gamma V^T V \\Gamma^{-1} U^T = V\\Gamma^{-1} U^T =A^{+}$\n",
+    "\n",
+    "* $(AA^{+})^T = (A^{+})^TA^T = (V \\Gamma^{-1} U^T)^T V\\Gamma U^T = U\\Gamma^{-1}V^TV\\Gamma U^T = UU^T = U\\Gamma V^TV \\Gamma^{-1} U^T = AA^{+}$\n",
+    "\n",
+    "* $(A^{+}A)^T = A^T(A^{+})^T = V\\Gamma U^T (V \\Gamma^{-1} U^T)^T = V\\Gamma U^T U\\Gamma^{-1}V^T = VV^T = V\\Gamma^{-1} U^TU\\Gamma V^T = A^{+}A$\n",
+    "\n",
+    "* $(A^T)^{+} = (V\\Gamma U^T)^{+} = U\\Gamma^{-1} V^T$ by definition,  and $(A^{+})^T = U\\Gamma^{-1}V^T$ so we have $(A^T)^{+} =(A^{+})^T$\n",
+    "\n",
+    "#### Exercise B.4\n",
+    "\n",
+    "* (a) Since for $|A|$, every summand has exactly one term from each column, for $I$, only the diagonal entries are nonzero, so we have $|I| = \\epsilon_{1,\\dots ,k}I_{11}\\dots I_{kk} = 1$\n",
+    "\n",
+    "Similarly we have $|D| = \\prod^N_{i=1}D_{ii}$\n",
+    "\n",
+    "* (b) For a given order of $i_1,\\dots, i_N$ (from $1,\\dots, N$), we have $|A| = \\sum_{j_1,\\dots,j_N} \\epsilon_{j_1,\\dots,j_N}A^T_{i_1j_1}\\dots A^T_{i_Nj_N}$. There are $N!$ such orders of $1,\\dots, N$, so we have $|A| =  \\frac{1}{N!}\\sum_{i_1,\\dots,i_N}\\sum_{j_1,\\dots,j_N}\\epsilon_{i_1,\\dots,i_N}\\epsilon_{j_1,\\dots,j_N}A_{i_1j_1}\\dots A_{i_Nj_N}$\n",
+    "\n",
+    "* (c) $|A^T| = \\frac{1}{N!}\\sum_{i_1,\\dots,i_N}\\sum_{j_1,\\dots,j_N} \\epsilon_{i_1,\\dots,i_N}\\epsilon_{j_1,\\dots,j_N}A^T_{i_1j_1}\\dots A^T_{i_Nj_N} = \\frac{1}{N!}\\sum_{i_1,\\dots,i_N}\\sum_{j_1,\\dots,j_N} \\epsilon_{i_1,\\dots,i_N}\\epsilon_{j_1,\\dots,j_N}A_{j_1i_1}\\dots A_{j_Ni_N} = \\frac{1}{N!}\\sum_{j_1,\\dots,j_N}\\sum_{i_1,\\dots,i_N}\\epsilon_{j_1,\\dots,j_N}\\epsilon_{i_1,\\dots,i_N}A_{j_1i_1}\\dots A_{j_Ni_N} = |A|$\n",
+    "\n",
+    "* (d) We have $1 = |I| = |O^TO| = |O||O^T| = |O|^2$, so $|O| = \\pm 1$\n",
+    "\n",
+    "$|OAO^T| = |O||A||O^T| = |O|^2|A| = |A|$\n",
+    "\n",
+    "* (e) $|A^{-1}A| = |A^{-1}||A| = |I_n| = 1$, so we have $|A^{-1}=\\frac{1}{|A|}$\n",
+    "\n",
+    "* (f) If $A$ is symmetric, we can write $A=U\\Lambda U^T$, thus $|A|=|U||\\Lambda||U^T| = |U|^2 |\\Lambda| = |\\Lambda| = \\prod^n_{i=1}\\lambda_i$\n",
+    "\n",
+    "#### Exercise B.5\n",
+    "\n",
+    "Consider $M=\\begin{bmatrix}I_n & A \\\\ -B & I_d\\end{bmatrix}$, according to the formula of block matrix, we have\n",
+    "\n",
+    "$|M| = |M_{22}||F_1| = |M_{11}||F_2|$, whereas\n",
+    "\n",
+    "$|M_{22}| = |M_{11}|= 1$, so we have $|F_1| = |M_{11}-M_{12}M^{-1}_{22}M_{21}| =  |I_n + AI_dB| = |I_n + AB|$ and $|F_2| = |M_{22} - M_{21}M^{-1}_{11}M_{12}| = |I_d + BI_nA| = |I_d + BA|$\n",
+    "\n",
+    "so we have $|I_n + AB| = |I_d + BA|$\n",
+    "\n",
+    "This is the Sylvester's determinant theorem.\n",
+    "\n",
+    "We have $A+XBY^T = A(I + A^{-1}XBY^T)$, take determinant on both sides and apply Sylvester's theorem above, we have\n",
+    "\n",
+    "$|A+XBY^T| = |A||I + A^{-1}XBY^T| = |A| |I + BY^TA^{-1}X| = |A||B(B^{-1}+Y^TA^{-1}X| = |A||B||B^{-1}+Y^TA^{-1}X|$\n",
+    "\n",
+    "#### Exercise B.6\n",
+    "\n",
+    "$\\|A\\|_2 = \\max_{\\|x\\|=1}\\|Ax\\|$, consider instead $\\|Ax\\|^2 = (Ax)^T(Ax) = x^TA^TAx = x^TV\\Gamma U^T U \\Gamma V^T x = x^T V \\Gamma^2 V^T x = x^TSx $ \n",
+    "\n",
+    "where $S=V \\Gamma^2 V^T$, it's clear that the eigenvalues of $S$ are the diagonal entries in $\\Gamma^2$, i.e. $\\lambda_i = \\gamma_i^2$.  Let $e_i$ be the eigenvectors of $S$, then we have $x=\\sum a_i e_i$, so we have $Sx = \\sum a_i \\lambda_i e_i = \\sum \\lambda_i a_i e_i$\n",
+    "\n",
+    "thus $x^TSx = \\sum a_i e^T_i \\sum \\lambda_j a_j e_j = \\sum \\lambda_i a_i^2 $ as $e_i$ are orthonormal. \n",
+    "Since we have $x^Tx = \\sum a_i^2 = 1$, so $x^TSx = \\sum \\lambda_i a_i^2  \\le \\sum \\lambda_1 a_i^2 = \\lambda_1$, where $\\lambda_1 $ is the largest eigenvalue of $S$, thus the maximum value of $\\|A\\|_2$ is $\\sqrt{\\lambda_1} = \\gamma_1$ the largest singular value. This is achieved when $x$ is the eigenvector corresponding to the $\\lambda_1$.\n",
+    "\n",
+    "See [the proof at math.stackexchange.com](https://math.stackexchange.com/questions/586663/why-does-the-spectral-norm-equal-the-largest-singular-value).\n",
+    "\n",
+    "$\\|A\\|_F = trace(AA^T) = trace(U\\Gamma V^T V\\Gamma U^T) = trace(U\\Gamma^2 U^T) = trace(\\Gamma^2 U^TU) = trace(\\Gamma^2) = \\sum^{\\rho}_{i=1}\\gamma_i^2$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
+   "cell_type": "markdown",
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "source": [
+    "#### Exercise B.7\n",
+    "\n",
+    "We express $x$ in terms of $z$, we have $x =U\\Lambda^{\\frac{1}{2}}z + \\mu$, so $x-\\mu = U\\Lambda^{\\frac{1}{2}}z$\n",
+    "\n",
+    "We have \n",
+    "\n",
+    "\\begin{align*}\n",
+    "P(x) &= \\frac{1}{(2\\pi)^{\\frac{d}{2}}|\\Sigma|^{\\frac{1}{2}}}e^{-\\frac{1}{2}(x-\\mu)^T\\Sigma^{-1}(x-\\mu)} \\\\\n",
+    "&= \\frac{1}{(2\\pi)^{\\frac{d}{2}}|\\Sigma|^{\\frac{1}{2}}}e^{-\\frac{1}{2}(U\\Lambda^{\\frac{1}{2}}z)^T\\Sigma^{-1}(U\\Lambda^{\\frac{1}{2}}z)} \\\\\n",
+    "&= \\frac{1}{(2\\pi)^{\\frac{d}{2}}|\\Sigma|^{\\frac{1}{2}}}e^{-\\frac{1}{2}z^T\\Lambda^{\\frac{1}{2}}U^T\\Sigma^{-1}U\\Lambda^{\\frac{1}{2}}z}\\\\\n",
+    "&= \\frac{1}{(2\\pi)^{\\frac{d}{2}}|\\Sigma|^{\\frac{1}{2}}}e^{-\\frac{1}{2}z^T\\Lambda^{\\frac{1}{2}}U^T(U\\Lambda U^T)^{-1}U\\Lambda^{\\frac{1}{2}}z}\\\\\n",
+    "&= \\frac{1}{(2\\pi)^{\\frac{d}{2}}|\\Sigma|^{\\frac{1}{2}}}e^{-\\frac{1}{2}z^T\\Lambda^{\\frac{1}{2}}U^TU^{-T}\\Lambda^{-1} U^{-1}U\\Lambda^{\\frac{1}{2}}z}\\\\\n",
+    "&= \\frac{1}{(2\\pi)^{\\frac{d}{2}}|\\Sigma|^{\\frac{1}{2}}}e^{-\\frac{1}{2}z^Tz}\\\\\n",
+    "\\end{align*}\n",
+    "\n",
+    "Also from exercise B.4 (d), we have $|\\Sigma| = |U\\Lambda U^T| = |\\Lambda|$, so $|\\Sigma|^{-\\frac{1}{2}} = |\\Lambda|^{-\\frac{1}{2}}$\n",
+    "\n",
+    "\\begin{align*}\n",
+    "E[x] &= \\int dx xP(x) \\\\\n",
+    "&= \\int \\frac{1}{|J|}dz (U\\Lambda^{\\frac{1}{2}}z+\\mu) \\frac{1}{(2\\pi)^{\\frac{d}{2}}|\\Sigma|^{\\frac{1}{2}}}e^{-\\frac{1}{2}z^Tz}\\\\\n",
+    "&= \\int dz (U\\Lambda^{\\frac{1}{2}}z+\\mu) \\frac{1}{(2\\pi)^{\\frac{d}{2}}}e^{-\\frac{1}{2}z^Tz}\\\\\n",
+    "\\end{align*}\n",
+    "\n",
+    "We also have $xx^T = (U\\Lambda^{\\frac{1}{2}}z + \\mu)(U\\Lambda^{\\frac{1}{2}}z + \\mu)^T = (U\\Lambda^{\\frac{1}{2}}z + \\mu)(z^T\\Lambda^{\\frac{1}{2}}U^T + \\mu^T) = U\\Lambda^{\\frac{1}{2}}zz^T\\Lambda^{\\frac{1}{2}}U^T + U\\Lambda^{\\frac{1}{2}}z\\mu^t +\\mu z^T\\Lambda^{\\frac{1}{2}}U^T + \\mu\\mu^T$\n",
+    "\n",
+    "take this into $E[xx^T]$ we obtain the corresponding formula."
+   ]
   }
  ],
  "metadata": {