-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathvectors.tex
1786 lines (1240 loc) · 109 KB
/
vectors.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\chapter{From Vectors to Tensors} \label{ch:vectors}
\section{What we need to unlearn}
We are first introduced to vectors in two different yet closely related and simplified forms. We're now going to rethink them as an abstraction, which will require us to be careful not to depend on any intuitions derived from our earlier encounters.
That's not to say that we will be abandoning the schoolhouse version of vectors; rather, we will be properly placing them in the context of a more general framework. Also they will very often help ground us, as long as we recognise their limitations.
\subsection{Arrows with direction and length}
The first way to think of vectors is by visualising them as arrows that have a direction and a length. Two vectors $\vec{a}$ and $\vec{b}$ can be added (Figure \ref{fig:vector-addition}) by laying them head to tail, so the sum $\vec{c}$ is the vector starting at the tail of $\vec{a}$ and ending at the head of $\vec{b}$.
\begin{figure}[h]
\centering
\begin{tikzpicture}
\draw[thick,->] (0,0) -- (2,0.5);
\node at (1,0) {$\vec{a}$};
\draw[thick,->] (2,0.5) -- (3,2.5);
\node at (2.7,1.3) {$\vec{b}$};
\draw[thick,->] (0,0) -- (3,2.5);
\node at (1.5,1.6) {$\vec{c}$};
\end{tikzpicture}
\caption{Adding arrows.} \label{fig:vector-addition}
\end{figure}
Scaling a vector (multiplying it by a number) just alters its length without changing its direction, e.g. multiply by $0.5$ to shrink the vector to half its prior length.
We also learn about the dot product, a scalar-valued operator between two vectors, $\vec{p}\cdot\vec{q}$. If the two vectors $\vec{p}$ and $\vec{q}$ are separated by angle $\theta$, and we know the magnitude (length) of each vector, e.g. $\|\vec{p}\|$, then:
$$
\vec{p}\cdot\vec{q} = \|\vec{p}\| \|\vec{q}\|\cos{\theta}
$$
When we get onto the abstract definition of a vector it may seem like the geometric viewpoint has been relegated to a special case, less fundamental. But it is often useful to keep it in your mind as a way to visualise vectors of any kind, because however abstractly they are defined, they will always be closely analogous to the familiar arrows.
\subsection{Columns of numbers}
The second concrete way to think of vectors is as columns of ordinary numbers, and the number of \textit{dimensions} of the space tells us how many numbers a column vector has to contain. In this form, to add two vectors we just deal with the rows separately: add the numbers in row $1$, and then the numbers in row $2$ and so on for however many rows there are in a column vector, and thus obtain the sum as a column:
$$
\begin{bmatrix}2 \\ 0.5\end{bmatrix} +
\begin{bmatrix}1 \\ 2\end{bmatrix} =
\begin{bmatrix}3 \\ 2.5\end{bmatrix}
$$
More succinctly we can use index notation $a_n$ to mean the value in the $n$th row of the column associated with vector $\vec{a}$, so to add two vectors we just do this:
$$
a_n + b_n = c_n
$$
Scaling a vector just involves multiplying all the rows by the same number:
$$
d_n = x a_n
$$
The dot product is extremely simple in this representation: like with addition, you treat each row separately, multiplying the numbers in row $1$ and so on, but then you just sum all the products to get the numeric value:
$$
\sum_n a_n b_n
$$
\subsection{Coordinates}
These two perspectives are united by introducing a coordinate grid (Figure \ref{fig:vector-coordinate-grid}).
\begin{figure}[h]
\centering
\begin{tikzpicture}
\draw[step=1cm,gray,very thin] (0,0) grid (4,3);
\draw[thick,->] (0,0) -- (2,0.5);
\node at (1,0) {$\vec{a}$};
\draw[thick,->] (2,0.5) -- (3,2.5);
\node at (2.7,1.3) {$\vec{b}$};
\draw[thick,->] (0,0) -- (3,2.5);
\node at (1.5,1.6) {$\vec{c}$};
\end{tikzpicture}
\caption{Coordinate grid.} \label{fig:vector-coordinate-grid}
\end{figure}
Much of this subject is concerned with ensuring that our choice of coordinate grid doesn't get confused with the physical facts. We're trying to get answers about nature, and those answers better not change just because we used a different coordinate grid. One of the most important ideas in physics is that vectors are primarily geometric objects. They can be described with numeric coordinates, but there is no preferred coordinate basis. A vector has an independent existence, because it describes something in the physical world.
But however abstract things get, it can often be helpful to remember that you can visualise vectors as arrows and think of the basic operations on them geometrically, and equally it can be helpful to remember that we will always have a way of representing vectors as columns of numbers (indeed, columns of numbers \textit{are} vectors.)
The primary intuition we've been implicitly relying on so far is \textit{orthonormality}. With arrow vectors we can simply see when they are orthogonal, or to be more precise we can measure the angle between vectors, and we can measure their lengths, and we can choose a unit length, and so on. We can simply draw a unit vector, and then draw another unit vector that is orthogonal to it.
Likewise from the column vectors we have no difficult choosing a set of orthonormal vectors, the \textit{standard basis}. They are \textit{one-hot}, all values zero except for a single $1$. In an $n$-dimensional space there can only be $n$ such distinct vectors.
None of these intuitive leaps will be available with abstract vector spaces, and orthonormality cannot be used as an elemental building block. We will build a quite rich set of more fundamental concepts before we invent orthonormality.
By the way, when in physics we speak of a \textit{vector field}, that is, a vector at each point in space, such as wind speed and direction, or the electric field, we visualise arrows spread out over space. But the value of the field in two different places may be the same.
This is obvious (and less confusing) in the case of a scalar field, such as temperature. At two different locations in a room, the temperature may be the same. It's a numerical value that varies from place to place, and the same number may appear in two places.
But exactly the same is true for a vector field. If the wind is some particular speed and direction at two different places on the map, we say the vectors are equal: they are the \textit{same vector}. From the point of view of considering their equality, it is irrelevant that they are associated with different locations in physical space. In vector space, there is one vector with that direction and length.\footnote{Although it will turn out that the luxury of using the same vector space for every point in physical space is only afforded to us if we assume that physical space is flat, which we will for now.}
On to the abstract stuff.
\section{Vectors as elements of a vector space}\label{sec:vectors-space}
A vector space is a set of objects, called vectors, about which we assume nothing except that we can perform certain operations on them.
\subsection{They can be added}
There is an operator $+$ that takes two objects from the set and returns another from the same set (we say it's a \textit{closed} operator).
This operator is commutative:
$$\vec{u} + \vec{v} = \vec{v} + \vec{u}$$
and associative:
$$\vec{u} + (\vec{v} + \vec{w}) = (\vec{v} + \vec{u}) + \vec{w}$$
There is a special object called $0$ (the \textit{zero vector}), which makes no difference when added to any object from the set:
$$\vec{v} + 0 = \vec{v}$$
Also every object has an opposite, known as its additive inverse, so they pair up. The inverse of $\vec{v}$ is written as $-\vec{v}$, and:
$$\vec{v} + (-\vec{v}) = 0$$
The above can written as $\vec{v} - \vec{v}$. Evidently $0$ is its own inverse.
Referring back to schoolhouse vectors, we can see how the arrows and the columns have an addition operation that satisfies all these requirements.
\subsection{They can be scaled}
For any vector space we must nominate an associated set of objects called scalars, having its own abstract requirements. In classical physics we almost always use the real numbers $\mathbb{R}$ as the scalars (in QM we use the complex numbers $\mathbb{C}$).
Our vectors can be multiplied by a scalar to get another object. Scaling them by $1$ makes no difference. Scaling them by $-1$ discovers the additive inverse.
Given two scalars $a$ and $b$, we can compute $c = ab$ and then scale an object $\vec{v}$ by it, or we can separately scale the object first by $a$ and then by $b$, and the result is the same:
$$(ab)\vec{v} = a(b\vec{v})$$
Scaling is distributive over addition of objects:
$$a(\vec{u} + \vec{v}) = a\vec{u} + a\vec{v}$$
And also over addition of scalars:
$$(a + b)\vec{v} = a\vec{v} + b\vec{v}$$
Again, arrows and columns have no problem meeting these requirements.
\subsection{Other Examples of Vector Spaces}
Any set of objects for which we can define these operations is a vector space, not just arrows and columns. The set of ordered tuples of real numbers $\mathbb{R}^n$ is just the column vectors with $n$ rows each. Also there is no reason why $n$ shouldn't be $1$, which means that the plain old set of real numbers $\mathbb{R}$ is also vector space. Think of the real number line as My First Vector Space\texttrademark.
Also the complex numbers $\mathbb{C}$, and tuples of them $\mathbb{C}^n$, work just as well. The example of $\mathbb{C}$ as a vector space is particularly interesting because of its close similarly to $\mathbb{R}^2$. The major difference is that it has a definition of multiplication as a closed operation over its vectors (such that the product of two vectors is a vector), which is absolutely not a general feature of vector spaces.\footnote{Although it is also defined (very differently) in $\mathbb{R}^3$ as the cross product, $\times$.}
In quantum mechanics we will contend with infinite-dimensional complex vector spaces.
\subsection{Fields}
The kind of set that can serve as a scalar is called by mathematicians a \textit{field} (an unfortunate collision of terminology given the very different meaning in physics), which is a set of objects on which we have defined addition, subtraction, multiplication and division, so real or complex numbers usually serve this purpose (and always do in physics), but vectors in general cannot serve as a field of scalars for other vector spaces, because the definition of a vector space says nothing about there being a natural way to multiply or divide pairs of vectors to obtain other vectors.
In the same way, you can't have a vector space of $\mathbb{R}$ over the field of $\mathbb{C}$, because although we can use regular multiplication to "scale" a vector from $\mathbb{R}$ by a scalar from $\mathbb{C}$, the result is likely to be a member of $\mathbb{C}$ but not of $\mathbb{R}$, and thus not a vector from the same space.
Unless we say otherwise, we'll assume the field is $\mathbb{R}$.
\subsection{Finding a Basis}
If we select two vectors $\vec{a}$ and $\vec{b}$ from the space, we may find that they only differ by a scalar ratio $x$:
$$
\vec{a} = x \vec{b}
$$
If there is an $x$ that can scale $\vec{b}$ into $\vec{a}$ then those two vector are \textit{colinear}\footnote{This literally means "on the same line". Note the interesting use of geometrical language, even though we're not supposed to be thinking about arrows in this abstract discussion} (we are careful not to say they point in the same direction because if $x$ is negative then they point in exactly opposite directions, but are still colinear.)
But if there is no such $x$ then they are not colinear. This gives them an interesting superpower:
$$
\vec{r} = x \vec{a} + y \vec{b}
$$
By varying the scalar coefficients $x$ and $y$ we can construct any vector $\vec{r}$ in a two-dimensional \textit{subspace} of the vector space.
We can generalise on this idea a bit by rearranging the equation (and supposing that $x$ becomes negative). If for two vectors $\vec{a}$ and $\vec{b}$ we can find a scalar ${x}$ so that:
$$
\vec{a} + x \vec{b} = 0
$$
then they are colinear. Suppose the two vectors point in the same direction but $\vec{a}$ is twice the length of $\vec{b}$. Then we can set $x = -2$ and the sum will cancel out. This is only possible because they are colinear. If it's not possible, we've found a pair of \textit{linearly independent} vectors. Now we can look for a third:
$$
\vec{a} + x \vec{b} + y \vec{c} \ne 0
$$
Supposing we find such a vector $\vec{c}$ for which there is no scalar $y$ that satisfies the above equation, then we have found three linearly independent vectors. Or to put it another way, it is not possible to make $\vec{c}$ by any weighted sum of $\vec{a}$ and $\vec{b}$:
$$
\vec{c} \ne x \vec{a} + y \vec{b}
$$
Visualising geometrically, we could say that $\vec{c}$ points outside of the planar subspace reachable by linear combination of $\vec{a}$ and $\vec{b}$. Now we can construct any vector in three dimensions:
$$
\vec{r} = x \vec{a} + y \vec{b} + z \vec{c}
$$
Eventually we may find (assuming the space is finite dimensional) that it is not possible to extend our linearly independent set $\vec{a}, \vec{b}, \vec{c}, ...$ any further. The size of this set tells us how many dimensions the space has, and these vectors are said to \textit{span} the space.
The vitally important thing to realise about this is that at no point have we said that these vectors are orthogonal. We haven't even defined what that means yet. We've only defined the property of linear independence. Nevertheless we have arrived at the idea of a coordinate grid; it's just that our grid may be awkwardly slanted, made of identical parallelogram tiles rather than identical square tiles.
So we don't have to keep choosing letters, we will label each dimension with an integer.\footnote{Note that this use of integers is quite wasteful, as usually they can be added, multiplied, etc. but we will just be using integers as mere labels.} The set of linearly independent vectors that we can use to construct any other vector in the space is called a \textit{basis}. The basis vectors are traditionally written as $\vec{e}_n$, where $n$ is often 1-based (although in Relativity it may be 0-based; this is purely a notational convention and makes no arithmetic difference). The scalar coefficients, which we will call coordinates, that construct a given vector $\vec{r}$ can also be numbered, conventionally with superscript $r^n$:
\begin{equation}
\begin{split}
\vec{r} &= r^1 \vec{e}_1 + r^2 \vec{e}_2 + ... + r^n \vec{e}_n \\
&= \sum_n r^n \vec{e}_n
\end{split}
\end{equation}
The use of a superscript index is obviously asking for trouble given that it looks like we're raising $r$ to a power\footnote{or possibly indicating an undetermined footnote?}, but this notation is universal in physics so we may as well get used to it.
Having chosen a basis, we can describe any vector with a tuple of coordinates $r^n$, so any vector space of dimension $N$ whose scalar field is $\mathbb{F}$ must be isomorphic with $\mathbb{F}^N$. In other words, all vectors can be described by column vectors, but the numbers in the columns will depend on our choice of basis.
But the laws of physics cannot possibly care what basis we choose, so we need ways of obtaining numeric facts about vectors that do not depend on the choice of basis.
\section{Covectors} \label{covector}
Think of a scalar-valued function of a vector. That is, a black box with a single input slot accepting a vector $\vec{a}$, and an output hole that gives us back a scalar $x$ (Figure \ref{fig:1-slot-box}).
\begin{figure}[h]
\centering
\begin{tikzpicture}
\draw[thick] (0,0) -- (2,0);
\draw[thick] (2,0) -- (2,2);
\draw[thick] (2,2) -- (0,2);
\draw[thick] (0,2) -- (0,0);
\draw[thick] (0,2) -- (-0.7,2.5);
\draw[thick] (-0.7,2.5) -- (-0.7,0.7);
\draw[thick] (-0.7,0.7) -- (0,0);
\draw[thick] (-0.7,2.5) -- (1.1,2.5);
\draw[thick] (1.1,2.5) -- (2,2);
\draw[thick] (1.4,2.1) -- (0.95,2.4);
\draw[thick] (0.95,2.4) -- (0.7,2.4);
\draw[thick] (0.7,2.4) -- (1.1,2.1);
\draw[thick] (1.1,2.1) -- (1.4,2.1);
\node at (1.1,3.1) { $\vec{a}$};
\draw[->] (1.1,2.85) -- (1.1,2.55);
\node at (-1.25,1) { $x$};
\draw[thick] (-0.4,1) ellipse (0.1 and 0.2);
\draw[->] (-0.6,1) -- (-1.1,1);
\node at (1,1) { $f$};
\end{tikzpicture}
\caption{Function $f$ with a single slot accepting a vector} \label{fig:1-slot-box}
\end{figure}
More precisely, this machine is a mapping from the vector space to the real numbers. There are infinitely many such mappings and they could be arbitrarily complicated. We will restrict ourselves to a simple subset of these mappings.
First, we note that it is possible to define the addition operator on mappings:
$$
f(\vec{a}) = g(\vec{a}) + h(\vec{a})
$$
That is, it could be that inside the box $f$, there are concealed two boxes $g$ and $h$. When $f$ receives an input vector $\vec{a}$, it passes it to both $g$ and $h$, and adds their results together to obtain its own result. Note that we haven't yet restricted the complexity of $g$ and $h$; we have no idea what they do to produce their individual results.
Likewise, it is possible to scale a mapping by a factor $x$:
$$
f(\vec{a}) = x g(\vec{a})
$$
The trick to restricting the complexity of our set of possible mappings is to require that they comply with the rules of a vector space. Not only can they be added and scaled, but combinations of these operations produce consistent results. But if we do that, then we have also ensured that the set of allowed mappings actually \textit{is} a vector space. Every mapping we care about must be a vector chosen from that space.
Note that we haven't proven that every possible mapping is a vector. We've merely restricted ourselves to only considering a subset of mappings, those that can be scaled and added to find other mappings from the same restricted subset, such that scaling a mapping by 5 is the same as scaling that mapping by 2 and separately by 3 and then adding those two scaled mappings.
If we label the original vector space $V$ then this associated vector space of mappings $V \mapsto \mathbb{R}$ is written as $V^*$ and is called the dual space of $V$. All we've discovered so far about $V$ also applies to $V^*$, including the idea of a set of mappings being linearly independent, which means we can select a basis of mappings chosen from $V^*$ and thus construct any mapping from it by scaling and adding the basis mappings. That's quite a leap, so pause to digest it. The moment you discover a set of objects is a vector space, you know you can choose a basis, and then describe anything in that space in terms of a weighted sum of that basis.
We call the mappings taken from $V^*$ \textit{covectors}. The basis covectors are labelled with superscripts $\vec{e}^n$ and the coordinates with subscripts $f_n$, so we can build any covector from the chosen basis:
\begin{equation}
\begin{split}
\vec{f} &= f_1 \vec{e}^1 + f_2 \vec{e}^2 + ... + f_n \vec{e}^n \\
&= \sum_n f_n \vec{e}^n
\end{split}
\end{equation}
It follows that, just as $N$-dimensional vectors are isomorphic with columns of $N$ scalars, so too are their associated covectors.
It is sometimes suggested that all vector spaces have a dual space, as if this was some property hiding in the definition of a vector space. But in truth we have conjured the dual space into existence, first by inventing the idea of a mapping $V \mapsto \mathbb{R}$, then by defining operations on those mappings, then by considering the set of all possible mappings, and finally by imposing the rules of vector spaces, which restricts us to a subset of the possible mappings that we named covectors. There is nothing particularly automatic about this. We made it happen by being curious about ways in which vectors might be mapped to scalars.
Another important point to note is that as covectors are vectors, the operation we've been writing as $f(\vec{a})$ is in fact much more symmetrical than that notation implies. We combine a vector from $V$ and a covector from $V^*$ and this produces a scalar from $\mathbb{R}$.
So we could equally say that a vector "operates" on a covector to produce the scalar. From the point of view of $V^*$ it is $V$ that is the dual space. There is a more symmetrical notation\footnote{Unfortunately there is almost no consistency on notation in this topic; we're just picking one of many possible notations for this.} we can use to make this clear:
$$\langle \vec{f},\vec{a}\rangle$$
In this notation, the left and right sides of the operation are from mutually dual spaces, mirror opposites that annihilate one another leaving only a scalar residue.
Although we will mostly think of a covector as a function and a vector as something that can be a parameter to a covector, keep in mind that just as we've thought of a covector as machine that accepts a vector as input, we could just as well think of a vector as a machine that accepts a covector as input.
\subsection{Connecting the Dual Spaces}
We obviously have a lot of freedom when choosing a basis in either $V$ or $V^*$. What can we usefully do to relate the two sides? We've seen how a covector may be built as a weighted sum of basis covectors $\vec{e}^i$ from $V^*$:
$$
\vec{f} = \sum_i f_i \vec{e}^i
$$
And likewise a vector is built as a weighted sum of basis vectors $\vec{e}_i$ from $V$:
$$
\vec{v} = \sum_i v^i \vec{e}_i
$$
If we have the vector $\vec{v}$ and we want to extract its $i$th coordinate, $v^i$, that's a function from a vector in $V$ to a scalar, that is, it's a covector from $V^*$. We could choose the basis covectors so that the $i$th basis covector extracts the $i$th coordinate of the vector passed to it:
$$
v^i = \langle \vec{e}^i , \vec{v} \rangle
$$
Equivalently, if we have a covector $\vec{f}$ and we want to extract its $j$th coordinate, $f_j$, then we need to pass $f$ an input vector chosen from $V$, and we could choose the basis vectors so that the $j$th basis vector makes $\vec{f}$ produce the $j$th coordinate of $\vec{f}$:
$$
f_j = \langle \vec{f} , \vec{e}_j\rangle
$$
It doesn't matter which of those two ways we approach this, because either will constrain the other. We can substitute $\vec{f}$ expressed as a sum:
$$
f_j = \langle \sum_i f_i \vec{e}^i , \vec{e}_j\rangle
$$
and linearity allows us to separately deal with each dimension and sum their results:
$$
f_j = \sum_i f_i \langle \vec{e}^i , \vec{e}_j\rangle
$$
But if that's true for \textit{any} $\vec{f}$, and not just a coincidence applying to some specific example, then the scalar factor $\langle \vec{e}^i , \vec{e}_j\rangle$ must be "selecting" just one of the $f_i$ terms, specifically the one where $i = j$, and eliminating all others, or to put it more succinctly using the Kronecker delta (§\ref{def:Kronecker}):
$$
f_j = \sum_i f_i \delta \indices{^i_j}
$$
So if the $f_j$ are indeed the components of the covector, discovered by making it act on the basis vectors, we've discovered the relationship that must exist between the dual bases:
\begin{equation}
\langle \vec{e}^i,\vec{e}_j\rangle = \delta\indices{^i_j}
\label{eqn:dual-bases-delta}
\end{equation}
So we could choose any basis at all in $V$, and then definition \eqref{eqn:dual-bases-delta} restricts the choice of basis in $V^*$, or vice versa. Such is the symmetry of this situation, we could instead have let a basis covector $\vec{e}^j$ act on a randomly chosen vector $\vec{a}$ and require that this give us the $j$th coordinate of $\vec{a}$, and we'd have reached the same conclusion.
From now on we'll assume that this alignment of the dual bases has been performed. That being the case, we can compute the action of a covector on a vector by arithmetic on their coordinates:
\begin{equation}
\begin{split}
\langle \vec{f},\vec{a}\rangle
&= \langle \sum_i f_i \vec{e}^i , \sum_j a^j \vec{e}_j \rangle \\
&= \sum_{ij} f_i a^j \langle\vec{e}^i,\vec{e}_j\rangle \\
&= \sum_{i} f_i a^i
\end{split}
\end{equation}
So the linearity allows us to sum over all combinations of $i, j$ and pull the coordinates outside of the action of the covector on the vector, and the dual bases yield the value $1$ where $i = j$ and $0$ otherwise, so we just end up with a simple sum over the products of the paired-up coordinates.
In a roundabout way we've discovered the dot product, albeit between a covector and a vector rather than two ordinary vectors. We still haven't introduced any concept of orthonormality, or even orthogonality, between pairs of vectors. Our "coordinate grid" is still not necessarily a lattice of squares, and our dot product is between elements of two different (dual) vector spaces, but we always choose their basis vectors so they are related by a definite requirement, which we can state in two ways:
\begin{enumerate}
\item The $n$th basis covector from $V^*$ can be used to extract the $n$th coordinate of a vector from $V$.
\item The $n$th basis vector from $V$ can be used to extract the $n$th coordinate of a covector from $V^*$.
\end{enumerate}
And as a consequence of this (dual) requirement we find that when the $n$th basis covector acts on the $m$th basis vector, the result is $1$ if $m = n$ and $0$ if $m \ne n$.
If we describe our covectors and vectors as sets of coordinates, to make a covector act on a vector we simply perform the dot product between their coordinates. Or equivalently, we write the covector as a single row matrix on the left, and the vector as a single column matrix on the right, and perform matrix multiplication to get a single scalar.
Does this mean we've created a dual link between \textit{every} vector and a corresponding covector? Absolutely not. We've only linked up the basis vectors with the basis covectors. Still, there is an obvious mapping between vectors and covectors: let the covector have exactly the same coordinates as its vector pair, $f_i = v^i$. And indeed this is the assumption made with schoolhouse vectors, where we freely perform the dot product between vectors described by coordinates, without worrying about whether they are from mutually dual spaces.
But that is only one possible mapping out of an infinity of possibilities, as we could blend the coordinates by any weighted sum we like. So connecting the basis covectors with the basis vectors is a start, but still leaves something to be desired.
\subsection{Visualising the dual basis} \label{sec:dual-bases}
It may be worth pausing here to see how this result relates to our schoolhouse version of arrows and coordinates and the dot product. In that world-view, the coordinates are just the scaling factors that weight the orthonormal basis vectors to construct a vector, and the dot product of a basis vector $\vec{e}_i$ and a given vector $\vec{a}$ produces the $a_i$ coordinate of $\vec{a}$. This is how we understand the geometric dot product (with $\cos \theta$) to be related to the idea of simply plucking one of the numbers from a column vector.
But what happens if we deny all knowledge of orthogonality? If we choose any linearly independent basis vectors (Figure \ref{fig:vectors-non-orth-1}) we can still sum them to generate any vector in the space (Figure \ref{fig:vectors-non-orth-2}).
\begin{figure}[h]
\caption{Building a vector from any basis}
\begin{subfigure}{0.5\textwidth}
\centering
\begin{tikzpicture}
\node at (1.1,-0.25) {\scriptsize $\vec{e_1}$};
\draw[thick,->] (0,0) -- (0.342,0.940);
\node at (0,1) {\scriptsize $\vec{e_2}$};
\draw[thick,->] (0,0) -- (1,0);
\end{tikzpicture}
\caption{Any old basis $\vec{e}_n$} \label{fig:vectors-non-orth-1}
\end{subfigure}
\begin{subfigure}{0.5\textwidth}
\centering
\begin{tikzpicture}
\draw[dashed] (0,0) -- (3,0);
\draw[dashed] (3,0) -- (3.684,1.879);
\draw[dashed] (0,0) -- (0.684,1.879);
\draw[dashed] (0.684,1.879) -- (3.684,1.879);
\node at (1.1,-0.25) {\scriptsize $\vec{e_1}$};
\draw[thick,->] (0,0) -- (0.342,0.940);
\node at (0,1) {\scriptsize $\vec{e_2}$};
\draw[thick,->] (0,0) -- (1,0);
\draw[thick,->] (0,0) -- (3.684,1.879);
\node at (3.85,1.9) {\scriptsize $\vec{a}$};
\end{tikzpicture}
\caption{$\vec{a} = a^1 \vec{e}_1 + a^2 \vec{e}_2$} \label{fig:vectors-non-orth-2}
\end{subfigure}
\end{figure}
The problem comes when we try to recover the coordinates by projecting $\vec{a}$ onto the two basis vectors (Figure \ref{fig:vectors-non-orth-3}).
\begin{figure}[h]
\caption{Projecting a vector onto a carelessly chosen basis}
\begin{subfigure}{0.5\textwidth}
\centering
\begin{tikzpicture}
\draw[dashed] (0,0) -- (3.684,0);
\draw[dashed] (3.683,0) -- (3.684,1.879);
\draw[dashed] (0,0) -- (0.992,2.726);
\draw[dashed] (0.992,2.726) -- (3.684,1.879);
\node at (1.1,-0.25) {\scriptsize $\vec{e_1}$};
\draw[thick,->] (0,0) -- (0.342,0.940);
\node at (0,1) {\scriptsize $\vec{e_2}$};
\draw[thick,->] (0,0) -- (1,0);
\draw[thick,->] (0,0) -- (3.684,1.879);
\node at (3.85,1.9) {\scriptsize $\vec{a}$};
\end{tikzpicture}
\caption{Projecting $\vec{a}$ back onto the basis} \label{fig:vectors-non-orth-3}
\end{subfigure}
\begin{subfigure}{0.5\textwidth}
\centering
\begin{tikzpicture}
\draw[dashed] (0,0) -- (3.684,-1.339);
\draw[dashed] (3.683,-1.339) -- (3.684,1.879);
\draw[dashed] (0,0) -- (0,3.218);
\draw[dashed] (0,3.218) -- (3.684,1.879);
\node at (1.2,-0.20) {\scriptsize $\vec{e^1}$};
\draw[thick,->] (0,0) -- (0,1);
\node at (-0.2,1.2) {\scriptsize $\vec{e^2}$};
\draw[thick,->] (0,0) -- (0.992,-0.342);
\draw[thick,->] (0,0) -- (3.684,1.879);
\node at (3.85,1.9) {\scriptsize $\vec{a}$};
\end{tikzpicture}
\caption{$\vec{a} = a_1 \vec{e^1} + a_2 \vec{e^2}$} \label{fig:vectors-non-orth-4}
\end{subfigure}
\end{figure}
We can visualise this projection process by drawing lines from the tip of $\vec{a}$ so they meet at right angles with the lines extended from the basis vectors. But these imply different coordinates for $\vec{a}$ from the ones that we used to build it using the basis $\vec{e}_n$.
This raises the question: in what basis are these the coordinates for $\vec{a}$? There is such a basis (Figure \ref{fig:vectors-non-orth-4}), $\vec{e^n}$, and we label the coordinates with subscripts, $a_n$, so the reconstructed $\vec{a}$ is given by:
$$
\vec{a} = a_1 \vec{e^1} + a_2 \vec{e^2}
$$
This basis $\vec{e}^j$ is related to the original basis $\vec{e}_i$ by \eqref{eqn:dual-bases-delta}. Looking at it geometrically (that is, cheating), when choosing the dual basis vector for a given index, we must choose a vector that is visibly orthogonal to all the other basis vectors, and this means we will have a severely limited choice, because there can be only one alignment that meets this requirement. Furthermore the magnitude of the vector $\vec{e}^i$ is fully determined by the requirement that $\langle \vec{e}_i, \vec{e}^i \rangle = 1$, as the ratio between the coordinates is already fixed by the choice of alignment.
In this visualisation process we have shown the relationship between $V$ and $V*$ by overlaying them on the same diagram, but they are in fact separate vector spaces: elements of $V$ are not elements of $V*$, and vice versa. But the way we have calibrated these two sets of bases to be mutually consistent is exactly the same as the relationship between the bases of $V$ and $V*$.
If the original basis vectors had been orthogonal, the dot product would have produced exactly the same coordinates we'd used to build the vector in the first place, i.e. figures \ref{fig:vectors-non-orth-2}, \ref{fig:vectors-non-orth-3} and \ref{fig:vectors-non-orth-4} would all be identical: a rectangle with the vector as its diagonal. But of course, we haven't yet said precisely what orthogonality means.
\subsection{The same ideas in coordinates}
We can make this concrete by playing with $\mathbb{R}^2$ as our vector space $V$, in which case the dual space of covectors $V^*$ contains mappings $\mathbb{R}^2 \mapsto \mathbb{R}$.
\begin{figure}[h]
\caption{Basis vectors in $\mathbb{R}^2$}
\begin{subfigure}{0.5\textwidth}
\centering
\begin{tikzpicture}
\draw[step=1cm,gray,very thin] (1,1) grid (4,4);
\draw[thick,->] (2,2) -- (3,2);
\node at (2.5,1.5) {$\vec{e_1}$};
\draw[thick,->] (2,2) -- (2,3);
\node at (1.5,2.5) {$\vec{e_2}$};
\end{tikzpicture}
\caption{Orthonormality} \label{fig:vectors-orthonormality}
\end{subfigure}
\begin{subfigure}{0.5\textwidth}
\centering
\begin{tikzpicture}
\draw[step=1cm,gray,very thin] (1,1) grid (4,4);
\draw[thick,->] (2,2) -- (4,3);
\node at (3.2,2.1) {$\vec{e_1}$};
\draw[thick,->] (2,2) -- (2,3);
\node at (1.5,2.5) {$\vec{e_2}$};
\end{tikzpicture}
\caption{Awkwardness} \label{fig:vectors-awkwardness}
\end{subfigure}
\end{figure}
With our schoolhouse foreknowledge it would be easy to choose an orthonormal basis in $\mathbb{R}^2$ (Figure \ref{fig:vectors-orthonormality}):
$$
\vec{e}_1 = \begin{bmatrix}1 \\ 0\end{bmatrix}\,,\,
\vec{e}_2 = \begin{bmatrix}0 \\ 1\end{bmatrix}
$$
But we still haven't defined what orthonormal means, so we'll just choose something awkward (Figure \ref{fig:vectors-awkwardness}):
$$
\vec{e}_1 = \begin{bmatrix}2 \\ 1\end{bmatrix}\,,\,
\vec{e}_2 = \begin{bmatrix}0 \\ 1\end{bmatrix}
$$
By the way, it is customary to put the basis vectors in a row matrix, $\begin{bmatrix}\vec{e}_1 & \vec{e}_2\end{bmatrix}$, so they can be matrix-multiplied by a column representation of a vector in $V$, but that's not what we're doing here. We are giving the definition of each basis vector as a matrix, and the basis vectors are ordinary vectors belonging to $V$, and customarily they are presented as column matrices.
What is the corresponding $V^*$ basis, $\vec{e}^i$? It has to obey:
$$
\langle \vec{e}_j,\vec{e}^i \rangle = \delta_{ij}
$$
Some straightforward equation building and substitution yields:
$$
\vec{e}^1 = \begin{bmatrix}0.5 & 0\end{bmatrix}\,,\,
\vec{e}^2 = \begin{bmatrix}-0.5 & 1\end{bmatrix}
$$
And these being covectors from $V^*$, we present them as row matrices. Using our $V$ basis we can construct a vector $\vec{v}$ from the coordinates $(2, 3)$:
$$
\vec{v} = 2\vec{e}_1 + 3\vec{e}_2
= \begin{bmatrix}4 \\ 2\end{bmatrix} + \begin{bmatrix}0 \\ 3\end{bmatrix}
= \begin{bmatrix}4 \\ 5\end{bmatrix}
$$
What happens if we evaluate the $V^*$ basis covectors against $\vec{v}$?
$$
\begin{bmatrix}0.5 & 0\end{bmatrix} \begin{bmatrix}4 \\ 5\end{bmatrix} = 2
\,,\,
\begin{bmatrix}-0.5 & 1\end{bmatrix} \begin{bmatrix}4 \\ 5\end{bmatrix} = 3
$$
We get back the correct coordinates. If we'd just transposed the $V$ basis vectors into rows and left-multiplied them, we would have obtained wrong answers: this is precisely the same problem we saw with projecting onto the non-orthogonal basis.
But the utility of these basis covectors is limited to their ability to extract a scalar coordinate from a vector. For example, there is nothing here that generally relates any vector (other than the basis vectors) with a specific covector partner, or anything that relates one vector with another.
\section{Tensors}
Let's upgrade our black box machine so it has two input slots, accepting vectors $\vec{a}$ and $\vec{b}$ from the same vector space $V$, but still one output hole that gives us back a scalar $x$ (Figure \ref{fig:2-slot-box}).
\begin{figure}[h]
\centering
\begin{tikzpicture}
\draw[thick] (0,0) -- (2,0);
\draw[thick] (2,0) -- (2,2);
\draw[thick] (2,2) -- (0,2);
\draw[thick] (0,2) -- (0,0);
\draw[thick] (0,2) -- (-0.7,2.5);
\draw[thick] (-0.7,2.5) -- (-0.7,0.7);
\draw[thick] (-0.7,0.7) -- (0,0);
\draw[thick] (-0.7,2.5) -- (1.1,2.5);
\draw[thick] (1.1,2.5) -- (2,2);
\draw[thick] (1.4,2.1) -- (0.95,2.4);
\draw[thick] (0.95,2.4) -- (0.7,2.4);
\draw[thick] (0.7,2.4) -- (1.1,2.1);
\draw[thick] (1.1,2.1) -- (1.4,2.1);
\draw[thick] (0.6,2.1) -- (0.15,2.4);
\draw[thick] (0.15,2.4) -- (-0.1,2.4);
\draw[thick] (-0.1,2.4) -- (0.3,2.1);
\draw[thick] (0.3,2.1) -- (0.6,2.1);
\node at (0.2,3.1) { $\vec{a}$};
\draw[->] (0.2,2.85) -- (0.2,2.55);
\node at (1.1,3.1) { $\vec{b}$};
\draw[->] (1.1,2.85) -- (1.1,2.55);
\node at (-1.25,1) { $x$};
\draw[thick] (-0.4,1) ellipse (0.1 and 0.2);
\draw[->] (-0.6,1) -- (-1.1,1);
\node at (1,1) { $\vec{h}$};
\end{tikzpicture}
\caption{Box $\vec{h}$ with two slots accepting vectors} \label{fig:2-slot-box}
\end{figure}
This is a mapping from pairs of vectors to scalars: $V \times V \mapsto \mathbb{R}$.
Suppose \textit{only to begin with} that the machine had an especially simple inner mechanism: inside the box $\vec{h}$, there are two single-slot boxes (covectors) $\vec{f}$ and $\vec{g}$. The machinery inserts input $\vec{a}$ into box $\vec{f}$, and input $\vec{b}$ into box $\vec{g}$, to obtain two scalars, which it simply multiplies together to produce its own resultant scalar that falls out of the hole of $\vec{h}$. In other words, it's really just two covectors glued together by scalar multiplication, and they act independently on the inputs:
$$
h(\vec{a}, \vec{b}) = \langle \vec{f},\vec{a} \rangle \langle \vec{g},\vec{b} \rangle
$$
In fact this design only accounts for a subset of the machines, but it will serve as an intuitive building block for us to construct all the machines we're interested in. We can write it as $\vec{f} \otimes \vec{g}$, which is called the \textit{tensor product} of the two covectors.
So sticking with this simplified design to begin with, consider the space of all such machines containing two covectors. We can define addition and scaling on these pairs in a way that satisfies the requirements of a vector space. Addition is easy. Much as we defined addition for covectors by simply adding their results when given the same input vector, we'll add these two-slot machines by adding their results when they act on the same pair of vectors:
\begin{equation}
\begin{split}
(\vec{f} \otimes \vec{g} + \vec{p} \otimes \vec{q})(\vec{a}, \vec{b})
&=
\vec{f} \otimes \vec{g} (\vec{a}, \vec{b}) + \vec{p} \otimes \vec{q}(\vec{a}, \vec{b}) \\
&= \langle \vec{f}, \vec{a} \rangle
\langle \vec{g}, \vec{b} \rangle
+ \langle \vec{p}, \vec{a} \rangle
\langle \vec{q}, \vec{b} \rangle
\end{split}
\end{equation}
Scaling by some $x$ is even easier:
$$
\left[x(\vec{f} \otimes \vec{g})\right](\vec{a}, \vec{b})
= x(\vec{f} \otimes \vec{g})(\vec{a}, \vec{b})
= x \langle \vec{f}, \vec{a} \rangle
\langle \vec{g}, \vec{b} \rangle
$$
Thus we have defined a vector space, and so these two-slot machines are also vectors. We can form a basis for that space by taking all possible pairs of basis covectors, $\vec{e}^i \times \vec{e}^j$. If the (co)vector space is $N$-dimensional, the pair-space will be $N^2$-dimensional, because it requires $N^2$ basis machines to span the space. Any two-slot machine can therefore be written as a linear combination (a weighted sum) of all the basis machines:
$$
\sum_{ij} M_{ij} (\vec{e}^i \otimes \vec{e}^j)
$$
And therefore to describe any two-slot machine in terms of the basis we will need $N^2$ numbers, which we can write as $M_{ij}$. By the way, there's no pressing need to think of it as a matrix, although we sometimes do. The $M$ stands for "machine" in this case. It's just a list of $N^2$ numbers, labelled with two indices that each can take on $N$ integer values, supplying the weighting for each basis machine.
Inserting two vectors $\vec{a}$ and $\vec{b}$ into the slots just means:
$$
\sum_{ij} M_{ij} \langle \vec{e}^i,\vec{a} \rangle \langle \vec{e}^j,\vec{b} \rangle
$$
Recall that an expression like $\langle \vec{e}^i,\vec{a} \rangle$ is the scalar resulting from $\vec{e}^i$ acting on $\vec{a}$. But as the input vector $\vec{a}$ can be described using the same (dual) basis, $a^k \vec{e}_k$:
$$
\sum_{ijk} M_{ij} \langle \vec{e}^i, a^k \vec{e}_k \rangle \langle \vec{e}^j,\vec{b} \rangle
$$
and the same for $\vec{b}$:
$$
\sum_{ijkl} M_{ij} \langle \vec{e}^i, a^k \vec{e}_k \rangle \langle \vec{e}^j,b^l \vec{e}_l \rangle
$$
Both these substitutions required us to introduce a new summation index, because we are essentially "multiplying out" between the existing expression's summation terms and those of the vector we are substituting. So if the vector space is $2$-dimensional, the above is summing $2 \times 2 \times 2 \times 2 = 16$ terms. By linearity:
$$
\sum_{ijkl} M_{ij} a^k \langle \vec{e}^i, \vec{e}_k \rangle b^l \langle \vec{e}^j, \vec{e}_l \rangle
$$
We know the basis covectors acting on the basis vectors have very simple results by definition: $\langle \vec{e}^i, \vec{e}_k \rangle$ evaluates to $1$ if $i = k$, but is $0$ otherwise. Likewise $\langle \vec{e}^j, \vec{e}_l \rangle$ is $1$ if $j = l$ but $0$ otherwise. Therefore the $12$ summation terms where either $i \ne k$ or $j \ne l$ must vanish, leaving only $4$ terms where they are equal and the covector-vector interactions are simply replaced with $1$. Therefore we can replace $k$ with $i$ and $l$ with $j$ throughout:
\begin{equation}
\sum_{ij} M_{ij} a^i b^j
\label{eqn:two-slot-computation}
\end{equation}
So to compute the scalar result we only need the coordinates of the two input vectors and a list of numeric parameters that fully defines how the machine operates, creating summation terms that contribute various weightings of every possible combination of coordinates from the two vectors.
This kind of machine is called a \textit{tensor}.
It will often be the case that $M_{ij} = M_{ji}$, which is quite a lot of redundancy. But this is necessary to ensure that the machine is symmetrical: switching the inputs around does not affect the result. Of course, a machine doesn't have to be defined that way.
\subsection{Simple tensors} \label{simple-tensor}
We mentioned at the start that the simplified design (the ordinary product of two scalars obtained by two covectors operating separately on one vector input each) is not powerful enough to describe all these machines, even though we used it to define our basis machines. The simplistic machine is defined by a matrix that can be written as:
$$
M_{ij} = f_i g_j
$$
In other words, it can be decomposed into two separate columns of numbers. Such a machine is known as a \textit{decomposable}, \textit{elementary} or just \textit{simple} tensor.
One hint as to why it is so limited is that as $f_i$ and $g_j$ provide $N$ values each for an $N$-dimensional space, that is only $2N$ adjustable parameters, even though $M$ appears to have $N^2$ independent values. So we aren't allowing the full flexibility of which $M$ is capable.
Of course, if $N=2$ then $N^2 = 2N = 4$, but even then, there is a restrictive pattern that applies regardless of the dimensions. Viewing $M_{ij}$ as a matrix, every row (labelled by $i$) would be a scaled version of the numbers in $g_j$, and every column (labelled by $j$) would likewise be a scaled version of the numbers in $f_i$. That is, the rows are all linearly dependent on one another, and so are the columns. This would not be the case if the elements of $M$ were truly independent.\footnote{Can any symmetric matrix be expressed as the product of a row and a column? No. Try to find a row and a column that can be multiplied to produce the identity matrix. The columns are linearly independent, as are the rows.}
There's a subtlety here though: vector addition and scaling operators are meant to be closed. We've proposed a way of defining simple machines, which is a restricted set of objects, and then we've said that scaling and adding simple machines allows us to discover objects that are not in that simple set, which sounds like we're breaking the rules, reaching outside the initial set.
The resolution to this conundrum is that we are dealing with a general set of machines that can be described by the somewhat misleading notation $V^* \otimes V^*$, which we define as not only the simple machines made of any two covectors $f \otimes g$, but also those machines that are \textit{weighted sums} of one or more simple machines. This broader set includes machines that cannot be decomposed into two covectors. We can however choose a basis from the subset that \textit{can} be decomposed, and we do that because it is that subset for which we are able to directly explain how they operate. And from that basis we can build any machine of the form $V^* \otimes V^*$. Covectors (one-slot machines) are the most basic building block, from which we can make simple (two-slot) machines, from which in turn we can make any machines by linear combination of simple machines.
\subsection{Any number of slots}
Another point to note about these two-slot machines is that although here we focused on $V^* \otimes V^*$, we could instead of chosen $V \otimes V$, in which case the simple machine would have consisted of two vectors, and would have acted on two input covectors (recall how the notation $\langle \vec{f}, \vec{a} \rangle = \langle \vec{a}, \vec{f} \rangle$ emphasises symmetry, so we can think of a covector acting on a vector or a vector acting on a covector with no real difference in the result). We can define machines of the form:
$$
V \otimes V^* \otimes V \otimes \ldots
$$
having any number of slots accepting any mixture of vectors and covectors in some specific order. A machine with five slots will be represented by a list of $N^5$ numbers labelled with five indices. The slots that accept vectors (being defined by covectors) will have down indices, and the slots that accept covectors (being defined by vectors) will have up indices. For example, we can say our machine is from the space:
$$
V \otimes V^* \otimes V \otimes V \otimes V^*
$$
or we can say it is represented by the numerical parameters:
$$
M\indices{^i_j^k^l_m}
$$
These convey the same information. Sometimes the indexed parameter notation is used as a compact way to describe the structure of the machine, the only downside being that we have to arbitrarily choose symbols for the indices.\footnote{Some authors call this \textit{slot-naming index notation}, others \textit{abstract index notation}.}
\subsection{What is a tensor?}
These machines, which are used to compute a scalar value from some (co)vector inputs, are all tensors, though as we've seen, the space of machines of a given type is also a vector space, so tensors are vectors.
We have proposed creating a five-slot machine $M\indices{^i_j^k^l_m}$, and so it seems entirely proper to treat a 1-slot machine $M_i$ or $M^i$ as part of the same family of objects. We've been calling them covectors and vectors, which is accurate (we had to invent them first in order to build toward tensors), but they are also themselves tensors in this general sense.
Perhaps more surprising, but no less consistent, is the idea that a machine with no slots also belongs to the same family. It's just a scalar value, which can be thought of as a machine that produces a scalar value from \textit{no} inputs.
Sometimes the type of a tensor is written $(u, d)$ where $u$ tells you how many up indices and $d$ tells you how many down indices it has. So a scalar is a $(0, 0)$-tensor, a vector is a $(1, 0)$-tensor, a covector is a $(0, 1)$-tensor, and we will soon encounter practical uses for $(1, 1)$, $(0, 2)$ and $(2, 0)$-tensors, all of which are also elements of vector spaces.
In summary, everything is seemingly an example of everything else, and yet all are different things.
\subsection{Contraction} \label{tensor-contraction}
When we insert a vector or covector into a suitable slot of a machine, we are effectively merging two tensors, by "wiring up" them up so that they share an index variable, which makes that variable disappear due to summation over it.
Starting with a five-slot machine $M\indices{^i_j^k^l_m}$ (which incidentally is a $(3,2)$-tensor), we will insert a covector $a_k$ into the middle slot, resulting in a machine $N$ with four slots (a $(2,2)$-tensor):
$$
N\indices{^i_j^l_m} = \sum_{k} M\indices{^i_j^k^l_m} a_k
$$
The index $k$ effectively disappears. This process is more formally regarded as a two stage process. First, we form the tensor product, which is an object with a separately named index for every index of the two source tensors:
$$
P\indices{^i_j^k^l_m_n} = M\indices{^i_j^k^l_m} a_n
$$
This step doesn't involve any summation. If the vector space is 4-dimensional, $P$ is a list of $4^6 = 4096$ numbers, each being the product of a distinct pair from the $4^5 = 1024$ numbers in $M$ and the $4$ numbers in $a$.
Then we link two of the slots by giving them the same index name and summing over that index (in this case by renaming $n$ to $k$):
$$
N\indices{^i_j^l_m} = \sum_{k} P\indices{^i_j^k^l_m_k}
$$
It's that second step, renaming $n$ to $k$ and summing over $k$, that is the actual contraction. We could then perform two contractions at once on $N$:
$$
x = \sum_{il} N\indices{^i_i^l_l}
$$
Each contraction ties two indices together and eliminates them, so this last double-contraction has eliminated four indices at once, leaving us with a scalar.
\section{Einstein notation}
We have been following a rule where basis vectors are given subscript indices, while vector components are given superscript indices. Then we do the opposite with basis covectors and components. This means that whenever one basis object acts on another:
$$
\langle e^i, e_j \rangle
$$
they always have opposing index positions. This is mirrored exactly by the way components are allowed to be multiplied. We've found that the dot product is only valid between the coordinates of a vector and a covector, so a product like this inside a summation, where we have repeated the same index variable:
$$
a^i b_i
$$
is valid, but neither of these is allowed because they imply a dot product between two vectors or two covectors:
$$
a^i b^i \, , \, a_i b_i
$$
Here's a real example that we'll encounter later:
$$
\sum_{\mu\nu\beta\lambda} g_{\mu\nu} Z\indices{^\mu_\beta} \underline{a}^{\beta} Z\indices{^\nu_\lambda} \underline{b}^{\lambda}
$$
Every single one of the four indices appears in two places, once up and once down, so it's a quadruple contraction, eliminating all the indices and so the result is a scalar. A simpler example shows how index variables are not necessarily introduced by summation:
$$
b^\mu = \sum_{\nu} O\indices{^\mu_\nu} a^\nu
$$
The $\nu$ index is repeated up/down in the way that is characteristic of all summation variables, but $\mu$ is introduced on the left to indicate that the expression computes the value of one component of several that represent a vector (we know it's a vector rather than a covector because the index is up).
From these patterns Einstein deduced something tremendously helpful: it is completely unnecessary to write the summation symbol and state what the summation index variables are! If an expression consisting of indexed quantities multiplied together contains exactly two references to the same index, once up and once down, then that index is a summation index.
$$
g_{\mu\nu} Z\indices{^\mu_\beta} \underline{a}^{\beta} Z\indices{^\nu_\lambda} \underline{b}^{\lambda}
=
\sum_{\mu\nu\beta\lambda} g_{\mu\nu} Z\indices{^\mu_\beta} \underline{a}^{\beta} Z\indices{^\nu_\lambda} \underline{b}^{\lambda}
$$
$$
b^\mu = O\indices{^\mu_\nu} a^\nu = \sum_{\nu} O\indices{^\mu_\nu} a^\nu
$$
This shorthand applies just as well to basis vectors:
$$
\vec{a} = a^i \vec{e}_i = \sum_{i} a^i \vec{e}_i
$$
The recent example of a tensor product cannot be mistaken for an implied summation because there are no repeated indices:
$$
P\indices{^i_j^k^l_m_n} = M\indices{^i_j^k^l_m} a_n
$$
Whereas the contraction example unmistakably sums over $k$ alone:
$$
N\indices{^i_j^l_m} = P\indices{^i_j^k^l_m_k}
$$
\section{The Inner Product} \label{inner-product}
The most important necessity for a specific machine of the form $V^* \otimes V^*$ is to at last come up with a way to define orthogonality, and the norm (length) of a vector, and thus orthonormality, but also a specific two-way pairing between every vector and a covector partner.
Nominating one such machine for a given vector space, we can call it the \textit{inner product}, and we say that the combination of the vector space and its inner product is an \textit{inner product space}.
As we've incessantly complained, we have so far had no way to judge whether two vectors are orthogonal to one another, even though we know that for arrows or column vectors there are some intuitive ways to choose orthogonal vectors. In our abstract development of the subject there was no such thing as orthogonal or orthonormal.
The choice of an inner product determines which vectors are mutually orthogonal, and also which are normalised (of unit norm). Importantly, it will also pair every vector with a single dual covector (and vice versa).
The notation $(\vec{a},\vec{b})$ is sometimes used for the inner product, similar to but deliberately distinct from the $\langle \vec{f},\vec{a}\rangle$ notation for the action of a covector on a vector.\footnote{Sadly the meanings of these notations are sometimes switched, and they aren't the only notations used.}
The \textit{squared-norm} of a vector $\vec{a}$ is $(\vec{a},\vec{a})$, so the norm is $\sqrt{(\vec{a},\vec{a})}$. In most situations (Newtonian and Quantum) there is a rule that the norm must be \textit{positive definite}, meaning that it is never negative and is only zero for the zero vector. The exception is just about anything involving Einstein, who favours inner products that break this rule in every way.
As it's a two-slot machine, its coordinate representation can be thought of as a matrix or a list of numbers addressed by two indices, and it is called the \textit{metric}. In General Relativity the metric is usually written as $g$, and its indices are often $\mu$ and $\nu$, so applying it to vectors $\vec{a}$ and $\vec{b}$ will look like this:
$$
(\vec{a},\vec{b}) = \sum_{\mu\nu} g_{\mu\nu} a^\mu b^\nu
$$
Or expressed in matrix multiplication, we put one of the vectors on the left of the matrix, transposed into a row, and one on the right as a column.
$$
(\vec{a}, \vec{b}) =
\begin{bmatrix}
a^1 & a^2 & a^3
\end{bmatrix}
\begin{bmatrix}
g_{11} & g_{12} & g_{13} \\
g_{21} & g_{22} & g_{23} \\
g_{31} & g_{32} & g_{33}
\end{bmatrix}
\begin{bmatrix}
b^1 \\ b^2 \\ b^3
\end{bmatrix}
$$
The central square matrix is always symmetric, $g_{\mu\nu} = g_{\nu\mu}$, so the vector inputs can be switched without affecting the result. The square matrix can multiply with the right column first, or with the left row: the order of operations doesn't matter.
As we will eventually see, this transposition business will wind up being somewhat messier than simply writing down the summations, in which the symmetry is a lot more obvious. This is one reason why it may not be worth thinking of $g$ (or any other two-slot machine) as a matrix.\footnote{Another reason: what if the machine has three or more slots?}
Compare it to the familiar dot product, which would be:
$$
\vec{a} \cdot \vec{b} = \sum_{\mu} a^\mu b^\mu
$$
It's the same row/square/column matrix multiplication except the central square matrix is missing, or equivalently it's the identity matrix:
$$
(\vec{a}, \vec{b}) =
\begin{bmatrix}
a^1 & a^2 & a^3
\end{bmatrix}
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}
\begin{bmatrix}
b^1 \\ b^2 \\ b^3
\end{bmatrix}
$$
So all the times when you obediently used the dot product to operate on two vectors, you were implicitly setting $g$ to be the Kronecker delta (§\ref{def:Kronecker}):
$$
g_{\mu\nu} = \delta_{\mu\nu}
$$
\textit{By definition} if the inner product is represented by $\delta_{\mu\nu}$ then the basis vectors are orthonormal. In other words, the inner product being represented by the identity matrix doesn't really tell us anything about the inner product. It tells us that we've chosen a set of basis vectors so that they are orthonormal according to this space's nominated inner product. It is always possible to do this, regardless of what the inner product happens to be.\footnote{Strictly speaking it is always possible for a finite-dimensional inner product space.}
This means that if we can use a single inner product consistently, we may as well define the basis to be orthonormal according that inner product, which means we will always be able to use the dot product between vectors, and the distinction between vectors and covectors becomes unimportant. We can even define all other multi-slot machines in terms of vectors acting on other vectors (via the inner product). That is, every machine would be of the form $V \otimes V \otimes V \otimes \ldots$, and would accept vectors as inputs. You may wonder if all that care we took to distinguish superscript and subscript indices was a waste of time. Indeed in many contexts, including all Newtonian classical mechanics and quantum mechanics, it is a waste of time.
The exception is General Relativity, where the inner product varies from place to place, becoming a \textit{tensor field}. To describe it, we have to fix the basis somehow, and allow the metric to vary. As a result of this, if you want to understand GR, you need to understand that to get the squared-norm of a vector, you need to get the equivalent covector so it can act on the original vector. But in many other topics, you can just think of the vector acting on itself.
This also means that the previous examples of what we called awkward basis vectors would in fact be orthonormal if we chose a particular inner product.
\subsection{No such thing as awkward}
Starting with the standard basis in column vectors, we could specify this as the inner product:
$$
g_{\mu\nu} =
\begin{bmatrix}
\frac{1}{2} & -\frac{1}{2} \\
-\frac{1}{2} & 1
\end{bmatrix}
$$
We had to pick \textit{some} basis as a starting point or we would not have had a way to write down the inner product in numerical form. We used the standard basis, which is made of very simple one-hot vectors, but we now know that we must not call those vectors orthonormal.
Now we'll move the goalposts and choose a different set of vectors to be our basis, and they will be our usual awkward choice:
$$
\vec{e}_1 = \begin{bmatrix}2 \\ 1\end{bmatrix}\,,\,
\vec{e}_2 = \begin{bmatrix}0 \\ 1\end{bmatrix}
$$
Let's test our inner product on every possible pairing of these basis vectors:
$$
(\vec{e}_1, \vec{e}_1) =
\begin{bmatrix}
2 & 1
\end{bmatrix}
\begin{bmatrix}
\frac{1}{2} & -\frac{1}{2} \\
-\frac{1}{2} & 1
\end{bmatrix}
\begin{bmatrix}
2 \\ 1
\end{bmatrix}
= 1
$$
$$
(\vec{e}_1, \vec{e}_2) =
\begin{bmatrix}
2 & 1
\end{bmatrix}
\begin{bmatrix}
\frac{1}{2} & -\frac{1}{2} \\
-\frac{1}{2} & 1
\end{bmatrix}
\begin{bmatrix}
0 \\ 1
\end{bmatrix}
= 0
$$
$$
(\vec{e}_2, \vec{e}_1) =
\begin{bmatrix}
0 & 1
\end{bmatrix}
\begin{bmatrix}
\frac{1}{2} & -\frac{1}{2} \\
-\frac{1}{2} & 1
\end{bmatrix}
\begin{bmatrix}
2 \\ 1
\end{bmatrix}
= 0
$$
$$
(\vec{e}_2, \vec{e}_2) =
\begin{bmatrix}
0 & 1
\end{bmatrix}
\begin{bmatrix}
\frac{1}{2} & -\frac{1}{2} \\
-\frac{1}{2} & 1
\end{bmatrix}
\begin{bmatrix}
0 \\ 1
\end{bmatrix}
= 1
$$
The basis is orthonormal if $(\vec{e}_i, \vec{e}_j) = \delta_{ij}$. Undeniably, these basis vectors are orthonormal. The inner product says so, and who are we to doubt it? Furthermore, the inner product's matrix \textit{when expressed in this basis} is $\delta_{\mu\nu}$.
Every distinct basis provides a different way to describe any vector with coordinates, and likewise any covector. Regardless of the basis they are described in, the result of applying a specific covector to a specific vector will be the same scalar value. That is, the choice of basis is not a fact about the vector space, but merely a way to describe the vectors in it.
Choosing a different inner product (or equivalently a metric) is not like that. In fact a given physical situation may impose a metric and we have to accept it; it's a fact of nature. So there are two possible reasons why the matrix $g_{\mu\nu}$ might change:
\begin{itemize}
\item a change of basis, which will change the coordinates $a^{\mu}$ and $b^{\nu}$ of the two input vectors, as well as the matrix elements $g_{\mu\nu}$, so as to ensure the scalar result of the inner product $(\vec{a}, \vec{b})$ is unchanged, or
\item a change of inner product itself, which will not affect the coordinates of anything else, and thus the scalar result of $(\vec{a}, \vec{b})$ may be very different.
\end{itemize}
\subsection{The metric and its inverse}
As always we must be careful not to confuse the coordinate or matrix representation of an object with the object itself. The matrix $g_{\mu\nu}$ is merely a representation in some basis of a machine with two slots awaiting vectors.
We know that internal to that machine, it may be simple (consisting of two covectors, which are single-slot machines ready to operate on the two vectors inserted into the slots, and whose scalar results will be multiplied to get the result) or, more generally, it may be a linear combination of such simple two-slot machines.
Therefore if a single vector is inserted into the first slot, it will be fed into the first slot of one or more simple two-slot machines, which will all feed that same vector into their first covector. This will produce a set of scaling factors that apply to the second covector of each simple two-slot machine. This is in effect a set of single-slot machines being linearly combined into a single-slot machine (a weighted sum of covectors).