# previous index next

# Linear Algebra for Quantum Mechanics

*Michael Fowler, UVa*

### Introduction

We’ve seen that in quantum mechanics, the state of an
electron in some potential is given by a wave function $\psi \left(\overrightarrow{x},t\right)$,
and physical variables are represented by operators on this wave function, such
as the momentum in the $x$ -direction ${p}_{x}=-i\hslash \partial /\partial x.$ The Schrödinger wave equation is a *linear* equation, which means that if ${\psi}_{1}$ and ${\psi}_{2}$ are solutions, then so is ${c}_{1}{\psi}_{1}+{c}_{2}{\psi}_{2}$,
where ${c}_{1},{c}_{2}$ are arbitrary complex numbers.

This linearity of the sets of possible solutions is true
generally in quantum mechanics, as is the representation of physical variables
by operators on the wave functions. The
mathematical structure this describes, the linear set of possible states and
sets of operators on those states, is in fact a *linear algebra* of operators acting on a *vector space*. From now on,
this is the language we’ll be using most of the time. To clarify, we’ll give some definitions.

### What is a Vector Space?

The prototypical vector space is of course the set of real vectors in ordinary three-dimensional space, these vectors can be represented by trios of real numbers $\left({v}_{1},{v}_{2},{v}_{3}\right)$ measuring the components in the $x,y$ and $z$ directions respectively.

*The basic properties
of these vectors are*:

$\u2022$ any vector multiplied by a number is another vector in the space, $a\left({v}_{1},{v}_{2},{v}_{3}\right)=\left(a{v}_{1},a{v}_{2},a{v}_{3}\right)$;

$\u2022$ the sum of two vectors is another vector in the space, that given by just adding the corresponding components together: $\left({v}_{1}+{w}_{1},{v}_{2}+{w}_{2},{v}_{3}+{w}_{3}\right).$

These two properties together are referred to as “*closure*”: adding vectors and multiplying
them by numbers cannot get you out of the space.

$\u2022$ A further property is that there is a unique null vector $\left(0,0,0\right)$ and each vector has an additive inverse $\left(-{v}_{1},-{v}_{2},-{v}_{3}\right)$ which added to the original vector gives the null vector.

Mathematicians have generalized the definition of a vector space: a general vector space has the properties we’ve listed above for three-dimensional real vectors, but the operations of addition and multiplication by a number are generalized to more abstract operations between more general entities. The operators are, however, restricted to being commutative and associative.

Notice that the list of necessary properties for a general
vector space does not include that the vectors have a magnitude—that would
be an additional requirement, giving what is called a *normed* vector space. More
about that later.

To go from the familiar three-dimensional vector space to
the vector spaces relevant to quantum mechanics, first the real numbers
(components of the vector and possible multiplying factors) are to be generalized
to complex numbers, and second the three-component vector goes an $n$ component vector. The consequent $n$ -dimensional complex space is sufficient to
describe the quantum mechanics of angular momentum, an important subject. But to describe the wave function of a
particle in a box requires an *infinite*
dimensional space, one dimension for each Fourier component, and to describe
the wave function for a particle on an infinite line requires the set of all
normalizable continuous differentiable functions on that line. Fortunately, all these generalizations are to
finite or infinite sets of complex numbers, so the mathematicians’ vector space
requirements of commutativity and associativity are always trivially
satisfied.

We use Dirac’s notation for vectors, $|1\rangle ,\text{\hspace{0.17em}}|2\rangle $ and call them “kets”, so, in his language, if $|1\rangle ,\text{\hspace{0.17em}}|2\rangle $ belong to the space, so does ${c}_{1}|1\rangle +{c}_{2}\text{\hspace{0.17em}}|2\rangle $ for arbitrary complex constants ${c}_{1},{c}_{2}.$ Since our vectors are made up of complex numbers, multiplying any vector by zero gives the null vector, and the additive inverse is given by reversing the signs of all the numbers in the vector.

Clearly, the set of solutions of Schrödinger’s equation for an electron in a potential satisfies the requirements for a vector space: $\psi \left(\overrightarrow{x},t\right)$ is just a complex number at each point in space, so only complex numbers are involved in forming ${c}_{1}{\psi}_{1}+{c}_{2}{\psi}_{2}$, and commutativity, associativity, etc., follow at once.

### Vector Space Dimensionality

The vectors $|1\rangle ,\text{\hspace{0.17em}}|2\rangle ,\text{\hspace{0.17em}}|3\rangle $ are *linearly independent* if

${c}_{1}|1\rangle +{c}_{2}\text{\hspace{0.17em}}|2\rangle +{c}_{3}\text{\hspace{0.17em}}|3\rangle =0$

implies

${c}_{1}={c}_{2}={c}_{3}=0.$

*A vector space is $n$-dimensional if the maximum number of
linearly independent vectors in the space is $n.$ *

Such a space is often called ${V}^{n}\left(C\right),$ or ${V}^{n}\left(R\right)$ if only real numbers are used.

Now, vector spaces with finite dimension $n$ are clearly insufficient for describing functions of a continuous variable $x.$ But they are well worth reviewing here: as we’ve mentioned, they are fine for describing quantized angular momentum, and they serve as a natural introduction to the infinite-dimensional spaces needed to describe spatial wavefunctions.

A set of $n$ linearly independent vectors in $n$-dimensional space is a *basis*: any
vector can be written *in a unique way* as a sum over a basis:

_{$|V\rangle ={\displaystyle \sum {v}_{i}|i\rangle}.$}

You can check the uniqueness by taking the difference between two supposedly distinct sums: it will be a linear relation between independent vectors, a contradiction.

Since all vectors in the space can be written as linear sums over the elements of the basis, the sum of multiples of any two vectors has the form:

$a|V\rangle +b|W\rangle ={\displaystyle \sum (a{v}_{i}+b{w}_{i})|i\rangle}.$

### Inner Product Spaces

The vector spaces of relevance in quantum mechanics also have an operation associating a number with a pair of vectors, a generalization of the dot product of two ordinary three-dimensional vectors,

$\overrightarrow{a}.\overrightarrow{b}={\displaystyle \sum {a}_{i}{b}_{i}}.$

Following Dirac, we write the inner product of two ket vectors $|V\rangle ,\text{\hspace{0.17em}}\text{\hspace{0.17em}}|W\rangle $ as $\langle W|V\rangle $. Dirac refers to this $\langle \text{}|\rangle $ form as a “bracket” made up of a “bra” and a “ket”. This means that each ket vector $|V\rangle $ has an associated bra $\langle V|.$ For the case of a real $n$-dimensional vector, $|V\rangle ,\text{\hspace{0.17em}}\langle V|$ have identical components—but we require for the more general case that

$\langle W|V\rangle ={\langle V|W\rangle}^{\ast}$

where ^{*} denotes complex conjugate. This implies
that for a ket $\left({v}_{1},\dots ,{v}_{n}\right)$ the bra will be $\left({v}_{1}^{\ast},\dots ,{v}_{n}^{\ast}\right)$. (Actually, bras are usually written as rows,
kets as columns, so that the inner product follows the standard rules for
matrix multiplication.) Evidently for
the *$n$ *-dimensional complex vector $\langle V|V\rangle $ is real and positive except for the null
vector:

$\langle V|V\rangle ={\displaystyle \sum _{1}^{n}{\left|{v}_{i}\right|}^{2}}.$

For the more general inner product spaces considered later we require $\langle V|V\rangle $ to be positive, except for the null vector. (These requirements do restrict the classes of vector spaces we are considering—no Lorentz metric, for example—but they are all satisfied by the spaces relevant to nonrelativistic quantum mechanics.)

The *norm* of $|V\rangle $ is then defined by

$\left|V\right|=\sqrt{\langle V|V\rangle}.$

If $|V\rangle $ is a member of ${V}^{n}\left(C\right),$ so is $a|V\rangle $, for any complex number $a.$

We require the inner product operation to commute with multiplication by a number, so

$\langle W|\left(a|V\rangle \right)=a\langle W|V\rangle .$

The complex conjugate of the right hand side is ${a}^{\ast}\langle V|W\rangle .$ For consistency, the bra corresponding to the ket $a|V\rangle $ must therefore be $\langle V|{a}^{\ast}$ -- in any case obvious from the definition of the bra in $n$ complex dimensions given above.

It follows that if

$|V\rangle ={\displaystyle \sum {v}_{i}|i\rangle},\text{}|W\rangle ={\displaystyle \sum {w}_{i}|i\rangle},\text{then}\langle V|W\rangle ={\displaystyle \sum {v}_{i}^{\ast}{w}_{j}\langle i|j\rangle .}$

### Constructing an Orthonormal Basis: the Gram-Schmidt Process

To have something better resembling the standard dot product
of ordinary three vectors, we need $\langle i|j\rangle ={\delta}_{ij},$ that is, we need to construct an *orthonormal
basis* in the space. There is a
straightforward procedure for doing this called the Gram-Schmidt process. We
begin with a linearly independent set of basis vectors, $|1\rangle ,\text{\hspace{0.17em}}|2\rangle ,\text{\hspace{0.17em}}|3\rangle ,\dots $ .

We first normalize $|1\rangle $ by dividing it by its norm. Call the normalized vector $|I\rangle .$ Now $|2\rangle $ cannot be parallel to $|I\rangle ,$ because the original basis was of linearly independent vectors, but $|2\rangle $ in general has a nonzero component parallel to $|I\rangle ,$ equal to $|I\rangle \langle I|2\rangle ,$ since $|I\rangle $ is normalized. Therefore, the vector $|2\rangle -|I\rangle \langle I|2\rangle $ is perpendicular to $|I\rangle ,$ as is easily verified. It is also easy to compute the norm of this vector, and divide by it to get $|II\rangle ,$ the second member of the orthonormal basis. Next, we take $|3\rangle $ and subtract off its components in the directions $|I\rangle $ and $|II\rangle ,$ normalize the remainder, and so on.

In an $n$-dimensional space, having constructed an orthonormal basis with members $|i\rangle ,$ any vector $|V\rangle $ can be written as a column vector,

$|V\rangle ={\displaystyle \sum {v}_{i}|i\rangle =\left(\begin{array}{c}{v}_{1}\\ {v}_{2}\\ .\\ .\\ {v}_{n}\end{array}\right)}\text{,where}|\text{1}\rangle =\left(\begin{array}{c}1\\ 0\\ .\\ .\\ 0\end{array}\right)\text{andsoon}\text{.}$

The corresponding bra is $\langle V|={\displaystyle \sum {v}_{i}^{\ast}\langle i|}$,
which we write as a row vector with the elements complex conjugated, $\langle V|=\left({v}_{1}^{\ast},\text{\hspace{0.17em}}\text{\hspace{0.17em}}{v}_{2}^{\ast},\dots {v}_{n}^{\ast}\right)$.
This operation, going from columns to
rows and taking the complex conjugate, is called taking the *adjoint*, and
can also be applied to matrices, as we shall see shortly.

The reason for representing the bra as a row is that the inner product of two vectors is then given by standard matrix multiplication:

$\langle V|W\rangle =({v}_{1}^{\ast},{v}_{2}^{\ast},\mathrm{...},{v}_{n}^{\ast})\left(\begin{array}{c}{w}_{1}\\ .\\ .\\ {w}_{n}\end{array}\right).$

(Of course, this only works with an orthonormal base.)

### The Schwartz Inequality

The Schwartz inequality is the generalization to any inner product space of the result $|\overrightarrow{a}.\overrightarrow{b}{|}^{2}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\le \text{\hspace{0.17em}}\text{\hspace{0.05em}}|\overrightarrow{a}{|}^{2}|\overrightarrow{b}{|}^{2}$ (or ${\mathrm{cos}}^{2}\theta \le 1$ ) for ordinary three-dimensional vectors. The equality sign in that result only holds when the vectors are parallel. To generalize to higher dimensions, one might just note that two vectors are in a two-dimensional subspace, but an illuminating way of understanding the inequality is to write the vector $\overrightarrow{a}$ as a sum of two components, one parallel to $\overrightarrow{b}$ and one perpendicular to $\overrightarrow{b}$. The component parallel to $\overrightarrow{b}$ is just $\overrightarrow{b}\left(\overrightarrow{a}\cdot \overrightarrow{b}\right)/{\left|\overrightarrow{b}\right|}^{2}$, so the component perpendicular to $\overrightarrow{b}$ is the vector ${\overrightarrow{a}}_{\perp}=\overrightarrow{a}-\overrightarrow{b}\left(\overrightarrow{a}\cdot \overrightarrow{b}\right)/{\left|\overrightarrow{b}\right|}^{2}$. Substituting this expression into ${\overrightarrow{a}}_{\perp}\cdot {\overrightarrow{a}}_{\perp}\ge 0$ , the inequality follows.

This same point can be made in a general inner product space: if $|V\rangle ,\text{\hspace{0.17em}}\text{\hspace{0.17em}}|W\rangle $ are two vectors, then

$\text{}|Z\rangle =|V\rangle -\frac{|W\rangle \langle W|V\rangle}{|W{|}^{2}}$

is the component of $|V\rangle $ perpendicular to $|W\rangle $, as is easily checked by taking its inner product with $|W\rangle $.

Then

$\langle Z|Z\rangle \ge 0\text{givesimmediately}{\left|\langle V|W\rangle \right|}^{\text{2}}\le {\left|V\right|}^{\text{2}}{\left|W\right|}^{\text{2}}\text{.}$

### Linear Operators

A *linear operator* $A$ takes any vector in a linear vector space to a
vector in that space, $A|V\rangle =|{V}^{\prime}\rangle ,$ and satisfies

$A\left({c}_{1}|{V}_{1}\rangle +{c}_{2}|{V}_{2}\rangle \right)={c}_{1}A|{V}_{1}\rangle +{c}_{2}A|{V}_{2}\rangle ,$

with ${c}_{1},{c}_{2}$ arbitrary complex constants.

The *identity operator $I$ *is (obviously!) defined by:

$I|V\rangle =\text{\hspace{0.17em}}|V\rangle \text{forall}|V\rangle .$

For an $n$-dimensional vector space with an orthonormal basis $|1\rangle ,\dots ,|n\rangle ,$ since any vector in the space can be expressed as a sum $|V\rangle ={\displaystyle \sum {v}_{i}|i\rangle}$, the linear operator is completely determined by its action on the basis vectors -- this is all we need to know. It’s easy to find an expression for the identity operator in terms of bras and kets.

Taking the inner product of both sides of the equation $|V\rangle ={\displaystyle \sum {v}_{i}|i\rangle}$ with the bra $\langle i|$ gives $\langle i|V\rangle ={v}_{i},$ so

$|V\rangle ={\displaystyle \sum {v}_{i}|i\rangle}={\displaystyle \sum |i\rangle}\langle i|V\rangle .$

Since this is true for any vector in the space, it follows that that the identity operator is just

$I={\displaystyle \sum _{1}^{n}|i\rangle \langle i|}\text{\hspace{0.17em}}.$

This is an important result: it will reappear in many disguises.

To analyze the action of a general linear operator $A,$ we just need to know how it acts on each basis vector. Beginning with $A|1\rangle ,$ this must be some sum over the basis vectors, and since they are orthonormal, the component in the $|i\rangle $ direction must be just $\langle i|A|1\rangle .$

That is,

$A|1\rangle ={\displaystyle \sum _{1}^{n}|i\rangle \langle i|\text{\hspace{0.17em}}}A|1\rangle ={\displaystyle \sum _{1}^{n}{A}_{\text{\hspace{0.17em}}i1}|i\rangle ,\text{writing}\langle i|A|1\rangle ={A}_{\text{\hspace{0.17em}}i1}}.$

So if the linear operator $A$ acting on $|V\rangle ={\displaystyle \sum {v}_{i}|i\rangle}$ gives $|{V}^{\prime}\rangle ={\displaystyle \sum {{v}^{\prime}}_{i}|i\rangle}$, that is, $A|V\rangle =|{V}^{\prime}\rangle ,$ the linearity tells us that

$\sum {{v}^{\prime}}_{i}|i\rangle}=|{V}^{\prime}\rangle =A|V\rangle ={\displaystyle \sum {v}_{j}A|j\rangle =}{\displaystyle \sum _{i,j}{v}_{j}|i\rangle \langle i|A|j\rangle}={\displaystyle \sum _{i,j}{v}_{j}{A}_{ij}|i\rangle$

where in the fourth step we just inserted the identity operator.

Since the $|i\rangle $ ’s are all orthogonal, the coefficient of a particular $|i\rangle $ on the left-hand side of the equation must be identical with the coefficient of the same $|i\rangle $ on the right-hand side. That is, ${v}_{i}{}^{\prime}=A{\text{\hspace{0.05em}}}_{ij}{v}_{j}.$

Therefore the operator *A*
is simply equivalent to *matrix
multiplication*:

$\left(\begin{array}{c}{v}_{1}{}^{\prime}\\ {v}_{2}{}^{\prime}\\ .\\ .\\ {v}_{n}{}^{\prime}\end{array}\right)=\left(\begin{array}{ccccc}\langle 1|A|1\rangle & \langle 1|A|2\rangle & .& .& \langle 1|A|n\rangle \\ \langle 2|A|1\rangle & \langle 2|A|2\rangle & .& .& .\\ .& .& .& .& .\\ .& .& .& .& .\\ \langle n|A|1\rangle & \langle n|A|2\rangle & .& .& \langle n|A|n\rangle \end{array}\right)\left(\begin{array}{c}{v}_{1}\\ {v}_{2}\\ .\\ .\\ {v}_{n}\end{array}\right).$

Evidently, then, applying
two linear operators one after the other is equivalent to successive matrix
multiplication—and, therefore, *since
matrices do not in general commute, nor do linear operators*. (Of course, if
we hope to represent quantum variables as linear operators on a vector space,
this has to be true: the momentum operator $p=-i\hslash d/dx$ certainly doesn’t commute with $x$!)

### Projection Operators

It is important to note that a linear operator applied
successively to the members of an orthonormal basis might give a new set of
vectors which *no longer span the entire
space*. To give an example, the
linear operator $|1\rangle \langle 1|$ applied to any vector in the space picks out
the vector’s component in the $|1\rangle $ direction. It’s called a *projection operator*. The operator $\left(|1\rangle \langle 1|+|2\rangle \langle 2|\right)$ projects a vector into its components in the
subspace spanned by the vectors $|1\rangle $ and $|2\rangle $,
and so on: if we extend the sum to be over the whole basis, we recover the
identity operator.

*Exercise*: prove
that the $n\times n$ matrix representation of the projection
operator $\left(|1\rangle \langle 1|+|2\rangle \langle 2|\right)$ has all elements zero except the first two
diagonal elements, which are equal to one.

There can be no *inverse*
operator to a nontrivial projection operator, since the information about
components of the vector perpendicular to the projected subspace is lost.

### The Adjoint Operator and Hermitian Matrices

As we’ve discussed, if a ket $|V\rangle $ in the $n$ -dimensional space is written as a column
vector with $n$ (complex) components, the corresponding bra is
a *row* vector having as elements the
complex conjugates of the ket elements. $\langle W|V\rangle ={\langle V|W\rangle}^{\ast}$ then follows automatically from standard
matrix multiplication rules, and on multiplying $|V\rangle $ by a complex number *a *to get $a|V\rangle $ (meaning that each element in the column of
numbers is multiplied by $a$ ) the corresponding bra goes to $\langle V|{a}^{\ast}={a}^{\ast}\langle V|.$

But suppose that instead of multiplying a ket by a number,
we operate on it with a linear operator. What generates the parallel
transformation among the bras? In other
words, if $A|V\rangle =|{V}^{\prime}\rangle ,$ what operator sends the bra $\langle V|$ to $\langle {V}^{\prime}|$? It must be a linear operator, because $A$ is linear, that is, if under *A* $|{V}_{1}\rangle \to |{{V}^{\prime}}_{1}\rangle ,\text{\hspace{0.17em}}|{V}_{2}\rangle \to |{{V}^{\prime}}_{2}\rangle $ and $|{V}_{3}\rangle =|{V}_{1}\rangle +|{V}_{2}\rangle ,$ then under $A$ $|{V}_{3}\rangle \text{isrequiredtogoto}|{{V}^{\prime}}_{3}\rangle =|{{V}^{\prime}}_{1}\rangle +|{{V}^{\prime}}_{2}\rangle .$ Consequently, under the parallel *bra *transformation we must have $\langle {V}_{1}|\to \langle {{V}^{\prime}}_{1}|,\text{\hspace{0.17em}}\langle {V}_{2}|\to \langle {{V}^{\prime}}_{2}|\text{and}\text{\hspace{0.17em}}\langle {V}_{3}|\to \langle {{V}^{\prime}}_{3}|,\text{\hspace{0.17em}}$$\u2014$the bra
transformation is necessarily also linear.
Recalling that the bra is an $n$ -element *row*
vector, the most general linear transformation sending it to another bra is an $n\times n$ matrix operating on the bra from the *right*.

This bra operator is called the *adjoint* of *A*, written ${A}^{\u2020}.$ That
is, the ket $A|V\rangle $ has corresponding bra $\langle V|{A}^{\u2020}.$ In an orthonormal basis, using the notation $\langle Ai|$ to denote the bra $\langle i|{A}^{\u2020}$ corresponding to the ket $A|i\rangle =|Ai\rangle ,$ say,

${\left({A}^{\u2020}\right)}_{ij}=\langle i|{A}^{\u2020}|j\rangle =\langle Ai|j\rangle =\langle j|{Ai\rangle}^{\ast}={A}_{ji}^{\ast}.$

So *the adjoint operator is the transpose complex
conjugate*.

*Important*: for a *product*
of two operators (prove this!),

${(AB)}^{\u2020}={B}^{\u2020}{A}^{\u2020}.$

**An operator equal to its adjoint $A={A}^{\u2020}$ is called Hermitian**. As we shall find in the next lecture,
Hermitian operators are of central importance in quantum mechanics. An operator equal to

*minus*its adjoint, $A=-{A}^{\u2020}$, is

*anti*-Hermitian (sometimes termed skew Hermitian). These two operator types are essentially generalizations of real and imaginary number: any operator can be expressed as a sum of a Hermitian operator and an anti-Hermitian operator,

$A={\scriptscriptstyle \frac{1}{2}}(A+{A}^{\u2020})+{\scriptscriptstyle \frac{1}{2}}(A-{A}^{\u2020}).$

The definition of adjoint naturally extends to *vectors
and numbers*: the adjoint of a ket is the corresponding bra, the adjoint of
a number is its complex conjugate. This
is useful to bear in mind when taking the adjoint of an operator which may be
partially constructed of vectors and numbers, such as projection-type
operators. The adjoint of a product of
matrices, vectors and numbers is the product of the adjoints *in reverse
order*. (Of course, for numbers the
order doesn’t matter.)

### Unitary Operators

An operator is *unitary* if ${U}^{\u2020}U=1.$ This implies first that $U$ operating on any vector gives a vector having
the same norm, since the new norm $\langle V|{U}^{\u2020}U|V\rangle =\langle V|V\rangle $. Furthermore, inner products are preserved, $\langle W|{U}^{\u2020}U|V\rangle =\langle W|V\rangle .$ *Therefore, under a unitary transformation the
original orthonormal basis in the space must go to another orthonormal basis*.

Conversely, any transformation that takes one orthonormal
basis into another one is a unitary transformation. To see this, suppose that a linear
transformation *A* sends the members of
the orthonormal basis $\left({|1\rangle}_{1},{|2\rangle}_{1},\dots ,{|n\rangle}_{1}\right)$ to the different orthonormal set $\left({|1\rangle}_{2},{|2\rangle}_{2},\dots ,{|n\rangle}_{2}\right)$,
so $A{|1\rangle}_{1}={|1\rangle}_{2},$ etc.
Then the vector $|V\rangle ={{\displaystyle \sum {v}_{i}|i\rangle}}_{1}$ will go to $|{V}^{\prime}\rangle =A|V\rangle ={\displaystyle \sum {v}_{i}{|i\rangle}_{2}},$ having the same norm, $\langle {V}^{\prime}|{V}^{\prime}\rangle =\langle V|V\rangle ={{\displaystyle \sum \left|{v}_{i}\right|}}^{2}$. A matrix elememt $\langle {W}^{\prime}|{V}^{\prime}\rangle =\langle W|V\rangle ={\displaystyle \sum {w}_{i}*{v}_{i}}$,
but also $\langle {W}^{\prime}|{V}^{\prime}\rangle =\langle W|{A}^{\u2020}A|V\rangle $.
That is, $\langle W|V\rangle =\langle W|{A}^{\u2020}A|V\rangle $ for *arbitrary*
kets $|V\rangle ,\text{\hspace{0.17em}}|W\rangle $. This is only possible if ${A}^{\u2020}A=I$,
so $A$ is unitary.

A unitary operation amounts to a rotation (possibly combined
with a reflection) in the space. Evidently,
since ${U}^{\u2020}U=1,$ the adjoint ${U}^{\u2020}$ rotates the basis back: it is the inverse
operation, and so $U{U}^{\u2020}=1$ also, that is, *U* and ${U}^{\u2020}$ commute.

### Determinants

We review in this section the *determinant* of a
matrix, a function closely related to the operator properties of the matrix.

Let’s start with $2\times 2$ matrices:

$A=\left(\begin{array}{cc}{a}_{11}& {a}_{12}\\ {a}_{21}& {a}_{22}\end{array}\right).$

The determinant of this matrix is defined by:

$\mathrm{det}A=\text{\hspace{0.17em}}\left|A\right|\text{\hspace{0.17em}}={a}_{11}{a}_{22}-{a}_{12}{a}_{21}.$

Writing the two rows of the matrix as vectors:

$\begin{array}{l}{\overrightarrow{a}}_{1}^{R}=\left({a}_{11},{a}_{12}\right)\\ {\overrightarrow{a}}_{2}^{R}=\left({a}_{21},{a}_{22}\right)\end{array}$

( $R$ denotes row), $\mathrm{det}A=\text{\hspace{0.17em}}{\overrightarrow{a}}_{1}^{R}\times {\overrightarrow{a}}_{2}^{R}$ is just the *area* (with appropriate
sign) of the parallelogram having the two row vectors as adjacent sides:

This is zero if the two vectors are parallel (linearly dependent) and is not changed by adding any multiple of ${\overrightarrow{a}}_{2}^{R}$ to ${\overrightarrow{a}}_{1}^{R}$ (because the new parallelogram has the same base and the same height as the original—check this by drawing).

Let’s go on to the more interesting case of $3\times 3$ matrices:

$A=\left(\begin{array}{ccc}{a}_{11}& {a}_{12}& {a}_{13}\\ {a}_{21}& {a}_{22}& {a}_{23}\\ {a}_{31}& {a}_{32}& {a}_{33}\end{array}\right).$

The determinant of $A$ is defined as

$\mathrm{det}A={\epsilon}_{ijk}{a}_{1i}{a}_{2j}{a}_{3k}$

where ${\epsilon}_{ijk}=0$ if any two are equal, +1 if $ijk=123,231\text{or}312$ (that is to say, an even permutation of 123) and $\u2013$1 if $ijk$ is an odd permutation of $123.$ Repeated suffixes, of course, imply summation here.

Writing this out explicitly,

$\mathrm{det}A={a}_{11}{a}_{22}{a}_{33}+{a}_{21}{a}_{32}{a}_{13}+{a}_{31}{a}_{12}{a}_{23}-{a}_{11}{a}_{32}{a}_{23}-{a}_{21}{a}_{12}{a}_{33}-{a}_{31}{a}_{22}{a}_{13}.$

Just as in two dimensions, it’s worth looking at this expression in terms of vectors representing the rows of the matrix

$\begin{array}{l}{\overrightarrow{a}}_{1}^{R}=({a}_{11},{a}_{12},{a}_{13})\\ {\overrightarrow{a}}_{2}^{R}=({a}_{21},{a}_{22},{a}_{23})\\ {\overrightarrow{a}}_{3}^{R}=({a}_{31},{a}_{32},{a}_{33})\end{array}$

so

$A=\left(\begin{array}{c}{\overrightarrow{a}}_{1}^{R}\\ {\overrightarrow{a}}_{2}^{R}\\ {\overrightarrow{a}}_{3}^{R}\end{array}\right)\text{,andweseethat}\mathrm{det}A=\text{\hspace{0.17em}}\text{\hspace{0.17em}}({\overrightarrow{a}}_{1}^{R}\times {\overrightarrow{a}}_{2}^{R})\cdot {\overrightarrow{a}}_{3}^{R}.$

This is the volume of the parallelopiped formed by the three vectors being adjacent sides (meeting at one corner, the origin).

This parallelepiped volume will of course be zero if the
three vectors lie in a plane, and it is not changed if a multiple of one of the
vectors is added to another of the vectors.
That is to say, *the determinant of
a matrix is not changed if a multiple of one row is added to another row*. This is because the determinant is linear in
the elements of a single row,

$\mathrm{det}\left(\begin{array}{c}{\overrightarrow{a}}_{1}^{R}+\lambda {\overrightarrow{a}}_{2}^{R}\\ {\overrightarrow{a}}_{2}^{R}\\ {\overrightarrow{a}}_{3}^{R}\end{array}\right)=\mathrm{det}\left(\begin{array}{c}{\overrightarrow{a}}_{1}^{R}\\ {\overrightarrow{a}}_{2}^{R}\\ {\overrightarrow{a}}_{3}^{R}\end{array}\right)+\lambda \mathrm{det}\left(\begin{array}{c}{\overrightarrow{a}}_{2}^{R}\\ {\overrightarrow{a}}_{2}^{R}\\ {\overrightarrow{a}}_{3}^{R}\end{array}\right)$

and the last term is zero because two rows are identical—so the triple vector product vanishes.

A more general way of stating this, applicable to larger
determinants, is that for a determinant with two identical rows, the symmetry
of the two rows, together with the *antisymmetry* of ${\epsilon}_{ijk},$ ensures that the terms in the sum all cancel
in pairs.

Since the determinant is not altered by adding some multiple of one row to another, if the rows are linearly dependent, one row could be made identically zero by adding the right multiples of the other rows. Since every term in the expression for the determinant has one element from each row, the determinant would then be identically zero. For the three-dimensional case, the linear dependence of the rows means the corresponding vectors lie in a plane, and the parallelepiped is flat.

The algebraic argument generalizes easily to $n\times n$ determinants: they are *identically zero if the rows are linearly dependent*.

The generalization from $3\times 3\text{to}n\times n$ determinants is that $\mathrm{det}A={\epsilon}_{ijk}{a}_{1i}{a}_{2j}{a}_{3k}$ becomes:

$\mathrm{det}A={\epsilon}_{ijk\mathrm{...}p}{a}_{1i}{a}_{2j}{a}_{3k}\mathrm{....}{a}_{np}$

where $ijk\dots p$ is summed over all permutations of $123\dots n,$ and the $\epsilon $ symbol is zero if any two of its suffixes are
equal, $+1$ for an even permutation and $-1$ for an odd permutation. (*Note*: any permutation can be written as a product of swaps of
neighbors. Such a representation is in
general not unique, but for a given permutation, all such representations will
have either an odd number of elements or an even number.)

An important theorem is that for a product of two matrices $A,B$ the determinant of the product is the product of the determinants, $\mathrm{det}AB=\mathrm{det}A\times \mathrm{det}B.$ This can be verified by brute force for $2\times 2$ matrices, and a proof in the general case can be found in any book on mathematical physics (for example, Byron and Fuller).

It can also be proved that if the *rows* are linearly independent, the determinant cannot be zero.

(*Here’s a proof*:
take an $n\times n$ matrix with the $n$ row vectors linearly independent. Now consider the components of those vectors
in the $n-1$ dimensional subspace perpendicular to $\left(1,0,\dots ,0\right).$ These $n$ vectors, each with only $n-1$ components, must be linearly dependent, since
there are more of them than the dimension of the space. So we can take some
combination of the rows below the first row and subtract it from the first row
to leave the first row $\left(a,0,\dots ,0\right),$ and $a$ cannot be zero since we have a matrix with $n$ linearly independent rows. We can then
subtract multiples of this first row from the other rows to get a determinant
having zeros in the first column below the first row. Now look at the $\left(n-1\right)\times \left(n-1\right)$ determinant to be multiplied by $a.$ Its rows must be linearly independent since
those of the original matrix were. Now
proceed by induction.)

To return to three dimensions, it is clear from the form of

$\mathrm{det}A={a}_{11}{a}_{22}{a}_{33}+{a}_{21}{a}_{32}{a}_{13}+{a}_{31}{a}_{12}{a}_{23}-{a}_{11}{a}_{32}{a}_{23}-{a}_{21}{a}_{12}{a}_{33}-{a}_{31}{a}_{22}{a}_{13}$

that we could equally have taken the *columns *of $A$ as three vectors, $A=({\overrightarrow{a}}_{1}^{C},{\overrightarrow{a}}_{2}^{C},{\overrightarrow{a}}_{3}^{C})$ in an obvious notation, $\mathrm{det}A=\text{\hspace{0.17em}}({\overrightarrow{a}}_{1}^{C}\times {\overrightarrow{a}}_{2}^{C})\cdot {\overrightarrow{a}}_{3}^{C}$,
and linear dependence among the *columns* will also ensure the vanishing
of the determinant—so, in fact, linear dependence of the columns ensures
linear dependence of the rows.

This, too, generalizes to $n\times n$:
in the definition of determinant $\mathrm{det}A={\epsilon}_{ijk\mathrm{...}p}{a}_{1i}{a}_{2j}{a}_{3k}\mathrm{....}{a}_{np}$,
the row suffix is fixed and the column suffix goes over all permissible
permutations, with the appropriate sign—but the same terms would be
generated by having the *column*
suffixes kept in numerical order and allowing the row suffix to undergo the
permutations.

### An Aside: Reciprocal Lattice Vectors

It is perhaps worth mentioning how the inverse of a $3\times 3$ matrix operator can be understood in terms of vectors. For a set of linearly independent vectors $({\overrightarrow{a}}_{1},{\overrightarrow{a}}_{2},{\overrightarrow{a}}_{3})$, a reciprocal set $({\overrightarrow{b}}_{1},{\overrightarrow{b}}_{2},{\overrightarrow{b}}_{3})$ can be defined by

${\overrightarrow{b}}_{1}=\frac{{\overrightarrow{a}}_{2}\times {\overrightarrow{a}}_{3}}{{\overrightarrow{a}}_{1}\times {\overrightarrow{a}}_{2}\cdot {\overrightarrow{a}}_{3}}$

and the obvious cyclic definitions for the other two reciprocal vectors. We see immediately that

${\overrightarrow{a}}_{i}\cdot {\overrightarrow{b}}_{j}={\delta}_{ij}$

from which it follows that the inverse matrix to

$A=\left(\begin{array}{c}{\overrightarrow{a}}_{1}^{R}\\ {\overrightarrow{a}}_{2}^{R}\\ {\overrightarrow{a}}_{3}^{R}\end{array}\right)\text{is}B=\left(\begin{array}{ccc}{\overrightarrow{b}}_{1}^{C}& {\overrightarrow{b}}_{2}^{C}& {\overrightarrow{b}}_{3}^{C}\end{array}\right)$.

(These reciprocal vectors are important in *x*-ray crystallography, for example. If a crystalline lattice has certain atoms at
positions ${n}_{1}{\overrightarrow{a}}_{1}+{n}_{2}{\overrightarrow{a}}_{2}+{n}_{3}{\overrightarrow{a}}_{3}$,
where ${n}_{1},{n}_{2},{n}_{3}$ are
integers, the reciprocal vectors are the set of normals to possible planes of
the atoms, and these planes of atoms are the important elements in the
diffractive *x*-ray scattering.)

### Eigenkets and Eigenvalues

If an operator $A$ operating on a ket $|V\rangle $ gives a multiple of the same ket,

$A|V\rangle \text{\hspace{0.17em}}=\lambda |V\rangle $

then $|V\rangle $ is said to be an *eigenket *(or, just as
often, *eigenvector*, or *eigenstate*!) of $A$ with *eigenvalue $\lambda $*.

Eigenkets and eigenvalues are of central importance in quantum mechanics: dynamical variables are operators, a physical measurement of a dynamical variable yields an eigenvalue of the operator, and forces the system into an eigenket.

In this section, we shall show how to find the eigenvalues and corresponding eigenkets for an operator $A.$ We’ll use the notation $A|{a}_{i}\rangle ={a}_{i}|{a}_{i}\rangle $ for the set of eigenkets $|{a}_{i}\rangle $ with corresponding eigenvalues ${a}_{i}.$ (Obviously, in the eigenvalue equation here the suffix $i$ is not summed over.)

The first step in solving $A|V\rangle \text{\hspace{0.17em}}=\lambda |V\rangle $ is to find the allowed eigenvalues ${a}_{i}.$

Writing the equation in matrix form,

$\left(\begin{array}{ccccc}{A}_{\text{\hspace{0.17em}}11}-\lambda & {A}_{\text{\hspace{0.17em}}12}& .& .& {A}_{\text{\hspace{0.17em}}1n}\\ {A}_{\text{\hspace{0.17em}}21}& {A}_{\text{\hspace{0.17em}}22}-\lambda & .& .& .\\ .& .& .& .& .\\ .& .& .& .& .\\ {A}_{\text{\hspace{0.17em}}n1}& .& .& .& {A}_{\text{\hspace{0.17em}}nn}-\lambda \end{array}\right)\left(\begin{array}{c}{v}_{1}\\ {v}_{2}\\ .\\ .\\ {v}_{n}\end{array}\right)=0.$

This equation is actually telling us that the *columns* of the matrix $A-\lambda I$ are linearly dependent! To see this, write the matrix as a row vector
each element of which is one of its columns, and the equation becomes

$({\overrightarrow{M}}_{1}^{C},{\overrightarrow{M}}_{2}^{C},\mathrm{...},{\overrightarrow{M}}_{n}^{C})\left(\begin{array}{c}{v}_{1}\\ .\\ .\\ .\\ {v}_{n}\end{array}\right)=0$

which is to say

${v}_{1}{\overrightarrow{M}}_{1}^{C}+{v}_{2}{\overrightarrow{M}}_{2}^{C}=\mathrm{...}+{v}_{n}{\overrightarrow{M}}_{n}^{C}=0,$

the columns of the matrix are indeed a *linearly dependent* set.

We know that means the determinant of the matrix $A-\lambda I$ is zero,

$\left|\begin{array}{ccccc}{A}_{\text{\hspace{0.17em}}11}-\lambda & {A}_{\text{\hspace{0.17em}}12}& .& .& {A}_{\text{\hspace{0.17em}}1n}\\ {A}_{\text{\hspace{0.17em}}21}& {A}_{\text{\hspace{0.17em}}22}-\lambda & .& .& .\\ .& .& .& .& .\\ .& .& .& .& .\\ {A}_{\text{\hspace{0.17em}}n1}& .& .& .& {A}_{\text{\hspace{0.17em}}nn}-\lambda \end{array}\right|=0.$

Evaluating the determinant using $\mathrm{det}A={\epsilon}_{ijk\mathrm{...}p}{a}_{1i}{a}_{2j}{a}_{3k}\mathrm{....}{a}_{np}$ gives an ${n}^{\text{th}}$ order polynomial in $\lambda $ sometimes called the *characteristic polynomial*.
Any polynomial can be written in terms of its roots:

$C(\lambda -{a}_{1})(\lambda -{a}_{2})\mathrm{....}(\lambda -{a}_{n})=0$

where the ${a}_{i}$ ’s are the roots of the polynomial, and *C* is an overall constant, which from
inspection of the determinant we can see to be ${\left(-1\right)}^{n}.$ (It’s the coefficient of ${\lambda}^{n}.$ ) The
polynomial roots (which we don’t yet know) are in fact the eigenvalues. For example, putting $\lambda ={a}_{1}$ in the matrix, $\mathrm{det}\left(A-{a}_{1}I\right)=0,$ which means that $\left(A-{a}_{1}I\right)|V\rangle =0$ has a nontrivial solution $|V\rangle ,$ and this is our eigenvector $|{a}_{1}\rangle .$

Notice that the diagonal term in the determinant $\left({A}_{11}-\lambda \right)\left({A}_{22}-\lambda \right)\dots \left({A}_{nn}-\lambda \right)$ generates the leading two orders in the polynomial ${\left(-1\right)}^{n}\left({\lambda}^{n}-\left({A}_{11}+\dots +{A}_{nn}\right){\lambda}^{n-1}\right)$, (and some lower order terms too). Equating the coefficient of ${\lambda}^{n-1}$ here with that in ${\left(-1\right)}^{n}(\lambda -{a}_{1})(\lambda -{a}_{2})\mathrm{....}(\lambda -{a}_{n})$,

$\sum _{i=1}^{n}{a}_{i}=}{\displaystyle \sum _{i=1}^{n}{A}_{\text{\hspace{0.17em}}ii}=\text{\hspace{0.17em}}}\text{Tr}A.$

Putting $\lambda =0$ in both the determinantal and the polynomial representations (in other words, equating the $\lambda $ -independent terms),

$\prod _{i=1}^{n}{a}_{i}}=\mathrm{det}A.$

So we can find both the sum and the product of the eigenvalues directly from the determinant, and for a $2\times 2$ matrix this is enough to solve the problem.

For anything bigger, the method is to solve the polynomial equation $\mathrm{det}\left(A-\lambda I\right)=0$ to find the set of eigenvalues, then use them to calculate the corresponding eigenvectors. This is done one at a time.

Labeling the first eigenvalue found as ${a}_{1},$ the corresponding equation for the components ${\nu}_{i}$ of the eigenvector $|{a}_{1}\rangle $ is

$\left(\begin{array}{ccccc}{A}_{\text{\hspace{0.17em}}11}-{a}_{1}& {A}_{\text{\hspace{0.17em}}12}& .& .& {A}_{\text{\hspace{0.17em}}1n}\\ {A}_{\text{\hspace{0.17em}}21}& {A}_{\text{\hspace{0.17em}}22}-{a}_{1}& .& .& .\\ .& .& .& .& .\\ .& .& .& .& .\\ {A}_{\text{\hspace{0.17em}}n1}& .& .& .& {A}_{\text{\hspace{0.17em}}nn}-{a}_{1}\end{array}\right)\left(\begin{array}{c}{v}_{1}\\ {v}_{2}\\ .\\ .\\ {v}_{n}\end{array}\right)=0.$

This looks like $n$ equations for the $n$ numbers ${\nu}_{i},$ but it isn’t: remember the rows are linearly dependent, so there are only $n-1$ independent equations. However, that’s enough to determine

the ratios of the vector components ${\nu}_{1},\dots ,{\nu}_{n}$ then finally the eigenvector is normalized. The process is then repeated for each eigenvalue. (Extra care is needed if the polynomial has coincident roots—we’ll discuss that case later.)

### Eigenvalues and Eigenstates of Hermitian Matrices

For a *Hermitian
*matrix, it is easy to establish that the eigenvalues are always *real*. (*Note*:
A basic postulate of Quantum Mechanics, discussed in the next lecture, is that
physical observables are represented by Hermitian operators.) Taking (in this section) $A$ to be hermitian, $A={A}^{\u2020},$ and labeling the eigenkets by the eigenvalue,
that is,

$A|{a}_{1}\rangle ={a}_{1}|{a}_{1}\rangle $

the inner product
with the bra $\langle {a}_{1}|$ gives $\langle {a}_{1}|A|{a}_{1}\rangle ={a}_{1}\langle {a}_{1}|{a}_{1}\rangle .$ But the
inner product of the *adjoint* equation
(remembering $A={A}^{\u2020}$ )

$\langle {a}_{1}|A={a}_{1}^{\ast}\langle {a}_{1}|$

with $|{a}_{1}\rangle $ gives $\langle {a}_{1}|A|{a}_{1}\rangle ={a}_{1}^{\ast}\langle {a}_{1}|{a}_{1}\rangle ,$ so ${a}_{1}={a}_{1}^{\ast}$, and all the eigenvalues must be real.

They certainly don’t have to all be different: for example,
the unit matrix $I$ is Hermitian, and all its eigenvalues are of
course $1.$ But let’s first consider the case where they *are* all different.

It’s easy to show that the eigenkets belonging to *different
*eigenvalues are *orthogonal*.

If

$\begin{array}{l}A|{a}_{1}\rangle \text{\hspace{0.17em}}={a}_{\text{\hspace{0.05em}}1}|{a}_{1}\rangle \\ A|{a}_{2}\rangle \text{\hspace{0.17em}}={a}_{2}|{a}_{2}\rangle ,\end{array}$

take the adjoint of the first equation and then the inner product with $|{a}_{2}\rangle $, and compare it with the inner product of the second equation with $\langle {a}_{1}|$:

$\langle {a}_{1}|A|{a}_{2}\rangle \text{\hspace{0.17em}}={a}_{1}\langle {a}_{1}|{a}_{2}\rangle \text{\hspace{0.17em}}={a}_{2}\langle {a}_{1}|{a}_{2}\rangle $

so $\langle {a}_{1}|{a}_{2}\rangle =0$ unless the eigenvalues are equal. (If they *are*
equal, they are referred to as *degenerate*
eigenvalues.)

Let’s first consider the nondegenerate case:$A$ has all eigenvalues distinct. The eigenkets of $A,$ appropriately normalized, form an orthonormal basis in the space.

Write

$|{a}_{1}\rangle =\left(\begin{array}{c}{v}_{11}\\ {v}_{21}\\ \vdots \\ {v}_{n1}\end{array}\right),\text{andconsiderthematrix}V=\text{}\left(\begin{array}{cccc}{v}_{11}& {v}_{12}& \cdots & {v}_{1n}\\ {v}_{21}& {v}_{22}& \cdots & {v}_{2n}\\ \vdots & \vdots & \ddots & \vdots \\ {v}_{n1}& {v}_{n2}& \cdots & {v}_{nn}\end{array}\right)=\left(\begin{array}{cccc}|{a}_{1}\rangle & |{a}_{2}\rangle & \cdots & |{a}_{n}\rangle \end{array}\right).$

Now

$AV=A\left(\begin{array}{cccc}|{a}_{1}\rangle & |{a}_{2}\rangle & \cdots & |{a}_{n}\rangle \end{array}\right)=\left({a}_{1}\begin{array}{cccc}|{a}_{1}\rangle & {a}_{2}|{a}_{2}\rangle & \cdots & {a}_{n}|{a}_{n}\rangle \end{array}\right)$

so

${V}^{\u2020}AV=\left(\begin{array}{c}\langle {a}_{1}|\\ \langle {a}_{2}|\\ \vdots \\ \langle {a}_{n}|\end{array}\right)\left({a}_{1}\begin{array}{cccc}|{a}_{1}\rangle & {a}_{2}|{a}_{2}\rangle & \cdots & {a}_{n}|{a}_{n}\rangle \end{array}\right)=\left(\begin{array}{cccc}{a}_{1}& 0& \cdots & 0\\ 0& {a}_{2}& \cdots & 0\\ \vdots & \vdots & \ddots & \vdots \\ 0& 0& \cdots & {a}_{n}\end{array}\right).$

Note also that, obviously, $V$ is unitary:

${V}^{\u2020}V=\left(\begin{array}{c}\langle {a}_{1}|\\ \langle {a}_{2}|\\ \vdots \\ \langle {a}_{n}|\end{array}\right)\left(\begin{array}{cccc}|{a}_{1}\rangle & |{a}_{2}\rangle & \cdots & |{a}_{n}\rangle \end{array}\right)=\left(\begin{array}{cccc}1& 0& \cdots & 0\\ 0& 1& \cdots & 0\\ \vdots & \vdots & \ddots & \vdots \\ 0& 0& \cdots & 1\end{array}\right).$

We have established, then, that for a Hermitian matrix with
distinct eigenvalues (nondegenerate case), the unitary matrix $V$ having columns identical to the normalized
eigenkets of $A$ *diagonalizes*
$A,$ that is, ${V}^{\u2020}AV$ is diagonal.
Furthermore, its (diagonal) elements equal the corresponding eigenvalues
of $A.$

Another way of saying this is that the unitary matrix $V$ is the transformation from the original orthonormal basis in the space to the basis formed of the normalized eigenkets of $A.$

### Proof that the Eigenvectors of a Hermitian Matrix Span the Space

We’ll now move on to the general case: what if some of the
eigenvalues of $A$ are the same?
In this case, any linear combination of them is also an eigenvector with
the same eigenvalue. *Assuming they form a basis in the subspace*,
the Gram Schmidt procedure can be used to make it orthonormal, and so part of
an orthonormal basis of the whole space.

However, we have not actually established that the
eigenvectors *do* form a basis in a
degenerate subspace. Could it be that
(to take the simplest case) the two eigenvectors for the single eigenvalue turn
out to be parallel? This is actually the
case for some 2×2 matrices—for example, $\left(\begin{array}{cc}1& 1\\ 0& 1\end{array}\right)$,
we need to prove it is *not* true for
Hermitian matrices, and nor are the analogous statements for higher-dimensional
degenerate subspaces.

A clear presentation is given in Byron and Fuller, section 4.7. We follow it here. The procedure is by induction from the $2\times 2$ case. The general $2\times 2$ Hermitian matrix has the form

$\left(\begin{array}{cc}a& b\\ {b}^{\ast}& c\end{array}\right)$

where $a,c$ are real.
It is easy to check that if the eigenvalues are degenerate, this matrix
becomes a real multiple of the identity, and so trivially has two orthonormal
eigenvectors. Since we already know that
if the eigenvalues of a $2\times 2$ Hermitian matrix are distinct it can be
diagonalized by the unitary transformation formed from its orthonormal eigenvectors,
we have established that *any* $2\times 2$ Hermitian matrix can be so diagonalized.

To carry out the induction process, we now assume any $\left(n-1\right)\times \left(n-1\right)$ Hermitian matrix can be diagonalized by a unitary transformation. We need to prove this means it’s also true for an $n\times n$ Hermitian matrix $A.$ (Recall a unitary transformation takes one complete orthonormal basis to another. If it diagonalizes a Hermitian matrix, the new basis is necessarily the set of orthonormalized eigenvectors. Hence, if the matrix can be diagonalized, the eigenvectors do span the $n$ -dimensional space.)

Choose an eigenvalue ${a}_{1}$ of $A,$ with normalized eigenvector $|{a}_{1}\rangle ={\left({v}_{11},{v}_{21},\dots ,{v}_{n1}\right)}^{T}.$ (We put in $T$ for transpose, to save the awkwardness of filling the page with a few column vectors.) We construct a unitary operator $V$ by making this the first column, then filling in with $n-1$ other normalized vectors to construct, with $|{a}_{1}\rangle $, an $n$ -dimensional orthonormal basis.

Now, since $A|{a}_{1}\rangle ={a}_{1}|{a}_{1}\rangle $,
the first column of the matrix $AV$ will just be ${a}_{1}|{a}_{1}\rangle $,
and the *rows* of the matrix ${V}^{\u2020}={V}^{-1}$ will be $\langle {a}_{1}|$ followed by $n-1$ normalized vectors orthogonal to it, so the
first column of the matrix ${V}^{\u2020}AV$ will be ${a}_{1}$ followed by zeros. It is easy to check that ${V}^{\u2020}AV$ is Hermitian, since $A$ is, so its first row is also zero beyond the
first diagonal term.

This establishes that for an $n\times n$ Hermitian matrix, a unitary transformation exists to put it in the form:

${V}^{\u2020}AV=\left(\begin{array}{ccccc}{a}_{1}& 0& .& .& 0\\ 0& {M}_{22}& .& .& {M}_{2n}\\ 0& .& .& .& .\\ 0& .& .& .& .\\ 0& {M}_{n2}& .& .& {M}_{nn}\end{array}\right).$

But we can now perform a second unitary transformation in the $\left(n-1\right)\times \left(n-1\right)$ subspace orthogonal to $|{a}_{1}\rangle $ (this of course leaves $|{a}_{1}\rangle $ invariant), to complete the full diagonalization -- that is to say, the existence of the $\left(n-1\right)\times \left(n-1\right)$ diagonalization, plus the argument above, guarantees the existence of the $n\times n$ diagonalization: the induction is complete.

### Diagonalizing a Hermitian Matrix

As discussed above, a Hermitian matrix is diagonal in the orthonormal basis of its set of eigenvectors: $|{a}_{1}\rangle ,|{a}_{2}\rangle ,\dots ,|{a}_{n}\rangle $, since

$\langle {a}_{i}|A|{a}_{j}\rangle =\langle {a}_{i}|{a}_{j}|{a}_{j}\rangle ={a}_{j}\langle {a}_{i}|{a}_{j}\rangle ={a}_{j}{\delta}_{ij}.$

If we are given the matrix elements of $A$ in some other orthonormal basis, to diagonalize it we need to rotate from the initial orthonormal basis to one made up of the eigenkets of $A.$

Denoting the initial orthonormal basis in the standard fashion

_{$|1\rangle =\left(\begin{array}{c}1\\ 0\\ 0\\ .\\ 0\end{array}\right),\text{}|\text{2}\rangle =\left(\begin{array}{c}0\\ 1\\ 0\\ .\\ 0\end{array}\right),\text{\hspace{1em}}|i\rangle \text{}=\left(\begin{array}{c}0\\ \vdots \\ 1\\ \vdots \\ 0\end{array}\right)\dots \text{(1in}{i}^{\text{th}}\text{placedown),}\text{\hspace{1em}}|n\rangle =\left(\begin{array}{c}0\\ 0\\ 0\\ .\\ 1\end{array}\right).$}

the elements of the matrix are ${A}_{\text{\hspace{0.17em}}ij}=\text{\hspace{0.17em}}\langle i|A|j\rangle $.

A transformation from one orthonormal basis to another is a *unitary* transformation, as discussed
above, so we write it

_{$|V\rangle \text{\hspace{0.17em}}\to \text{\hspace{0.17em}}|{V}^{\prime}\rangle \text{\hspace{0.17em}}=U|V\rangle .$}

Under this transformation, the matrix element

$\langle W|A|V\rangle \text{\hspace{0.17em}}\to \text{\hspace{0.17em}}\langle {W}^{\prime}|A|{V}^{\prime}\rangle \text{\hspace{0.17em}}=\text{\hspace{0.17em}}\langle W|{U}^{\u2020}AU|V\rangle .$

So we can find the appropriate
transformation matrix $U$ by requiring that ${U}^{\u2020}AU$ be diagonal with respect to the *original* set of basis vectors. (Transforming the operator in this way,
leaving the vector space alone, is equivalent to rotating the vector space and
leaving the operator alone. Of course,
in a system with more than one operator, the same transformation would have to
be applied to all the operators).

In fact, just as we discussed for the nondegenerate (distinct eigenvalues) case, the unitary matrix $U$ we need is just composed of the normalized eigenkets of the operator $A,$

$U=\left(|{a}_{1}\rangle ,|{a}_{2}\rangle ,\mathrm{...},|{a}_{n}\rangle \right)$

and it follows as before that

${\left({U}^{\u2020}AU\right)}_{ij}=\langle {a}_{i}|{a}_{j}|{a}_{j}\rangle ={\delta}_{ij}{a}_{j},\text{adiagonalmatrix}\text{.}$

(The repeated suffixes here are of course *not* summed over.)

If some of the eigenvalues are the same, the Gram Schmidt procedure may be needed to generate an orthogonal set, as mentioned earlier.

### Functions of Matrices

The same unitary operator $U$ that diagonalizes an Hermitian matrix $A$ will also diagonalize ${A}^{2}$ because

${U}^{-1}{A}^{2}U={U}^{-1}AAU={U}^{-1}AU{U}^{-1}AU$

so

${U}^{\u2020}{A}^{2}U=\left(\begin{array}{ccccc}{a}_{1}^{2}& 0& 0& .& 0\\ 0& {a}_{2}^{2}& 0& .& 0\\ 0& 0& {a}_{3}^{2}& .& 0\\ .& .& .& .& .\\ 0& .& .& .& {a}_{n}^{2}\end{array}\right).$

Evidently, this same process works for any power of $A,$ and formally for any function of $A$ expressible as a power series, but of course convergence properties need to be considered, and this becomes trickier on going from finite matrices to operators on infinite spaces.

### Commuting Hermitian Matrices

From the above, the set of powers of
an Hermitian matrix all commute with each other, and have a common set of
eigenvectors (but not the same eigen*values*,
obviously). In fact it is not difficult
to show that any two Hermitian matrices that commute with each other have the
same set of eigenvectors (after possible Gram Schmidt rearrangements in
degenerate subspaces).

If two $n\times n$ Hermitian matrices $A,B$ commute, that is, $AB=BA,$ and $A$ has a *nondegenerate* set of eigenvectors
$A|{a}_{i}\rangle \text{\hspace{0.05em}}={a}_{i}|{a}_{i}\rangle ,$ then $AB|{a}_{i}\rangle \text{\hspace{0.17em}}=BA|{a}_{i}\rangle \text{\hspace{0.17em}}=B{a}_{i}|{a}_{i}\rangle ={a}_{i}B|{a}_{i}\rangle ,$ that is, $B|{a}_{i}\rangle $ is an eigenvector of $A$ with eigenvalue ${a}_{i}.$ Since $A$ is nondegenerate, $B|{a}_{i}\rangle $ must be some multiple of $|{a}_{i}\rangle $,
and we conclude that $A,$ $A,B$ have the same set of eigenvectors.

Now suppose $A$ *is* degenerate, and consider the $m\times m$ subspace ${S}_{{a}_{i}}$ spanned by the eigenvectors $|{a}_{i},1\rangle ,|{a}_{i},2\rangle ,\dots $ of $A$ having eigenvalue ${a}_{i}.$ Applying the argument in the paragraph above,
$B|{a}_{i},1\rangle ,\text{\hspace{0.17em}}B|{a}_{i},2\rangle ,\dots $ must also lie in this subspace. Therefore, if
we transform $B$ with the same unitary transformation that
diagonalized $A,\text{\hspace{0.17em}}\text{\hspace{0.17em}}B$ will not in general be diagonal in the
subspace ${S}_{{a}_{i}},$ but it will be what is termed *block
diagonal*, in that if $B$ operates on any vector in ${S}_{{a}_{i}}$ it gives a vector in ${S}_{{a}_{i}}$.

*$B$ *can
be written as two diagonal blocks: one $m\times m$,
one $\left(n-m\right)\times \left(n-m\right),$ with zeroes outside these diagonal blocks, for
example, for $m=2,\text{\hspace{0.17em}}n=5:$

$\left(\begin{array}{cc}\begin{array}{cc}{b}_{11}& {b}_{12}\\ {b}_{21}& {b}_{22}\end{array}& \begin{array}{c}\begin{array}{ccc}0& 0& 0\end{array}\\ \begin{array}{ccc}0& 0& 0\end{array}\end{array}\\ \begin{array}{c}\begin{array}{cc}0& 0\end{array}\\ \begin{array}{cc}0& 0\end{array}\\ \begin{array}{cc}0& 0\end{array}\end{array}& \begin{array}{ccc}{b}_{33}& {b}_{34}& {b}_{35}\\ {b}_{43}& {b}_{44}& {b}_{45}\\ {b}_{53}& {b}_{54}& {b}_{55}\end{array}\end{array}\right).$

And, in fact, if there is only *one*
degenerate eigenvalue that second block will only have nonzero terms on the
diagonal:

$\left(\begin{array}{cc}\begin{array}{cc}{b}_{11}& {b}_{12}\\ {b}_{21}& {b}_{22}\end{array}& \begin{array}{c}\begin{array}{ccc}0& 0& 0\end{array}\\ \begin{array}{ccc}0& 0& 0\end{array}\end{array}\\ \begin{array}{c}\begin{array}{cc}0& 0\end{array}\\ \begin{array}{cc}0& 0\end{array}\\ \begin{array}{cc}0& 0\end{array}\end{array}& \begin{array}{ccc}{b}_{3}& 0& 0\\ 0& {b}_{4}& 0\\ 0& 0& {b}_{5}\end{array}\end{array}\right).$

*$B$* therefore
operates on two subspaces, one $m$ -dimensional, one $\left(n-m\right)$ -dimensional, *independently*—a vector
entirely in one subspace stays there.

This means we can complete the diagonalization
of $B$ with a unitary operator that *only *operates
on the $m\times m$ block ${S}_{{a}_{i}}$. Such an operator will also affect the
eigenvectors of $A,$ but that doesn’t matter, because all vectors
in this subspace are eigenvectors of $A$ with the *same* eigenvalue, so as far as $A$ is concerned, we can choose any orthonormal
basis we like—the basis vectors will still be eigenvectors.

This establishes that any two *commuting*
Hermitian matrices can be diagonalized at the same time. Obviously, this can
never be true of *noncommuting*
matrices, since all diagonal matrices commute.

### Diagonalizing a Unitary Matrix

Any unitary matrix can be diagonalized by a unitary
transformation. To see this, recall that
*any* matrix *M* can be written as a sum of a Hermitian matrix and an anti-Hermitian
matrix,

$M=\frac{M+{M}^{\u2020}}{2}+\frac{M-{M}^{\u2020}}{2}=A+iB$

where both *A*, *B* are Hermitian. This is the matrix analogue of writing an
arbitrary complex number as a sum of real and imaginary parts.

If $A,B$ commute, they can be simultaneously
diagonalized (see the previous section), and therefore $M$ can be diagonalized. Now, if a *unitary*
matrix is expressed in this form $U=A+iB$ with $A,B$ Hermitian, it easily follows from $U{U}^{\u2020}={U}^{\u2020}U=1$ that $A,B$ commute, so *any unitary matrix $U$ can be diagonalized by a unitary
transformation.* More generally, if a
matrix $M$ commutes with its adjoint ${M}^{\u2020}$,
it can be diagonalized.

(*Note*: it is *not* possible to diagonalize $M$ unless both $A$ and $B$ are simultaneously diagonalized. This follows from ${U}^{\u2020}AU,\text{\hspace{0.17em}}{U}^{\u2020}iBU$ being Hermitian and anti-Hermitian for any
unitary operator $U,$ so their off-diagonal elements cannot cancel
each other, they must all be zero if $M$ has been diagonalized by $U,$ in which case the two transformed matrices ${U}^{\u2020}AU,\text{\hspace{0.17em}}{U}^{\u2020}iBU$ are diagonal, therefore commute, and so do the
original matrices $A,B.$ )

It is worthwhile looking at a specific example, a simple rotation of one orthonormal basis into another in three dimensions. Obviously, the axis through the origin about which the basis is rotated is an eigenvector of the transformation. It’s less clear what the other two eigenvectors might be—or, equivalently, what are the eigenvectors corresponding to a two dimensional rotation of basis in a plane? The way to find out is to write down the matrix and diagonalize it.

The matrix

$U(\theta )=\left(\begin{array}{cc}\mathrm{cos}\theta & \mathrm{sin}\theta \\ -\mathrm{sin}\theta & \mathrm{cos}\theta \end{array}\right).$

Note that the determinant is equal to unity. The eigenvalues are given by solving

$\left|\begin{array}{cc}\mathrm{cos}\theta -\lambda & \mathrm{sin}\theta \\ -\mathrm{sin}\theta & \mathrm{cos}\theta -\lambda \end{array}\right|=0\text{togive}\lambda ={e}^{\pm i\theta}\text{.}$

The corresponding eigenvectors satisfy

$\left(\begin{array}{cc}\mathrm{cos}\theta & \mathrm{sin}\theta \\ -\mathrm{sin}\theta & \mathrm{cos}\theta \end{array}\right)\left(\begin{array}{c}{u}_{1}^{\pm}\\ {u}_{2}^{\pm}\end{array}\right)={e}^{\pm i\theta}\left(\begin{array}{c}{u}_{1}^{\pm}\\ {u}_{2}^{\pm}\end{array}\right)\text{.}$

The eigenvectors, normalized, are:

$\left(\begin{array}{c}{u}_{1}^{\pm}\\ {u}_{2}^{\pm}\end{array}\right)=\frac{1}{\sqrt{2}}\left(\begin{array}{c}1\\ \pm \text{\hspace{0.17em}}i\end{array}\right).$

Note that, in contrast to a Hermitian matrix, the eigenvalues of a unitary matrix do not have to be real. In fact, from ${U}^{\u2020}U=1$, sandwiched between the bra and ket of an eigenvector, we see that any eigenvalue of a unitary matrix must have unit modulus—it’s a complex number on the unit circle. With hindsight, we should have realized that one eigenvalue of a two-dimensional rotation had to be ${e}^{i\theta}$, the product of two two-dimensional rotations is given be adding the angles of rotation, and a rotation through $\pi $ changes all signs, so has eigenvalue $-1.$ Note that the eigenvector itself is independent of the angle of rotation: in two dimensions, the rotations all commute, so they must have common eigenvectors. Successive rotation operators applied to the plus eigenvector add their angles, when applied to the minus eigenvector, all angles are subtracted.