## The Dirac equation (3): heuristic derivation

In Section 3 we introduced the Klein-Gordon equation $\partial_t^2 \psi - \Delta \psi + m^2 \psi = 0$ (formula (3.2)). Although it is a fundamental equation in relativistic quantum mechanics, it does not really fit the quantum world framework. The wave function  $\psi(t)$ does not completely describe the state of a particle at time $t$. This is because the Klein-Gordon equation is a second order differential equation in time variable (unlike the Schrödinger equation). In this post we will discuss a way to take a ‘square-root’ of the Klein-Gordon equation.

## 6. Drawbacks of the Klein-Gordon equation

As Paul Dirac wrote in his article The Quantum Theory of the Electron, there are at least two difficulties with the Klein-Gordon equation. The first problem was already mentioned in the introduction: a second-order differential equation requires too much initial data: $\psi(0)$ does not completely determine the solution. Indeed, it is clear from the explicit formula for the solution (see formula (3.6)) that for uniqueness, one needs both $\psi(0)$ and $\partial_t \psi(0)$. In the standard quantum interpretation, the probability of finding the particle in a domain $E$ at a given time is equal to $\int_E |\psi(t, \vec{x})|^2 \mathrm{d}\vec{x}$. This definition makes sense for the Klein-Gordon equation. However, it is difficult to express in a similar way dynamical variables, such as momentum or angular momentum. As it was mentioned, these problems would not appear if the evolution equation was of first order in time, because then the initial data $\psi(0)$ would completely determine the solution. But since time and space are in a sense symmetric in special relativity, we expect that the better equation will be of first order also in spatial variables.

The other problem is related to the charge-time symmetry observed in Section 3: the solutions of the Klein-Gordon equation can be decomposed into positive/negative energy parts $\psi_\pm$, and the evolution equation of each of the parts is time-reversal of the evolution equation of the other one. Later we will see that these two parts correspond to opposite electric charges and, in some sense, opposite energies (hence the names). More accurately, these two families of solutions correspond to anti-particles. We expect that the solutions of the better equation will describe the motion of just one particle, and that its anti-particle should have its own, different equation. One could try to resolve this issue by restricting the Hilbert space of admissible solutions just to, say, the positive energy solutions $\psi = \psi_+$. However, then the evolution equation becomes non-local (Theorem 5.1), violating the causality principle: no information can propagate faster than light.

Dirac writes: In the present paper we shall be concerned only with the removal of the first of these two difficulties. The resulting theory is therefore still only an approximation, but it appears to be good enough to account for all the duplexity phenomena without arbitrary assumptions. To my understanding of the later development of the quantum theory, it turned out that the interplay between positive and negative energy parts is a physical fact rather than a problem, so we cannot resolve the second issue mentioned above. And the true drawback of the Dirac equation is rather related to difficulties with the description of interacting systems of particles. Please correct me if I am wrong.

## 7. Square roots of the Klein-Gordon equation

There are many possible attempts to convert the Klein-Gordon equation $\partial_t^2 \psi - \Delta \psi + m^2 \psi = 0$ to a first-order equation. For example, one can move the time derivative to the other side of the equality sign, and then observe that $-\Delta + m^2$ and $-\partial_t^2$ are nonnegative self-adjoint operators, so that both have well-defined square roots. Hence, one can consider the equation

$\displaystyle \pm \mathrm{i} \partial_t \psi = \sqrt{-\Delta + m^2} \, \psi . \hspace{\stretch{1}} (7.1)$

By iterating (7.1) twice, we see that any solution of (7.1) satisfies the Klein-Gordon equation. Equation (7.1) is sometimes referred to as the Klein-Gordon square root (or semi-relativistic, or quasi-relativistic) equation, and the operator $\sqrt{-\Delta + m^2}$ is often called the Klein-Gordon square root operator. It can defined using the Fourier transform, or spectral theory, as explained in Section 4.

Although equation (7.1) has some nice features (such as positivity of the square root operator on the right-hand side), it is clearly not what we are looking for. Indeed, it is a first order differential equation in time, but it is not a differential equation in space (it is, however, a pseudo-differential equation), so the symmetry between space and time is broken. This is also the reason for the ambiguity in the sign of the time derivative: both $\pm \mathrm{i} \partial_t \psi$ are (self-adjoint, but signed, not nonnegative) square roots of $-\partial_t^2$. (Note that the nonnegative square-root is again not a local operator.) Furthermore, the Klein-Gordon square root operator is nonlocal, hence (7.1) violates the causality principle in a similar way we observed in Theorem 5.1. In fact we will see that this is no coincidence: the Klein-Gordon square root operator is closely related to the positive and negative energy parts of the solution (whose evolution is described in Theorem 5.1), and at some point it will play an important role in our story.

One could try to find a square root of the wave operator $\partial_t^2 - \Delta$, or even the complete Klein-Gordon operator $\partial_t^2 - \Delta + m^2$. This is, however, problematic, because again this leads to pseudo-differential equations (this time, also in time variable) rather than differential ones, and again there is ambiguity in the definition of the square root (the operators under the discussion are not even nonnegative).

It seems that our approach of finding a first-order version of the Klein-Gordon equation by taking a square root of some operator is wrong. We can try attacking our problem from the other side. We know rather well what we are looking for: a first order differential equation with constant coefficients, whose solutions satisfy also the Klein-Gordon equation. There are not many possibilities here, so the better equation, known as the Dirac equation, must be of a rather general form:

$\displaystyle -\mathrm{i} (\gamma_0 \partial_t \psi + \gamma_1 \partial_1 \psi + \gamma_2 \partial_2 \psi + \gamma_3 \partial_3 \psi) + m \psi = 0. \hspace{\stretch{1}} (7.2)$

The operators $-\mathrm{i} \partial_j$ are self-adjoint operators, this explains the presence of the factor $-\mathrm{i}$ in (7.2) (which, of course, could have been included in the coefficients $\gamma_j$). It turns out, however, the coefficients $\gamma_0, \gamma_1, \gamma_2, \gamma_3$ in (7.2) are necessarily operators (instead of just complex numbers). These operators must have a very regular structure, and they admit some uniqueness property; before we describe them, however, let us make one comment. Equation (7.2) is equivalent to

$\displaystyle \mathrm{i} \partial_t \psi = -\mathrm{i} (\alpha_1 \partial_1 \psi + \alpha_2 \partial_2 \psi + \alpha_3 \partial_3 \psi) + m \beta \psi, \hspace{\stretch{1}} (7.3)$

with $\alpha_j = \gamma_0^{-1} \gamma_j$ and $\beta = \gamma_0^{-1}$. Equation (7.3) is the classical Schrödinger equation, of the form $\mathrm{i} \partial_t \psi = \boldsymbol{H} \psi$, where the Hamiltonian $\boldsymbol{H}$ is the free (that is, not coupled to any external field) Dirac Hamiltonian

$\displaystyle \boldsymbol{H}_0 \psi = -\mathrm{i} (\alpha_1 \partial_1 \psi + \alpha_2 \partial_2 \psi + \alpha_3 \partial_3 \psi) + m \beta \psi, \hspace{\stretch{1}} (7.4)$

Equation (7.2) is the covariant form of the Dirac equation, and (7.3) is the Schrödinger (or classical) form.

## 8. The Dirac algebra

Let us now check for which coefficients $\gamma_0, \gamma_1, \gamma_2, \gamma_3$ iteration of (7.2) gives the Klein-Gordon equation. Any solution of (7.2) satisfies

$\displaystyle -(\gamma_0 \partial_t + \gamma_1 \partial_1 + \gamma_2 \partial_2 + \gamma_3 \partial_3)^2 = m^2 \psi ,$

so the condition is

$\displaystyle -(\gamma_0 \partial_t + \gamma_1 \partial_1 + \gamma_2 \partial_2 + \gamma_3 \partial_3)^2 = \partial_t^2 - \Delta.$

Since the coefficients $\gamma_0, \gamma_1, \gamma_2, \gamma_3$ do not depend on $t$ and $\vec{x}$, the above condition simplifies to

\displaystyle \begin{aligned} & \gamma_0^2 = 1, \qquad && \gamma_1^2 = \gamma_2^2 = \gamma_3^2 = -1, \qquad && \gamma_i \gamma_j = -\gamma_j \gamma_i \end{aligned} \hspace{\stretch{1}} (8.1)

for all distinct $i, j \in \{0, 1, 2, 3\}$. In particular, the operators $\gamma_0, \gamma_1, \gamma_2, \gamma_3$ anti-commute with each other.

We see that if (8.1) holds, then any solution $\psi$ of the first order linear equation (7.2) satisfies the Klein-Gordon equation. The price we pay is that $\psi$ can no longer be complex-valued — otherwise, $\gamma_j$ would be just multiplication operators and therefore would commute with each other (a formal proof of this statement is a nice exercise in Fourier methods).

The wave function $\psi$ must therefore take values in some Hilbert space $X$, so that $\psi(t) \in L^2(\mathbf{R}^3; X)$ for all $t$. The space $\mathcal{H} = L^2(\mathbf{R}^3; X)$ is isometrically isomorphic to the tensor product space $L^2(\mathbf{R}^3) \otimes X$, and the tensor product has the following natural representation: if $e_1, e_2, ...$ is an orthonormal basis of $X$, then

$\displaystyle \psi(t, \vec{x}) = \psi_1(t, \vec{x}) e_1 + \psi_2(t, \vec{x}) e_2 + ... \, ,$

so we may identify $\mathcal{H}$ with the space of sequences $(\psi_1, \psi_2, ...)$ of $L^2(\mathbf{R}^3)$ functions. The above sums and sequences can be finite of infinite, depending on whether $X$ is finite or infinite dimensional. In the standard representation of the Dirac equation, $X = \mathbf{C}^4$ is four-dimensional, and we often write $\mathcal{H} = (L^2(\mathbf{R}^3))^4$. However, it is usually more convenient to work with general, abstract Hilbert space $X$.

It can be showed that $\gamma_j$ are necessarily operators acting only on the $X$ component; instead of proving this, let us agree that this is an assumption. In particular, we will often think that $\gamma_j$ are matrix multiplication operators in the basis $e_1, e_2, ...$, which explains using non-bold characters to denote $\gamma_j$. We see that each $\gamma_j$ is non-degenerate and $\gamma_j^{-1} = \gamma_j$. Furthermore, the anti-commutation relations imply that $\gamma_j$ are linearly independent. Indeed, if $\gamma = \lambda_0 \gamma_0 + \lambda_1 \gamma_1 + \lambda_2 \gamma_2 + \lambda_3 \gamma_3$, then $\gamma \gamma_j + \gamma_j \gamma = 2 \lambda_j$, so that $\gamma = 0$ implies $\lambda_0 = \lambda_1 = \lambda_2 = \lambda_3 = 0$, as desired.

The operators $\gamma_j$ generate, over the field of complex numbers, an at most $16$-dimensional algebra, spanned by the products $\gamma_0^{\varepsilon_0} \gamma_1^{\varepsilon_1} \gamma_2^{\varepsilon_2} \gamma_3^{\varepsilon_3}$, where $\varepsilon_j \in \{0, 1\}$. It turns out that this algebra is exactly $16$-dimensional, and it is unique up to an isomorphism. In order to prove this fact, we first take a look at a particular realization of $\gamma_j$ as $4 \times 4$ complex matrices, the so called Dirac matrices:

\begin{aligned} \gamma_0 & = \begin{pmatrix} 1&0&0&0 \\ 0&1&0&0 \\ 0&0&-1&0 \\ 0&0&0&-1\end{pmatrix} , \qquad & \gamma_1 & = \begin{pmatrix} 0&0&0&1 \\ 0&0&1&0 \\ 0&-1&0&0 \\ -1&0&0&0\end{pmatrix} , \\ \gamma_2 & = \begin{pmatrix} 0&0&0&-\mathrm{i} \\ 0&0&\mathrm{i}&0 \\ 0&\mathrm{i}&0&0 \\ -\mathrm{i}&0&0&0\end{pmatrix} , \qquad & \gamma_3 & = \begin{pmatrix} 0&0&1&0 \\ 0&0&0&-1 \\ -1&0&0&0 \\ 0&1&0&0\end{pmatrix} . \end{aligned} \hspace{\stretch{1}} (8.2)

In this standard representation, $X = \mathbf{C}^4$, and $\gamma_0, \gamma_1, \gamma_2, \gamma_3$ are simply matrix multiplication operators. Dirac matrices have a $2 \times 2$ block form (which is denoted by using square brackets)

\begin{aligned} & \gamma_0 = \begin{bmatrix} 1&0 \\ 0&-1 \end{bmatrix} , \qquad && \gamma_j = \begin{bmatrix} 0&\sigma_j \\ -\sigma_j&0 \end{bmatrix} , \end{aligned} \hspace{\stretch{1}} (8.3)

for $j = 1, 2, 3$, where

\begin{aligned} & \sigma_1 = \begin{pmatrix} 0&1 \\ 1&0 \end{pmatrix} , \qquad && \sigma_2 = \begin{pmatrix} 0&-\mathrm{i} \\ \mathrm{i}&0 \end{pmatrix} , \qquad && \sigma_3 = \begin{pmatrix} 1&0 \\ 0&-1 \end{pmatrix} \end{aligned} \hspace{\stretch{1}} (8.4)

are the Pauli matrices. By a direct calculation, $\sigma_j^2 = 1$ and $\sigma_i \sigma_j = -\sigma_j \sigma_i$ for distinct $i, j \in \{1, 2, 3\}$, and (8.1) follows easily. Furthermore, again by a direct calculation, the matrices $1, \sigma_1, \sigma_2, \sigma_3$ span the entire algebra of complex $2 \times 2$ matrices. It follows that the following $16$ matrices span the entire algebra of complex $4 \times 4$ matrices: $1, \gamma_0 ( \, = \beta), \gamma_1, \gamma_2, \gamma_3$,

\begin{aligned} & \alpha_j = \gamma_0 \gamma_j = \begin{bmatrix} 0&\sigma_j \\ \sigma_j&0 \end{bmatrix} , \qquad && \mathrm{i} \gamma_i \gamma_j = \begin{bmatrix} \sigma_k & 0 \\ 0 & \sigma_k \end{bmatrix} , \\ & \mathrm{i} \gamma_1 \gamma_2 \gamma_3 = \begin{bmatrix} 0&1 \\ -1&0 \end{bmatrix} , && \mathrm{i} \gamma_0 \gamma_i \gamma_j = \begin{bmatrix} \sigma_k & 0 \\ 0 & -\sigma_k \end{bmatrix} , \\ & \mathrm{i} \gamma_0 \gamma_1 \gamma_2 \gamma_3 = \begin{bmatrix} 0&1 \\ 1&0 \end{bmatrix} , \end{aligned} \hspace{\stretch{1}} (8.5)

where $i, j, k \in \{1, 2, 3\}$, and $(i, j, k)$ is one of the even permutations $(1, 2, 3)$, $(2, 3, 1)$ or $(3, 1, 2)$.

We come back to the general case. It is easy to see that assigning the operator $\gamma_j$ to the corresponding Dirac matrix (8.2) extends to a homomorphism of the algebra of complex $4 \times 4$ matrices onto the algebra generated by the operators $\gamma_0, \gamma_1, \gamma_2, \gamma_3$ (homomorphism of algebras is a linear mapping that preserves multiplication). The kernel of this homomorphism must be a two-sided ideal of $4 \times 4$ matrices, which is either $\{0\}$ or the entire algebra (another nice exercise). The latter is not possible, because $\gamma_j$ are nonzero operators. Hence, the kernel is trivial, and so the algebra generated by $\gamma_j$ is isomorphic with the algebra of complex $4 \times 4$ matrices. (This result can be generalized to any Clifford algebra related to a quadratic form over complex vector space of even dimension. The proof is inductive, and in fact the construction of Dirac matrices using Pauli matrices is reminiscent of the induction step.)

## 9. Notation for the Dirac equation

Let us summarize the above results before we proceed. The wave function $\psi(t, \vec{x})$ takes values in a Hilbert space $X$; in the standard representation, $X = \mathbf{C}^4$. Elements of $X$ are called spinors. Hence, for each $t$, $\psi(t)$ is an element of the Hilbert space $\mathcal{H} = L^2(\mathbf{R}^3; X) = L^2(\mathbf{R}^3) \otimes X$ (in the standard representation, $\mathcal{H} = (L^2(\mathbf{R}^3))^4$). The operators $\gamma_0, \gamma_1, \gamma_2, \gamma_3$ on $X$ (in the standard representation, $4 \times 4$ matrices (8.2)) satisfy the relations (8.1). The wave function satisfies the Dirac equation (7.2), which can be written in short as

$\displaystyle - \mathrm{i} \vec{\gamma} \cdot \partial \psi + m \psi = 0 , \hspace{\stretch{1}} (9.1)$

where $\vec{\gamma} = (\gamma_0, \gamma_1, \gamma_2, \gamma_3)$ and $\partial = (\partial_0, \partial_1, \partial_2, \partial_3)$. We also have the Schrödinger form (7.3), that is,

$\displaystyle \mathrm{i} \partial_t \psi = -\mathrm{i} \vec{\alpha} \cdot \nabla \psi + \beta m \psi . \hspace{\stretch{1}} (9.2)$

Here $\vec{\alpha} = (\alpha_1, \alpha_2, \alpha_3)$ and $\nabla = (\partial_1, \partial_2, \partial_3)$. (The notation may be slightly confusing here: remember that $\vec{\gamma}$ is always a space-time vector (printed with upright font for roman symbols), while $\vec{\alpha}$ is a space vector (printed with italic font for roman symbols). We do not use upright and italic fonts for Greek characters, though. Components of both vectors are operators on $X$, or, in the standard representation, $4 \times 4$ matrices.) The solutions of the Dirac equation satisfy the Klein-Gordon equation (3.2):

$\displaystyle \partial_t^2 \psi - \Delta \psi + m^2 \psi = 0 .$

This equation should be understood component-wise: in the standard representation, $\psi = (\psi_1, \psi_2, \psi_3, \psi_4)$, and each of the components $\psi_1, \psi_2, \psi_3, \psi_4$ satisfies the Klein-Gordon equation. The relation between Dirac and Klein-Gordon equations can be viewed as a (much more complicated) analogy of Cauchy-Riemann and Laplace equations.

From (9.2) and Theorem 5.2 it is clear that the solutions of the Dirac equation propagate with finite speed, in agreement with causality principle. In next sections we study the solutions in more details.