For more than a century now, we know that light presents some properties typical to particles (for example, in the photoelectric effect, explained in 1905 by Albert Einstein), and also matter can behave as waves (which was first conjectured by Louis de Broglie in 1924, and then observed for example in electron diffraction experiments). This phenomenon is known as wave-particle duality, and it suggests that the equations of motion for matter should at least resemble Maxwell’s equations for light. This is a starting point for our first approximation to relativistic quantum dynamics.
3. The Klein-Gordon equation
We agree that the motion of a particle should be described in terms of the wave function , possibly taking vector values. We are looking for something Lorentz-invariant, so a good guess is
In principle, the coefficients , may depend both on time and position. However, we first consider a free particle, that is, a particle which does not interact with any external force. The equation is therefore expected to be isotropic (invariant under translations and rotations of space), autonomous (invariant under translations of time) and homogeneous (invariant under multiplication of the unknown function by constants) — just as free Maxwell’s equations, that is, Maxwell’s equations with no electric charges and currents. (Homogeneity is not obvious at all here: one cannot ‘multiply’ particles by fractions, and we also have Pauli’s exclusion principle. On the other hand, light is quantized too, but this fact cannot be seen directly from Maxwell’s equations. Hence, a similar behavior is acceptable, or perhaps even expected, in our first approximation to quantum mechanics.)
The above considerations suggest that in equation (3.1) we should take , and should be a constant. Furthermore, should be nonnegative: otherwise, if at some initial time the wave function was constant (this idea can be localized, but consider a global constant here), it would either grow or decay exponentially with time, violating any reasonable energy conservation principle. This leads to the Klein-Gordon equation:
Here is a square-integrable function of for each , or a vector of such functions. Although in principle some regularity of is required for the derivatives of to be well-defined, we will see later in this post that (3.2) makes sense also in the more general context.
The choice of the constant is completely arbitrary here. However, it turns out that in (3.2) corresponds to the rest mass of a particle. If , we recover the potential formulation of free Maxwell’s equations (2.9′), describing the motion of massless photons; mathematically, this is just the classical wave equation. For general , a plain wave (moving with velocity and frequency ; this function is clearly not square integrable, but it is interesting as a basic building block of the Fourier transform) satisfies (3.2) if and only if . This is very similar to the relativistic relation between mass, momentum and energy:
better known in non-natural units formulation:
We will come back to this relation in the next post.
The Klein-Gordon equation (3.2) is fundamental for the relativistic quantum theory. It is believed that every relativistic quantum model describing a system without external interactions (a free system) is, in a sense, a special case of the Klein-Gordon equation; in particular, every solution of the potential formulation of free Maxwell’s equations satisfies (3.2), and the same is true for the solutions of the free Dirac equation, which will be introduced in the next post. For this reason, we briefly discuss some basic features of the Klein-Gordon equation.
In the Fourier (or momentum) space, (3.2) reads
For a fixed , this is an ordinary differential equation in time. The solution is given by a linear combination of two functions, . Hence, the general solution of the Klein-Gordon equation (3.2) is given by
The functions and will correspond to the spaces of positive and negative energies (and negative energy here is related to anti-matter). Observe that each of the evolution equations for and is the time-reversal of the other one. This gives the charge-time symmetry of quantum systems, our first approximation to the fundamental charge-parity-time (CPT) symmetry of physical laws. (Parity here corresponds to the orientation of space; for example, reflections are parity transformations.)
The components can be easily found from the initial conditions
Indeed, we have
We remark that for the free Maxwell’s equations, the four-potential is a real function, and therefore is conjugate to . In particular, the positive and negative energy components of a photon have equal ‘weight’.
Later in this post we rewrite formulas (3.5) and (3.6) in the integral form. In the zero mass case, the formula is explicit, but for massive particles we are not able to avoid using Fourier transform completely. Before, however, we make a short digression about the spectral theorem.
Recall that by we denote as a function of . By (3.5), is the image of under a Fourier multiplier: an operator which acts in the Fourier space as a multiplication operator. The symbol of this Fourier multiplier is . Since the Laplace operator is the Fourier multiplier with symbol , it seems reasonable to write
For a more general (say, bounded and measurable) function , the operator can be defined using the Fourier transform,
for smooth, rapidly decreasing functions , and then extended continuously to . Formula (4.1) corresponds to . The definition given in (4.2) is a particular case of a more general construction of a function of an operator, which requires the spectral theorem.
If is a unitary or (possibly unbounded) self-adjoint operator (or, more generally, a normal operator) on a Hilbert space , then there is a corresponding spectral measure (aka resolution of identity): a family of orthogonal projectors for Borel sets , such that is the identity operator, and is a countably additive function of (a complex-valued measure) for any , and furthermore we have for all in the domain of and all .
The smallest closed set such that is the identity operator is the spectrum of , denoted (this is equivalent to the classical definition). And for any measurable function defined on , the operator is given by the identity
whenever and is in
the domain of . Note that if is a bounded function, then is a bounded operator, even if the original operator was unbounded. In particular, formulas (3.4) and (3.5) (or (4.1)) define the unique solution of the Klein-Gordon equation (3.2) for arbitrary square-integrable initial data and , with no further regularity assumptions. Clearly, we also have the uniqueness of the solution given the initial data and (see (3.6)), but may fail to be square-integrable.
We give (4.3) and (4.4) here more to fix the notation, a proper introduction to spectral theory of operators in Hilbert spaces would take too long (and there are many good textbooks covering this subject). Readers unfamiliar with spectral measures but not willing to spend too much time to learn about them, may find helpful the following two examples.
If there is a complete orthonormal set of eigenvectors of the operator , , then is simply the orthogonal projection on the subspace spanned by those for which , and
whenever the orthogonal series on the right is convergent. In particular, this is a rather standard construction in finite-dimensional spaces.
The second example is the Laplace operator , which actually motivated the above discussion. The spectral measure of is easily shown to be the Fourier multiplier with symbol (here and below is the indicator function, equal to if the argument belongs to and otherwise), that is,
This proves that formulas (4.2) and (4.3) indeed define the same operator.
5. Explicit solutions of the Klein-Gordon equation
We begin with rewriting (3.6) as
The operator is closely related to the Bessel potential operator, and for smooth we have
where is the modified Bessel function of the second kind. Since this result will not be used in the sequel, we omit the proof, which can be found, for example, in the book (or the article) Theory of Bessel potentials by N. Aronszajn and K. T. Smith. Instead, we derive a formula for the evolution of in the position representation.
The result is rather complicated, and we need another definition. By a principal value integral, denoted , we mean the limit of integrals with symmetric intervals around singularities removed. The limit is taken here as the length of the removed intervals tends to zero. For example, in the statement of the theorem, one should remove the interval and let .
Theorem 5.1 (propagation of positive and negative energy solutions of the Klein-Gordon equation)
Let denote the mean value of on the sphere , and let, as usual, be the derivative of with respect to . Suppose that is a smooth solution of the Klein-Gordon equation (3.2), and that are its positive and negative energy components. When , we have
For , there is an additional term,
where denotes the convolution in spatial variables, and is an function with Fourier transform
Before the proof, we note that the evolution of violates the causality principle in special relativity: the value of is expected to depend only on the values of for in the light cone , but in Theorem 5.1 this is not the case. It is quite clear when , and for the square-integrable term cannot compensate the highly singular kernel of the principal value integral; we omit the details. As a consequence, any physical solution of the Klein-Gordon equation must either comprise nonzero positive and negative parts, or present some additional symmetry, which makes possible a causal reformulation of (5.3) and (5.4). On the other hand, the evolution of agrees with the causality principle, as stated in the following result.
Theorem 5.2 (propagation of general solutions of the Klein-Gordon equation)
Let denote the mean value of on the sphere , and let be the derivative of with respect to . Suppose that is a smooth solution of the Klein-Gordon equation (3.2). When , we have
For , there are two additional terms,
where and are square-integrable functions vanishing outside the ball , and with Fourier transforms given by
In the proof, we need the following technical result from complex analysis.
For any bounded function on which is smooth near , we have
The convergence is dominated by a constant depending only on , the supremum norm of on , and the supremum norm of , on a fixed neighborhood of .
The proof of Lemma 5.3 is given at the end of this section.
Proof of Theorem 5.1. According to (3.5), , where is the Fourier multiplier with symbol . Our goal is to find a more explicit description of .
For an symbol, the integral formula for the corresponding Fourier multiplier can be found using the convolution theorem. This method cannot be applied directly to , since its symbol is not in . However, when , we can use the convolution theorem for the operator with a square-integrable symbol . The theorem will then be proved by taking an appropriate limit.
The explicit formula can only be found for massless particles: we assume that . We find that for ,
Note that is a radial function, that is, it depends on only through its norm . For this reason it is convenient to compute first the (inverse) Fourier transform of the surface measure on the sphere with radius , centered at the origin. Symmetry, integration in spherical coordinates (rotated appropriately, so that the vector points upwards) and then substitution give
When is a radial function, then the function (), where is an arbitrary unit vector, is called the profile of , and it is sometimes denoted again by when this makes no confusion. Integration in spherical coordinates and Fubini’s theorem give
Therefore, the three-dimensional (inverse) Fourier transform of a radial function is again a radial function with profile equal to times the (inverse) Fourier sine transform of the profile of .
By the above observation,
An elementary calculation gives
If and converges to , then and converges to . Therefore, by Plancherel’s theorem and dominated convergence, converges in to . Furthermore, symmetry and integration in spherical coordinates gives
By Lemma 5.3, it follows that for all unit vectors , all smooth, rapidly decreasing functions , and all , we have
The last statement of Lemma 5.3 enables us to change the order of the integral and the limit, so that
Formula (5.3) for the function and is proved. The other cases follow by symmetry.
The case is now easy. Let denote the operator corresponding to mass . The Fourier symbol of , that is, , is a square-integrable function of . Formulas (5.4) and (5.5) follow by applying the convolution theorem, as described in the first part of the proof.
Proof of Theorem 5.2. We use the notation introduced in the proof of Theorem 5.1. Recall that under the assumption that is square-integrable for each , the solution is uniquely determined by and . Since the real and imaginary parts of a solution of the Klein-Gordon equation (3.2) are again solutions of (3.2), it suffices to consider real-valued solutions. As in the proof of Theorem 5.1, we first consider .
Suppose first that for all . By Theorem 5.1,
and is a solution of the Klein-Gordon equation (3.2). Furthermore, by (3.6), , and . Hence, the formula
defines a solution of (3.2) with given and with .
Suppose now that and that is real. Suppose for a moment that for all . Then is again a solution of the Klein-Gordon equation (3.2), and . Therefore, for ,
Since , integration in gives
and a similar formula for follows by symmetry. Either by a direct substitution to (3.2) or using an approximation argument, we conclude that (5.11) holds without the additional square-integrability assumption on . By combining two solutions given by (5.10) and (5.11), we obtain a solution for general initial data, and formula (5.6) follows by the uniqueness of the solution.
We can repeat the above argument when . In (5.10) we have an additional term , where . By (5.5) and the properties of the Fourier transform, we have
In a similar manner, in (5.11) we have an additional term , where
It remains to prove that when .
For , the holomorphic function has a branch cut along , but the boundary values of this function on approached from above and from below are opposite purely imaginary numbers. Since cosine is an even function, has a continuous extension to , and so it is an entire function of . By Morera’s theorem (or, more precisely, one of its corollaries), it follows that is an entire function. Furthermore, since and , we have .
The Fourier transform of is equal to . But , , is an entire function of three complex variables, and for . By a multivariate version of Paley-Wiener theorem (see a nice proof in the article by Y. Yang and T. Qian), we conclude that vanishes outside the ball , as desired.
The multivariate Paley-Wiener theorem is a rather advanced tool, and its proof is far beyond the scope of these notes. Although it would be difficult to avoid it completely in the proof of Theorem 5.2, we could have used only its one-dimensional version. Indeed, is a radial function, and so, by (5.9), its Fourier transform is expressed in terms of the one-dimensional Fourier sine transform of the profile of .
Proof of Lemma 5.3. Although we could simply refer to the Sokhotski-Weierstrass-Plemelj formulas, we give an explicit proof. First, we decompose into the sum of two parts, one vanishing in a neighborhood of , and the other smooth and vanishing outside a larger neighborhood of . The result for the first part is just dominated convergence. For the other part, we use methods of complex analysis.
We therefore assume that is a smooth function supported in a small neighborhood of . We extend to an even function on the real line, and define
Then is a holomorphic function in the half-plane . For any , we have
For the first integral on the right-hand side, we simply use dominated convergence. The other one is a Poisson integral, which converges to . (This is easy to prove directly, using just the continuity of at — an ‘approximate identity’ argument.) We conclude that
and the first statement of the lemma follows by a simple rearrangement. The proof of the other statement is very similar: again is split into two parts, and for smooth supported in a neighborhood of , we define
Integration by parts gives
the second equality is a consequence of . Hence, by the first part of the proof and the identity ,
Again integrating by parts (carefully: this is a principal value integral), we conclude that
as desired. Finally, by inspecting the above argument, one proves the last statement of the lemma; we omit the details.