As Paul Dirac wrote in his article *The Quantum Theory of the Electron*, there are at least two difficulties with the Klein-Gordon equation. The first problem was already mentioned in the introduction: a second-order differential equation requires too much initial data: does not completely determine the solution. Indeed, it is clear from the explicit formula for the solution (see formula (3.6)) that for uniqueness, one needs both and . In the standard quantum interpretation, the probability of finding the particle in a domain at a given time is equal to . This definition makes sense for the Klein-Gordon equation. However, it is difficult to express in a similar way *dynamical variables*, such as momentum or angular momentum. As it was mentioned, these problems would not appear if the evolution equation was of first order in time, because then the initial data would completely determine the solution. But since time and space are in a sense symmetric in special relativity, we expect that the better equation will be of first order also in spatial variables.

The other problem is related to the charge-time symmetry observed in Section 3: the solutions of the Klein-Gordon equation can be decomposed into positive/negative energy parts , and the evolution equation of each of the parts is time-reversal of the evolution equation of the other one. Later we will see that these two parts correspond to opposite electric charges and, in some sense, opposite energies (hence the names). More accurately, these two families of solutions correspond to anti-particles. We expect that the solutions of the better equation will describe the motion of just one particle, and that its anti-particle should have its own, different equation. One could try to resolve this issue by restricting the Hilbert space of admissible solutions just to, say, the positive energy solutions . However, then the evolution equation becomes non-local (Theorem 5.1), violating the causality principle: no information can propagate faster than light.

Dirac writes: *In the present paper we shall be concerned only with the removal of the first of these two difficulties. The resulting theory is therefore still only an approximation, but it appears to be good enough to account for all the duplexity phenomena without arbitrary assumptions*. To my understanding of the later development of the quantum theory, it turned out that the interplay between positive and negative energy parts is a physical fact rather than a problem, so we cannot resolve the second issue mentioned above. And the true drawback of the Dirac equation is rather related to difficulties with the description of interacting systems of particles. Please correct me if I am wrong.

There are many possible attempts to convert the Klein-Gordon equation to a first-order equation. For example, one can move the time derivative to the other side of the equality sign, and then observe that and are nonnegative self-adjoint operators, so that both have well-defined square roots. Hence, one can consider the equation

By iterating (7.1) twice, we see that any solution of (7.1) satisfies the Klein-Gordon equation. Equation (7.1) is sometimes referred to as the *Klein-Gordon square root* (or *semi-relativistic*, or *quasi-relativistic*) *equation*, and the operator is often called the *Klein-Gordon square root operator*. It can defined using the Fourier transform, or spectral theory, as explained in Section 4.

Although equation (7.1) has some nice features (such as positivity of the square root operator on the right-hand side), it is clearly not what we are looking for. Indeed, it is a first order differential equation in time, but it is not a differential equation in space (it is, however, a pseudo-differential equation), so the symmetry between space and time is broken. This is also the reason for the ambiguity in the sign of the time derivative: both are (self-adjoint, but signed, not nonnegative) square roots of . (Note that the nonnegative square-root is again not a local operator.) Furthermore, the Klein-Gordon square root operator is nonlocal, hence (7.1) violates the causality principle in a similar way we observed in Theorem 5.1. In fact we will see that this is no coincidence: the Klein-Gordon square root operator is closely related to the positive and negative energy parts of the solution (whose evolution is described in Theorem 5.1), and at some point it will play an important role in our story.

One could try to find a square root of the wave operator , or even the complete Klein-Gordon operator . This is, however, problematic, because again this leads to pseudo-differential equations (this time, also in time variable) rather than differential ones, and again there is ambiguity in the definition of the square root (the operators under the discussion are not even nonnegative).

It seems that our approach of finding a first-order version of the Klein-Gordon equation by taking a square root of some operator is wrong. We can try attacking our problem from the other side. We know rather well what we are looking for: a first order differential equation with constant coefficients, whose solutions satisfy also the Klein-Gordon equation. There are not many possibilities here, so the better equation, known as the *Dirac equation*, must be of a rather general form:

The operators are self-adjoint operators, this explains the presence of the factor in (7.2) (which, of course, could have been included in the coefficients ). It turns out, however, the coefficients in (7.2) are necessarily *operators* (instead of just complex numbers). These operators must have a very regular structure, and they admit some uniqueness property; before we describe them, however, let us make one comment. Equation (7.2) is equivalent to

with and . Equation (7.3) is the classical Schrödinger equation, of the form , where the Hamiltonian is the *free* (that is, not coupled to any external field) *Dirac Hamiltonian*

Equation (7.2) is the *covariant form* of the Dirac equation, and (7.3) is the Schrödinger (or classical) form.

Let us now check for which coefficients iteration of (7.2) gives the Klein-Gordon equation. Any solution of (7.2) satisfies

so the condition is

Since the coefficients do not depend on and , the above condition simplifies to

for all distinct . In particular, the operators anti-commute with each other.

We see that if (8.1) holds, then any solution of the first order linear equation (7.2) satisfies the Klein-Gordon equation. The price we pay is that can no longer be complex-valued — otherwise, would be just multiplication operators and therefore would commute with each other (a formal proof of this statement is a nice exercise in Fourier methods).

The wave function must therefore take values in some Hilbert space , so that for all . The space is isometrically isomorphic to the tensor product space , and the tensor product has the following natural representation: if is an orthonormal basis of , then

so we may identify with the space of sequences of functions. The above sums and sequences can be finite of infinite, depending on whether is finite or infinite dimensional. In the standard representation of the Dirac equation, is four-dimensional, and we often write . However, it is usually more convenient to work with general, abstract Hilbert space .

It can be showed that are necessarily operators acting only on the component; instead of proving this, let us agree that this is an assumption. In particular, we will often think that are matrix multiplication operators in the basis , which explains using non-bold characters to denote . We see that each is non-degenerate and . Furthermore, the anti-commutation relations imply that are linearly independent. Indeed, if , then , so that implies , as desired.

The operators generate, over the field of complex numbers, an at most -dimensional algebra, spanned by the products , where . It turns out that this algebra is exactly -dimensional, and it is unique up to an isomorphism. In order to prove this fact, we first take a look at a particular realization of as complex matrices, the so called *Dirac matrices*:

In this *standard representation*, , and are simply matrix multiplication operators. Dirac matrices have a block form (which is denoted by using square brackets)

for , where

are the *Pauli matrices*. By a direct calculation, and for distinct , and (8.1) follows easily. Furthermore, again by a direct calculation, the matrices span the entire algebra of complex matrices. It follows that the following matrices span the entire algebra of complex matrices: ,

where , and is one of the even permutations , or .

We come back to the general case. It is easy to see that assigning the operator to the corresponding Dirac matrix (8.2) extends to a homomorphism of the algebra of complex matrices onto the algebra generated by the operators (homomorphism of algebras is a linear mapping that preserves multiplication). The kernel of this homomorphism must be a two-sided ideal of matrices, which is either or the entire algebra (another nice exercise). The latter is not possible, because are nonzero operators. Hence, the kernel is trivial, and so the algebra generated by is isomorphic with the algebra of complex matrices. (This result can be generalized to any Clifford algebra related to a quadratic form over complex vector space of even dimension. The proof is inductive, and in fact the construction of Dirac matrices using Pauli matrices is reminiscent of the induction step.)

Let us summarize the above results before we proceed. The wave function takes values in a Hilbert space ; in the standard representation, . Elements of are called *spinors*. Hence, for each , is an element of the Hilbert space (in the standard representation, ). The operators on (in the standard representation, matrices (8.2)) satisfy the relations (8.1). The wave function satisfies the Dirac equation (7.2), which can be written in short as

where and . We also have the Schrödinger form (7.3), that is,

Here and . (The notation may be slightly confusing here: remember that is always a space-time vector (printed with upright font for roman symbols), while is a space vector (printed with italic font for roman symbols). We do not use upright and italic fonts for Greek characters, though. Components of both vectors are operators on , or, in the standard representation, matrices.) The solutions of the Dirac equation satisfy the Klein-Gordon equation (3.2):

This equation should be understood component-wise: in the standard representation, , and each of the components satisfies the Klein-Gordon equation. The relation between Dirac and Klein-Gordon equations can be viewed as a (much more complicated) analogy of Cauchy-Riemann and Laplace equations.

From (9.2) and Theorem 5.2 it is clear that the solutions of the Dirac equation propagate with finite speed, in agreement with causality principle. In next sections we study the solutions in more details.

]]>It is perhaps worth noting that if denotes the smallest such that , then , so equally well this note could have a title *Numerical computation of the distribution of the first passage time*.

This post is not about the supremum functional or first passage times. It is about numerical approximation to a rather complicated object. No knowledge of Lévy processes, complete Bernstein functions and other theoretical objects is required to follow this post.

**1.** Some properties of complete Bernstein functions are discussed in my previous post on conjecture. Here it is enough to know that a complete Bernstein function extends to a holomorphic function on such that the imaginary part of is positive in the upper complex half-plane and negative in the lower half-plane. Most important examples are (corresponding Lévy process: Brownian motion), , (symmetric -stable process), (variance gamma process) and (relativistic process).

**2.** The formula for the cumulative distribution function of the supremum functional is rather complicated (apart from two well-known cases and ). It involves not only the values of for positive , but also the jump of along the negative half-axis. In general, this jump should be understood in the distributional sense and it can be a measure. However, for simplicity, we only consider the case when extends to a continuous function in the closed upper complex half-plane . We begin with the following auxiliary functions:

Here . Suppose that is integrable in . Then

For some special cases this formula has been proved in my older preprint, while the general theorem is a joint result with Jacek Małecki and Michał Ryznar. The role of the functions is explained in the articles and exceeds the scope of this note. Just a remark: and .

**3.** The numerical scheme based on formula (1) is not quite straightforward: there are four nested integrals involved, one of them is in the exponent, and the other one in the argument of the sine function. And if this was not enough, there are essential numerical stability issues due to many cancellations in the formula.

One thing should be clarified here: I know next to nothing about numerical computations. There are two reasons for which I write this post. One of them is that I believe it may be useful for someone working in applications of Lévy processes. The other one is that I hope someone more experienced in numerical maths (perhaps the same person) would visit this page, read the code below and help me improve it. I welcome all comments.

The code below is written in Mathematica language, and it was executed on a (rather old) HP laptop with Pentium dual core 1.66 GHz processor and 1 GB RAM, running Mathematica 8.0.1.0. *MathematicaMark8* benchmark result is 0.27. Instead of just giving the final code, I prefer to start with a naive attempt, then describe the stability issues and how can one resolve them. The code itself is given at the end of this note.

**4.** We start with . After some manipulation, we obtain a slightly better formula,

Hence, we define:

`theta[k_?NumericQ] := theta[k] =`

NIntegrate[

Log[(psi[k^2]-psi[(k z)^2])/(psi[(k/z)^2]-psi[k^2])/z^2]/(1-z^2),

{z, 0, 1}

] / Pi;

Note that we memorize computed values by mixing `Set`

and `SetDelayed`

in the definition `theta[k_] := theta[k] = ...`

. The first referrence to, say, `theta[1]`

, makes the kernel execute the command `theta[1] = NIntegrate[...]`

. After computing the integral, a new definition `theta[1] = `

is added to the kernel, and this definition takes precedence over the general definition of *number*`theta[lambda_]`

. Hence, when `theta[1]`

is referred again, no integration is performed.

The exponent in the definition of is computed in a similar manner, but we take the term outside, and again use a simple substitution. That is, we use the formula

We let:

`exponent[k_?NumericQ, y_?NumericQ] := exponent[k, y] =`

NIntegrate[

Log[(1-psi[y^2 z^2]/psi[k^2])/(1-y^2 z^2/k^2)]/(1+z^2),

{z, 0, Infinity}

];

and we define the Laplace transform of :

`G[k_?NumericQ, x_?NumericQ] := G[k, x] =`

Sqrt[psi[k^2] dpsi[k^2]] / Pi NIntegrate[

Im[1/(psi[k^2]-psi[-y^2])] Exp[exponent[k, y]/Pi - x y],

{y, 0, Infinity}

];

In the above definition we use the nice feature of Mathematica: functions with a branch cut along negative half-axis, evaluated at negative arguments, return the boundary limit approached from above. This rule applies, for example, for power functions and logarithms. The function `dpsi`

is simply the derivative of `psi`

. The definition of is now straightforward:

`F[k_?NumericQ, x_?NumericQ] :=`

Sin[k x + theta[k]] - G[k, x];

It remains to define the main function. We use formula (1):

`supremum[t_?NumericQ, x_?NumericQ] := supremum[t, x] =`

2/Pi NIntegrate[

Exp[-t psi[k^2]] Sqrt[dpsi[k^2]/psi[k^2]] F[k, x],

{k, 0, Infinity}

];

**5.** Let us check what happens if . It takes less than 15 seconds to make a plot of :

Plotting (which gives the same picture, since in this special case ) consumes much more time, nearly 9 minutes. This is because for the latter plot, the values of `exponent[k, y]`

cannot be re-used: `k`

is no longer a constant. For the same reason, the computation of `supremum[t, x]`

would take a lot of time. But we can do little about it.

Consider now . In this case increases from to . Evaluation of `theta[1]`

, `theta[2]`

and `theta[10^10]`

gives correct results, but after typing `theta[10^(-10)]`

, we obtain an error:

. Numerical errors become visible when , as seen in the plot of :*The integrand (...) has evaluated to Overflow, Indeterminate, or Infinity for all sampling points in the region with boundaries {{0,1}}*

This error is caused by numerical instability of the argument of `Log`

in the definition of `theta`

when `z`

is close to . Replacing this argument by:

`(Sqrt[(k/z)^2 + 1] + Sqrt[k^2 + 1])/(Sqrt[k^2 + 1] + Sqrt[(k z)^2 + 1])`

solves the problem. When is small, Mathematica still gives a warning

, but this is simply because the value of the integral is close to zero. We can get rid of this message by setting *NIntegrate failed to converge to prescribed accuracy after 9 recursive bisections in z*`AccuracyGoal`

to a finite value (instead of the default `Infinity`

).

Similar stability problems may occur in the definition of `exponent`

for `z`

close to , so again we replace the argument of the `Log`

by:

`(Sqrt[1 + k^2] + 1)/(Sqrt[1 + k^2] + Sqrt[1 + (y z)^2])`

Finally, the definition of `psi`

should be changed to the stable form `psi[y_] = y/(Sqrt[1 + y] + 1)`

. With these corrections, the computation of `supremum[1, 1]`

is stable (well, at least Mathematica does not rise any warnings) and takes less than 3 minutes.

Let us now come back to the first example . The execution of `supremum[1, 1]`

, after ca. 15 minutes, results in an integrability error in the integral in `k`

near zero. Why? Well, in the present case decays when as a constant times . Hence, the integrand in (1) behaves near zero as , which is integrable. However, a tiny numerical instability in the computation of , the Laplace transform of , results in a large *relative* error in , which gives rise to a large *absolute* error in the integrand. The simplest solution is to replace for close to zero by an appropriate approximation, such as . In the general case, we choose the approximation , and the correct choice for is for some small (but not too small!) .

One more thing can be improved. We have just dealt with the unstable behavior of the integrand in (1) near zero. But what makes the computation of `supremum`

so time-consuming is the oscillatory part. Note that only the sine part has oscillatory behavior. Hence, the calculation can be easily sped up (sometimes even more than 10 times!) by expanding the integral over, say, as the difference of two integrals.

**6.** The final code is as follows:

`(*parameters*)`

accuracy = 5;

thetaaccuracy = 10;

exponentaccuracy = 10;

gaccuracy = Infinity;

e = 10^(-2);

(*definitions*)

theta[k_?NumericQ] := theta[k] =

NIntegrate[

Log[thetalog[k, z]]/(1 - z^2),

{z, 0, 1},

AccuracyGoal -> thetaaccuracy

]/Pi;

exponent[k_?NumericQ, y_?NumericQ] := exponent[k, y] =

NIntegrate[

Log[exponentlog[k, y, z]]/(1 + z^2),

{z, 0, Infinity},

AccuracyGoal -> exponentaccuracy

];

G0[k_?NumericQ, x_?NumericQ] := G0[k, x] =

NIntegrate[

imag[k, y] Exp[exponent[k, y]/Pi - x y],

{y, psimin, psimax},

AccuracyGoal -> gaccuracy

]/Pi;

F[k_?NumericQ, x_?NumericQ] :=

Sin[k x + theta[k]] - Sqrt[psi2[k] dpsi2[k]] G0[k, x];

supremum[t_?NumericQ, x_?NumericQ] := supremum[t, x] =

2/Pi (

F[e/x, x]/(e/x Sqrt[dpsi2[e/x]]) NIntegrate[

Exp[-t psi2[k]] k dpsi2[k]/Sqrt[psi2[k]],

{k, 0, e/x},

AccuracyGoal -> accuracy

] +

NIntegrate[

Exp[-t psi2[k]] (Sqrt[dpsi2[k]/psi2[k]] Sin[k x + theta[k]] -

dpsi2[k] G0[k, x]),

{k, e/x, 1/x},

AccuracyGoal -> accuracy

] +

NIntegrate[

Exp[-t psi2[k]] Sqrt[dpsi2[k]/psi2[k]] Sin[k x + theta[k]],

{k, 1/x, Infinity},

AccuracyGoal -> accuracy

] -

NIntegrate[

Exp[-t psi2[k]] dpsi2[k] G0[k, x],

{k, 1/x, Infinity},

AccuracyGoal -> accuracy

]

);

Furthermore, the following definitions specific to a given are needed:

`psi[y_] = `

*complete Bernstein function*;

psi2[y_] = psi[y^2];

dpsi[y_] = psi'[y];

dpsi2[y_] = dpsi[y^2];

thetalog[k_, z_] = FullSimplify[

(psi2[k] - psi2[k z])/(psi2[k/z] - psi2[k])/z^2,

k > 0 && z > 0

];

exponentlog[k_, y_, z_] = FullSimplify[

(1 - psi2[y z]/psi2[k])/(1 - (y z/k)^2),

k > 0 && y > 0 && z > 0

];

psimin = 0;

psimax = Infinity;

imag[k_, y_] = FullSimplify[

Im[1/(psi2[k] - psi[-y^2])],

k > 0 && psimin < y < psimax

];

If possible, the above definitions should be changed to numerically stable forms, and `psimin`

, `psimax`

should be changed to the minimal and the maximal such that the imaginary part of is nonzero. For example, for the -stable process:

`alpha = `

*number*;

psi[y_] = y^(alpha/2);

psi2[y_] = y^alpha;

dpsi[y_] = alpha/2 y^(alpha/2 - 1);

dpsi2[y_] = alpha/2 y^(alpha - 2);

thetalog[k_, z_] = z^(alpha - 2);

exponentlog[k_, y_, z_] = (1 - psi2[y z]/psi2[k])/(1 - y^2 z^2/k^2);

psimin = 0;

psimax = Infinity;

imag[k_, y_] = Sin[Pi alpha/2] y^alpha/

(k^(2 alpha) - 2 (k y)^alpha Cos[Pi alpha/2] + y^(2 alpha));

When is rational, then it is a good idea to manually remove singularities in `exponentlog`

. For the relativistic process:

`psi[y_] = y/(Sqrt[1 + y] + 1);`

psi2[y_] = psi[y^2];

dpsi[y_] = 1/(2 Sqrt[1 + y]);

dpsi2[y_] = dpsi[y^2];

thetalog[k_, z_] = (Sqrt[(k/z)^2 + 1] + Sqrt[k^2 + 1])/

(Sqrt[k^2 + 1] + Sqrt[(k z)^2 + 1]);

exponentlog[k_, y_, z_] = (Sqrt[1 + k^2] + 1)/

(Sqrt[1 + k^2] + Sqrt[1 + (y z)^2]);

psimin = 1;

psimax = Infinity;

imag[k_, y_] = Sqrt[y^2 - 1]/(k^2 + y^2);

Finally, for the variance gamma process:

`psi[y_] = Log[1 + y];`

psi2[y_] = psi[y^2];

dpsi[y_] = psi'[y];

dpsi2[y_] = dpsi[y^2];

thetalog[k_, z_] =

Log[(1 + k^2)/(1 + (k z)^2)]/Log[(1 + (k/z)^2)/(1 + k^2)]/z^2;

exponentlog[k_, y_, z_] =

Log[(1 + k^2)/(1 + (y z)^2)]/Log[1 + k^2]/(1 - (y z)^2/k^2);

psimin = 1;

psimax = Infinity;

imag[k_, y_] = Pi;

Here, however, the integral in (1) converges rather slowly when is small, so extra care is needed with the oscillatory integral. In such cases, when grows slowly at infinity, it is a good idea to replace `theta[k]`

in the third `NIntegrate`

in the definition of `supremum`

by a constant , so that Mathematica can apply its efficient algorithm for oscillatory integrals. For the variance gamma process, the constant is equal to , and this is a typical limit for slowly growing . Of course, one needs to update also the fourth `NIntegrate`

in this case:

`supremum[t_?NumericQ, x_?NumericQ] := supremum[t, x] =`

2/Pi (

F[e/x, x]/(e/x Sqrt[dpsi2[e/x]]) NIntegrate[

Exp[-t psi2[k]] k dpsi2[k]/Sqrt[psi2[k]],

{k, 0, e/x},

AccuracyGoal -> accuracy

] +

NIntegrate[

Exp[-t psi2[k]] (Sqrt[dpsi2[k]/psi2[k]] Sin[k x + theta[k]] -

dpsi2[k] G0[k, x]),

{k, e/x, 1/x},

AccuracyGoal -> accuracy

] +

NIntegrate[

Exp[-t psi2[k]] Sqrt[dpsi2[k]/psi2[k]] Sin[k x + thetainfinity],

{k, 1/x, Infinity},

AccuracyGoal -> accuracy

] +

NIntegrate[

Exp[-t psi2[k]] (2 Sqrt[dpsi2[k]/psi2[k]]

Cos[k x + (theta[k] + thetainfinity)/2]

Sin[(theta[k] - thetainfinity)/2] - dpsi2[k] G0[k, x]),

{k, 1/x, Infinity},

AccuracyGoal -> accuracy

]

);

Here `thetainfinity`

is the constant . The above modification slows down the computation significantly, but on the other hand, it makes the algorithm numerically stable even for small `t`

.

**7.** When `psimin`

is nonzero, then the Laplace transform of vanishes exponentially, and so the computation is relatively fast. For example, for the relativistic case, the first call to `supremum[1, 1]`

takes nearly 40 seconds to compute, but thanks to memorization, each subsequent call to `supremum[t, 1]`

with `t`

close to requires only tenths of a second. However, any change of the `x`

argument in `supremum[t, x]`

, or a significant change of `t`

, requires another 40 seconds. Similar times are obtained for the variance gamma process; in this case, however, the computation time increases significantly as `t`

goes to zero.

In the -stable case, the Laplace transform of has power-type behavior at infinity, and also the singularities in `G0`

are of higher order. For these reasons, the computation of `supremum[1, 1]`

takes significantly more time, approximately 5 minuts.

Obviously, in the special cases considered above we could do better. For example, for the symmetric -stable process, we can simply let `theta[_] = (2 - alpha)/8 Pi`

. However, my goal was to study the general picture rather than focus on one particular case. Also, I only wrote about efficiency; it would be very interesting to compare the above algorithm with, say, Monte Carlo methods. Accuracy of the method described above is a completely different topic. I am rather convinced that it is possible to *prove* that the error is small in an appropriate sense. However, I know far too little about numerical methods to do that myself.

Let us recall the statement of the conjecture, which is now a theorem:

**Theorem** (Jacek Małecki, 2010)

Suppose that and are positive reals, and is a nonnegative, increasing, continuously differentiable function on . Suppose furthermore that is integrable at infinity. Then

(Compared to the original formulation, we rearranged the integral using the fact that .)

** **

*Proof.* The proof basically identifies the integrand with a jump of a holomorphic function along the branch cut on the negative half-axis, and then uses contour integration and Cauchy theorem to find the integral. With no loss of generality we may assume that . For simplicity, we assume that is unbounded, the argument for bounded is very similar. Below we prove (1) in three steps.

**1.** We begin with a brief study of the following auxiliary function:

Here means the principal branch of the complex logarithm, so that the definition of is well-formed for any and . Of course we need to assume that the integral converges, that is, that is integrable at infinity.

When and , then , and the limits of as and are and respectively. Furthermore, if and , we have

Here we use the relations and . Since , we have when . It follows that for any ,

(A function satisfying the above conditions (except perhaps ) is said to be a *complete Bernstein function*, see Appendix below.)

**2.** Let us find the boundary values of at . Thanks to the identity it is enough to consider the limit approached from the upper half-plane, denoted . For we have (with being the boundary limit of the principal branch of the logarithm approached from the lower half-plane)

Note that when . Therefore, with ,

The second integral on the right hand side of (2) is simply equal to , and therefore

Furthermore, we have the following result.

**Proposition**

For we have

*Proof.* The function is positive and harmonic in the upper complex half-plane, and as . Hence, by Poisson’s integral formula (see also Appendix below),

By combining (2), (3) and the above proposition, we obtain

Note that . Hence, the conjecture (1) states that

Substituting , we obtain the following equivalent form of (1):

**3.** This part is rather informal. For technical details, see the discussion below and Appendix. From the identity it follows that the integrand in (4) is, up to the factor , the jump of along the branch cut . By considering an appropriate family of contours, shown in the figure on the right, one can prove that:

The left hand side of (5) is equal to the left hand side of (4) multiplied by . By inspecting the definition of , one can show that as and as , and hence the right hand side of (5) is equal to . The proof of the conjecture is complete.

The last part of the proof is not rigorous. It can be made formal by a careful limiting procedure, accompanied by estimates of the integrand near the branch cut, and this is what Jacek primarily did. However, I prefer the following `soft’ argument: is a complete Bernstein function, and (4) is simply a property of complete Bernstein functions. Instead of giving references, let me explain this in detail right here.

**Definition**

A holomorphic function , , is a *complete Bernstein function* (CBF) if:

(a) for ;

(b) when ,

(c) when .

There are several alternative names for the class of complete Bernstein functions, one of them is *operator monotone function*s. This is because is a CBF if and only if we have whenever are linear operators satisfying . A Stieltjes function is a closely related concept. Despite its beauty and plenty of applications, the theory of complete Bernstein functions is not very widely known. There is an outstanding book by Rene Schilling, Renming Song and Zoran Vondraček covering this area (book’s website, publisher site, Google Books preview). I also like the exposition in the first volume of the Niels Jacob’s book on pseudo-differential operators and Markov processes (publisher site, Google Books preview).

Examples of CBFs include for , and . Furthermore, if and are CBFs, then (), (or, more generally, for ) and are again CBFs. Below we prove a fundamental representation theorem for CBFs. In fact this is usually taken as the definition of a complete Bernstein function, and a similar result has been proved by several authors in the early 20th century.

**Theorem**

Every complete Bernstein function has the following form:

where and is a nonnegative measure on , for which the function is integrable. Furthermore, the measure can be recovered as the (distributional) jump of the imaginary part of along , that is,

*Proof.* The imaginary part of a CBF is a nonnegative harmonic function in the upper half-plane. This is a very classical object (see, for example, the book by Sheldon Axler, Paul Bourdon and Wade Ramey, available online here). In particular, we have the representation theorem (Th. 7.26 in the book),

where and the nonnegative measure is the weak-* limit of absolutely contiunous measures as . A priori, can be an arbitrary measure for which is integrable. (Compare this result with the proof of Proposition above.)

Since is continuous on and for , the measure is concentrated on . We write where is concentrated on and . Formula (8) can be rewritten as

Since , we obtain

Two holomorphic functions have equal imaginary part if and only if their difference is a real constant. Hence, for some real ,

Note that the term is necessary, because we do not know a priori that is integrable with respect to . It remains to prove that this is indeed a case, that , and that .

When , clearly , which proves that . Furthermore, by monotone convergence,

The conjecture follows easily from the following simple consequence of (6) and (7): when , we have

We apply the above identity for . Since is bounded on , clearly . By (7), . Furthermore, the right-hand side of (9) is equal to . This proves (4).

]]>We agree that the motion of a particle should be described in terms of the wave function , possibly taking vector values. We are looking for something Lorentz-invariant, so a good guess is

In principle, the coefficients , may depend both on time and position. However, we first consider a *free particle*, that is, a particle which does not interact with any external force. The equation is therefore expected to be *isotropic* (invariant under translations and rotations of space), *autonomous* (invariant under translations of time) and *homogeneous* (invariant under multiplication of the unknown function by constants) — just as free Maxwell’s equations, that is, Maxwell’s equations with no electric charges and currents. (Homogeneity is not obvious at all here: one cannot ‘multiply’ particles by fractions, and we also have Pauli’s exclusion principle. On the other hand, light is quantized too, but this fact cannot be seen directly from Maxwell’s equations. Hence, a similar behavior is acceptable, or perhaps even expected, in our first approximation to quantum mechanics.)

The above considerations suggest that in equation (3.1) we should take , and should be a constant. Furthermore, should be nonnegative: otherwise, if at some initial time the wave function was constant (this idea can be localized, but consider a global constant here), it would either grow or decay exponentially with time, violating any reasonable energy conservation principle. This leads to the *Klein-Gordon equation*:

Here is a square-integrable function of for each , or a vector of such functions. Although in principle some regularity of is required for the derivatives of to be well-defined, we will see later in this post that (3.2) makes sense also in the more general context.

The choice of the constant is completely arbitrary here. However, it turns out that in (3.2) corresponds to the rest mass of a particle. If , we recover the potential formulation of free Maxwell’s equations (2.9′), describing the motion of massless photons; mathematically, this is just the classical wave equation. For general , a plain wave (moving with velocity and frequency ; this function is clearly not square integrable, but it is interesting as a basic building block of the Fourier transform) satisfies (3.2) if and only if . This is very similar to the relativistic relation between mass, momentum and energy:

better known in non-natural units formulation:

We will come back to this relation in the next post.

The Klein-Gordon equation (3.2) is fundamental for the relativistic quantum theory. It is believed that every relativistic quantum model describing a system without external interactions (a free system) is, in a sense, a special case of the Klein-Gordon equation; in particular, every solution of the potential formulation of free Maxwell’s equations satisfies (3.2), and the same is true for the solutions of the free Dirac equation, which will be introduced in the next post. For this reason, we briefly discuss some basic features of the Klein-Gordon equation.

In the Fourier (or momentum) space, (3.2) reads

For a fixed , this is an ordinary differential equation in time. The solution is given by a linear combination of two functions, . Hence, the general solution of the Klein-Gordon equation (3.2) is given by

where

The functions and will correspond to the spaces of positive and negative energies (and negative energy here is related to anti-matter). Observe that each of the evolution equations for and is the time-reversal of the other one. This gives the *charge-time symmetry* of quantum systems, our first approximation to the fundamental *charge-parity-time (CPT) symmetry* of physical laws. (Parity here corresponds to the orientation of space; for example, reflections are parity transformations.)

The components can be easily found from the initial conditions

Indeed, we have

We remark that for the free Maxwell’s equations, the four-potential is a real function, and therefore is conjugate to . In particular, the positive and negative energy components of a photon have equal ‘weight’.

Later in this post we rewrite formulas (3.5) and (3.6) in the integral form. In the zero mass case, the formula is explicit, but for massive particles we are not able to avoid using Fourier transform completely. Before, however, we make a short digression about the spectral theorem.

Recall that by we denote as a function of . By (3.5), is the image of under a *Fourier multiplier*: an operator which acts in the Fourier space as a multiplication operator. The *symbol* of this Fourier multiplier is . Since the Laplace operator is the Fourier multiplier with symbol , it seems reasonable to write

For a more general (say, bounded and measurable) function , the operator can be defined using the Fourier transform,

for smooth, rapidly decreasing functions , and then extended continuously to . Formula (4.1) corresponds to . The definition given in (4.2) is a particular case of a more general construction of a function of an operator, which requires the spectral theorem.

**Spectral Theorem**

If is a unitary or (possibly unbounded) self-adjoint operator (or, more generally, a normal operator) on a Hilbert space , then there is a corresponding *spectral measure* (aka *resolution of identity*): a family of orthogonal projectors for Borel sets , such that is the identity operator, and is a countably additive function of (a complex-valued measure) for any , and furthermore we have for all in the domain of and all .

The smallest closed set such that is the identity operator is the *spectrum* of , denoted (this is equivalent to the classical definition). And for any measurable function defined on , the operator is given by the identity

whenever and is in

the *domain* of . Note that if is a bounded function, then is a bounded operator, even if the original operator was unbounded. In particular, formulas (3.4) and (3.5) (or (4.1)) define the unique solution of the Klein-Gordon equation (3.2) for arbitrary square-integrable initial data and , with no further regularity assumptions. Clearly, we also have the uniqueness of the solution given the initial data and (see (3.6)), but may fail to be square-integrable.

We give (4.3) and (4.4) here more to fix the notation, a proper introduction to spectral theory of operators in Hilbert spaces would take too long (and there are many good textbooks covering this subject). Readers unfamiliar with spectral measures but not willing to spend too much time to learn about them, may find helpful the following two examples.

If there is a complete orthonormal set of eigenvectors of the operator , , then is simply the orthogonal projection on the subspace spanned by those for which , and

whenever the orthogonal series on the right is convergent. In particular, this is a rather standard construction in finite-dimensional spaces.

The second example is the Laplace operator , which actually motivated the above discussion. The spectral measure of is easily shown to be the Fourier multiplier with symbol (here and below is the indicator function, equal to if the argument belongs to and otherwise), that is,

This proves that formulas (4.2) and (4.3) indeed define the same operator.

We begin with rewriting (3.6) as

The operator is closely related to the *Bessel potential operator*, and for smooth we have

where is the modified Bessel function of the second kind. Since this result will not be used in the sequel, we omit the proof, which can be found, for example, in the book (or the article) *Theory of Bessel potentials* by N. Aronszajn and K. T. Smith. Instead, we derive a formula for the evolution of in the position representation.

The result is rather complicated, and we need another definition. By a *principal value integral*, denoted , we mean the limit of integrals with *symmetric* intervals around singularities removed. The limit is taken here as the length of the removed intervals tends to zero. For example, in the statement of the theorem, one should remove the interval and let .

**Theorem 5.1 (propagation of positive and negative energy solutions of the Klein-Gordon equation)**

Let denote the mean value of on the sphere , and let, as usual, be the derivative of with respect to . Suppose that is a smooth solution of the Klein-Gordon equation (3.2), and that are its positive and negative energy components. When , we have

For , there is an additional term,

where denotes the convolution in spatial variables, and is an function with Fourier transform

Before the proof, we note that the evolution of violates the *causality principle* in special relativity: the value of is expected to depend only on the values of for in the *light cone* , but in Theorem 5.1 this is not the case. It is quite clear when , and for the square-integrable term cannot compensate the highly singular kernel of the principal value integral; we omit the details. As a consequence, any physical solution of the Klein-Gordon equation must either comprise nonzero positive and negative parts, or present some additional symmetry, which makes possible a causal reformulation of (5.3) and (5.4). On the other hand, the evolution of agrees with the causality principle, as stated in the following result.

**Theorem 5.2 (propagation of general solutions of the Klein-Gordon equation)**

Let denote the mean value of on the sphere , and let be the derivative of with respect to . Suppose that is a smooth solution of the Klein-Gordon equation (3.2). When , we have

For , there are two additional terms,

where and are square-integrable functions vanishing outside the ball , and with Fourier transforms given by

In the proof, we need the following technical result from complex analysis.

**Lemma 5.3**

For any bounded function on which is smooth near , we have

and

The convergence is dominated by a constant depending only on , the supremum norm of on , and the supremum norm of , on a fixed neighborhood of .

The proof of Lemma 5.3 is given at the end of this section.

*Proof of Theorem 5.1*. According to (3.5), , where is the Fourier multiplier with symbol . Our goal is to find a more explicit description of .

For an symbol, the integral formula for the corresponding Fourier multiplier can be found using the convolution theorem. This method cannot be applied directly to , since its symbol is not in . However, when , we can use the convolution theorem for the operator with a square-integrable symbol . The theorem will then be proved by taking an appropriate limit.

The explicit formula can only be found for massless particles: we assume that . We find that for ,

where

Note that is a *radial* function, that is, it depends on only through its norm . For this reason it is convenient to compute first the (inverse) Fourier transform of the surface measure on the sphere with radius , centered at the origin. Symmetry, integration in spherical coordinates (rotated appropriately, so that the vector points upwards) and then substitution give

When is a radial function, then the function (), where is an arbitrary unit vector, is called the *profile* of , and it is sometimes denoted again by when this makes no confusion. Integration in spherical coordinates and Fubini’s theorem give

Therefore, the three-dimensional (inverse) Fourier transform of a radial function is again a radial function with profile equal to times the (inverse) Fourier sine transform of the profile of .

By the above observation,

An elementary calculation gives

If and converges to , then and converges to . Therefore, by Plancherel’s theorem and dominated convergence, converges in to . Furthermore, symmetry and integration in spherical coordinates gives

By Lemma 5.3, it follows that for all unit vectors , all smooth, rapidly decreasing functions , and all , we have

The last statement of Lemma 5.3 enables us to change the order of the integral and the limit, so that

Formula (5.3) for the function and is proved. The other cases follow by symmetry.

The case is now easy. Let denote the operator corresponding to mass . The Fourier symbol of , that is, , is a square-integrable function of . Formulas (5.4) and (5.5) follow by applying the convolution theorem, as described in the first part of the proof.

*Proof of Theorem 5.2*. We use the notation introduced in the proof of Theorem 5.1. Recall that under the assumption that is square-integrable for each , the solution is uniquely determined by and . Since the real and imaginary parts of a solution of the Klein-Gordon equation (3.2) are again solutions of (3.2), it suffices to consider real-valued solutions. As in the proof of Theorem 5.1, we first consider .

Suppose first that for all . By Theorem 5.1,

and is a solution of the Klein-Gordon equation (3.2). Furthermore, by (3.6), , and . Hence, the formula

defines a solution of (3.2) with given and with .

Suppose now that and that is real. Suppose for a moment that for all . Then is again a solution of the Klein-Gordon equation (3.2), and . Therefore, for ,

Since , integration in gives

and a similar formula for follows by symmetry. Either by a direct substitution to (3.2) or using an approximation argument, we conclude that (5.11) holds without the additional square-integrability assumption on . By combining two solutions given by (5.10) and (5.11), we obtain a solution for general initial data, and formula (5.6) follows by the uniqueness of the solution.

We can repeat the above argument when . In (5.10) we have an additional term , where . By (5.5) and the properties of the Fourier transform, we have

In a similar manner, in (5.11) we have an additional term , where

It remains to prove that when .

For , the holomorphic function has a branch cut along , but the boundary values of this function on approached from above and from below are opposite purely imaginary numbers. Since cosine is an even function, has a continuous extension to , and so it is an entire function of . By Morera’s theorem (or, more precisely, one of its corollaries), it follows that is an entire function. Furthermore, since and , we have .

The Fourier transform of is equal to . But , , is an entire function of three complex variables, and for . By a multivariate version of Paley-Wiener theorem (see a nice proof in the article by Y. Yang and T. Qian), we conclude that vanishes outside the ball , as desired.

The multivariate Paley-Wiener theorem is a rather advanced tool, and its proof is far beyond the scope of these notes. Although it would be difficult to avoid it completely in the proof of Theorem 5.2, we could have used only its one-dimensional version. Indeed, is a radial function, and so, by (5.9), its Fourier transform is expressed in terms of the one-dimensional Fourier sine transform of the profile of .

*Proof of Lemma 5.3*. Although we could simply refer to the Sokhotski-Weierstrass-Plemelj formulas, we give an explicit proof. First, we decompose into the sum of two parts, one vanishing in a neighborhood of , and the other smooth and vanishing outside a larger neighborhood of . The result for the first part is just dominated convergence. For the other part, we use methods of complex analysis.

We therefore assume that is a smooth function supported in a small neighborhood of . We extend to an even function on the real line, and define

Then is a holomorphic function in the half-plane . For any , we have

and

For the first integral on the right-hand side, we simply use dominated convergence. The other one is a Poisson integral, which converges to . (This is easy to prove directly, using just the continuity of at — an ‘approximate identity’ argument.) We conclude that

and the first statement of the lemma follows by a simple rearrangement. The proof of the other statement is very similar: again is split into two parts, and for smooth supported in a neighborhood of , we define

Integration by parts gives

the second equality is a consequence of . Hence, by the first part of the proof and the identity ,

Again integrating by parts (carefully: this is a principal value integral), we conclude that

as desired. Finally, by inspecting the above argument, one proves the last statement of the lemma; we omit the details.

]]>When a snowy weather comes back to Wrocław, I will upload some winter pictures as well.

]]>In order to properly understand the Dirac equation, one needs some background on the Lorentz transformation. In this post, we also discuss briefly some aspects of Maxwell’s equations, which will become important later when we couple the Dirac particle with electromagnetic field.

To keep the notes readable for mathematicians, it is important to keep the notation as simple and as consistent as possible. In quantum mechanics, the state of a particle (or a system of particles) at time is described completely by an element of a fixed Hilbert space with norm . (This definition is already not strictly Lorentz-invariant; we will discuss this later.) In the Dirac model, is the space of square-integrable -valued functions on . Measurable values in this theory correspond to self-adjoint, typically unbounded linear operators on , called *observables*. If, for example, and are the position and the momentum vectors of a classical particle, then we denote the corresponding observables by and .

Three-dimensional vectors (space vectors) are denoted by , etc. By , we denote the lenght of vectors , . Relativistic *four-vectors* are written in roman font, for example . For that reason, we sometimes write . Mathematically, a four-vector is just a four-dimensional vector. We use the name four-vector to emphasize that a natural class of transformations acting on four-vectors is the group of Lorentz transformations (see below), and not isometries. The set of four-vectors is called *space-time*.

The partial derivative operators are denoted by (). Furthermore, is the vector of spatial partial derivatives and is the spatial Laplace operator. The gradient of a function, divergence of a vector field, and its curl are denoted by , and . The time derivative is denoted by , or when four-vectors are discussed. Sometimes we also use a dot placed over a symbol, like in .

The evolution of the state is described by the wave function . We often drop the arguments from the notation when they are clear from the context. Furthermore, we often write for , a function of the spatial variable. Perhaps it is worth noting that the wave function typically does not satisfy the usual wave equation, but a Schrödinger one. As it was already mentioned, in the Dirac approach the function takes values in , so that .

Physicists often use fixed variable names for fixed spaces, or *representations*. The symbol corresponds to the position, and the wave function is given in the so-called *position* (or *standard*) *representation*. However, it is often more convenient to work with the (spatial) Fourier transform of , which corresponds to the so-called *momentum representation*, related to the momentum variable . Instead of writing the Fourier transform explicitly, it is customary to write simply for the Fourier transform of . This is just a very convenient short hand notation, but at first it may seem very informal. For that reason, we try to avoid it, and write instead. We use for the Fourier transform normalized to be an isometry,

The inverse Fourier transform is then given by the formula

Finally, we choose to use *natural units*. That is, we choose unit system in such a way that some chosen physical constants are equal to one, thanks to which we can drop them from the formulas. We will assume that the speed of light , the electric and magnetic constants and and the (reduced) Planck constant are all equal to one.

One of the main features of the Dirac equation is its Lorentz invariance: the Dirac equation has the same form in all inertial (in the sense of special relativity) frames of reference. It is therefore reasonable to start with a short introduction to the Lorentz transformation. And since its discovery was mostly motivated by Maxwell’s equations, we shall begin with a brief introduction to classical electromagnetism.

The evolution of the electric field and the magnetic field is described by the system of four Maxwell’s equations:

where is the density of electric charge and is the density of electric current. These two objects are ruled by a rather complicated mechanism which depends on what type of media is the space filled with: conducting or not, magnetic or not etc. Here we will consider and simply to be parameters, describing an external source of electromagnetic wave, and only note the general continuity equation:

Intuitively, (2.2) is a form of electric charge conservation law: it says that describes the flow of the electric charge .

When there is no electric charge and no current, all components of and satisfy the classical wave equation: and . For example, using the identity , we obtain

Maxwell’s equations agree perfectly with experimental data. However, they are not preserved by Galilean transformations. This is true even with the absence of electric charge and current, because Galilean transformations do not preserve the classical wave operator . It is not dificult to find linear transformation of coordinates preserving : the frame of reference moving with constant speed in the direction should be described by the Lorentz transformation (a *Lorentz boost*):

where is the *Lorentz factor*. It is an easy (but very instructive) exercise to verify explicitly that the transformation (2.3) indeed preserves the classical wave operator. The Lorentz boost corresponding to the frame of reference moving with arbitrary constant speed can be obtained from (2.3) by rotations, but the result is not very simple:

again with . Here is a matrix with entries , and is the ortogonal projection on the line containing . Lorentz boosts, rotations, translations and their compositions, form the group of Lorenz transformations of space-time. Since Lorenz transformation do not preserve time, they were first considered as a purely mathematical notion, and it was Albert Einstein who first considered them (in special relativity) to be the true physical description of inertial frames of reference.

There are two types of Lorentz invariance. Suppose that two inertial frames are related by a Lorentz boost (2.3). A path (here corresponds to the time variable ) in the primed frame is given by , with

and similar formula can be written for the derivatives of . Physicists say that four-vectors, such as and , transform in a *contravariant* way. On the other hand, a function (again is the time coordinate) in the primed frame is given by the formula , so that its derivatives are transformed in a *covariant* way:

This is a natural transformation for gradient-like operations. (The above discussion may seem completely trivial for most physicist. However, I always found this covariant and contravariant terminology quite confusing, so I hope such a lay explanation will help many mathematicians. At least, I needed it.)

It is perhaps surprising — it was for me — to note that Maxwell’s equations (2.2) (or, more precisely, the electric and magnetic fields) are not *strictly* invariant under the Lorentz transformation. The easiest way to see this is to consider a single stationary charge. Since it is at rest, it generates no current, and so the magnetic field is constant zero. On the other hand, in a different inertial frame, the charge is no longer stationary, the electric current is no longer zero, and so cannot vanish. Therefore, the magnetic field cannot be measured absolutely, without fixing an inertial frame of reference. There are, however, relatively simple (but different from (2.5) and (2.6)) transformation rules for and , which is no longer true when Galilean transformation are considered.

What is Lorentz-invariant (well, contravariant) is the potential. To introduce this notion, we need the Helmholtz theorem. Informally, it states that any vector field can be written in the form for some function (the s*calar potential*) and some vector field (the *vector potential*). The first summand in the decomposition has no curl (it is *irrotational*), the other one has zero divergence (it is *solenoidal*).

The main topic of these notes is the Dirac equation, which deals with square integrable functions. Therefore, we give the simplest, version of Helmholtz Theorem instead of the ‘continuous’ version typical in electrodynamics. This way we also introduce the notion of Sobolev spaces, a very important object in quantum mechanics. A function is said to belong to the Sobolev space if and partial derivatives of of order up to (defined in the distributional sense) are square integrable. Equivalently, if and only if is square integrable. A vector field is said to be square integrable etc., if so are all of its components.

**Helmholtz Theorem**

Suppose that is a square integrable vector field. Suppose furthermore that , and that is integrable. Then there exist a function and a vector field such that . Furthermore, and each component of are in the Sobolev space .

*Sketch of the proof*. By the assumptions, and are square integrable. Define the potentials using the Fourier transform, by the formulas

,

and verify the statements of the theorem.

In principle, the vector potential is not defined uniquely: the curl is not changed when is replaced by for any function . Choosing a particular vector potential is known as *gauge fixing*. When is the vector potential constructed in the proof of Helmholtz theorem, then . In this case we have . Hence, the function can be recovered from by the formula . This means that gauge fixing is equivalent to an arbitrary choice of the divergence of the vector potential.

We now come back to the electric and magnetic fields . Since , the magnetic field has zero scalar potential. Let be *a* vector potential of , the *magnetic potential*. The vector field has zero curl, and therefore its vector potential vanishes. Let be *the* scalar potential of , the *electric potential*. When fixing , we use the *Lorenz gauge* (note that Ludvig Lorenz and Hendrik Lorentz were two different physicists): we require that

Before we discuss why this condition can be satisfied, let us note that with the above definitions, we have

and Maxwell’s equations (after some manipulation) can be rewritten as

accompanied by the Lorenz gauge condition (2.7) and the continuity equation (2.2).

It was noted above that the classical wave operator is preserved by Lorentz transformations. This proves that the Maxwell’s equations (2.9) are Lorentz-invariant. On the other hand, (2.8) and (2.2) are not strictly Lorentz-invariant. In fact, these formulas say that and are vector fields on space-time and transform according to (2.5). For this is rather intuitive, and by (2.9), should transform in the same way. For this reason, it is sometimes convenient to define the electromagnetic *four-potential* by taking . Then (2.9) reduces to a single equation

where is the *four-current*, with . Furthermore, the relation (2.8) can be written in a more abstract form using the *electromagnetic tensor* (the word matrix may sound more familiar here, though):

Formula (2.8) says that , where and . Also (2.1) could be written in terms of , but it is no longer that elegant (see, for example, the Wikipedia article).

It remains to explain why the Lorentz gauge condition is in fact a gauge condition. The general magnetic and electric potentials, without fixing any gauge, can be described as follows. We start with the magnetic potential with divergence zero and the corresponding electric potential . (This corresponds to the *Coulomb gauge*.) In general, we have , where is an arbitrary, sufficiently smooth function. It follows that . The Lorenz gauge condition can be rewritten as , which transforms to an ordinary differential equation in the Fourier space, . General theory gives existence of a solution. Note that, however, is not defined uniquely: all Lorenz gauge functions differ from each other by a solution of the classical wave equation . Therefore, the pair of electric and magnetic potentials is defined uniquely up to the *four-gradient* of a solution of the classical wave equation: changing and to and does not affect (2.7)–(2.9).

One thing should be pointed out here: we do not discuss any regularity properties (smoothness, square integrability etc.) of the solutions of Maxwell’s equations. This issue will be partially addressed later, when we will couple a wave function with electromagnetic field. However, we will usually assume that the potentials are smooth enough and consider them as parameters of the environment.

]]>This post is not about the hypergeometric function, it is enough to know that it accepts four arguments, usually typed as . For some parameters , , it reduces to elementary functions, for example . Of course, Wolfram Research products are aware of that (link). A sligth modification of the parameters of the hypergeometric function does not change the value of the function significantly, it is smooth in all four arguments. But according to Wolfram Alpha, is not even continuous! Click on an image to see the plot at the Wolfram Alpha site.

There is clearly something wrong with the way Wolfram Alpha (and its kernel, Mathematica) computes . This issue is related to finite precision calculations: forcing Mathematica to work with higher precision helps a lot, but it does not resolve the problem completely. Numerical instability is nothing spectacular, it is a common issue in approximate computations. Here, however, the error cannot be explained simply by accumulation of numerical errors, the result is just completely wrong: zero instead of somethign close to 0.8. This kind of bug is quite unexpected in such an advanced application.

Let us play around this bug for a while. Wolfram Alpha asked about , answers correctly (link). The numerical approximation (N[…]) is “result: 0.81072238…”, which is fine; the next line, however, reads “number name: zero” (link). Things get even worse if we request five digits only: “decimal approximation: 0.810722, result: 0.0” (link).

I was in fact quite lucky to find this issue: there are not too many parameters for which is not computed accurately. In the contour plot of they are all concentrated near the line , when is close to . And this is exactly the case I was interested in.

Nearly a week ago, I submitted this bug to Wolfram Support Center. Unfortunately, apart from an automated confirmation letter (claiming that they typically answer within three business days), I have received no reply.

The above bug might be caused by Mathematica using a wrong hypergeometric identity (perhaps this one?) in numerical evaluation. If this is the case, it will be rather easy to fix. Until then, one can simply use another hypergeometric relation (like this one) explicitly. For an example, see the last two (theoretically identical) plots on the right.

Let me end this post with a few words about what I was trying to do when I discovered this bug. Unfortunately, this will be a little bit technical for those not used to non-local operators and jump processes.

Let be an open set in . Consider a Brownian motion (with variance ) starting at a fixed point , and let denote the (random) first time when hits the complement of . The mean amount of time that spent in a Borel set before can be expressed as for some function . It can be proved that is the Green function for , the Laplace operator in with zero Dirichlet boundary conditions. In fact, this is probably the easiest way to construct the Green function for a general open set.

Replace in the above paragraph the Brownian motion with the isotropic -stable Lévy process (here ), and will become the Green function for the fractional power of the Laplace operator , with zero exterior condition (this is no longer a local operator, so instead of *boundary* conditions, one must give *exterior* condition). This operator is usually denoted as , and it is important to note that this is something else than . Fractional powers of are inverse operators to Riesz potentials, and they are important examples of pseudo-differential operators. It is not my point to give a detailed introduction to probabilistic potential theory and Riesz potential theory herem and there is much literature on these topics, inlcuding classical textbooks by Landkoff, Bliedtner and Hansen, and Blumenthal and Getoor.

If is a ball, then the formula for the Green function for is relatively easy to find using Kelvin transform. Suprisingly, similar method works for , due to the calculation of M. Riesz in 1938, in this case the formula for the Green function for a ball is:

While I was working on numerical bounds for the eigenvalues of for (that is, a one-dimensional ball), I needed to compute an array of values of . Numerical approximation to the integral worked fine, but it was very slow. For that reason, I tried the formula with the hypergeometric function. It worked much faster, but the results were different (and plainly wrong). While I was checking the code, I plotted the graph of for . The correct picture should be like the one below (prepared using a better hypergeometric identity).

Instead, Mathematica produced the following image.

The erroneous plot can be reproduced in lower resolution using Wolfram Alpha (link).

I will update this post as soon as Wolfram contacts me.

*Update, Mar 11, 2011:*

I have just received a copy of Mathematica 8 and istalled it on my computer. The first thing I checked was, clearly, whether Wolfram fixed the hypergeometric bug. And they did! A huge surprise to me, as they did not respond to any of my two bug submissions. And the bug is still present on Wolfram Alpha! Does Wolfram Alpha run an older version of Mathematica then?

*Update, Mar 23, 2011:*

I have just received a kind email from Wolfram Research confirming that the issue has been resolved in Mathematica 8.

]]>The Dirac equation is a first-order differential equation for a -valued wavefunction , which describes the evolution of the state of an electron. It is famous for a nice (but not perfect!) description of the hydrogen atom, and the prediction of existence of positrons, or, more generally, antimatter.

With no interaction with external fields, the Dirac equation is believed to be fully correct. However, for many-particle systems, like the hydrogen atom, it is in a sense an intermediate step between classical quantum mechanics and quantum field theory: it is Lorentz invariant (hence *relativistic*) and describes a spin-½ particle in an electromagnetic field, but it fails to catch the influence of (for example) the spin of the proton. Therefore, the Dirac equation explains most of the fine structure of the hydrogen atom spectrum, but it says nothing about its hyperfine structure. (Honestly, these Wikipedia articles did not explain *me* much. I found this blog post much more informative for such a greenhorn.)

The Dirac equation is related to the Klein-Gordon equation and the Klein-Gordon square-root operator . In my recent work I study operators of this kind, in the context of subordinate Brownian motion. I hope that my results may have something to do with the motion of an electron in the presence of an infinite potential wall, but I know too little about quantum physics to state this formally. My basic motivation to study the Dirac equation is to find a physical application of my mathematical work. But even if this fails, I will enjoy the seminar — in fact, I always dreamed to be a physicist.

]]>Suppose that x and y are positive reals, and is an increasing function. Then, whenever the integral

makes sense, it is equal to .

Although the statement is quite elementary, I fail to find any elementary proof. Even for simple , except and perhaps , the conjecture seems to be highly non-trivial.

I showed this conjecture to several people. Let me cite here two comments: ‘*It is the worst formula for I have ever seen!*’, and ‘*Come on, this must be either elementary or false*’.

The conjecture originated in my recent preprint, while I was studying spectral theory of the transition semigroup of a subordinate Brownian motion killed at the time of first exit from the half-line. Every subordinate Brownian motion corresponds to a Bernstein function . For the theory developed in the article, I need to be a *complete* Bernstein function and satisfy the above conjecture.

In my preprint I prove the conjecture for , where , and a class of complete Bernstein functions, including for example and for . Unfortunately, the argument is rather involved. As far as I know, the problem is open for all other functions, even for with .

One can easily verify the conjecture numerically for various . Try playing around with the following *Mathematica* code:

psi[t_] = t^2 + Exp[-t] + Sin[Sqrt[t]]; dpsi[t_] = D[psi[t], t]; x = 3; y = 2; 2/Pi NIntegrate[ Hold[ t^2 (x + y)/(x^2 + t^2)/(y^2 + t^2) Exp[1/Pi NIntegrate[ (x/(x^2 + s^2) + y/(y^2 + s^2)) Log[dpsi[t^2] (t^2 - s^2)/(psi[t^2] - psi[s^2])], {s, 0, Infinity} ] ] ], {t, 0, Infinity}, WorkingPrecision -> 50]

I used this code to convince myself that the conjecture is true for all complete Bernstein functions. In fact I tried some other functions (that is, not complete Bernstein ones) only to check if the code works correctly. I was quite surprised to see that it works in the general case. Or perhaps I was just not smart enough to find a good counterexample?

]]>Every integer Heronian triangle is congruent to a lattice triangle.

A triangle is said to be an *integer Heronian triangle*, if it has integer side lengths and integer area. A triangle is said to be a *lattice triangle*, if all its vertices have integer coordinates.

Not much is known about this conjecture. Clearly:

- An area of any lattice triangle is an integer multiple of ½. If in addition it has integer side lengths, then its area is integer. This follows from Heron’s formula.
- Any counterexample must have a side with length greater than 1000. This was checked by enumerating all small Heronian triangles and verifying the conjecture directly.
- Noteworthy, there are integer Heronian triangles which are not congruent to a lattice triangle with one side parallel to a coordinate axis, like in the following example (side lengths 5, 29, 30):

The problem was first posed on a Polish math Usenet group pl.sci.matematyka in 2004.

]]>