05 October 2014

Nitpicking Distance

I recently wrote down what I thought is a completely standard definition of distance on graphs. I was asked if I'm sure that the definition is well-behaved. Well … let's see.

Let's define a distance on a directed graph: To go from $x$ to $z$, we first go to $y$, and then take one more step. $$ d(x,z) = 1 + \min \{\,d(x,y)\mid y \to z \,\} $$ Question: Is this definition good? Suppose the digraph is $0 \leftrightarrow 1$. According to the equation above, $$\begin{align} &\bbox[2px,border:2px solid red,pink]{d(0,1)} \\ &= 1 + \min \{ d(0,0) \} &&\text{by decomposing $0\leadsto 1$ as $0\leadsto 0\to 1$} \\ &= 1 + d(0,0) \\ &= 1 + 1 + \min \{ d(0,1) \} &&\text{by decomposing $0\leadsto 0$ as $0\leadsto 1\to 0$} \\ &= \bbox[2px,border:2px solid red,pink]{2 + d(0,1)} \end{align}$$ Something is fishy. There are in fact at least two problems with the definition. One problem is illustrated above: It leads to weird consequences when applied for $x=z$. Another problem: What happens when the minimum is taken over the empty set? This second problem leads to a third: What is the type of the function $d$? Let's say $$ d : V \times V \to \mathbb{N} \cup \{\infty\} $$ Now $\infty$ was promoted to a proper distance value. We can agree that $\min\emptyset=\infty$, by convention. Also, we can solve the equation $d(0,1)=2+d(0,1)$: The only solution in $\mathbb{N}\cup\{\infty\}$ is $\infty$. It's nice that we can solve that equation, but we probably don't want $d(0,1)$ to be $\infty$. It's time to fix the definition by adding a special case for $x=z$. $$\begin{align} d(x,z) = \begin{cases} 0 &\text{if $x=z$} \\ 1 + \min\{\,d(x,y)\mid y\to z\,\} &\text{otherwise} \end{cases} \end{align}$$

Question: How can we tell for sure that $d$ is uniquely determined by the equation above? This question really has two parts:

  1. Is there always a solution?
  2. Is there at most one solution?

The first subquestion is suitably addressed by a sledgehammer that goes by the name of the Knaster–Tarski theorem. In the limited form that we need here, it says that an order-preserving function on a complete lattice has a fixed-point. In our case, the lattice is the set $L\;=\;V \times V \to \mathbb{N}\cup\{\infty\}$. The partial order on the lattice is defined pointwise: $$ d \le e \quad\stackrel\Delta=\quad \bigl(\forall x \forall y\; d(x,y) \le e(x,y)\bigr)$$ The order-preserving function is $D:L\to L$, defined by $$ D(d)(x,z) = \begin{cases} 0 &\text{if $x=z$} \\ 1 + \min\{\,d(x,y) \mid y\to z\,\} &\text{otherwise} \end{cases} $$ Let's check that it is order-preserving. We pick $d,e : L$ such that $d\le e$, and we'd like to show that $D(d)\le D(e)$. The case $x=z$ reduces to showing $0\le 0$. The case $x\ne z$ reduces to showing that $$ \min\{\,d(x,y)\mid y\to z\,\} \le \min\{\,e(x,y)\mid y\to z\,\}$$ which follows from the assumption $d\le e$. At this point, we know that the conditions of the Knaster–Tarski theorem are fulfilled, so $D$ has at least one fixed-point. In other words, there is at least one function $d$ that satisfies our recursive equation, no matter on which graph we apply it. (This is true even for infinite graphs.)

Now on to the second subquestion: Is there at most one function in $L$ that satisfies the recursive equation? Let's assume that $D(d)=d$ and $D(e)=e$. By way of contradiction, assume also that $d\ne e$; that is, $d(x,z)\ne e(x,z)$ for some $(x,z)\in V\times V$. Since $d$ and $e$ are fixed-points of $D$, the second assumption is equivalent to $$ D(d)(x,z) \ne D(e)(x,z)$$ If $x=z$, then $D(d)(x,z)=0=D(e)(x,z)$, which is a contradiction. If $x\ne z$, then $$\min\{\,d(x,y)\mid y\to z\,\} \ne \min\{\,e(x,y)\mid y\to z\,\}$$ Without loss of generality, let's assume that $$\min\{\,d(x,y)\mid y\to z\,\} \lt \min\{\,e(x,y)\mid y\to z\,\}$$ We fix $y$ to make the left hand side minimum, and now we have $$d(x,y) \lt e(x,y) \qquad d(x,y) \lt d(x,z)$$ (Exercise: How do we know that $d(x,y)\lt d(x,z)$? Why not $d(x,y)=d(x,z)=\infty$?) To recap, if $d\ne e$, then it must be that $d(x,z)\lt e(x,z)$ (or, symmetrically, $d(x,z)\gt e(x,z)$), and then in turn it must be that $d(x,y)\lt e(x,y)$ and $d(x,y)\lt d(x,z)$ for some $y$. But, this process can't be repeated forever, because $\mathbb{N}$ has no infinite descending chains. Done. Again, this also works for infinite graphs.

You may expect that $d(x,y)=\infty$ means that $x$ and $y$ are disconnected. This is not the case. Let $V=\mathbb{N}\to\{0,1\}$, and say there are arcs $x\leftrightarrow y$ when $x$ and $y$ differ in exactly one place. In this graph you can get from any sequence to any other sequence, but you may need to walk forever to do so. (A very similar example would be to take $V=\mathbb{R}$ and define arcs in terms of decimal representations of real numbers.)

Believe it or not, after all this work, we're not done yet. To be allowed to use the word ‘distance’ with a straight face, we should check some axioms. Here's what Wikipedia says: $$\begin{align} &d : V\times V \to \mathbb{R} \\ &d(x,y)=0 \iff x = y \\ &d(x,y)=d(y,x) \\ &d(x,z)\le d(x,y)+d(y,z) \end{align}$$ I suppose we're in trouble, because $\mathbb{N}\cup\{\infty\}$ is not a subset of $\mathbb{R}$. Also, if the graph is directed then the sort-of distance we defined isn't symmetric. Oh, well.

The other axioms should hold. For finite connected undirected graphs we do have a metric. For finite directed strongly connected graphs we have a quasimetric, although Wikipedia says this terminology is not completely standard. (Is it?) I wouldn't lose sleep about calling it a distance for other graphs, though.

PS: Feel free to nitpick, obviously.

No comments:

Post a Comment

Note: (1) You need to have third-party cookies enabled in order to comment on Blogger. (2) Better to copy your comment before hitting publish/preview. Blogger sometimes eats comments on the first try, but the second works. Crazy Blogger.