12 July 2014

Minimizing Weighted Automata via Linear Algebra

In this post I present the minimization problem for weighted automata, closely following a paper by Stefan Kiefer and Björn Wachter. I do add lots of steps that aren't necessary. To me, they make the result look less like magic. I really like the mixture of automata and linear algebra.

Definition. Finite automata describe functions Σ2Σ2. Weighted automata describe functions ΣRΣR. I will only describe weighted automata, because they are more general and less known. The configuration of a weighted automaton with nn states is a vector with nn components. Processing one letter amounts to multiplying this vector by a matrix. For example, the configuration after processing the word hello is xTAhAeAlAlAoxTAhAeAlAlAo. Here xx is the initial configuration. (When I write x,y,z,x,y,z, I mean column vectors. But, it is more convenient to see the configuration of the automaton as a row vector. Otherwise, you'd have to reverse words, as in AoAlAlAeAhxAoAlAlAeAhx.) Each configuration of the automaton determines a real number: just take the scalar product between the configuration and a fixed vector yy. Thus, the word hello is mapped to the number xTAhAeAlAlAoyxTAhAeAlAlAoy. To summarize, a weighted automaton of size nn has:

  • an initial configuration xRnxRn,
  • a square matrix AαRn×nAαRn×n for each letter αΣαΣ, and
  • accepting weights, grouped in a vector yRnyRn.

Example. To define the language of strings of odd length we could take x=[1000]Aα=[0100001000011000]y=[0101] This is a deterministic finite automaton with 4 states, out of which 2 are accepting. ∎

Change of base. The automaton is represented by vectors x, y and matrices Aα. In turn, vectors and matrices are represented as collections of real numbers. These numbers implicitly depend on the choice of a basis. This observation gives us a simple way to design automata equivalent to a given one. Suppose we are given x, y, and Aα. We pick some invertible matrix C and set x=Cxy=CyAα=CAαC1 This is a very standard thing to do in vector spaces. It is easy to see that the primed automaton computes exactly the same weight. For example, the weight of the word abc is =xTAaAbAcy=(Cx)T(CAaC1)(CAbC1)(CAcC1)(Cy)=xT(CTC)Aa(C1C)Ab(C1C)Ac(C1C)y=xT(CTC)AaAbAcy Hmmm … not quite the same: the matrix C must also come from an orthonormal basis; that is, CTC=In. But this is doable. And, of course, the same calculation works for any word, not just abc. Two automata that describe the same function ΣR are said to be equivalent.

Example. Continuing the previous example, we could pick C=Aα. It is invertible and CTC=I4. The resulting automaton is the following: x=[0001]Aα=[0100001000011000]y=[1010] This example is perhaps too simple: It amounts to renaming states. One could also pick a more complicated C, though. ∎

Changing the number of states. The calculation for the change of base suggests that what we really want is this: CTC=Inx=Cxy=CyAα=CAαCT We don't need to insist on C being invertible, and therefore square! Now n, the size of the primed automaton, may be different from n. Well, it must be that nn: The matrix C has dimension n×n, and CTC=In says that the columns of C form an orthonormal basis for an n-dimensional subspace of Rn. In other words, we have a very simple recipe for increasing the number of states. For any nn it's easy to find some n-dimensional subspace of Rn, then choose an orthonormal basis for it, call it C, and then compute Cx, Cy, and CAαCT.

Reducing the number of states. Of course, increasing the number of states seems like a rather dumb thing to do. So, the next thing that comes to mind is to ask when can the process from above be reversed? That is, if we are given x, y, Aα, can we compute x, y, Aα? These are necessary conditions: x=CTxy=CTyAα=CTAαC (From Cx=x we get CTCx=CTx and finaly x=CTx. The others are similar.) These conditions are not sufficient. For example, x=Cx does not follow from x=CTx, unless x is in the image of C. If it is, then there must be a z such that x=Cz and it must be that z=x: x=CTx=CTCz=z The same argument works for y. So, we know that we probably want to pick C such that its image includes the given x and y.

But what about Aα? If we'd follow the same line of reasoning, we'd want to find some sufficient condition such that Aα=CTAαC implies Aα=CAαCT. So, let's try it. We are given Aα and, optimistically, we compute Aα as CTAαC. Then we check if we got the right result by plugging into the sufficient condition Aα=CAαCT: Aα=CCTAαCCT These matrices are the same if they affect in the same way all vectors z: Aαz=CCTAαCCTz And this provides a strong hint as to what are the sufficient conditions we are looking for. The following (optimistic) calculation almost suggests itself: =CCTAαCCTz={ASSUME z=Cz for some z}=CCTAαCCTCz={because CTC=In}=CCTAαCz={because z=Cz}=CCTAαz={denote Aαz by u}=CCTu={ASSUME u=Cu for some u}=CCTCu={because CTC=In}=Cu={because Cu=u=Aαz}=Aαz So, the equality almost works, except we had to make two new assumptions. We now know that Aα and CCTAαCCT affect a vector z in the same way provided that: (1) z is in the image of C, and (2) Aαz is in the image of C. This looks very promising.

Let's recap the setting. We are given x, y, Aα. We pick some C such that CTC=In. We compute x=CTx and y=CTy and Aα=CTAαC. We hope that if the image of C includes all the right things, then we have an equivalent automaton. Let's look at the weight associated to the word abc. =xTAaAbAcy={we computed xyAα as x=CTx  }=(CTx)T(CTAaC)(CTAbC)(CTAcC)(CTy)={associativity}=xT(CCT)Aa(CCT)Ab(CCT)Ac(CCT)y={ASSUME y and Acy are in the image of C}=xT(CCT)Aa(CCT)AbAcy={ASSUME AbAcy and AaAbAcy are in the image of C}=xTAaAbAcy Here we need to assume that the image of C contains y, Acy, AbAcy, AaAbAcy. We did not need to assume that the image of C contains x. In other words, we set out to revert the procedure for growing automata, but found something more general.

The general case should now be clear, so let's summarize it. If the columns of C form an orthonormal basis that covers Aα1Aαny, then x, y, Aα computed as above give indeed an equivalent automaton. Let's call this the backward minimization procedure. By symmetry, there is also a forward minimization procedure. If the columns of C form an orthonormal basis that covers ATα1ATαnx, then x, y, Aα computed as above give indeed an equivalent automaton.

Example. Let's minimize our (first) example automaton.

The forward minimization procedure doesn't help much. If we start with x=[1,0,0,0]T and keep applying Aα, then we can also reach [0,1,0,0]T and [0,0,1,0]T and [0,0,0,1]T. The space that includes all these vectors is clearly 4-dimensional.

What about the backward minimization procedure? If we start with y=[0,1,0,1]T and keep applying ATα we'll generate only one more vector, namely [1,0,1,0]T. We can cover these two vectors with a 2-dimensional orthonormal basis. C=12[01100110] The minimized automaton will have x=[0,1/22]T and y=[2,0]T and Aα=[0110]. It is not too hard to check that these indeed compute weight 1 for words of odd length, and weight 0 for the others. ∎

Reference. The paper I used is [Kiefer, Wachter, Stability and Complexity of Minimizing Probabilistic Automata, 2014]. If you read the paper, you'll find

  • that applying the two minimization procedures (in any order) yields a minimal automaton, and
  • that the algorithm is similar to Brzozowski's algorithm.

As the title of the paper suggests, many other topics are covered.

1 comment:

rgrig said...

There has been some discussion on G+: here.

Post a Comment

Note: (1) You need to have third-party cookies enabled in order to comment on Blogger. (2) Better to copy your comment before hitting publish/preview. Blogger sometimes eats comments on the first try, but the second works. Crazy Blogger.