Skip to main content

Chapter 10 Machine Minimization

When writing a computer program or mathematical proof, the first goal is to create one that works. The second goal is to make it as simple as possible. The simpler a program or proof is, the easier it is to be certain that it really does work. The simpler it is, the easier it is to maintain and modify.

The finite state acceptor model for computing machines gives us a context in which this simplification can be fully realized. In this chapter we will see how, given a finite state acceptor, we can construct a new finite state acceptor of minimal complexity that recognizes exactly the same language. This work will lead to a very useful test to tell whether or not a given language is recognized by some FSA.

In general it is difficult to measure the complexity of a computer program or mathematical proof. A short program or proof can utilize complex logic while a long program or proof can be straightforward and even repetitive. In the case of a finite state acceptor \(M = \langle \Sigma, Q,s,\delta,Y\rangle\text{,}\) the number of internal states, \(|Q|\text{,}\) provides us with a useful measure of the complexity of \(M\text{.}\) Our goal is to find a method to generate from an FSA \(M\) recognizing a language \(L\text{,}\) a new FSA \(M'\) such that

  1. \(M'\) also recognizes \(L\) and

  2. no other FSA that recognizes \(L\) has fewer states than \(M'\text{.}\)

Such an FSA \(M'\) is called a minimal FSA. To see an example of a minimal FSA, let \(L_0\) be the language recognized by the FSA \(M_0\) in Figure 10.1.

Figure 10.1. A Finite State Acceptor, \(M_0\)
Problem 10.2.

\(M_0\) has two accepting states, \(s\) and \(v\text{.}\) Find a regular expression for the language \(L_0\text{.}\)

Problem 10.3.

Construct an FSA \(M_0'\) that also recognizes \(L_0\) but has only 4 states, one of which will be a dead state.

Problem 10.4.

Let \(N\) be any FSA that recognizes \(L_0\) and let \(s'\) be its start state. Explain why \(s'\text{,}\) \(\delta(s',0)\text{,}\) \(\delta(s',1)\) and \(\Delta(s',10)=\delta(\delta(s',1),0)\) must all be distinct states.

You have just shown that any FSA \(N\) that recognizes \(L_0\) has at least as many states as \(M_0'\) that you constructed in Problem 10.3. Thus your FSA \(M'_0\) is a minimal FSA for \(M_0\) in Figure 10.1. Our goal is to find a general procedure that will produce, from any FSA \(M\text{,}\) a minimal FSA \(M'\) that recognizes the same language as \(M\text{.}\) Let \(\Sigma\) be any alphabet.

Definition 10.5.

Given strings \(u,v\in \Sigma^*\text{,}\) the concatenation of \(u\) and \(v\) is the string \(w = uv\) formed by tacking \(v\) onto the end of \(u\text{.}\) Note that \(\lambda\) is an identity relative to the binary operation of concatenation, as we have \(\lambda w = w = w\lambda\) for all \(w\in \Sigma^*\text{.}\)

Definition 10.6.

Let \(L \subseteq \Sigma^*\) be a language and \(w \in \Sigma^*\text{.}\) We define the right-set for \(w\) to be \(R_w = \{z\in \Sigma^* \mid wz\in L\}\text{.}\)

Problem 10.7.

Using the regular expression you found in Problem 10.2 for \(L_0\text{,}\) find a regular expression for each of these right sets.

\begin{equation*} R_0 \quad R_{001} \quad R_{00110} \quad R_{0011011} \quad R_{10} \quad R_{11} \quad R_{110000111} \quad R_{0011} \end{equation*}
Definition 10.8.

Let \(L \subseteq \Sigma^*\) be a language. If \(x,y \in \Sigma^*\text{,}\) then we write \(x\equiv_L y\) if \(R_x = R_y\text{.}\)

Problem 10.9.

Show that if \(L \subseteq \Sigma^*\) is any language, then \(\equiv_L\) is an equivalence relation on \(\Sigma^*\text{.}\)

Note that no string in \(L\) begins with \(w\) if and only if \(R_w = \emptyset\text{.}\) Thus the set of all strings that are not a prefix of any member of \(L\) is an \(\equiv_L\)-class. We denote this \(\equiv_L\)-class by \(Z(L)\text{,}\) the set of strings whose right set is empty.

Problem 10.10.

Referring to Problem 10.7, tell which of the strings 0, 001, 00110, 0011011, 10, 11, 110000111, 0011 are \(\equiv_{L_0}\)-equivalent.

Problem 10.11.

Let \(T = \{\lambda, 0,1,01\}\text{.}\)

  1. Show that no two strings in \(T\) are \(\equiv_{L_0}\)-equivalent.

  2. Explain why every string \(w\in \{0,1\}^*\) is \(\equiv_{L_0}\)-equivalent to some string in \(T\text{.}\)

If \(w\in \{0,1\}^*\text{,}\) let \([w]\) denote the \(\equiv_0-class\) of~\(w\text{.}\) Problem 10.11 shows that there are exactly four different \(\equiv_{L_0}\)--classes:

\begin{equation*} [\lambda] = L_0 \quad [0] = L_00 \quad [1] = L_01 \quad [01] = \Sigma^*\sim(L_0\cup L_00\cup L_01) = Z(L_0). \end{equation*}

The arrows in Figure 10.12 show where we go if we extend a string in any one of these classes by either 0 or~1.

Figure 10.12. \(M_0''\text{.}\) A Minimal Finite State Acceptor

Since \(\lambda\in L_0\text{,}\) we can take any string \(w\in \{0,1\}^*\) and find which \(\equiv_{L_0}\)-class \(w\) is in by tracing the diagram, building \(w\) by starting with \(\lambda\) in \(L_0\) and successively adding 0s or 1s on the right until we have \(w\text{.}\) Our ending point will be the \(\equiv_{L_0}\)–class of \(w\text{.}\) If we land in \(L_0\text{,}\) then \(w\in L_0\text{;}\) otherwise \(w \notin L_0\text{.}\) What we have done is to find a systematic way to build an FSA \(M_0''\)for the language \(L_0\) using the \(\equiv_{L_0}\)-classes as internal states and taking \(L_0\) as the only accepting state (see Figure 10.12). In fact, according to Problem 10.4, this is a minimal FSA to recognize \(L_0\text{!}\)

We would like to mimic this construction, starting with an arbitrary language \(L \subseteq \Sigma^*\) to see if we can construct a FSA for \(L\) with a minimum number of states. In order to do this, it will be helpful to start with a broader setting in which we make no restrictions on the number of states in \(Q\text{.}\)

Definition 10.13.

A General State Acceptor (abbreviated as GSA) is a quintuple \(G = \langle \Sigma, Q, s, \delta, Y \rangle\) defined exactly as we defined a FSA but without the restriction that the set \(Q\) of states must be finite.

While it is not possible to build a GSA with infinitely many states, it will be helpful here to look first for a GSA for \(L\) and then see when we can find one with only finitely many states. Following the construction of \(M_0''\text{,}\) we define

\begin{equation*} G_L = \langle \Sigma, Q_L, s_L, \delta_L, Y_L \rangle \end{equation*}

where

\begin{equation*} Q_L = \{[w] \mid w\in \Sigma^*\}, s_L = [\lambda],Y_L = \{[w] \mid w \in L\}, \text{ and } \delta_L([w],a) = [wa]. \end{equation*}
Problem 10.14.

Let \(\Sigma\) be an alphabet and let \(L \subseteq \Sigma^*\) be a language. Then, for all \(u, v \in \Sigma^*\) and \(a\in \Sigma\text{,}\)

  1. \([u] \in Y_L\) if and only if \(u\in L\text{,}\)

  2. \([u] = [v]\) implies \([ua] = [va]\text{.}\)

Part 2 of Problem 10.14 implies that \(\delta_L\) is well defined.

Recall that \(\Delta_L\) is the extension of \(\delta_L\) to all of \(Q_L \times \Sigma^*\text{.}\)

Problem 10.15.

Use mathematical induction on the length of \(v\) to show that, for all \(u,v \in \Sigma^*\text{,}\)

\begin{equation*} \Delta_L([u],v) = [uv]. \end{equation*}

This shows that every language \(L \subseteq \Sigma^*\) is recognized by some GSA. The GSA \(G_L\) is called the standard \(GSA\) for \(L\text{.}\)

Example 10.17.

Let \(\Sigma = \{a,b\}\) and construct the GSA \(G_{L_{10}}\) for the language \(L_{10} = \{a^nb^n \mid n = 1,2,3,\dots\}\) of Example 10.17 in the following steps.

  1. Show that the \(\equiv_{L_{10}}\)-classes \([b], [\lambda], [a], [a^2], [a^3], [a^4], \dots\) are all different, and that consequently \(Q_{L_{10}}\) is infinite.

  2. Show that every string is \(\equiv_{L_{10}}\)-equivalent to one of the strings in the list \(b, \lambda, a, a^2, a^3, a^4, \dots\text{.}\) Thus \(Q_{L_{10}} = \{[b], [\lambda], [a], [a^2], [a^3], [a^4], \dots\}\) is the set of all states of \(G_{L_{10}}\text{.}\)

  3. Figure 10.18 gives a skeleton of the transition diagram for \(G_{L_{10}}\text{.}\) Complete this diagram by assigning one of the states of \(Q_{L_{10}}\) to each of its states, identifying the start state and the accepting states.

  4. Explain from your diagram why \(G_{L_{10}}\) accepts exactly the strings in \(L_{10}\text{.}\)

Figure 10.18. Transition diagram for a GSA with infinitely many states.

Example 10.17 shows how GSAs with infinitely many states can naturally arise from simply defined languages. But our real interest is in FSAs, which might actually be built. Do you think that there could be an FSA that recognizes \(L_{10}\text{?}\) We can now exhibit a special feature of standard GSAs which at last proves for certain that there is not.

Problem 10.19.

Show that if \(L\) is a language, then every GSA that recognizes \(L\) has at least as many states as the standard GSA for \(L\text{.}\) Hint Let \(M = \langle \Sigma, Q, s, \delta, Y\rangle\) be any GSA that recognizes \(L\text{.}\) Consider a fixed list \(w_1, w_2, w_3, \dots\) of strings in \(\Sigma^*\text{,}\) one from each \(\equiv_L\)-class. Use Problem 10.15 to show that \(\Delta(s,w_1), \Delta(s,w_2), \Delta(s,w_3), \dots\) is a list of distinct states of \(M\text{,}\) one for each of the states of \(Q_L\text{.}\)

Given a language \(L\subseteq \Sigma^*\text{,}\) we can now determine whether or not it is recognized by some FSA and, if it is, construct a minimal FSA that recognizes it. All we need to do is to calculate right sets to determine the \(\equiv_L\)-classes of \(Q_L\text{.}\) We state what we have found as a final theorem.

Example 10.21.

The standard FSA for \(L_0 = ({\bf 00 \cup 11})^*\) is a minimal FSA that recognizes \(L_0\text{.}\)

To see this, let \(w\in \{0,1\}^*\text{.}\)

  1. If \(w\in ({\bf 00 \cup 11})^*\text{,}\) then its right set is \(({\bf 00 \cup 11})^*\text{.}\)

  2. If \(w\in ({\bf 00 \cup 11})^*{\bf 0}\text{,}\) then its right set is \({\bf 0}({\bf 00 \cup 11})^*\text{.}\)

  3. If \(w\in ({\bf 00 \cup 11})^*{\bf 1}\text{,}\) then its right set is \({\bf 1}({\bf 00 \cup 11})^*\text{.}\)

  4. Otherwise its right set is \(\pmb\varnothing\text{.}\)

Consequently there are only four different \(\equiv_{L_0}\)-classes in \(Q_{L_0}\text{.}\)

Example 10.22.

The language \(L_{10} = \{a^nb^n \mid n = 1,2,3,\dots\}\) is not recognized by any FSA.

To see this, consider the infinite set \(\{a, a^2, a^3, a^4, \dots\}\) of strings in \(\{a,b\}^*\text{.}\) For each positive integer \(n\text{,}\) the string \(b^n\) is in the right set of \(a^n\) but not in the right set of \(a^m\) for any \(m\neq n\text{.}\) Consequently \(Q_{L_{10}}\) has infinitely many different \(\equiv_{L_{10}}\)-classes.

For the next five languages \(L\text{,}\) show that \(Q_L\) is finite and construct its standard FSA.

Problem 10.23.

\(L = 0^*1^*2^*\)

Problem 10.24.

\(L = 00(1^*\cup 2^*)\)

Problem 10.25.

\(L = (1^*\cup 2^*)00\)

Problem 10.26.

\(L = 111\cup(020)^*\)

Problem 10.27.

\(L = (111)^*1\)

For the next four languages \(L\text{,}\) show that \(Q_L\) is infinite and therefore \(L\) is not recognized by any FSA.

Problem 10.28.

\(L = \{0^n10^n\mid n=1,2,3,\dots\}\)

Problem 10.29.

\(L = \{wcw\mid w\in \{a,b\}^*\}\)

Problem 10.30.

\(L_{re}\) is the set of regular expressions over the alphabet \(\{a,b,c\}\text{.}\)

Problem 10.31.

\(L\subseteq \{a,b\}^*\) is the set of strings \(w\) containing the same number of \(a\)'s as \(b\)'s.

Problem 10.32.

In 1931 the famous logician Kurt Gödel proved that there would never be a computing machine that could correctly tell us (Yes-or-No) if an arbitrary statement about positive integers is true. Prove the special case of this theorem for FSAs: “There is no FSA that can correctly tell us (Yes-or-No) if an arbitrary statement about the positive integers is true.”