Regular Expressions

Section Regular Expressions

There is a compact and convenient notation to describe languages recognized by an FSA which tells, in a simple way, exactly what strings of \(\Sigma^*\) the FSA accepts. This notation is build recursively.

\(\pmb \varnothing\) is the empty language with no strings.
\(\lda\) is the one string language \(\{\lambda\}\text{,}\) and
\(\ba\) is the one string language \(\{a\}\) for each \(a\in \Sigma\text{.}\)

If \(K\) and \(L\) are each a language, then

\(KL = \{uv \mid u\in K \mbox{ and } v\in L\}\text{,}\) the concatenation of \(K\) and \(L\text{,}\)
\(K\cup L = \{w \in \Sigma^* \mid w\in K \mbox{ or } w \in L\}\text{,}\) the union of \(K\) and \(L\text{,}\)
\(L^* = \{\lambda\} \cup\{w\in \Sigma^* \mid w=u_1u_2\dots u_n \mbox{ for some } u_1, u_2, \dots u_n \in L\}\text{,}\) the Kleene star of \(L\text{.}\)

For example, the language described in Problem 9.21 is

\begin{equation*} L_4 = bc \dots \mbox{ any string } \dots d \end{equation*}

which is the concatenation of three languages: \(\{bc\}\text{,}\) all strings (i.e., \(\Sigma^*\)) and \(\{d\}\text{.}\)

To express each of these as a regular expression, we can use \({\bb} = \{b\}\text{,}\) \({\bc} = \{c\}\) and \({\bd} = \{d\}\text{.}\) Then we have the concatenation \({\bb\bc} = \{b\}\{c\} = \{bc\}\text{.}\) In this example we have \(\Sigma = \{a,b,c,d\} = {\ba}\cup {\bb}\cup {\bc}\cup {\bd}\text{.}\) Notice that the set of all strings from \(\Sigma\text{,}\) that we have called \(\Sigma^*\text{,}\) is in fact an example of the Kleene star. Thus we have \(\Sigma^* = \{a,b,c,d\}^* = ({\ba}\cup {\bb} \cup {\bc} \cup{\bd})^*\text{.}\) The language \(L_4\) can now be described by the regular expression that is the concatenation of these three:

\begin{equation*} L_4 = \{bc\} \Sigma^* \{d\} = {\bb\bc}({\ba}\cup {\bb} \cup {\bc} \cup {\bd})^*\bd. \end{equation*}

Problem 9.31.

Find a regular expression for the language \(L_3\) described in Example 9.20.

Problem 9.32.

Find a regular expression for the language \(L_5\text{,}\) where \(M_5\) is the FSA drawn in Figure 9.23.

Problem 9.33.

Find a regular expression for the language \(L_6\) described in Problem 9.24.

Problem 9.34.

Find a regular expression for the language \(L_7\) described in Problem 9.25.

Problem 9.35.

Find a regular expression for the language \(L_8\) described in Problem 9.26.

Problem 9.36.

Find a regular expression for the language \(L_{10}\) described in Problem 9.29.

Problem 9.37.

Draw a transition diagram for an FSA that recognizes the language \((\ba\bb\ba)^*(\bd\bd\bd)(\bb\cup \be)^*\text{.}\)

Problem 9.38.

Consider FSAs \(M = \langle \Sigma,Q,s,Y,\delta\rangle\) recognizing language \(L\) and \(M' = \langle \Sigma,Q',s',Y',\delta'\rangle\) recognizing language \(L'\text{.}\) Tell how to use \(M\) and \(M'\) to design an FSA \(M^{\cup}\) that will recognize \(L\cup L'\text{.}\) HINT: Try using \(Q^{\cup}=Q\times Q'\) for the states of \(M^{\cup}\text{,}\) and then define \(s^{\cup}\text{,}\) \(Y^{\cup}\) and \(\delta^{\cup}\text{.}\)

The direct product construction used in Problem 9.38 for \(Q^\cup\) tends to produce an FSA with an excessively large number of states. In practice most of these states are unnecessary since they can not be reached from the start state. A practical way to construct the transition diagram for the FSA \(M\) in Problem 9.38 is to begin with the start state and add in only those states that can be reached by successive transitions from the start state. Try this in the following extension of Problem 9.37.

Problem 9.39.

Draw a transition diagram for an FSA that recognizes the language

\begin{equation*} (\ba\bb\ba)^*(\bd\bd\bd)(\bb\cup \be)^* \cup \ba^*(\bd^*\cup \bc^*)\text{.} \end{equation*}

These problems suggest an important theorem about languages which says that a language is recognized by some finite state acceptor if and only if it can be represented by a regular expression.