Sagnik's WebLog: 2010

Friday, March 5, 2010

NoPrefix and NoExtend

This is one interesting problem and I had some discussion about it with one of my classmates which gave a lot of insight to the problem. The problem can be found in Sipser 1.40.

Let $NOPREFIX(A)=\{w \in A| \textrm{ no proper prefix of }w \textrm{ is a member of }A\}$, and $NOEXTEND(A)=\{w \in A| w \textrm{ is not the proper prefix of any member of }A\}$. Show that the class of regular language is closed under these two operations.

The DFA construction for both the cases is pretty interesting. In first case, i.e., for the case of $NOPREFIX(A)$, we have to remove all transitions from the set of final states of the DFA of $A$. That will ensure that after one acceptance, we cannot append some symbols to the string and accept it yet again.
For the construction of the second DFA, we have to be choosy about the new final states. We are going to make those old final states of $A$ as new final states of $NOEXTEND(A)$ from which there is no path to the set of final states.
The question is whether $NOPREFIX(A)=NOEXTEND(A)$ or not. My friend says that they are not equal. I agree to that, and I extend the fact to the conclusion that $NOPREFIX(A) \subseteq NOEXTEND(A)$, as I find the construction of the former is more stringent than the latter. I am not sure about this claim though. Hope you people can help me out.

Thursday, March 4, 2010

Some Tricky CFLs

Here are three brand new problems about context-free language. (Sipser 2.22 - 24) There problems are pretty tricky, and needed a little time to work them out, some solutions are taken from online study materials.

Let $C=\{xay|x,y \in \{0,1\}^* \textrm{ and }x \neq y\}$. Show that $C$ is context free.

We are going to design an NPDA for this language. Note that a string of the form $xay$ is a member of language $C$ if and only if $x \neq y$, and the latter holds if and only if for some position $i, x_i \neq y_i$. A PDA can recognize language $C$ as
follows:

Skip input symbols and use the stack to remember the number of symbols skipped, until a position in $x$ is nondeterministically guessed.
Remember the symbol at this position in its nite control, and skip all the symbols until it seems $a$.
Using the stack, skip an equal number of symbols as in Step 1.
Check if the current symbol is diff erent from the symbol remembered in Step 2.

More precisely, the PDA does the following:

Read the next input symbol, and push a 1 onto the stack.
Nondeterministically jump to either Step 1 or Step 3.
Remember the current input symbol in the nite control.
Read the next input symbols until $a$ is read.
Read the next symbol, and pop the stack.
If the stack is empty, go to Step 7, else go to Step 5.
Accept if the current input symbol is di fferent from the symbol remembered in Step 3. Otherwise reject.

Another problem of the same type is as follows:

Let $D=\{xy|x,y \in \{0,1\}^* \textrm{ and }|x| = |y|\}$. Show that $D$ is context free.

We construct a PDA that recognizes language D. A string w is a member of language
$D$ if and only if $|w|$ is even and $w$ can be written as $w = xy$ where $|x|= |y|$and $x_i \neq y_i$ for some
position $i$. The challenge is to guess the right positions in $x$ and $y$. We make the key observation
that if $w = xy$ where $|x| = |y|$, then for each index $i$, the number of symbols in w from $x_{i+1}$ to
$y_i$, is exactly equal to the number of all other symbols in $w$. A PDA can recognize language D by
guessing a place in $x$ and a place in $y$, and using the stack to check whether the number of symbols
between the two places equals the number of other symbols. More precisely, the PDA does the
following:

Read the next input symbol and push a 1 onto the stack.
Nondeterministically jump to either Step 1 or Step 3.
Remember the current input symbol in the finite control.
Read the next input symbol and pop the stack. Repeat until the stack is empty.
Read the next input symbol and push a 1 onto the stack.
Nondeterministically jump to either Step 5 or Step 7.
Reject if the current symbol is the same as the symbol remembered in Step 3.
Read the next input symbol and pop the stack. Repeat until the stack is empty.
Accept if the input is empty.

The stack memory goes through two maximas (in general case - without considering the boundary condition) and the distance between the two peaks is the distance between $x_i$ and $y_i$. At the end of the input string, the stack is necessarily empty. So the only accepting condition is the empty stack condition.

Another problem worth discussing is as follows:

Let $E=\{a^ib^j| i \neq j \textrm{ and } 2i \neq j\}$. Show that $E$ is context free language.

This is pretty interesting, as this problem is the conglomeration of three different problems. To satisfy the second constraint, we can write $E$ as \[E=\{a^ib^j| j < i \textrm{ OR } i < j<2i \textrm{ OR } j>2i\}\]
Designing the CFG for the the two boundary cases ($ j < i $ and $j>2i$) is easy, and need not to be discussed. The interesting part is to design the CFG for the $i < j <2i$ part. A key observation is for this constraint to hold, $i$ and $j$ can be rewritten as $i=h+l$ and $j=h+2l$ for some $h,l > 0$. Then the language becomes $\{a^la^hb^hb^{2l}\}$, which is definitely a CFL. So union of these three CFLs is a CFL indeed. (Thanx to course material of NYU).

Tuesday, March 2, 2010

Mathematical Induction

This is a problem for which I think structural induction should suffice, but I cannot formalize the induction step. Hope you people will help me out.

Let $\Sigma=\{0,1\}$. For each $k \geq 1$, define $C_k=\Sigma^*a\Sigma^{k-1}$. Prove that for each $k$, no DFA can recognize $C_k$ with fewer than $2^k$ states.

Sipser 1.60-61.
I was trying to design the NFA (which is trivial to design) and then converting the NFA to an equivalent DFA. Base case is satisfied, no problem with that, i.e., for $k=1$, the equivalent DFA contains $2^1$ states.
One interesting observation is that for increasing the value of $k$ by 1, we have to modify the equivalent DFA by repeating all of its states once more (i.e., the number of states gets doubled, the power of 2 comes in play) and then making some connections among the old states and the new ones.
The new transitions are defined in an interesting way. We are defining four types of states, and their corresponding transitions -

Old states

Non-accepting states for previous $k$: The transitions remain same for them.
Accepting states for previous $k$: The transitions are modified to the new states with the new NFA state within them, i.e., if a transition was $\delta(\{1,4\},a)=\{1\}$, the new trastition will be $\delta(\{1,4\},a)=\{1,5\}$, where 5 is the new NFA state. ($k=4$).

New states

States not having the last pair of states of NFA: Define transitions from these states treating them as corresponding states of previous DFA (i.e., $\{q_0,q_1,..,q_k\}$ will be treated as $\{q_0,q_1,..,q_{k-1}\}$ for defining the trasition from)
States not having the last pair of states of NFA: Treat both from state and to state as corresponding states of previous DFA.

That should do I guess. But I am not sure if this formalizes the induction step. I will be glad to hear your views on it.

Saturday, February 27, 2010

Synchronizing Sequence

Today I found another interesting problem in Sipser (refer to 1.59). The problem is the problem of synchronizing sequence, and is a "starred" problem! We are to find the upper bound of a k-state synchronizable DFA (which we are to show as $k^3$).
The definition of synchronizing sequence is somewhat intuitive. Let $M$ be a DFA and $h$ be a state of it called home. A synchronizing sequence for $M$ and $h$ is a string $s \in \Sigma^*$, where $\delta(q,s)=h$ for every $q \in Q$.

A little bit of research in net yielded interesting facts about synchronizing sequence. The problem of estimating the length of synchronizing sequence has a long history and one very important result was Cerny's conjecture which says $(k-1)^2$ is the upper bound of shortest synchronizing sequence of a k-state synchronizable DFA. He was Czech, and the paper was in his native language, so I couldn't make out a single word of what he has written.

I got hold of a paper by A. N. Trahtman, in which he proved that

every n-state aperiodic DFA with a state that is accessible from every state of the automaton has a synchronizing word of length not greater than $\frac{n(n-1)}{2}$, and therefore, for aperiodic automata as well as for automata accepting only star-free languages, the Cerny´s conjecture holds true.

Certain terms seem alien to me at first sight. For example, a star-free language is a regular language if it can be described by a regular expression constructed from the letters of the alphabet, the empty set symbol, union and concatenation but no Kleen star. An aperiodic monoid needs a bit discussion, but I will shorten it because I do not have much of an idea about it right now. An aperiodic semigroup is a semigroup such that for every $x \in S$, there exists an integer $n$ so that $x^n=x^{n+1}$. An aperiodic monoid is, in short, an aperiodic semigroup which is monoid. A syntactic monoid $M(L)$ for a language $L$ is the smallest monoid that recognizes language $L$ (??). Schutzenberger concluded that a language is star free if it is accepted by a DFA with aperiodic syntactic transition monoid. Now this transition monoid need a lot of discussion on semiautomation, so I leave it for another blog. Readers may take the pain to write down something about it, I will appreciate it.

I am marking up this blog for further discussion, not before I learn a bit about these terms though. Before writing this blog, I didn't have much of an idea how group theory can be linked to language theory. This discussion gave an insight!

Friday, February 26, 2010

Another Interesting Regular Language

Refer to Sipser 2.49a.

Let $B=\{1^ky|y\in \{0,1\}^* \textrm{ and y contains at least k 1s }, k \geq 1\}$.

Again a language that seemingly looks irregular for memory involvement. But we can prove it to be regular instead. It is surprising to see that $B$ can be written as $10+1(1+0)^*$, which is a trivial regular expression. The forward subset relation is trivial. For backward subset relation (i.e., the regular expression is a subset of $B$), we see that other than $10$, we can adjust the first occurrences of 1's in two groups - one corresponding to $1^k$ and the other corresponding to $y$, such that the constraint holds.

An Almost Irregular Regular Language!

This problem has its own class. The problem says,

Let $\Sigma=\{0,1\}$ and let \[D=\{w|w \textrm{ contains an equal number of occurances of the substring }01 \textrm{ and }10\}\] Thus $101 \in D$ but $1010$ doesn't. Prove that $D$ is a regular language.

Refer to Sipser 1.48. Well, at a first glace, this language doesn't seem like regular at all, because it involves memory to an extent ("an equal occurance"). But I gave it a try, and tried to develop a DFA (well, an NFA, so to say). It gave me a hard time, so I tried to develop a regular grammar, which seemed even more difficult.

I came across this DFA somewhere in net, which looks flawless. It surfaces up some interesting properties indeed. Starting with $0$, no matter how many $10$ (and only $10$) you have in the string, you will have the same number of occurrences of $01$ and $10$ substrings. Symmetrically, starting with $1$, no matter how many $01$ (and only $01$) you have in the string, you will have the same number of occurrences of $01$ and $10$ substrings.
Another interesting problem which looks closely nonregular can be found in Sipser 1.44. The problem again deals with equal occurance, as follows:

Let $B$ and $C$ be languages over $\Sigma=\{0,1\}$. Define \[B \sqsubset C=\{w\in B| \textrm{ for some }y \in C, \textrm{ strings }w\textrm{ and }y\textrm{ contains equal number of 1s}\}\] Show that the class of regular language is closed under $\sqsubset$ operation.

The construction of $M$ from $M_B$ and $M_C$ can be done as follows.

$Q=Q_B X Q_C$
For $(q,r) \in Q$ and $a \in \Sigma$, we define

$\delta((q,r),a)=\{(\delta_B(q,0),r)\}$ if $a=0$
$\delta((q,r),a)=\{(\delta_B(q,1),\delta_C(r,1))\}$ if $a=1$
$\delta((q,r),a)=\{(q,\delta_C(r,0))\}$ if $a=\epsilon$. This step nondeterministically selects the strings of $C$. (Power of nondeterminism at its best!)

$q_0=(q_B,q_C)$
$F=F_B X F_C$

Interesting problem, ain't it?

Thursday, February 25, 2010

Shuffle and Perfect Shuffle

These two problems from the book of Mike Sipser (1.41 and 1.42) need a little bit of discussion, for I guess these two are the most coveted problems in all university coursework. Question# 1.41 deals with shuffle and question# 1.42 is an extension of shuffle called perfect shuffle. Let's look at the problem of perfect shuffle first, then we can modify our answer for shuffle.

For language $A$ and $B$, let the perfect shuffle of $A$ and $B$ be the language \[\{w|w=a_1b_1...a_kb_k. \textrm{ where }a_1...a_k \in A \textrm{ and } b_1...b_k \in B, \textrm{ each }a_i,b_i \in \Sigma\}\]. Show that the class of regular language is closed under perfect shuffle.

This is a proof by construction. Let $M_A = (Q_A;\Sigma; \delta_A; q_A; F_A)$ and $M_B = (Q_B;\Sigma; \delta_B; q_B; F_B)$ be two DFAs that recognize $A$ and $B$, respectively. Here, we shall construct a DFA $M = (Q;\Sigma; \delta; q; F)$ that recognizes the perfect shuffle of $A$ and $B$. The construction proceeds as follows.

$Q=Q_A \textrm{ x }Q_B \textrm{ x }\{A,B\}$.
$q=(q_A,q_B,A)$.
$F=F_A \textrm{ x }F_B\textrm{ x }{A}$, as the next character to read should be in $M_A$. (The last character read was in $M_B$).
$\delta$ is defined as -

$\delta((x,y,A),a)=(\delta_A(x,a),y,B)$.
$\delta((x,y,B),b)=(x,\delta_B(y,a),A)$.

The construction of shuffle DFA is more or less the same, but with fewer restrictions. Let me state the question first.

For language $A$ and $B$, let the shuffle of $A$ and $B$ be the language \[\{w|w=a_1b_1...a_kb_k. \textrm{ where }a_1...a_k \in A \textrm{ and } b_1...b_k \in B, \textrm{ each }a_i,b_i \in \Sigma^*\}\]. Show that the class of regular language is closed under shuffle.

Now let's take a look at the construction. Again we are going to design a combined DFA with the same tuples as defined for the last problem.

$Q=Q_A \textrm{ x }Q_B \cup {q_0}$, $q_0$ defines the state when nothing is read.
$q=q_A$.
$F=F_A \textrm{ x }F_B \cup {q_0}$, as this DFA can accept the null string.
$\delta$ is defined as -

$\delta(q_0,\epsilon)=(q_A,q_B)$.
$(\delta_A(x,a),y) \in \delta((x,y),a)$.
$(x,\delta_B(y,a)) \in \delta((x,y),a)$.

I guess these two construction answer the problems satisfactorily. Thanks to Wing-Kai Hon of National Tsing Hua University for the discussion of the above problems.

Sagnik's WebLog