Concepts of Programming Language -

Description

Hi there, can you please refer the textbook and draw the images. I submitted one time yesterday and its different from what i need. I only need the image like what’s shown in textbook and some explanations. Also try to not to copy from Internet sites. Also please create in a word docs.

Don't use plagiarized sources. Get Your Custom Assignment on

Concepts of Programming Language

From as Little as $13/Page

Unformatted Attachment Preview

CS 435
Homework 03
Name your .docx file with your answers: CS435HW03LastName.
For each of the following regular expressions in a) – d) do steps 1 – 3:
1. Draw the syntax (expression) tree.
2. Using Theorem 2.8.1 translate each regular expression into an NFA- ε. Give a state diagram
representation of the constructed NFA- ε. Be sure to indicate the start state and final
state(s) and be sure to label all states and transitions.
3. Using Algorithm 2.5 translate the NFA- ε into a DFA. Show all epsilon closures and all steps
as shown in the example in the textbook. (You can use 1, 2, … for state names as opposed to
q1, q2, … .) Show the state diagram of the DFA and show the tabular of the DFA. Be sure to
indicate the start state and final state(s) and be sure to label all states and transitions.
a)
b)
c)
d)
( a | b )*
( a* | b* )*
( ( ε | a ) b* )*
( a | b )* a b b ( a | b )*
Chapter 2
Introduction to Languages and Finite
tio
n
State Automata
Computer language designers must formally specify the set of legal programs in the designed language to
bu
users (programmers) and implementors of that language. Formally, a programming language is a set of
strings. Since the set may be infinite, a formal way of generating the set in a finite way is needed. To do this,
tri
formal grammars are used to generate the languages. Checking whether a program is syntactically correct is
is
tantamount to implementing an algorithm to realize set membership of strings in a possibly infinite set. To
rD
do this, implementors also use computational models, such as finite state automata, push-down automata,
or Turing machines to implement the membership test.
In this chapter and the next, formal languages, formal grammars, automata, and related concepts are
fo
covered. In addition, their application to building and engineering the front-end of translators (interpreters
and compilers) is covered. As presented in the previous chapter the first two phases of a translator usually
of a translator.
N
ot
include lexical analysis (scanning) and parsing. Together these phases are usually referred to as the front-end
To model lexical analysis, the lexemes are usually viewed as comprising the “sentences” of this lexical
language. The lexemes are categorized into groups called tokens. These tokens makes up the vocabulary of
the programming language being translated. To recognize lexemes, the computational model class known
as a finite state automata are utilized. There are different kinds of finite state automata: nondeterministic,
nondeterministic-with-ϵ-moves, and deterministic. Each are useful for different aspects of lexical analysis,
but, it turns out, all of these are equivalent in the sense that they recognize the same family of languages,
called regular languages. None of these finite automata recognize more languages than the other.
To help language designers express these lexemes of regular languages, regular expressions are used. We
say that regular expressions denote regular languages and finite state automata recognize regular languages.
This chapter looks at the formal definition of finite state automata and the simulation or execution of them
to test string membership in regular languages. We also examine how to express regular languages with
regular expressions along with the algebraic identities of regular expressions and the construction of finite
35
2.1 Formal Languages
Introduction to Languages and Finite State Automata
state automata from regular expressions. This latter translation is used by compiler building tools, such as
lex or antlr4.
The concepts presented in this chapter are intended to give the theoretical basis for building lexical
analyzers. The lexical analysis of a compiler not only involves recognizing the lexemes of a programming
language, but categorizing the lexeme as a token kind and returning the token and, where necessary, the
corresponding lexeme to a parser. In the next chapter grammars and automata related to parsing is examined.
The parser realizes the string membership function for a given language whose vocabulary is comprised of
the tokens recognized by the lexical analyzed (scanner).
2.1
Formal Languages
Computer scientists use finite state automata to model many types of hardware and software systems. The
n
primary focus here is to introduce finite state automata as regular language recognizers. This chapter
tio
ultimately provides the foundation of writing software for scanners in compilers and front ends for the
command and data entry components of user interfaces. It can be shown that given a set of regular expression,
bu
an finite state automata can be constructed that recognizes the regular language specified by the set of regular
expressions. We will study an algorithm that can translate a regular expression into a deterministic finite
tri
state automata. Being able to do this, allows one to develop compiler-writing tools that allows a compiler
is
writer specify the lexemes as a set of regular expression and a recognizer automatically generated. Flex
and Antl4 are examples of tools that use regular expressions to produce deterministic finite automata to be
rD
utilized in a scanner.
Regular expressions and the role as a denotation for regular languages are covered first. The syntax of
fo
regular expressions and their meanings (semantics) is given. To make regular expressions easier to express
and make more concise, extensions to meta-syntax and operators are shown. Also, identities surrounding
particular form.
N
ot
regular expressions are also shown for the purpose of simplifying or transforming a regular expression into a
A class of finite state automata, called deterministic finite automata, are defined first. Following their
definition and behavior, non-deterministic automata are introduced. It will be shown that non-deterministic
automata and deterministic automata recognize the same class of regular languages.
Following the presentation of the automata and their behavior, the theorems stating the equivalence of
finite automata to regular grammars, in sense of the class of languages recognized by finite automata is the
same as the class of languages generated, is shown. First, the transformation from a regular language that
generates a language L to a non-deterministic automata that recognizes that language L and the inverse
transformation are shown. In addition similar transformations between regular grammars and deterministic
automata are also established.
Copyright 2024
36
Introduction to Languages and Finite State Automata
2.2
2.2 Overview of Formal Languages
Overview of Formal Languages
Since the advent of the electronic digital computer the representation of data and instructions along with
the properties of the representations has been studied. In particular, the text representation of programming
languages and data along with the automatic translation of text from one natural or programming language to
another has been researched quite extensively by both computer scientists and linguists. Chomsky grammars
have played a central and foundational role in describing the syntax of these languages and the framework
for the formal semantics of these languages.
Although electronic digital computers are finite, the sets of finite strings representing programming
languages and abstract data are modeled as potentially infinite sets. Designers of a programming language
need a notation or metalanguage for formally specifying the potentially infinite number of legal strings of
their new language. Developers writing translators for the language must know the designer’s intent of what
n
the legal strings are comprising the new programming language.
tio
A particular family of formal grammars, first formulated by Noam Chomsky 1 , are widely used and studied
by computer scientists for describing formal languages. Chomsky grammars also allow one to formally
bu
describe the syntactic structure of a language, which can be useful in describing the semantics of that
language. This can help with proper use of the language and also translation of strings of that language to
tri
strings of other languages.
Many notations and constructions have been introduced to describe infinite sets. In mathematical litera-
is
ture, an author might describe an infinite set of non-negative, even integers to a reader through establishing
rD
a pattern sequence or through elementary arithmetic operations over constants and numbers chosen from a
given set. For example, one can describe the set of even non-negative integers using these two ways as:
= {0, 2, 4, . . .} = {2k|k ∈ N = {0, 1, 2, . . .}}.
fo
Ne
N
ot
To test membership of a number n in Ne one can either begin enumerating the members of Ne until n
is found. Another algorithmic approach is to test whether 2 divides n without remainder; if so, then n is in
Ne and if not, then it is not in the set.
Just as set construction notation involving functions and predicates is used to describe the numbers comprising an infinite set of those numbers, strings comprising the elements of an infinite set can be described
similarly. However, the operations and objects that comprise the functions and predicates must be operations over string objects. In this chapter we examine what these operations are and how formal grammars
serve as a finite way for constructing infinite (and finite) languages of strings and for as a means for constructing procedure or algorithms for testing membership of strings. This latter aspect of language theory
deals primarily with the study and design of algorithms and models that are often referred to as language
recognizers and automata. We shall concentrate the discussion first on the representation or description of
languages and postpone the study of recognizers, parsers, etc. until later.
The scope of this chapter is to introduce the reader to the fundamental concepts underlying formal language theory so that it may be applied later to translator writing systems, such as interpreters and compilers.
1 Noam Chomsky, a linguist, has provided the framework for the subject matter in this chapter.
was done in the late 1950’s.
Copyright 2024
37
Most of his original work
2.3 Strings
Introduction to Languages and Finite State Automata
Section 2.3 introduces definitions of concepts related to strings, string operations, and string properties, each
of which provide the foundational concepts necessary for discussing formal grammars. Next, we cover Chomsky grammars and the four hierarchical families of Chomsky languages along with the corresponding families
of Chomsky grammars. Context Free grammars, including regular grammars, and related concepts such as
leftmost and rightmost derivations, parse trees, ambiguity and properties of these grammars are introduced.
Finally, the relationship to parsing is introduced. A more complete discussion of parsing of context free
languages and related recognizers are discussed in a later chapter.
2.3
Strings
Within formal language theory, an alphabet (or vocabulary) is a finite set of symbols. A finite string is a
finite sequence over an alphabet A. Recall that a finite sequence is a total function restricted to an index set
n
domain: [n] = {0, 1, . . . , n − 1} where n ≥ 0; [0] represents the empty set. A string, α : [n] → A, has length
tio
n and is denoted |α| = n, where n ≥ 0. For each alphabet, A, there is only one string with the empty index
set; this string is called the empty string, denoted in this text as ϵA . Often the subscript A is dropped when
bu
the alphabet A is clear.
Notation: Although strings are finite sequences, we often replace the sequence delimiters , with
tri
quote ” symbol delimiters and leave out commas separating the symbols (terms). Where confusion will not
is
arise, the quote ” delimiters may be dropped.
rD
Example 2.1 Using the String definition
Let A = {a, b, c}. Example of strings are α1 = “a” and α2 = “aac”. More formally, we have:
N
ot
a),(1 7→ a), (2 7→ c)}.
fo
α1 : [1] → A, formally defined as: {(0 7→ a)} and α2 : [3] → A formally defined as: {(0 7→
Substrings, String Prefixes and String Suffixes
Given strings α, β, and ω, (each possibly empty) such that ϕ = αβω, α is the prefix string of ϕ and proper
prefix iff α ̸= ϕ and α ̸= ϵ; similarly, ω (ω possibly ϵ) is the suffix string of ϕ and the proper suffix iff ω ̸= ϕ
and ω ̸= ϵ; and α, β, and ω are all substrings of ϕ.
Example 2.2 Prefixes and Suffixes
Let ϕ = cabb be a string over an alphabet that contains {a, b, c}.
Prefixes of cabb: ϵ, c, ca, cab, and cabb;
Proper Prefixes of cabb: c, ca, cab;
Suffixes of cabb: cabb, abb,bb, b and ϵ;
Copyright 2024
38
Introduction to Languages and Finite State Automata
2.3 Strings
Proper Suffixes of cabb: abb,bb, and b .
Substrings of cabb: All prefixes and suffixes of cabb along with a, b, ab.
Concatenation
The concatenation of strings α : [m] → A and β : [n] → A is the string αβ : [m + n] → A, where αβ(i) = α(i)
for 0 ≤ i ≤ m and αβ(j + m) = β(j) for 0 ≤ j ≤ n. It follows that |αβ| = |α| + |β|. Note that for all strings
α, αϵ = ϵα = α. Also note that concatenation is not always commutative; that is, it does not always hold
that αβ = βα.
n
Example 2.3 Concatenation
tio
Again, let the alphabet A = {a, b, c} and let α1 = “a” and α2 = “aac”. The concatenation, α1 α2 = “aaac”.
= {(0 7→ a), (1 7→ a), (2 7→ a), (3 7→ c)}
rD
is
tri
α1 α2
bu
We have α1 α2 : [4] → A. Formally, we have:
Exponentiation and String Reverse Notation
fo
It is convenient to use exponentiation notation to represent repeated symbols in a string. For each symbol
a ∈ A, we may write an (n ≥ 0) as a shorthand notation for aa
· · · a}. For example, it will be convenient
| {z
n a′ s
N
ot
to write aaac as a3 b1 For each symbol a ∈ A, a1 = a. Usually, exponent of value 1, the exponent is left
implicit. The strings of length 1 are not only interpreted as strings, but as single symbols from an alphabet;
the context of usage is used for understanding the proper interpretation or explicit quotes are used: ”a”.
Also note that for any a ∈ A, a0 = ϵ (or more precisely, ϵA ).
For a ∈ A and m, n ≥ 0:
am an
= am+n .
It is also convenient to use exponentiation notation to represent repeated concatenation of repeated
strings over an alphabet. To denote the concatenation of n ≥ 0 copies of a string α, we write αn as
shorthand notation for |αα{z
· · · α}. Again, α0 = ϵ.
n α′ s
It is sometimes convenient to refer to the reverse of a string. The reverse of a string α = a1 a2 . . . an is a
string denoted and defined as αR = an . . . a2 a1 . A string α is a palindrome iff α = αR . The reverse of ababc,
written ababcR = cbaba. The string abba is a palindrome.
Copyright 2024
39
2.3 Strings
Introduction to Languages and Finite State Automata
Concatenation over Sets of Strings
Concatenation can also be extended over sets of strings. Let S and T each be sets of strings. The set
S • T = {αβ|α ∈ S and β ∈ T }. For an alphabet A, An (n ≥ 0) is recursively defined as follows:

 {ϵ}
if n = 0
An =
 An−1 • A if n > 0.
Kleene Closure
Note that An is the set of all strings of length n over A. The Kleene closure of an alphabet A is defined as:
A∗ = A0 ∪ A1 ∪ A2 ∪ · · · =
∞
[
Ai .
i=0
A∗ is the infinite set of all possible finite length strings over alphabet A. Any set L ⊆ A∗ is a language over
n
an alphabet A. Each string in language L is called a sentence.
bu
A+ = A∗ − {ϵ}
tio
It also will be convenient to use the following set and corresponding notation:
tri
An Algebraic Structure using String Concatenation
In summary, the concatenation operation over strings of A∗ forms an algebraic structure called a monoid.
rD
is
Such algebraic structures possess the following properties:
1. (Closure) For all α, β ∈ A∗ , αβ ∈ A∗ ;
fo
2. (Associative) For all α, β, γ ∈ A∗ , (αβ)γ = α(βγ);
N
ot
3. (Identity Element: ϵ) For each α ∈ A∗ , αϵ = ϵα = α;
These properties show that A∗ with concatenation forms an algebraic structure called an monoid. Because
a monoid is formed, properties of monoids prove to be important and useful in more theoretical studies of
language theory.
Examples of Languages
Example 2.4 Application of String and Language Definitions
Let A = {0, 1}.
A∗
= {ϵ, 0, 1, 00, 01, 10, 11,
000, 010, 100, 110, 001, 011, . . .}
The set of binary digit strings with no leading zeroes is one possible language.
The set L = {0n 1m | n, m ≥ 0} is another example of a language over this alphabet A. Some sentences of L
are 0 ∈ L since 01 10 = 0ϵ = 0, where n = 1 and m = 0. Similarly, 1 ∈ L since 00 11 = ϵ1 = 1, where n = 0
Copyright 2024
40
Introduction to Languages and Finite State Automata
2.3 Strings
and m = 1. For n = 1, m = 1, we have 01 11 ∈ L. For n = 5, m = 2, we have 05 12 = 0000011 ∈ L. Note that
ϵ ∈ L for n = m = 0. Languages, L, described in English is: any number of 0’s (possibly none), followed by
any number of 1’s (possibly none), but once a 1 is encountered (read from left to right) no 0 may occur.
As shown in the previous example some languages may contain the empty string. For each alphabet, is a
language that contains only the empty string, ϵ. Note that there is a difference between an empty language
and the language containing only the empty string. L = ∅ = { } and L = {ϵ} are both languages given any
alphabet.
Example 2.5 Language of Latin Letter Alphabet
= {ϵ, a, b, . . . , z,
tio
A∗
n
Let A = {a, b, . . . , z}.
bu
aa, ab, ac, . . . , az, ba, bb, . . . bz,
..
.
tri
. . . , za, zb, zc, . . . , zz,
is
aaa, aab, aac, . . . , aaz, aba, abb, . . . , abz,
..
.
rD
zza, zzb, zzc, . . . , zzz, . . . ,
. . .}
N
ot
in Webster’s dictionary.
fo
Here the alphabet is the set of lower-case Latin letters. One possible subset of A∗ is the set of English words
By using the terminology defined thus far, this example language has sentences that are words; this is
not commonly used terminology when discussing natural languages, such as English. It is more common to
think of the words of a dictionary as serving as the building blocks of the language as alphabet sets have
been defined above. We shall see in the development of formal grammars, the sentences of one language may
serve the role of an alphabet for another language. Such is the case here for English. When using alphabets
in this hierarchical way, the sentences forming the dictionary and, in turn, used as an alphabet for another
language, the latter alphabet is often called a vocabulary. This is shown in the next example.
Example 2.6 Language Over a Dictionary Alphabet (or Vocabulary)
Let the vocabulary V equal the set of words in Webster’s dictionary along with the usual punctuation marks
and a space. The set of syntactically legal English sentences is one possible language. Note that some of the
sentences might not make any sense semantically.
Copyright 2024
41
2.4 Regular Expressions
Introduction to Languages and Finite State Automata
Note that most natural languages, such as English, are dynamic in the sense that the set of syntactically legal sentences change; moreover, the vocabulary, which comprises an English dictionary, is also
ever-changing.
Programming languages, like many natural languages, can be viewed as being built from a vocabulary,
which can be viewed as a language built from an alphabet. In contrast to natural languages each of which
are based upon a dictionary of finite size that serves as its vocabulary or alphabet, programming language’s
alphabet may be theoretically infinite if there are no restrictions on the length of identifier (e.g. variable)
names. Each identifier can serve as an element in the alphabet and sentences are legally correct programs
in that language.
=
{ keywords: abstract , boolean, . . . , while }
tio
Let V
∪ { operators and/or delimiters:
{ , } , ( , ) , , , + , – , * , / , ++ , — , == , = , . . . }
x , X , x1 , Max , MAX , . . . , sort , . . . , }
bu
∪ { identifiers:
n
Example 2.7 Java Vocabulary
∪ { literals: abc, a , ’a’, 123 , -123 , +123 , 123L , 12.3 , 12.3f , 12.3e04 , . . . }
=
{ ϵ, abstract , class x { } , . . . , abstract class, if , . . . }
tri
V
∗
Regular Expressions
fo
2.4
rD
is
The Java language is one possible subset of V ∗ . Listed above are strings that are not in the Java language.
Regular expressions serve as a convenient way to describe string patterns that can be searched in a larger
N
ot
string, such as a text file. In language design and compiler engineering, regular expressions are used to
describe lexemes and, with these regular expressions, FSAs can be constructed and, in turn, can represent
scanners. We now look at the syntax and semantics of regular expressions.
Copyright 2024
42
Introduction to Languages and Finite State Automata
2.4.1
2.4 Regular Expressions
Regular Expression Syntax and Semantics
Definition 2.4.1 Regular Expression Syntax
Let T be an alphabet. Regular expressions over the alphabet, T ∪ {∅, ϵ, |, ∗, (, )}, with T ∩ {∅, ϵ, |, ∗, (, )} = ∅
are defined as follows:
∅ is a regular expression;
ϵ is a regular expression;
for each a ∈ T , a is a regular expression;
if r1 and r2 are regular expressions, then r1 |r2 is a regular expression;
if r1 and r2 are regular expressions, then r1 r2 is a regular expression;
n
if r is a regular expression, then r∗ is a regular expression;
tio
if r is a regular expression, the (r) is a regular expression.
bu
Definition 2.4.2 Regular Expression Semantics Let T be an alphabet and r1 , r2 , and r be regular expressions.
tri
∅ is denotes ∅;
is
ϵ is denotes {ϵ};
rD
each regular expression a is denotes {a};
if r1 and r2 denotes sets, R1 and R2 , r1 |r2 denotes R1 ∪ R2 = {α|α ∈ R1 or α ∈ R2 };
if r1 and r2 denotes sets, R1 and R2 , r1 r2 denotes R1 R2 = {αβ | α ∈ R1 , β ∈ R2 };
fo
if r denotes set R, then r∗ denotes R∗ ;
N
ot
if r denotes set R, the (r) denotes R.
The operators precedence levels are highest to lowest as follows: Kleene closure (∗), Concatenation, and then
Union (|). The parentheses can alter the precedence rules.
Example 2.8 Regular Expression
Identifiers in C/C++ can be denoted as follows. Let letter → a|b| . . . z|A|B| . . . Z
Let digit → 0|1| . . . 9
Let ident → (letter| )(letter| |digit)∗
Example 2.9 Regular Expression
Consider this regular expression: abc∗ is denoted by the set {a}{b}{c}∗ = {a}{b, bc, bcc, bccc, . . .} = {ab, abc, abcc, abccc, . . .}.
Copyright 2024
43
2.4 Regular Expressions
Introduction to Languages and Finite State Automata
Example 2.10 Regular Expression
a|bc∗ is denoted by the set {a} ∪ {b}{c}∗ = {a} ∪ {b, bc, bcc, bccc, . . .} = {a, b, bc, bcc, bccc, . . .}
2.4.2
Regular Expression Identities
In Figure 2.1 is a list of identities.
Let r, s, and t represent regular expressions in the following. For a regular expression, r, the following
notation is introduced for convenience: r+ = rr∗ = r∗ r.
r|s
= s|t
(Commutative)
(Associative)
= r(st)
r(s|t)
= rs|rt
(Distributive)
(s|t)r
= sr|tr
r|∅
= ∅|r = r
rϵ
= ϵr = r
r∅
= ∅r = ∅
tio
rD
is
tri
(Identity)
(Zero)
= r
(Idempotent)
(r∗ )∗
= r∗
(Closure-related Identities)
r∗
= ϵ + r+
ϵ∗
= ϵ
ϵ+
= ϵ
∗
= ϵ
N
ot
fo
r|r
n
(rs)t
bu
(r|s)|t = r|(s|t)
∅
Figure 2.1: Regular Expression Identities
2.4.3
Extended Notation for Regular Expressions
The class of regular languages is the family or set of languages that can be described with the set constructions
operations used for defining the semantics of regular expressions. The sets of lexemes for most programming
languages form small regular languages. Since many languages have are based upon fairly large character
sets, regular expressions used for denoting these regular languages can become quite long and cumbersome.
Copyright 2024
44
Introduction to Languages and Finite State Automata
2.5 Finite State Automata
For example, using the standard ASCII character set, a regular expression for denoting a string of upper-case
or lower-case letters, requires a regular expression like:
[a | b | c | … | z | A | B | C | … | Z]*
2.5
Finite State Automata
Abstractly, a finite state automaton (FSA) is a computational model for checking membership of strings
in a language over a given alphabet. Given a string over an alphabet, an FSA reads a string from left to
right and indicates whether that string is accepted or not as a member of the language. An FSA, includes
a read-head, a finite set of states and a (finite) set of transitions between any two states (possibly the same
state). A transition is dependent on a state and also a specific alphabet symbol for determining what state
n
becomes the current state. A subset of states are designated as final (or accepting) states. When ”running”
an FSA, an FSA is always in one state and over time the machine changes states based upon the specification
tio
of the transitions and what symbol is being read. As each symbol is read the read head is moved to the
bu
next symbol. When the end of a input string is reached and the automaton is in a final state, the string
is considered accepted. If not, the string is rejected. Depending on how an FSA’s transitions are specified,
tri
there may be a case where in a given state, there are no transitions for a particular alphabet symbol; in this
case, the machine is considered ”stuck” and, in such as case, the string is rejected.
ai
a2
an
fo
a1
N
ot
a0
rD
is
An abstract graphical depiction of is shown in Figure 2.2.
Finite Control
qj
…
tk
…
qi
Figure 2.2: Abstract view of a Finite State Automata
As a recognizer, a finite state automaton may recognize many prefixes in a given string. It does this informally
as follows. The machine starts in a start state. The machine transitions between state qi to qj if the symbol
that the read-head is reading matches the symbol label of the directed edge from qi to qj . The entire string
Copyright 2024
45
2.5 Finite State Automata
Introduction to Languages and Finite State Automata
is recognized only when the last symbol is read and the state of the automaton is in a final accepting state. If
not in a final state, the string is not recognized as being in the language. We now formally define a particular
kind of finite state automaton called a Deterministic Finite Automaton (DFA). It is deterministic because
there is no choice in a transition to a new state based upon the current symbol in the input. Following
the definition, graphical and matrix representations of DFAs are shown and formal meaning of a DFA as a
language is defined.
Definition 2.5.1 Deterministic Finite Automaton A Deterministic Finite Automaton is a 5-tuple < Q, T, δ, q0 , F > where
Q is a non-empty finite set of states;
T is an alphabet;
tio
n
δ : Q × T → Q is a (possibly partial) function, called the transition function;
q0 ∈ Q is the start state; and
Representations of DFAs
is
2.5.1
tri
bu
F ⊆ Q is a non-empty (finite) set of final states.
Each DFA = M = < Q, T, δ, q0 , F > is usually presented or expressed with a matrix or an edge-labeled and
rD
vertex-labeled directed graph with added notations for the start and final states. Assuming that |Q| = n
and |T | = m, a transition function δ can be specified with a n × m-matrix as follows:
fo
rows are bijectively labeled with states q ∈ Q;
columns are bijectively labeled with alphabet symbols t ∈ T ;
N
ot
each (row-q,column-t) entry is q ′ when δ is defined and q ‘ = δ(q, t); and
each(row-q,column-t)entry has a blank entry or the symbol ⊥ when δ(q, t) is undefined.
δ
..
.
···
..
.
t
..
.
···
..
.
q
..
.
···
..
.
q′
..
.
···
..
.
and when undefined
δ
..
.
···
..
.
t
..
.
···
..
.
q
..
.
···
..
.
⊥
..
.
···
..
.
One can also graphically present instances of a DFA as follows:
q
a state
/ q
qf
the start state
a final state
q
t / ′
q
a transition
In order to define what is meant by a string α being recognized by an DFA, we extend the δ transition
function to δ̂ : Q × T ∗ → Q and define δ̂ as follows:
Copyright 2024
δ̂(q, ϵ)
=
q
δ̂(q, βa)
=
δ(δ̂(q, β), a)
46
Introduction to Languages and Finite State Automata
2.5 Finite State Automata
Definition 2.5.2 String Recognition by an DFA A string α is recognized by an DFA, M =< Q, T, δ, q0 , F >
when δ(q0 , α) = qf , where qf ∈ F . A language L recognized by an DFA, M , denoted L(M ), is defined as
follows:
L(M ) = {α | δ̂(q0 , α) = qf , qf ∈ F }
A string α is not recognized (rejected) when either at some point in the trace, reading symbol a while in
state q, the transition function, δ, is undefined for that state-alphabet pair, (q, a), or the end of the string is
reached and the machine is in not in a final state.
Example 2.11 M2.11
a
b
q0
q0
q1
q1
q1
q0
/) q0
bu
tio
δ
tri
n
Let M2.11 be a finite state automaton: Q = {q0 , q1 }. Start state = q0 . T = {a, b}. F = {q1 }.
a
b
k
a
+ q
u
1
rD
is
b
Figure 2.3: State diagram of M2.11
fo
We trace the steps for δ̂(q0 , aba).
N
ot
δ̂(q0 , aba)
=
δ(δ̂(q0 , ab), a)
=
δ(δ(δ̂(q0 , a), b), a)
= δ(δ(δ(δ̂(q0 , ϵ), a), b), a)
=
δ(δ(δ(q0 , a), b), a)
=
δ(δ(q0 , b), a)
=
δ(q1 , a)
=
q1
Since q1 ∈ F , the string aba is recognized by M . To simplify the tracing of the acceptance or rejection of
strings the following notation is introduced. For each step δ(q , a) = q we use q a q . So, to denote the trace
i
j
i
j
of aba using machine M , we have the q a q b q a q . Shown below are several example traces of strings using
0
0
1
1
M:
1. abaa ∈ L(M ):
q aq bq aq aq
0
0
1
1
1
2. bbbab is not in L(M ):
q bq bq bq aq bq
0
1
0
Copyright 2024
1
1
0
47
2.5 Finite State Automata
3. bababa ∈ L(M )
q bq aq bq aq bq aq
0
1
1
0
0
1
Introduction to Languages and Finite State Automata
1
4. bababab is not in L(M ):
q bq aq bq aq bq aq bq
0
1
1
0
0
1
1
0
L(M ) consists of strings over {a, b}+ that contains an odd number of bs. Intuitively, we could use a phrase
to represent a state that would convey a more meaningful name than say, q0 or q1 . When in state q0 , we are
in a state of having read an even number of b’s; in state q1 we are in a state of having read an odd number of
b’s. Although this gives more insight into the motivation for machine and structure and intended semantics
of the language recognized by the machine, it is too cumbersome to label the states with such phrases or
assertions. Often, if such assertions prove useful legends indicating the associations of the state labels with
tio
n
the assertions can be included.
Example 2.12 M2.12
bu
Let M2.12 be a finite state automaton: Q = {q0 , q1 , q2 }. Start state = q0 . T = {a, b, c}. F = {q2 }.
a
b
c
q0
q0
q1
⊥
q1
q1
q2
⊥
⊥
q0
/) q1
b
q2
rD
q2
is
tri
δ
a
a
d
a
/ q2 u
c
N
ot
fo
/) q0
b
Figure 2.4: State diagram for M2.12
Shown below are several example traces of strings using M :
1. ababa ∈ L(M ) :
q aq bq aq bq aq
0
0
1
1
2
2
2. bb ∈ L(M ) :
q bq bq
0
1
2
3. ababacababa ∈ L(M ):
q aq bq aq bq aq cq aq bq aq bq aq
0
0
1
1
2
2
0
0
1
1
2
2
4. ac is not in L(M ):
q aq c
0
0
5. ababaca is not in L(M ):
q0 a q0 b q1 a q1 b q2 a q2 c q0 a q0 and q0 ∈
/F
Copyright 2024
48
Introduction to Languages and Finite State Automata
2.5 Finite State Automata
L(M ) is the set of strings that is comprised of one or more substrings separated (delimited) by symbol c and
with each of these substrings being a string over {a, b}+ with exactly two bs (not necessarily consecutive).
2.5.2
An alternative approach to DFA transition functions: Total transition
functions
An alternative to using partial transition function is to introduce an error state, qe , where qe is required not
to be a final state and to modify the transition function, δ into a transition function, δe , as follows. δe is
identical to δ except where δ is undefined. For all δ : (q, t) 7→ ⊥ define δe : (q, t) 7→ qe . and add another |T |
transitions, δ(qe , t) = qe , for all t ∈ T . This introduces many more transitions and complicates the graphical
n
representation.
tio
Example 2.13 Alternative DFA: M2.12
Consider the DFA, M2.12 , from Example 2.12. Using the alternative approach we have the alternative DFA,
bu
e
e
e
M2.12
, graphically presented below. It can be shown that L(M2.12 ) = L(M2.12
). M2.12
is graphically depicted
tri
e
in Figure 2.5. Tracing a string, α ∈ L(M2.12
) is identical to that shown above for L(M2.12 ) in Example 2.12.
b
a
)5/ qe u
T`
is
c
rD
c
a
a
d
b
/) q1
fo
/) q0
c
b
a
b
/ q2 u
c
N
ot
e
Figure 2.5: State diagram for alternative, total, transition function for M2.12
However, tracing a string that is not in the language can be different. When encountering the
1. ac is not in L(M ):
q aq cq
0
0
e
2. ababaca is not in L(M ):
q a q b q a q b q a q c qa
0
0
1
Copyright 2024
1
2
2
0
49
2.6 Non-deterministic Finite Automata (NFAs)
2.5.3
Introduction to Languages and Finite State Automata
Deterministic Finite Automata as Language Recognizers
We present an algorithm as a solution to the decision problem: Given a DFA M and a string α, does
α ∈ L(M )?
Algorithm 2.1: A

Related Questions: