| Course code | CTU03 |
| Course title | Text Searching Algorithms |
| Institution | Czech Technical University in Prague |
| Course address | |
| City | |
| Minimum year of study | 4th year |
| Minimum level of English | Good |
| Minimum level of French | None |
| Key words | Stringology, finite automata, exact and approximate string matching, forward and backward string matching, factor automata, subsequence automata |
| Language | English |
| Professor responsible | Borivoj Melichar |
| Telephone | +420 2234 357 287 |
| Fax | +420 224 923 325 |
| melichar@fel.cvut.cz | |
| Participating professors | Jan Holub |
| Number of places | Minimum: 10, Maximum: 15, Reserved for local students: 0 |
| Objectives | Text is the simplest and most natural representation of information in a range of areas. Text is a linear sequence of symbols from some alphabet. The text is manipulated in many application areas: processing of text in natural and formal languages, study of sequences in molecular biology, music analysis, etc. The design of algorithms that process texts goes back at least thirty years. In particular, the 1990s produced many new results. This progress is due in part to genome research, where text algorithms are often used.The basic problem of text processing concerns string matching. It is used to access information and this operation is used very frequently. We have recognized while working in this area that finite automata are very useful tools for understanding and solving many text processing problems. We have found in some cases that well known algorithms are in fact simulators of non-deterministic finite automata serving as models of these algorithms. For this reason the material used in this course is based mainly on results from the theory of finite automata.Because the string is a central notion in this area, Stringology has become the nickname of this subfield of algorithmic research.
|
| Programme to be followed | · Five 3-hour lectures: 1. Overview of Stringology, string matching problems, string matching and finite automata.2. Forward string matching, fail function, dynamic programming and bit parallelism.3. Factor automata, subsequence automata, repetition in text.4. Forward string matching, fail function.5. Backward string matching, models of backward string matching, Boyer-Moore algorithm. · Three 1-hour case studies: 1. Pattern matching in a two-dimensional text.2. Implementation of factor automata.3. String matching in a compressed text. · Three 2-hour seminars: 1. Mastering finite automata: determinisation, union, intersection, e-transitions removal, elimination of more than one initial states.2. Construction of string matching automata, factor and subsequence automata.3. Forward string matching. |
| Prerequisites | Sets, relations, oriented graphs, finite automata, regular expressions. |
| Course exam | Written exam with the duration of 1 hour, evaluation of the results. |
The ATHENS Programme