String Match

Published: Jun 3, 2016

The topic here is string search algorithm and not a regular expression. For example, find the position(s) of matched patterns in a loooong text.

The easiest implementation is:

  1. check each character of the given pattern one by one
  2. shift starting index by one
  3. repeat 1 and 2

A performance of this brute-force search is O(n^2), which considered bad. When the given text is huge, it’s apparent.

How to effectively find matched pattern

There’s a few algorithms to effectively find the pattern. Among them, Knuth-Morris-Pratt (KMP) algorithm would be well-known algorithm. KMP uses prefix function to skip or find a pattern effectively. The performance of KMP is O(N+M) where N, M are the lengths of text and pattern respectively.

KMP is quite effective, however, understanding its idea is quite tough. Especially, a prefix function, which is also called failure function or partial match table, is a challenging part. Actually, it is an array, neither of function nor table.

This famous algorithm is explained in books and websites, so we can easily find many. Unfortunately, most of them don’t help to understand the idea so much except:

The Knuth-Morris-Pratt Algorithm in my own word

This blog post is really good. The key idea of prefix function is to find “the longest proper prefix of a substring which is also a suffix.” The blog post tells us what is it clearly using good examples.

Also, the idea how to find the pattern using prefix function is described with easy-to-understand illustrations.

Prefix function (a.k.a. failure function, matching table)

The prefix function (an array) will be created only from a pattern. We don’t use a text at all to compute values in prefix function. The function is used to see how the pattern matches itself. Based on this information, we can safely skip useless comparisons.

Here’s the code to calculate prefix function:

static int[] computePrefixFunction(char[] pattern) {
    int m = pattern.length;
    int[] p = new int[m];
    p[0] = 0;
    int k = 0;
    for (int q=1; q<m; q++) {
        while (k > 0 && pattern[k] != pattern[q]) {
            k = p[k];
        }
        if (pattern[k] == pattern[q]) {
            k++;
        }
        p[q] = k;
    }
    return p;
}

When the pattern is “ababaca”, above computes:

i 0 1 2 3 4 5 6
pattern[i] a b a b a c a
p[i] 0 0 1 2 3 0 1

Pattern search in text by prefix function

Once prefix function is computed, it’s time to search the pattern. This part absolutely use the given text. The rule is:

  • find the length of characters matched
  • if the matched length is the same as pattern, the pattern found
  • look up the value in prefix function at index (matched length - 1)
  • skip characters in the value at index (matched length -1)
  • repeat

Below is the code to search the pattern:

static void kmp(char[] text, char[] pattern) {
    List<Integer> indexes = new ArrayList<>();
    int n = text.length;
    int m = pattern.length;
    int[] p = computePrefixFunction(pattern);
    int q = 0;
    for (int i=0; i<n; i++) {
        if (q > 0 && pattern[q] != text[i]) {
            q = p[q-1];
        }
        if (pattern[q] == text[i]) {
            q++;
        }
        if (q == m) {
            indexes.add(i-q+1);
            q = p[q-1];
        }
    }
    if (indexes.size()==0) {
        System.out.println((new String(pattern)) + " not found");
    } else {
        System.out.println((new String(pattern)) + " found at: " + indexes.toString());
    }
}

Run the code and find the pattern

All parts are ready. It’s time to run the code and see the result.

public static void main(String[] args) {
    char[] pat = "ababaca".toCharArray();
    char[] text = "bacbababaabcbab".toCharArray();
    kmp(text, pat);

    pat = "ababaca".toCharArray();
    text = "bacbababaabcbababaca".toCharArray();
    kmp(text, pat);

    pat = "aba".toCharArray();
    text = "bacbababaabcbababaca".toCharArray();
}

Above prints,

ababaca not found
ababaca found at: [13]
aba found at: [4, 6, 13, 15]

Resources of KMP

Latest Posts

Application Development by Rails Action Cable

The previous two blog posts introduced WebSocket and how to implement a WebSocket application on Ruby on Rails. This blog post digs deeper. It is a memo on creating a more realistic application by Action Cable.

Real-time App on Rails by Action Cable

The previous blog post, WebSocket on Rails by Action Cable, focused on WebSocket as a protocol. As in the previous post, by default, Rails app responds to WebSocket connection requests without any hassle. However, other than connecting and sending ping frames, it doesn’t do anything. This blog post focuses on an application side and explains how we can create a full-duplex, bidirectional app.

WebSocket on Rails by Action Cable

In the web application domain, we hear some protocol names. Absolutely, HTTP or HTTPS is the most famous protocol that all web developers know. Although there’s a mechanism of Keep-Alive, a single request/response sequence with a single client/server is all done by HTTP. The client initiates the HTTP request to the server. Once the client receives the HTTP response from the server, communication finishes. As far as HTTP is used, the server just waits and waits. Only when the request comes in, the server can send back some data to the client. This communication style is surprisingly capable of doing many things, so most web applications are satisfied with HTTP.