Knuth-Morris-Pratt (KMP) Substring Search Algorithm with Java Example

(6567 Views)


Knuth-Morris-Pratt (KMP) Substring Search Algorithm in Java Example

Substring Search Algorithm:

In a Substring search algorithm, we're finding whether the given pattern exists in the text of not. Lets say we have a string text "An apple Pie", and we need to search whether "pie" exists in it or not. Since our search string exists in the string, our method will return true, else it'll return false.

Naive method:

If a novice programmer is asked to write an algorithm to perform a substring search, he'll implement a brute-force method. Here's what it looks like.

Consider our text is "abcabcd", and
Substring to search is "abcd"

Naive method algorithm:

  • Initialize i=0 (for text) and j=0 (for pattern/substring)
  • Match text[i] with pattern[j]
  • If it's a match, match every other character in the pattern with text starting from position i
  • else increment i and reset j to 0

This algorithm works just fine, it'll tell you whether the substring is present in the text or not. But this algorithm takes more time than required to work. Consider a text of size m, and pattern of size n, this algorithm is O(m*n) complex with respect to time, since for every character in text, we're searching the pattern length.

Knuth-Morris-Pratt (KMP) Algorithm:

The KMP algorithm is able to search for the substring in O(m+n) time, this is why we don't use the above naive method. Lets now see how this algorithm works.

We'll take the following example to understand KMP:

Knuth-Morris-Pratt (KMP) Substring Search Algorithm with Java Example
Lets match first character of both the strings
Knuth-Morris-Pratt (KMP) Substring Search Algorithm with Java Example
Since it's a match, we'll check the next.
Knuth-Morris-Pratt (KMP) Substring Search Algorithm with Java Example
We'll match until we find a mismatch.
Knuth-Morris-Pratt (KMP) Substring Search Algorithm with Java Example
We found our mismatch at index 3. In naive method, we'll start checking from index 1 of text with index 0 of pattern. But in KMP, we don't follow this rule. Mismatch occured at index 3 in pattern. Now, we've to search for any prefix which is also a suffix in the pattern before mismatch. Mismatch occurs at index 3, before that, we have substring "ABC" in pattern, now search for any character/string which is a prefix as well as a suffix. In this case, there's no such character/string. So we start our match from index 4 in string and index 0 in pattern.
Note: In "the road is a path" text, "th" is both suffix as well prefix.
Knuth-Morris-Pratt (KMP) Substring Search Algorithm with Java Example
Again, we'll match the characters until we find a mismatch.
Knuth-Morris-Pratt (KMP) Substring Search Algorithm with Java Example
Since, ABC is both prefix and suffix, we don't need to check the first 3 characters of pattern, because we've already checked them. We would have started from index 0 in pattern, but prefix and suffix are both ABC, we can start matching from index 3 in pattern.
Knuth-Morris-Pratt (KMP) Substring Search Algorithm with Java Example
This way, KMP reduces the number of comparisons and improve the time complexity.

In the above illustration, we always looked for any suffix matching the prefix. This is done by using a pattern array.

Knuth-Morris-Pratt (KMP) Substring Search Algorithm with Java Example
This pattern array tells us the positions we have to shift. Index 5 in pattern array is 2. We reach 5 by matching "ABCDAB", here we can see that "AB" is both prefix and suffix, so when we start matching again, we have to start from index 2 in pattern array which is character "C" and not from the starting.

Pseudocode for creating pattern array:

  • Create a patternArray arr of same size of pattern
  • Initialize i=0 , j=1, and arr[0]=0
  • If pattern[i] matches pattern[j], arr[j]=arr[i]+1 and increment both i and j
  • else set i to arr[i-1].
    • if pattern[i] matches pattern[j], arr[j]=arr[i]+1
    • else pattern[j]=0

Pseudocode for Knuth-Morris-Pratt (KMP):

  • Initialize j=0, i=0
  • Start comparing pattern[j] with current window of text (i)
  • If pattern[j] matches current window of text, increment i and j
  • If a mismatch occurs;
    • arr[j-1] gives the number of characters matched before mismatch
    • initialize j to arr[j-1] and repeat the process
  • if last character of pattern matches the one in text, Pattern is found

Knuth-Morris-Pratt (KMP) Java Program:

class KMP_String_Matching { void KMPSearch(String pat, String txt) { int M = pat.length(); int N = txt.length(); int arr[] = new int[M]; int j = 0; // for traversing through pattern computearrArray(pat, M, arr); int i = 0; // for traversing through text while (i < N) { if (pat.charAt(j) == txt.charAt(i)) { j++; i++; } if (j == M) { System.out.println("Found pattern "+ "at index " + (i - j)); j = arr[j - 1]; } // mismatch occurs else if (i < N && pat.charAt(j) != txt.charAt(i)) { if (j != 0) j = arr[j - 1]; else i = i + 1; } } } void computearrArray(String pat, int M, int arr[]) { int len = 0; int i = 1; arr[0] = 0; // arr[0] is always 0 while (i < M) { if (pat.charAt(i) == pat.charAt(len)) { len++; arr[i] = len; i++; } else // (pat[i] != pat[len]) { if (len != 0) { len = arr[len - 1]; } else // if (len == 0) { arr[i] = len; i++; } } } } public static void main(String args[]) { String txt = "ABCXABCDABXABCDABCDABCY"; String pat = "ABCDABCY"; new KMP_String_Matching().KMPSearch(pat, txt); } }

Solution Worked 9 UpvotesUpvote

        

Solution Didn't Worked 3 DownvotesDownvote



Comments

1 comment
  • Some random Computerscience student

    i have no idea whats going on



Search