The Boyer-Moore algorithm is a highly efficient string searching algorithm that finds occurrences of a pattern within a text.
It works by scanning the text from left to right, comparing the pattern to substrings of the text.
However, it employs two main heuristics to skip comparisons that are unlikely to result in a match, making it much faster than naive string search algorithms.
pseudocode:
function BoyerMooreSearch(text, pattern):
create bad_character_table for pattern
create good_suffix_table for pattern
i = 0
while i <= length(text) - length(pattern):
j = length(pattern) - 1
while j >= 0 and pattern[j] == text[i + j]:
j = j - 1
if j < 0:
// pattern found
return i
else:
shift = max(1, j - bad_character_table[text[i + j]])
shift = max(shift, good_suffix_table[j])
i = i + shift
return -1
function create_bad_character_table(pattern):
table = {} // dictionary to store last occurrence index of each character
for each character c in pattern:
table[c] = -1
for i from 0 to length(pattern) - 1:
table[pattern[i]] = i
return table
function create_good_suffix_table(pattern):
table = [0] * length(pattern)
last_prefix_position = length(pattern)
for i from length(pattern) - 1 down to 0:
if is_prefix(pattern, i + 1):
last_prefix_position = i + 1
table[length(pattern) - 1 - i] = last_prefix_position - i + length(pattern) - 1
for i from 0 to length(pattern) - 2:
suffix_length = get_suffix_length(pattern, i)
if pattern[i - suffix_length] != pattern[length(pattern) - 1 - suffix_length]:
table[suffix_length] = length(pattern) - 1 - i + suffix_length
return table
function is_prefix(pattern, p):
for i from p to length(pattern) - 1:
if pattern[i] != pattern[i - p]:
return False
return True
function get_suffix_length(pattern, p):
length = 0
i = p
j = length(pattern) - 1
while i >= 0 and pattern[i] == pattern[j]:
length = length + 1
i = i - 1
j = j - 1
return length
functionboyerMoore(text, pattern) {consttextLength=text.length;constpatternLength=pattern.length;constlastOccurrence=getLastOccurrence(pattern);let textIndex = patternLength -1;let patternIndex = patternLength -1;while (textIndex < textLength) {if (text[textIndex] === pattern[patternIndex]) {if (patternIndex ===0) {// Match foundreturn textIndex; } textIndex--; patternIndex--; } else {constlastOccur= lastOccurrence[text[textIndex]]; textIndex += patternLength -Math.min(patternIndex,1+ lastOccur); patternIndex = patternLength -1; } }// No match foundreturn-1;}functiongetLastOccurrence(pattern) {constlastOccurrence= {};for (let i =0; i <pattern.length; i++) { lastOccurrence[pattern[i]] = i; }return lastOccurrence;}// Example usage:consttext="ababcababcabcabc";constpattern="abcabc";constindex=boyerMoore(text, pattern);if (index !==-1) {console.log(`Pattern found starting at index ${index}`);} else {console.log("Pattern not found in the text.");}
In this implementation:
The boyerMoore function takes the text and pattern as input and returns the index where the pattern first occurs in the text, or -1 if it doesn't occur.
The getLastOccurrence function preprocesses the pattern to create a map of the last occurrence of each character in the pattern.
The main loop in boyerMoore iterates through the text, comparing characters from right to left in both the pattern and the text. If a mismatch is found, the algorithm uses the last occurrence heuristic to skip unnecessary comparisons.
If a match is found, the function returns the starting index of the pattern in the text.