relevanssi_remove_punctuation

apply_filters( 'relevanssi_remove_punctuation', string $string )

This filter hook removes the punctuation from a string.

Parameters

$string
(string) A string of text.

More information

Removing punctuation is a vital part of the indexing process. Relevanssi tokenizer uses this filter hook to remove punctuation from all indexed content and all search terms. Relevanssi either removes the punctuation, replaces it with spaces or, in some cases, keeps it.

The default behaviour of Relevanssi is in the relevanssi_remove_punct() function, which Relevanssi hooks to this filter hook on priority 10. In general, that function and Relevanssi settings provide many ways to adjust the punctuation removal, so you often don’t need to do anything with this filter hook.

From Relevanssi’s advanced indexing settings, you can control how Relevanssi handles hyphens and dashes, apostrophes, ampersands and decimal separators. For simple punctuation replacements, you can use the relevanssi_punctuation_filter filter hook. You can adjust the default replacement with the relevanssi_default_punctuation_replacement filter hook.

So, what does that leave for this filter hook? You need this filter hook for the cases where you want to keep some punctuation. Because Relevanssi eventually sweeps away all punctuation (as defined by the regex /[[:punct:]]/ pattern), the only way to keep the punctuation is to temporarily convert it to something else and then convert it back. That’s what Relevanssi does.

When adding custom functions to this filter hook, you must use a priority lower than 10 to process the original content.

Relevanssi uses this filter hook to support the wildcard searching in Premium. The wildcard symbols (* and ?) are punctuation, so they need to be protected to keep them.

Removing accents

Relevanssi runs remove_accents() on this hook on priority 9. This function strips all the accents from the letters. This behaviour is enabled by default because most database collations ignore accents. The letters a, á, à, â and ä are all considered equal.

In German locales, you also need this:

add_filter( 'relevanssi_remove_punctuation', function( $string ) {
  $chars = array(
    'ü' => 'u',
    'ä' => 'a',
    'ö' => 'o',
    'Ü' => 'U',
    'Ä' => 'A',
    'Ö' => 'O',
  );
  return strtr( $string, $chars );
}, 8 );

In Danish locales, use this:

add_filter( 'relevanssi_remove_punctuation', function( $string ) {
  $chars = array(
    'å' => 'a',
    'ö' => 'o',
    'Å' => 'A',
    'Ö' => 'O',
  );
  return strtr( $string, $chars );
}, 8 );

In Norwegian locales, use this:

add_filter( 'relevanssi_remove_punctuation', function( $string ) {
  $chars = array(
    'å' => 'a',
    'ø' => 'o',
    'æ' => 'ae',
    'Å' => 'A',
    'Ø' => 'O',
    'Æ' => 'AE',
  );
  return strtr( $string, $chars );
}, 8 );

If you use a database collation that does not ignore accents, unhook remove_accents():

// From a theme:
remove_filter( 'relevanssi_remove_punctuation', 'remove_accents', 9 );

// From a plugin:
add_action( 'init', function() {
  remove_filter( 'relevanssi_remove_punctuation', 'remove_accents', 9 );
} );

Keeping punctuation

To keep some punctuation, you must convert it to something else before Relevanssi runs the default punctuation removal and re-convert it afterwards. Here’s how you would keep brackets:

add_filter('relevanssi_remove_punctuation', 'rlv_keep_brackets_1', 9);
function rlv_keep_brackets_1($a) {
	$a = str_replace( '[', 'OPENINGBRACKET', $a );
	$a = str_replace( ']', 'CLOSINGBRACKET', $a );
	return $a;
}
 
add_filter('relevanssi_remove_punctuation', 'rlv_keep_brackets_2', 11);
function rlv_keep_brackets_2($a) {
	$a = str_replace('OPENINGBRACKET', '[', $a);
	$a = str_replace('CLOSINGBRACKET', ']', $a);
	return $a;
}

On priority 9, before the default process, the first function converts the brackets into keywords. It doesn’t matter what the keywords are, as long as they’re something that doesn’t appear anywhere in your posts to avoid false positives. You can’t also use any punctuation in the keywords! On priority 11, after the default process, the second function converts the bracket keywords back to brackets.

Keeping periods inside words

You can control decimal periods from Relevanssi settings, but you can use this function to keep periods inside words (like in domain names). This function does not keep the periods inside words but removes them. Removing them is functionally the same as keeping them; just easier. String "www.example.com" becomes "wwwexamplecom" instead of "www example com", which has the same effect as keeping the periods. Add this to your site and rebuild the index:

add_filter( 'relevanssi_remove_punctuation', function( $a ) {
    $a = preg_replace( '/(\w)\.(\w)/', '\1\2', $a );
    $a = preg_replace( '/(\w)\.(\w)/', '\1\2', $a );
    return $a;
}, 8 );

Removing ordinals

If you want to search numbers and use whole word matching, ordinal suffixes can be a headache: if you search for “30”, you won’t find “30th”. With partial matching, searching for “30” will find “30th”, but it will also find “300”. This function removes the ordinal suffixes:

add_filter( 'relevanssi_remove_punctuation', 'rlv_ordinals' );
function rlv_ordinals( $word ) {
  $word = preg_replace( '/(\d)[st|nd|rd|th]/', '\1', $word );
  return $word;
}