Relevanssi is language-agnostic in itself. It does not know any language and doesn’t care about which language the site uses.
However, there are a few things that you need to consider when using Relevanssi in languages other than English.
Characters: use UTF8
As long as your site uses UTF8 characters, Relevanssi can handle just about anything you throw at it – you can even search for emojis. UTF8 is the standard in WordPress, and you generally don’t have to worry about it.
Words: bad news for Chinese and Japanese
While Relevanssi can read Chinese, Japanese and many other characters without problems, the lack of distinct words in these languages is a problem for Relevanssi.
Relevanssi works by splitting the posts into words at spaces and then counting how many times those words appear. Since Chinese and Japanese texts don’t have spaces separating words, Relevanssi can’t do this.
As a result, Relevanssi can search for Chinese and Japanese characters or character sequences, especially if you enable one-character words and inside-word matching in Relevanssi settings. Still, since the weights for the posts are essentially random, the results won’t be of high quality.
Unfortunately, making the search work well in Chinese, Japanese and other languages with similar characteristics requires advanced linguistics and is far beyond our capabilities.
Update 25.11.2020: Matthew Wang has suggested using a Chinese language segmentation tool like phpjieba. If you have the jieba()
function installed on your site, you can use it for tokenizing Chinese text like this:
add_filter( 'relevanssi_remove_punctuation', 'rlv_use_jieba' ); function rlv_use_jieba( $string ) { $string = jieba( $string, 1, 1500 ); $string = @implode( ' ', $string ); return $string; }
For Japanese, there’s Limelight.
Did you mean suggestions: limited to Latin characters
While Relevanssi can search Arabic, Russian or other non-Latin character sets, the “Did you mean” suggestions in Relevanssi Premium only support Latin characters.
The way these suggestions work is that when Relevanssi searches, Relevanssi then modifies the search term in different ways by adding or removing letters in it. Relevanssi does these modifications with the Latin alphabet (mainly the English alphabet, with a few extra umlauts thrown in). This alphabet use restricts the Premium “Did you mean” feature to text in the Latin alphabet.
The simpler “Did you mean” feature in the free version of Relevanssi should work with most character sets, as it uses the user searches, but it’s less reliable in other ways.
Relevanssi has a filter hook relevanssi_didyoumean_alphabet
for replacing the alphabet used. Here are some replacement alphabets:
Russian
add_filter( 'relevanssi_didyoumean_alphabet', function() { return 'абвгдеёжзийкмнопрстуфхцчшщъыьэюя'; } );
Arabic
add_filter( 'relevanssi_didyoumean_alphabet', function() { return 'ابتثجحخدذرزسشصضطظعغفقكفمنههيآإأؤئى'; } );
Polish
add_filter( 'relevanssi_didyoumean_alphabet', function() { return 'aąbbcćdeęfghijklłmnńoóprsśtuwyzźż'; } );
Vietnamese
add_filter( 'relevanssi_didyoumean_alphabet', function() { return 'aáàâậăằảbcdđeẹêệềghiịklmnoóọôộơớợpqrstuụủưựứvxyỹ'; } );
Hebrew
add_filter( 'relevanssi_didyoumean_alphabet', function() { return 'אבגדהוזחטיכלמנעפצקרשתםןףךץ'; } );
Stemming and suffix stripping
Relevanssi Premium includes a simple English-language stemmer that changes word forms to more basic forms to make the searching less dependent on exact word form.
To enable the English stemmer, add this to your site and rebuild the index:
add_filter( 'relevanssi_stemmer', 'relevanssi_simple_english_stemmer' );
Other languages:
- Finnish: Simple Finnish stemmer.
- French: Simple French plural support.
- German: Simple German stemmer.
- Korean: Korean postposition stripping.
These simple stemmers are not very good, though, so I recommend using a proper Snowball stemmer. It’s available as an add-on plugin and is slightly harder to set up, but the results are better, and the plugin supports over dozen languages.
Get the Snowball Stemmer add-on plugin here.
Arabic diacritics
You can improve the Relevanssi Arabic support by removing diacritics with this function. Add this to your site:
add_filter( 'relevanssi_remove_punctuation', 'rlv_arabic_remap', 9 ); /** * Remove Arabic diacritics. * * @param string $a The text to remove punctuation from. * * @return string The same text with punctuation and diacritics removed. */ function rlv_arabic_remap( $a ) { $remap = array( 'إ' => 'ا', 'آ' => 'ا', 'أ' => 'ا', 'ئ' => 'ى', 'ة' => 'ه', 'ؤ' => 'و', 'ـ' => '', 'آ' => 'ا', ); $diacritics = array( '~[\x{0600}-\x{061F}]~u', '~[\x{063B}-\x{063F}]~u', '~[\x{064B}-\x{065E}]~u', '~[\x{066A}-\x{06FF}]~u', ); $a = preg_replace( $diacritics, '', $a ); $a = str_replace( array_keys( $remap ), array_values( $remap ), $a ); return $a; }
After adding the code, make sure you rebuild the index. This function will remove the diacritics and map some characters to their simpler forms in the index and user searches, enabling the search to find more results.
Hi guys, please help me figure this out. I use your plugins (free version) on my site there are 2 languages English and Estonian for some reason in the search I look only for English I can’t find the reason for the translation plugin TranslatePress – Multilingual. Please direct me where to look. Thank you very much!
Aleksandr, Relevanssi can’t work with TranslatePress. It uses a method that is unfortunately incompatible with Relevanssi.
Hi, I would like to know if there is a detailed tutorial for the Chinese Segmentation Tool? I don’t quite understand how to install and use the phpjieba tool. A detailed way of using it would be much appreciated!
Harry, unfortunately not. It’s a complex tool, and I don’t have the resources to produce a detailed tutorial for it at the moment.
Hey Mikko;
We use persian language. It’s like arabic characters but it has 4 more character than arabic.
Is there any way that Relevanssi guess the equal character of the english in persian and vice versa.
Specially in names and brands, people search either in persian or english.
for example SAMSUNG is written like “سامسونگ” in persian
S=س
A=ا
M=م
S=س
U=و
N=ن
G=گ
and for sentense they can search link:
SAMSUNG mobile=
موبایل سامسونگ =
موبایل SAMSUNG
Is there any way that relevanssi can understand this equal characters or not?
Or I should make a synonym for every brand and names in plugin setting?
Yes, it’s possible – you could have Relevanssi transliterate everything so all the content would be indexed in Latin or Persian characters. I suppose in your case, it would make more sense to transliterate Latin characters to Persian.
You can find examples of such transliteration functions in the
relevanssi_remove_punctuation
documentation. If you create a function that converts each Latin character to the matching Persian character, your search will work in Persian and Latin.For the SAMSUNG example above I should use this:
add_filter( ‘relevanssi_remove_punctuation’, function( $string ) {
$chars = array(
‘S’ => ‘س’,
‘A’ => ‘ا’,
‘M’ => ‘م’,
‘U’ => ‘و’,
‘N’ => ‘ن’,
‘G’ => ‘گ’,
‘س’ => ‘S’,
‘ا’ => ‘A’,
‘م’ => ‘M’,
‘و’ => ‘U’,
‘ن’ => ‘N’,
‘گ’ => ‘G’,
);
return strtr( $string, $chars );
}, 8 );
Is it right?
And can i assign one character to more than one? like:
‘S’ => ‘س’,
‘S’ => ‘ث’,
‘S’ => ‘ص’,
Because all these characters pronounce as S in english.
And whats that number “8” at the end of the code.
I appreciate your help.
Do it one way only. So in this case, you’d do
‘S’ => ‘س’,
, but not‘س’ => ‘S’,
. You can only map each letter to one letter; after the first replacement is made, there are no S characters remaining in the text, so the other replacements won’t happen.The number 8 is the priority of the filter function. 8 is good there.
You mean if I add
“‘S’ => ‘س’”
in the filter, it will work in both ways, I mean:
If there is a word in text with “س” , plugin will see it as a “S”
If there is a word in text with “S” , plugin will see it as a “س”
Is it right?
No, it means that if there’s an “S” in the text, the plugin will see it as “س”, and if there’s an “S” in the search terms, it will also be seen as “س”. This way it’ll work out fine: it doesn’t matter which form is in the text and in the search terms, internally it’s all Persian.
I’ve added this snippet:
add_filter( ‘relevanssi_remove_punctuation’, function( $string ) {
$chars = array(
‘A’ => ‘ا’,
‘B’ => ‘ب’,
‘C’ => ‘س’,
‘D’ => ‘د’,
‘E’ => ‘ی’,
‘F’ => ‘ف’,
‘G’ => ‘ج’,
‘H’ => ‘ح’,
‘I’ => ‘ی’,
‘J’ => ‘ج’,
‘K’ => ‘ک’,
‘L’ => ‘ل’,
‘M’ => ‘م’,
‘N’ => ‘ن’,
‘O’ => ‘و’,
‘P’ => ‘پ’,
‘Q’ => ‘ک’,
‘R’ => ‘ر’,
‘S’ => ‘س’,
‘T’ => ‘ت’,
‘U’ => ‘ئ’,
‘V’ => ‘و’,
‘W’ => ‘و’,
‘X’ => ‘ز’,
‘Y’ => ‘ی’,
‘Z’ => ‘ز’,
);
return strtr( $string, $chars );
}, 8 );
And my WP got down with this error in error_log:
[10-Oct-2024 05:43:24 UTC] PHP Fatal error: Uncaught Error: Undefined constant “‘relevanssi_remove_punctuation’” in /functions.php:207
Stack trace:
#0 /wp-settings.php(668): include()
#1 /wp-config.php(113): require_once(‘/home/kazemibi/…’)
#2 /wp-load.php(50): require_once(‘/home/kazemibi/…’)
#3 /wp-blog-header.php(13): require_once(‘/home/kazemibi/…’)
#4 //index.php(17): require(‘/home/kazemibi/…’)
#5 {main}
thrown in /functions.php on line 207
Replace the backticks (`) around
relevanssi_remove_punctuation
with simple apostrophes (‘).How can I make this characters work? (because of special characters I get PHP error)
‘,’ => ‘و’,
‘;’ => ‘ک’,
‘’’ => ‘گ’,
‘[’ => ‘ج’,
‘]’ => ‘چ’,
‘\’ => ‘پ’,
Persian keyboard map is like: https://kbdlayout.info/KBDFA/
I wanna make a replacement for this characters to make a meaningful search if a user types persian in english keyboard,
Reza, what’s the exact error? In general, you don’t need to make punctuation work – it’s ignored in searching anyway, so you should instead remove the matching Persian characters.