Posted on

Indexing attachment file names

Relevanssi has been working nicely for the normal usecase. But how does one setup indexing of attachment files. When someone searches by a file name or an extension like pdf, there are no results. I have enabled ‘attachment’ at Relevanssi ‘Indexing options’ and still there are no results.

Relevanssi doesn’t index attachment file names. For attachments, Relevanssi indexes title and description.

However, adding the file name to the Relevanssi index is simple using the relevanssi_content_to_index filter hook. Add this function to your site and rebuild the index:

add_filter( 'relevanssi_content_to_index', 'rlv_add_filenames', 10, 2 );
function rlv_add_filenames( $content, $post ) {
    if ( 'attachment' === $post->post_type ) {
        $content .= ' ' . basename( $post->guid );
    }
    return $content;
}

Notice that by default Relevanssi replaces periods with spaces, so “sample.pdf” is indexed as “sample pdf”. That should not be a problem, as also in searching “sample.pdf” becomes “sample pdf”.

However, if you are using OR search, this can lead to lots of useless results when searching for one pdf will match all pdfs.

Originally asked here.

7 comments Indexing attachment file names

  1. I have post type named: asset
    It has a custom field named: attachment-asset
    So, how to I index by filename of the files attached in the custom field?
    Is it possible in the free version?

    1. Use the relevanssi_content_to_index filter hook as described on the page. It works in the free version as well. How to exactly get the filename from your custom field depends on how the data is stored and so on, so I can’t comment on that.

  2. Is there a way to do the opposite ? It look like it’s now a default behavior… I was searching for “2020” and it find all the files uploaded in 2020

    It look a bit weird, because the search result like that:

    …standard no edgtf-masonry-image-square no standard edgtf-masonry-image-square no social_networks standard edgtf-masonry-image-square no standard edgtf-masonry-image-square no standard edgtf-masonry-image-square no standard edgtf-masonry-image-square no no no no no sidebar-33-right yes https://www.clientdomain.ca/wp-content/uploads/2020/11/takeno.jpg no yes header-bottom…

    It set to expand on sortcode and all, but since the url is in the shortcode, I found it to be problematic.

    1. It’s not the default behaviour, but looks like your site is indexing garbage data. Are you perhaps indexing all custom fields and including meta fields? If not, you probably need to use filters to clean up the contents of your posts before indexing.

      1. My website rely heavily on ACF, so yes I’m indexing meta fields. Do you have any clues / hint for me on how to get that fixed ?

      2. My settings for Custom fields is “Visible”, if I put “Some” I need to know each fieldname of all my ACF that I want to index and that’s fine, but I use 1 repeater field, so it will be impossible I think right ? (It can have between 1 or 2 sub fields, can I just add fieldname_0_subfieldname and fieldname_1_subfieldname ?)

        1. “Visible” should exclude the ACF metadata fields (those start with an underscore and thus are not visible). If you want to list specific fields, Relevanssi supports the % notation, so you can list fieldname_%_subfieldname to capture all repeater fields.

          The search result looks like it contains CSS classes; those in general should be stripped out, but it may be your data is in a format where the HTML doesn’t get removed correctly and the CSS classes remain.

          You can filter out the content with relevanssi_excerpt_content and relevanssi_custom_field_value to make sure unwanted parts of the content don’t end up in the excerpts and the index.

Leave a Reply

Are you a Relevanssi Premium customer looking for support? Please use the Premium support form.

Your email address will not be published. Required fields are marked *