Relevanssi can automatically index PDF content for the parent post if the PDF (or other attachment) is attached to the parent post in WordPress. However, that’s not always the case. Sometimes the PDF is attached to the page using an embed, which doesn’t create a connection between the posts in WordPress. Thus, Relevanssi won’t know the PDF is embedded in the post and cannot index the PDF contents for the parent post.
Most of these plugins use shortcodes to embed the PDF viewer on a page. To get Relevanssi to index the embedded PDF contents for the parent post, you need to establish a connection between the PDF and the post, based on the URL in the shortcode.
The same code works with different PDF embedders; you only have to adjust the regex to match the shortcode used by the plugin.
WordPress Core File block
If you use the default File block from WordPress, this snippet will index the PDF contents for the post where the file is embedded:
add_filter( 'relevanssi_block_to_render', function( $block ) { if ( 'core/file' === $block['blockName'] ) { $file_id = $block['attrs']['id']; $file_content = get_post_meta( $file_id, '_relevanssi_pdf_content', true ); if ( $file_content ) { $block['innerContent'][0] = $file_content; } } return $block; } );
PDF.js Viewer Shortcode
PDF.js Viewer Shortcode uses a shortcode with the file name in the url
parameter.
add_filter( 'relevanssi_content_to_index', 'rlv_pdfjs_content', 10, 2 ); function rlv_pdfjs_content( $content, $post ) { $m = preg_match_all( '/\[pdfjs-viewer url=["\'](.*?)["\']/', $post->post_content, $matches ); if ( $m ) { global $wpdb; $upload_dir = wp_upload_dir(); foreach ( $matches[1] as $pdf ) { $pdf_url = ltrim( str_replace( $upload_dir['baseurl'], '', urldecode( $pdf ) ), '/' ); $pdf_content = $wpdb->get_var( $wpdb->prepare( "SELECT meta_value FROM $wpdb->postmeta WHERE meta_key = '_relevanssi_pdf_content' AND post_id IN ( SELECT post_id FROM $wpdb->postmeta WHERE meta_key = '_wp_attached_file' AND meta_value = %s )", $pdf_url ) ); $content .= $pdf_content; } } return $content; }
PDF Embedder
PDF Embedder uses the same method, so the only change is the name of the shortcode:
add_filter( 'relevanssi_content_to_index', 'rlv_pdfembedder_content', 10, 2 ); function rlv_pdfembedder_content( $content, $post ) { $m = preg_match_all( '/\[pdf-embedder url=["\'](.*?)["\']/', $post->post_content, $matches ); if ( $m ) { global $wpdb; $upload_dir = wp_upload_dir(); foreach ( $matches[1] as $pdf ) { $pdf_url = ltrim( str_replace( $upload_dir['baseurl'], '', urldecode( $pdf ) ), '/' ); $pdf_content = $wpdb->get_var( $wpdb->prepare( "SELECT meta_value FROM $wpdb->postmeta WHERE meta_key = '_relevanssi_pdf_content' AND post_id IN ( SELECT post_id FROM $wpdb->postmeta WHERE meta_key = '_wp_attached_file' AND meta_value = %s )", $pdf_url ) ); $content .= $pdf_content; } } return $content; }
If you use the Gutenberg block, the code is different and uses the relevanssi_block_to_render
filter hook:
add_filter( 'relevanssi_block_to_render', 'rlv_pdfembedder_content', 10 ); function rlv_pdfembedder_content( $block ) { if ( $block['blockName'] === 'pdfemb/pdf-embedder-viewer' ) { $block['innerContent'] = array( get_post_meta( $block['attrs']['pdfID'], '_relevanssi_pdf_content', true ) ); } return $block; }
Wonderplugin PDF Embed
Wonderplugin PDF Embed uses a similar method; the URL of the attachment is in the attribute src
.
add_filter( 'relevanssi_content_to_index', 'rlv_wonderpdf_content', 10, 2 ); function rlv_wonderpdf_content( $content, $post ) { $m = preg_match_all( '/\[wonderplugin_pdf src=["\'](.*?)["\']/', $post->post_content, $matches ); if ( $m ) { global $wpdb; $upload_dir = wp_upload_dir(); foreach ( $matches[1] as $pdf ) { $pdf_url = ltrim( str_replace( $upload_dir['baseurl'], '', urldecode( $pdf ) ), '/' ); $pdf_content = $wpdb->get_var( $wpdb->prepare( "SELECT meta_value FROM $wpdb->postmeta WHERE meta_key = '_relevanssi_pdf_content' AND post_id IN ( SELECT post_id FROM $wpdb->postmeta WHERE meta_key = '_wp_attached_file' AND meta_value = %s )", $pdf_url ) ); $content .= $pdf_content; } } return $content; }
3D Flipbook
3D Flipbook has the flipbook post ID as the shortcode parameter, and you can find the attachment post ID in the post meta for the flipbook post:
add_filter( 'relevanssi_content_to_index', 'rlv_3dflipbook_content', 10, 2 ); function rlv_3dflipbook_content( $content, $post ) { $m = preg_match_all( '/\[3d-flip-book.*?id=["\'](.*?)["\']/', $post->post_content, $matches ); if ( $m ) { global $wpdb; foreach ( $matches[1] as $flipbook_id ) { $data = get_post_meta( $flipbook_id, '3dfb_data', true ); $pdf_content = $wpdb->get_var( $wpdb->prepare( "SELECT meta_value FROM $wpdb->postmeta WHERE meta_key = '_relevanssi_pdf_content' AND post_id = %d", $data['post_ID'] ) ); $content .= $pdf_content; } } return $content; }
TNC Flipbook
The TNC Flipbook PDF Viewer stores the PDF file name in the tnc_pvfw_pdf_viewer_fields
custom field:
add_filter( 'relevanssi_content_to_index', 'rlv_pdfviewer_content', 10, 2 ); function rlv_pdfviewer_content( $content, $post ) { global $wpdb; $field = get_post_meta( $post->ID, 'tnc_pvfw_pdf_viewer_fields', true ); if ( isset( $field[ 'file' ] ) ) { $upload_dir = wp_upload_dir(); $pdf_url = ltrim( str_replace( $upload_dir['baseurl'], '', urldecode( $field[ 'file'] ) ), '/' ); $pdf_content = $wpdb->get_var( $wpdb->prepare( "SELECT meta_value FROM $wpdb->postmeta WHERE meta_key = '_relevanssi_pdf_content' AND post_id IN ( SELECT post_id FROM $wpdb->postmeta WHERE meta_key = '_wp_attached_file' AND meta_value = %s )", $pdf_url ) ); $content .= $pdf_content; } return $content; }
DearFlip
DearFlip uses a shortcode that points to the flipbook post, which has a custom field with the PDF URL:
add_filter( 'relevanssi_content_to_index', 'rlv_dearflip_content', 10, 2 ); function rlv_dearflip_content( $content, $post ) { $m = preg_match_all( '/\[dflip.*?id=["\'](.*?)["\']/', $post->post_content, $matches ); if ( $m ) { global $wpdb; $upload_dir = wp_upload_dir(); foreach ( $matches[1] as $flipbook_id ) { $data = get_post_meta( $flipbook_id, '_dflip_data', true ); $pdf_url = ltrim( str_replace( $upload_dir['baseurl'], '', urldecode( $data['pdf_source'] ) ), '/' ); $pdf_content = $wpdb->get_var( $wpdb->prepare( "SELECT meta_value FROM $wpdb->postmeta WHERE meta_key = '_relevanssi_pdf_content' AND post_id IN ( SELECT post_id FROM $wpdb->postmeta WHERE meta_key = '_wp_attached_file' AND meta_value = %s )", $pdf_url ) ); $content .= $pdf_content; } } return $content; }
Algori PDF Viewer
Algori PDF Viewer uses a Gutenberg block:
add_filter( 'relevanssi_block_to_render', 'rlv_algoriviever_content', 10 ); function rlv_algoriviever_content( $block ) { if ( $block['blockName'] === 'algori-pdf-viewer/block-algori-pdf-viewer' ) { $block['innerContent'] = array( get_post_meta( $block['attrs']['id'], '_relevanssi_pdf_content', true ) ); } return $block; }
PDF Content in Excerpts
To get excerpts from the PDF content, you can use the same function with the relevanssi_excerpt_content
filter hook, like this:
add_filter( 'relevanssi_excerpt_content', 'rlv_pdfjs_content', 10, 2 );
This function will include the PDF content for excerpt-building. There’s a performance cost, so you have to try and see whether including the content slows down the search too much.
One option is to read the PDF content to a custom field in the relevanssi_content_to_index
hook and then use the data in the custom field in excerpt-building, which may be faster.
Good morning, I’m trying to index embebed PDF’s as in your tutorial: “indexing embedded PDFs for the parent post” but I’m having some problems. I’m using Relevanssi permanent access subscription and my PDF embebber is 3D FlipBook.
I’ve updated the post to include the code for 3D FlipBook. Does that help? If not, please use the support form and tell me how it is not working.
Thank you very much! it works!
Hi there – great plugin. I’m trying to get this to work with PDF Embedder Premium, any chance you could provide the code for this as well?
I have the site PW protected during development. Thanks in advance!
Dave, at least the PDF Embedder free version is straightforward. I’ve added that to the post. If the Premium version does something different, I’d need to know what, I don’t have any access to the Premium version.