Multisite Apache Solr Search with Domain Access

Using one Apache Solr search core with more than one Drupal website isn't too difficult; you simply use a module like Apache Solr Multisite Search, or a technique like the one mentioned in Nick Veenhof's post, Let's talk Apache Solr Multisite. This kind of technique can save you time (and even money!) so you can use one Hosted Apache Solr subscription with multiple sites. The only caveat: any site using the solr core could see any other site's content (which shouldn't be a problem if you control all the sites and don't expose private data through solr).

There are two ways to make Apache Solr Search Integration work with Domain Access (one of which works similarly to the methods mentioned above for multisite), and which method you use depends on how your site's content is structured.

Solr Search with Domain Access - Siloed Content

If the content you are indexing and searching is unique per domain (just like it would be unique per multisite Drupal instance), then you can set up Domain Access to index content with a different Apache Solr hash per site, like so:

First, in a custom module, use hook_domain_batch() to tell Domain Access to add variable configuration for the apachesolr_site_hash variable per-domain (this requires the Domain Configuration module to be enabled, as well as, obviously, the Apache Solr module):

<?php
/**
* Implements hook_domain_batch().
*
* Add batch settings to Domain Configuration for Apache Solr Search.
*/
function MODULENAME_configuration_domain_batch() {
 
$batch = array();
  if (
function_exists('apachesolr_site_hash')) {
   
$batch['apachesolr_site_hash'] = array(
     
'#form' => array(
       
'#title' => t('Apache Solr site hash'),
       
'#type' => 'textfield',
       
'#description' => t('Unique site hash for Apache Solr.'),
      ),
     
'#domain_action' => 'domain_conf',
     
'#permission' => 'administer site configuration',
     
'#system_default' => apachesolr_site_hash(),
     
'#meta_description' => t('Set Apache Solr site hash for each domain.'),
     
'#variable' => 'apachesolr_site_hash',
     
'#data_type' => 'string',
     
'#weight' => -1,
     
'#group' => t('Apache Solr'),
     
'#update_all' => TRUE,
     
'#module' => t('Apache Solr'),
    );
  }
  return
$batch;
}
?>

</pre>

Second, visit /admin/structure/domain/batch/apachesolr_site_hash and enter a different hash for each domain.

Third, use hook_apachesolr_query_alter() to alter solr queries to search using the site-specific hash:

<?php
/**
* Implements hook_apachesolr_query_alter().
*/
function MODULENAME_apachesolr_query_alter($query) {
 
// Get the current domain.
 
$domain = domain_get_domain();
 
$hash = domain_conf_variable_get($domain['domain_id'], 'apachesolr_site_hash');
 
// Add the current domain's apachesolr site hash to the query.
 
$query->addFilter('hash', $hash);
}
?>

</pre>

At this point, if you reindex all your content on all your domains, each domain will only find content specific to the domain. (This method was discussed in this issue in Domain Access's issue queue.).

Problem: Single node, multiple domains

There's a major issue that I've seen a few times with this situation: what if there is a node (or many nodes) that are published to multiple domains (shared across more than one domain)? In this case, the content will show up only when searching on the domain where Solr indexing was run first. So, if a piece of content is published to domain A and domain B, but solr indexes the node on domain A, the content won't show in results for domain B, because the apachesolr site hash for that content was set to domain A's hash.

So, to avoid this issue, we can't actually use Apache Solr's site hash when indexing nodes (or at least, we can't only use it). Instead, we need to add an array of assigned domains for each document in Apache Solr's index, and use that array to filter search results when searching on each individual domain.

Solution: Adding domain access info to the index for shared content

The fix involves three parts:

First, when indexing a document in solr, we need to add domain information to the index so we can filter our query with it later. We'll do this with hook_apachesolr_index_document_build():

<?php
/**
* Implements hook_apachesolr_index_document_build().
*/
function MODULENAME_apachesolr_index_document_build(ApacheSolrDocument $document, $entity, $entity_type, $env_id) {
  if (isset(
$entity->domains)) {
    foreach(
$entity->domains as $domain) {
     
// The gid in the {domain} table is unsigned, but the domain module makes
      // it -1 for the deault domain. Also apache doesn't like it if we query
      // for domain id -1.
     
if ($domain == -1) {
       
$domain = 0;
      }

     
// Build an apachesolr-compatible domain search index key.
     
$key = array(
       
'name' => 'domain_id',
       
'multiple' => TRUE,
       
'index_type' => 'integer',
      );
     
$key = apachesolr_index_key($key);

     
// Add domain key to document.
     
$document->setMultiValue($key, $domain);
    }
  }
}
?>

</pre>

Second, we need to filter the search query sent to Apache Solr using hook_apachesolr_query_alter() so it filters based on the domain where the search is being performed:

<?php
/**
* Implements hook_apachesolr_query_alter().
*/
function MODULENAME_apachesolr_query_alter($query) {
 
// Add domain key to filter all queries.
 
$domain = domain_get_domain();
 
$query->addParam('fq', 'im_domain_id:' . $domain['domain_id']);
}
?>

</pre>

Third, we need to change the 'url' passed into the search result, so it's a relative URL that will work on all your domains (by default, Apache Solr seems to use an absolute URL to the domain on which the node was indexed, meaning some links will link users off one domain to another domain!). You can do this using template_preprocess_search_result() in your theme (or in a custom module substituting MODULENAME for THEMENAME below):

<?php
/**
* Overrides template_preprocess_search_result().
*/
function THEMENAME_preprocess_search_result(&amp;$vars) {
 
// Create a relative link to the node, so it points to the correct domain.
 
$nid = $vars['result']['node']->entity_id;
 
$vars['url'] = check_plain(url('node/' . $nid));
}
?>

</pre>

Once all this is done, you need to reindex all your content (across all domains) before everything will start working correctly. Once that's done, you'll have multi-domain apache solr working with shared content. Nice!

The inspiration behind this method of making solr work well filtering content across multiple domains comes from the Domain Access Solr Facet module, which doesn't yet have a Drupal 7 release, but is relatively simple, and has a patch for the D7 port in the issue queue.

Comments

Hi Jeff, this is a great article, thanks, Is there any way it would work for drupal 6? The solr domain facet module doesnr see to be compatible with version 6.3. of apache solr module. Any help you could give would be greatly appreciated. My situation is nodes that have been assigned to multiple domains (sometimes using the publish to all affiliates option).
Thanks in advanced
Adam

Thanks for this. Very clear and just what I needed.

Is domain_solr.module (now in Version 7.x) an replacement for the solution described here?

That module could do most of what's suggested in this post, yes. For some situations, you'll still need to set it up in custom code, but that module may be all that's required in simpler use cases.

Hi Jeff, thanks for all the great content. I know you have moved away from Drupal but you still have lots of great resources for Drupal People. Unfortunately, the site's code formatting seems to have become broken recently. This is more of a heads up than anything. Keep up the good new content.

Eek, sorry about that! I had to remove the php filter module in my D8 upgrade, so PHP code blocks got messed up. I try to fix those when I notice them but don't always see them :(