Solr for two sites running Drupal 6 Search on Tomcat 6 / CentOS 6

Note this tutorial sets up two seperate solr applications in tomcat, not multi core in one java application.

ApacheSolr for Drupal 6 improves on the out-of-the-box search experience for Drupal users. The easiest way to get Solr running on your Drupal web site is to use the hosted service provided by Acquia; it is way easier than running your own Solr. You simply point your queries to their Solr server and you’re done.

For various reasons, you might want to run your own Solr web service on your own machine. In this article, I will walk you through setting up a working Solr installation using Tomcat 6 on CentOS 6. The end result of this walkthrough will be two separate Solr indexes (via two separate Solr web apps) for two different web sites running on a single Tomcat. I will assume that you are using Acquia’s Drupal (which ships with SolrPHPClient).

Warning: This article assumes all services are on a single machine (suitable for a small organization). Running Solr on a separate machine is possible but raises security implications that are outside the scope of this article.

These are the tasks that we will work on:

  1. Set-up Solr
  2. Set-up Tomcat
  3. Tweak CentOS security thinger (SELinux)
  4. Configure Acquia Drupal

Prerequisites

The prerequisites are:

  • CentOS 6 Web Server w/ PHP 5.3, MySQL 5, Tomcat 6, Java 6 (all services running w/ no problemos)
  • Acquia Drupal 6 installed
  • Familiarity with Drupal (basic skills – enabling modules, setting permissions on nodes, etc)
  • Familiarity with Java & Tomcat (basic skills)
  • Familiarity working with Linux in a terminal and vi (intermediate skills)

This is my system (a web server set-up with Anaconda):

# uname -a Linux templeton.localdomain 2.6.32-71.29.1.el6.i686 #1 SMP Mon Jun 27 18:07:00 BST 2011 i686 i686 i386 GNU/Linux # cat /etc/redhat-release CentOS Linux release 6.0 (Final) # yum list installed | grep mysql-server mysql-server.i686 5.1.52-1.el6_0.1 @updates # yum list installed | grep php php.i686 5.3.2-6.el6_0.1 @updates php-cli.i686 5.3.2-6.el6_0.1 @updates php-common.i686 5.3.2-6.el6_0.1 @updates php-gd.i686 5.3.2-6.el6_0.1 @updates php-mysql.i686 5.3.2-6.el6_0.1 @updates php-pdo.i686 5.3.2-6.el6_0.1 @updates php-pear.noarch 1:1.9.0-2.el6 @anaconda-centos-201106051823.i386/6.0 php-xml.i686 5.3.2-6.el6_0.1 @updates # java -version java version "1.6.0_17" OpenJDK Runtime Environment (IcedTea6 1.7.5) (rhel-1.31.b17.el6_0-i386) OpenJDK Client VM (build 14.0-b16, mixed mode) # yum list installed | grep tomcat6 tomcat6.noarch 6.0.24-24.el6_0 @updates tomcat6-el-2.1-api.noarch tomcat6-jsp-2.1-api.noarch tomcat6-lib.noarch 6.0.24-24.el6_0 @updates tomcat6-servlet-2.5-api.noarch # /sbin/service tomcat6 status tomcat6 (pid 1790) is running... [ OK ] # sestatus SELinux status: enabled SELinuxfs mount: /selinux Current mode: enforcing Mode from config file: enforcing Policy version: 24 Policy from config file: targeted

Notice the hashmark (#) as my terminal prompt. It denotes that I am executing all these commands as root (use ‘su -’). You can also prefix the following commands with ‘sudo’.

Download Solr

Obtain a copy of the Solr tarball from a nearby mirror:

http://www.apache.org/dyn/closer.cgi/lucene/solr/

Select Solr 1.4.1 or the latest recommended Solr:

ie. http://apache.sunsite.ualberta.ca//lucene/solr/1.4.1/

I’m using the 54M GZipped Tarball and downloading it using wget:

# wget http://apache.sunsite.ualberta.ca//lucene/solr/1.4.1/apache-solr-1.4.1.tgz 
--2011-09-02 02:06:05-- http://apache.sunsite.ualberta.ca//lucene/solr/1.4.1/apache-solr-1.4.1.tgz Resolving apache.sunsite.ualberta.ca... 129.128.5.190 Connecting to apache.sunsite.ualberta.ca|129.128.5.190|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 56374837 (54M) [application/x-tar] Saving to: “apache-solr-1.4.1.tgz” 100%[=============================================>] 56,374,837 261K/s in 6m 20s 2011-09-02 02:12:42 (145 KB/s) - “apache-solr-1.4.1.tgz” saved [56374837/56374837] # tar zxvf apache-solr-1.4.1.tgz apache-solr-1.4.1/client/ ... # pwd /root

Copy the Solr package somewhere reasonable like in the /opt folder:

# mkdir -p /opt/solr 
# cp -r -p /root/apache-solr-1.4.1 /opt/solr

Link it (the Solr WAR file) to the Tomcat library directory:

# ln -s /opt/solr/apache-solr-1.4.1/dist/apache-solr-1.4.1.war /usr/share/tomcat6/lib/solr.war

In the future, when you upgrade your software, install the Solr upgrade and update the symlink.

Create Solr directories

You need to choose where your Solr indexes will be kept. I put them into the /var directory and that’s where I’m assuming that you will put yours:

# mkdir -p /var/solr 
# cp -r -p /opt/solr/apache-solr-1.4.1/example/solr/ /var/solr/
# mv /var/solr/solr /var/solr/example.com
# ls -l /var/solr/example.com/
total 12
drwxr-xr-x. 2 root root 4096 Sep 2 02:44 bin
drwxr-xr-x. 3 root root 4096 Sep 2 02:44 conf
-rw-r--r--. 1 root root 2259 Sep 2 02:44 README.txt

Each domain has its own Solr indexes located in ‘data‘ and its own configuration files in ‘conf‘. There are two optional directories: ‘bin‘ (for replication scripts) and ‘lib‘ (for plugins). Unless your other apps use them, chances are they will be missing.

Install Drupal ApacheSolr plugin protwords, schema and solrconfig

You should already have Acquia Drupal 6 running or Drupal 6 with the ApacheSolr plugin installed. You can copy the ‘protwords.txt’, ‘schema.xml’, and ‘solrconfig.xml’ files from the plugin directory in your respective distribution rather than downloading it, but adjust the paths accordingly.

If you don’t already have the ApacheSolr plugin, get it from the Drupal web site.

http://drupal.org/project/apachesolr

 

Choose the latest Tarball and use wget to download it to your server, then copy the ApacheSolr configuration files (and backup originals using ‘b’ flag):

# wget http://ftp.drupal.org/files/projects/apachesolr-6.x-1.5.tar.gz # tar zxvf apachesolr-6.x-1.5.tar.gz ... # echo 'If ur root cp may give u a scary msg next cmd! Ignore it! Y to overwrite!' If ur root cp may give u a scary msg next cmd! Ignore it! Y to overwrite! # # cp -b -p -f apachesolr/protwords.txt /var/solr/example.com/conf # cp -b -p -f apachesolr/schema.xml /var/solr/example.com/conf # cp -b -p -f apachesolr/solrconfig.xml /var/solr/example.com/conf # # echo 'Fix group so tomcat can use this!' Fix group so tomcat can use this! # # chown -R root:tomcat /var/solr/example.com # chmod -R 775 /var/solr/

Warning! If you are not using the Acquia distribution and instead installed the ApacheSolr plugin from the main Drupal web site then you should check that you have a copy of the SolrPhpClient (version r22 – see module README for the gory details). The Acquia distribution includes the correct SolrPhpClient (so you might want to use that instead?).

Make the two Solr instances for the two domains

This walkthrough will create two domains, but you can create more. Using the example.com folder as a prototype, just recursively copy it twice to make two domains (use ‘p’ switch to ‘preserve’ the file permissions and settings):

# cp -r -p /var/solr/example.com /var/solr/www1.kelvinwong.ca # cp -r -p /var/solr/example.com /var/solr/www2.kelvinwong.ca

If the future, to add a new domain, copy the example.com folder you just made and customize it. This will also work for additional domains that you want to support.

Configure Tomcat 6

It’s All About Context: The Context element represents a web application run within a particular Tomcat virtual host. Each web application is based on a Web Application Archive (WAR) file or a corresponding unpacked directory. The web application used to process each web request is determined by matching the request to the path of each Context. You may define as many Context elements as you wish, but each Context MUST have a unique path. More on Context

Contexts are no longer put into Tomcat’s server.xml file since that file is read only at server start-up. Instead Contexts are placed into a folder hierarchy under CATALINA_BASE (on CentOS 6 it is /etc/tomcat6). Create and configure the following files:

# touch /etc/tomcat6/Catalina/localhost/www1.kelvinwong.ca.xml # touch /etc/tomcat6/Catalina/localhost/www2.kelvinwong.ca.xml # chown tomcat:root /etc/tomcat6/Catalina/localhost/{www1.kelvinwong.ca.xml,www2.kelvinwong.ca.xml} # chmod 664 /etc/tomcat6/Catalina/localhost/{www1.kelvinwong.ca.xml,www2.kelvinwong.ca.xml}

Tomcat will use these files to find the WAR and deploy the application using the settings in the Context. Note: Contexts can be overridden (they often are) and there are more than a few in Tomcat. Review Tomcat’s documentation if they give you any trouble.

Make sure your Context fragments have .xml suffixes!

Place the following into /etc/tomcat6/Catalina/localhost/www1.kelvinwong.ca.xml

# vi /etc/tomcat6/Catalina/localhost/www1.kelvinwong.ca.xml
<?xml version="1.0" encoding="utf-8"?>
<Context docBase="/usr/share/tomcat6/lib/solr.war" debug="0" crossContext="true" >
   <Environment name="solr/home" type="java.lang.String" value="/var/solr/www1.kelvinwong.ca" override="true" />
</Context>

The Context fragment is simply telling Tomcat where to find the Context root (document base). It is an absolute path to its web app archive (WAR) file. CrossContext allows Solr to get a request dispatcher from ServletContext.getContext() for access to other web apps on the virtual host. The Environment tag defines the ‘solr/home‘ setting and allows it to be overridden. That’s all you need.

Change the other fragment:

# vi /etc/tomcat6/Catalina/localhost/www2.kelvinwong.ca.xml

Change the paths:

 
<?xml version="1.0" encoding="utf-8"?>
<Context docBase="/usr/share/tomcat6/lib/solr.war" debug="0" crossContext="true" >
   <Environment name="solr/home" type="java.lang.String" value="/var/solr/www2.kelvinwong.ca" override="true" />
</Context>

Bind Tomcat to Local Port

By default, Tomcat listens on port 8080. The default iptables ruleset in CentOS 6 does not allow remote connections to port 8080. For our purposes this is fine since we want our Drupal sites to connect locally on port 8080. Local good, remote bad.

You can also tell Tomcat to bind to localhost and not any of the other network adapters. Open Tomcat’s server.xml file:

# vi /etc/tomcat6/server.xml

Change Tomcat’s binding address to the localhost address (127.0.0.1) in the Connector tag:

69
70
71
72
73
74
    <Connector port="8080" protocol="HTTP/1.1" 
 connectionTimeout="20000" 
 redirectPort="8443" 
 URIEncoding="UTF-8"
 maxHttpHeaderSize="65535"
 address="127.0.0.1" />

Solr is a web service that takes many requests from Drupal using the HTTP GET method, similar to you typing into your browser’s web address bar. These requests routinely get very long; you can increase the GET request character limit by increasing the maxHttpHeaderSize attribute (from 8k to 64k as shown). To handle non-English characters, you should also set the request encoding to UTF-8. The Connector as-shown does both.

Restart Tomcat to reload the server.xml file:

# /sbin/service tomcat6 restart Stopping tomcat6: [ OK ] Starting tomcat6: [ OK ]

View Solr Admin (optional)

You should now be able to view the Solr administration page if you open a local web browser on the server. If you don’t have a desktop on the server (as should be the case), you can use a text-browser like elinks.

View http://localhost:8080/www1.kelvinwong.ca/admin:

# elinks http://localhost:8080/www1.kelvinwong.ca/admin

You should see the Solr administration page in your browser.

SELinux

“Apache Solr: Your site was unable to contact the Apache Solr server,” reports Drupal; SELinux chuckles.

SELinux is enabled by default on CentOS 6, so you will likely have it running and it will not appreciate Apache trying to talk to Tomcat/Solr on port 8080 (check /var/log/audit/audit.log):

type=AVC msg=audit(1315100262.891:17629): avc: denied { name_connect } for pid=2064 comm="httpd" dest=8080 scontext=unconfined_u:system_r:httpd_t:s0 tcontext=system_u:object_r:http_cache_port_t:s0 tclass=tcp_socket type=SYSCALL msg=audit(1315100262.891:17629): arch=40000003 syscall=102 success=no exit=-13 a0=3 a1=bfbe6590 a2=b70426f4 a3=11 items=0 ppid=2060 pid=2064 auid=500 uid=48 gid=48 euid=48 suid=48 fsuid=48 egid=48 sgid=48 fsgid=48 tty=(none) ses=4 comm="httpd" exe="/usr/sbin/httpd" subj=unconfined_u:system_r:httpd_t:s0 key=(null)

You can either turn off SELinux (not recommended) or fix the attributes so that SELinux allows Apache to talk to Tomcat. The handy tool sealert gives helpful advice:

# sealert -a /var/log/audit/audit.log | less Summary: SELinux is preventing the http daemon from connecting to itself or the relay ports Detailed Description: SELinux has denied the http daemon from connecting to itself or the relay ports. An httpd script is trying to make a network connection to an http/ftp port. If you did not setup httpd to make network connections, this could signal an intrusion attempt. Allowing Access: If you want httpd to connect to httpd/ftp ports you need to turn on the httpd_can_network_relay boolean: "setsebool -P httpd_can_network_relay=1" Fix Command: setsebool -P httpd_can_network_relay=1 Additional Information: Source Context unconfined_u:system_r:httpd_t:s0 Target Context system_u:object_r:http_cache_port_t:s0 Target Objects None [ tcp_socket ] Source httpd Source Path /usr/sbin/httpd Port 8080 Host <Unknown> Source RPM Packages httpd-2.2.15-5.el6.centos Target RPM Packages Policy RPM selinux-policy-3.7.19-54.el6_0.5 Selinux Enabled True Policy Type targeted Enforcing Mode Enforcing Plugin Name httpd_can_network_relay Host Name templeton.localdomainPlatform Linux templeton.localdomain 2.6.32-71.29.1.el6.i686 #1 SMP Mon Jun 27 18:07:00 BST 2011 i686 i686 Alert Count 14First Seen Sat Sep 3 18:25:40 2011Last Seen Sat Sep 3 18:37:42 2011Local ID 4b66d238-ddf7-4b74-bbe5-3fb54be5b3e4Line Numbers 178, 179, 180, 181, 182, 183, 184, 185, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211 Once upon a time and a very good time it was there was a moocow coming down along the road and this moocow that was coming down along the road met a nicens little boy named baby tuckoo[1]

The quick fix is to set the network relay flag (‘P’ flag makes the change persistent across reboots):

# setsebool -P httpd_can_network_relay=1 # getsebool httpd_can_network_relay httpd_can_network_relay --> on

You don’t need sealert to use setsebool but it is a useful utility to debug errors with SELinux. If you don’t have sealert installed, it is a simple thing to install it since it is part of the setroubleshoot package:

# yum install setroubleshoot

Configure Drupal to use Solr

Turning now to your Drupal installation…

Enable the Solr Search service module…

Configure the Apache Solr Search module by visiting http://www1.kelvinwong.ca/?q=admin/settings/apachesolr

Solr host name
localhost
Solr port
8080
Solr path
/www1.kelvinwong.ca

The Solr path is the name of your Context fragment minus the xml suffix (ie. /etc/tomcat6/Catalina/localhost/www1.kelvinwong.ca.xml)



The cron job indexes 50 nodes at a time by default. When indexed, you can then search for nodes by keyword.

Save the settings. You should see:

  • The configuration options have been saved.
  • Apache Solr: Your site has contacted the Apache Solr server.
  • Apache Solr PHP Client Library: Correct version “Revision: 22″.

Try a search

You can re-index the site by force or let cron do it gradually. Either way it take a while for Solr to process the data.


http://www1.kelvinwong.ca/?q=admin/settings/apachesolr/index


Once you have indexed your site and adjusted the permissions on the search form (so anonymous users can use the search form), visit it:


http://www1.kelvinwong.ca/?q=search/apachesolr_search


Intentionally misspell something and let Solr give you hints!

What about the other one??? www2?

Ah, yes…the other one is set-up in a similar manner, just use the following configuration in Drupal:

Solr host name
localhost
Solr port
8080
Solr path
/www2.kelvinwong.ca