SOLR Installation Type 3 with ZooKeeper

Installation

Hardware

As CLAIMS Direct SOLR is a pre-configured, bundled distribution of Apache SOLR, it can be deployed on any number of nodes (individual instances). A group of nodes function to expose a collection. Further, multiple collections could be searched across the distribution.

There are many scenarios for a CLAIMS Direct deployment that range from indexing the entire content of CLAIMS Direct XML to the sparse indexing of certain fields and ranges of publication dates for application-specific usage. There could also be specific QoS requirements: minimum supported queries per second, average response time et al. All of these factors play a role in planning for a CLAIMS Direct SOLR deployment. Generally speaking, a full index with the entire content of CLAIMS Direct XML requires, at a minimum:

NumberTypeSpecs
8

SOLR search server

nodes 1-3 housing the ZooKeeper quorum

minimum:
  • CPU: 2 cores
  • RAM: 16GB
  • Disk: 1TB
1processing serverminimum:
  • CPU: 4 cores
  • RAM: 16GB
  • Disk: 1TB

The ZooKeeper quorum could be placed together on SOLR search servers or, optionally, you could break out the ZooKeeper configuration into an additional 3 separate servers.

NumberTypeSpecs
3ZooKeeper configuration serverminimum:
  • CPU: 1 core
  • RAM: 2GB
  • Disk: 50GB

The following diagram represents the minimum architecture required to support a full CLAIMS Direct index.

Primary Architecture


Currently, the delivery of a fully populated CLAIMS Direct index requires the above SOLR and ZooKeeper configuration (8 SOLR servers + 3 ZooKeepers). Load balancers and web servers are required only if CLAIMS Direct Web Services (CDWS) will be installed as well. A customized deployment with select data to index is currently not offered fully populated. With a custom configuration, complete indexing will need to be done at the installation site.

Software

The CLAIMS Direct SOLR installation is a self-contained package suitable for deployment on any Linux server running Java 8. The simple prerequisite tool list follows:

NameUsed By
javaZooKeeper, SOLR and various support tools
wgetConfiguration tools (bootstrap-*.sh)
lsofStart/stop scripts (solrctl/zookeeperctl)


The configuration script setup.sh assumes that each node in the cluster will have the same directory structure. For example, if you download to a machine and unpack the archive into path /cdsolr, the full path to the package will be /cdsolr/alexandria-solr-v2.1.2-distribution. Each node must have the path /cdsolr available for deployment. You are free to choose any mount point or path as long as they are uniform across all nodes in the cluster and as long as the mount point or path for each SOLR node has at least 1TB of available disk space.

Create User asolr

It is recommended to create the user asolr.

useradd -m asolr
passwd asolr
  => <password>

Configuring SolrCloud

CLAIMS Direct SOLR uses the Apache SOLR distribution for indexing. The fully populated delivery of the index is an 8-node collection as described above. The collection name is alexandria and although configurable, it is recommended not to change this setting.

Configuration

The following table lists the applicable nodes and the function in the environment:

Note

Your IP address allocation may be different. These are configurable in solr-alexandria-vars.

IP
Function
Description
10.234.1.91node-1: * SOLRbasic SOLR node
10.234.1.92node-2: * SOLR
10.234.1.93node-3: * SOLR
10.234.1.94node-4: * SOLR
10.234.1.95node-5: * SOLR
10.234.1.96node-6: * SOLR
10.234.1.97node-7: * SOLR
10.234.1.98node-8: * SOLR

The following variables should be configured in solr-alexandria-vars:

Variable
Value
Description
ALEXANDRIA_SOLR_CLOUD_NUMSHARDS8The number of nodes in the alexandria collection

ALEXANDRIA_SOLR_CLOUD_NODES

Note: this should be one line in the configuration
10.234.1.91,10.234.1.92,10.234.1.93,10.234.1.94,

10.234.1.95,10.234.1.96,10.234.1.97,10.234.1.98

The IP addresses of each node
ALEXANDRIA_SOLR_PORT8080The port SOLR (Jetty) should listen on and accept requests
ALEXANDRIA_SOLR_CLOUD_USERasolrThis is the user that will deploy and run the ZooKeeper and SOLR services
ALEXANDRIA_SOLR_URL10.234.1.91The URL to configure collections. The IP address should be any one of the nodes listed under ALEXANDRIA_SOLR_CLOUD_NODES
ALEXANDRIA_SOLR_JVM_MEM8gThis is the java heap setting. Generally speaking, you should allocate at least 8g.

Configuring ZooKeeper

ZooKeeper is a distributed, open-source coordination service for distributed applications. It is an essential component in a functioning CLAIMS Direct SolrCloud environment. The CLAIMS Direct SolrCloud distribution comes bundled with a preconfigured, 3-node ZooKeeper quorum.

Configuration

A CLAIMS Direct ZooKeeper deployment requires 3 individual servers. These servers could be dedicated machines or they could share duties with 3 nodes in the SolrCloud collection. We will assume the following SOLR/ZooKeeper environment:

Note

Your IP address allocation may be different. These are configurable in solr-alexandria-vars.

IP
Function
Description
10.234.1.91node-1: * ZooKeeperbasic ZooKeeper node
10.234.1.92node-2: * ZooKeeper
10.234.1.93node-3: * ZooKeeper

The following variables should be configured in solr-alexandria-vars:

Variable
Value
Description
ALEXANDRIA_SOLR_ZK_NODES10.234.1.91,10.234.1.92,10.234.1.93Comma-delimited list of IP addresses
ALEXANDRIA_SOLR_ZK_HOST10.234.1.91First configured node for SolrCloud configuration bootstrap


Preparing Deployment

After ZooKeeper and SolrCloud node configuration is complete, the following prerequisites need to be installed and configured on each node:

PrerequisiteDescription
java version 8 (although 7 will work, this install has only been tested with 8)An up-to-date java install is required
public key ssh access for ALEXANDRIA_SOLR_CLOUD_USERIn order to facilitate configuration and index deployment as well as being able to manage start/stop scripts in one location.
create location that is read/write accessible by ALEXANDRIA_SOLR_CLOUD_USER to hold the programs, configurations, and index

Note

If you don't want to or can't enable public key access, the setup.sh script will error and exit. You will then need to follow the manual instructions below.


Deployment

Running setup.sh.

The steps taken by setup.sh are described in sections that can be managed manually if desired.

Configuration Creation

The first step is to create the configuration files for both SOLR as well as ZooKeeper based on the main variable configuration file solr-alexandria-varssetup.sh uses the following variables to create ZooKeeper configuration files for each node of the quorum. Note: italic entries should not be modified unless absolutely necessary.

VariableDescription
ALEXANDRIA_SOLR_ZK_NODESComma-separated list of nodes on which ZooKeeper will run
ALEXANDRIA_SOLR_ZK_PORTZooKeeper listening port
ALEXANDRIA_SOLR_ZK_CONFIG_DIRMain configuration directory for SOLR
ALEXANDRIA_SOLR_ZK_CONFIG_NAMEConfiguration name


In addition, a valid solr.xml is generated from the template: solrxml.in. The following variables are used:

VariableDescription
ALEXANDRIA_SOLR_HOMEThe directory in which the distribution is unpacked. This is generated dynamically with $(pwd)
ALEXANDRIA_SOLR_CONTEXTThe Jetty web application context
ALEXANDRIA_SOLR_ZK_CLIENT_TIMEOUTClient ZooKeeper timeout in milliseconds
ALEXANDRIA_SOLR_ZKDynamically generated comma-separated list of ZooKeeper nodes in the form ip:port


Code Deployment

The next section in setup.sh deploys the code and configuration to all nodes. If you haven't enabled public key ssh access, the following commands need to be issued manually for each IP address in ALEXANDRIA_SOLR_CLOUD_NODES and ALEXANDRIA_SOLR_ZK_NODES:

rsync -va * $ALEXANDRIA_SOLR_USER@$IPADDRESS:$ALEXANDRIA_SOLR_HOME
# e.g.
rsync -va * asolr@10.234.1.91:/home/asolr/alexandria-solr-v2.1.2-distribution


Example Walk-Through

Assumptions:

8 SOLR nodesIP: 10.234.1.91-98 (as01-08)
3 ZooKeeper nodesIP 10.234.1.91-93 (shared resources with SOLR)
1 processing node10.234.1.99 (ap01)
deployment userasolr
deployment location/home/asolr


Create user on all machines

# as root on all machines 10.234.1.91-99
useradd -m asolr
passwd asolr # create your own password


Enable public key ssh access

# as user asolr on 10.234.1.99 (ap01)
cd ~/.ssh
ssh-keygen
# for each node from 10.234.1.91 -> .98
ssh-copy-id -i id_rsa.pub asolr@10.234.1.91
ssh-copy-id -i id_rsa.pub asolr@10.234.1.92
ssh-copy-id -i id_rsa.pub asolr@10.234.1.93
etc ...


Download package onto ap01 (10.234.1.99) and unpack into /home/asolr

cd /home/asolr
wget http://alexandria.fairviewresearch.com/software/alexandria-solr-v2.1.2-distribution.tar.gz
tar zxvf alexandria-solr-v2.1.2-distribution.tar.gz
cd /home/asolr/alexandria-solr-v2.1.2-distribution


Adjust solr-alexandria-vars 

The following variables need to be confirmed or set:

VariableValue
ALEXANDRIA_SOLR_CLOUD_NUMSHARDS

8

ALEXANDRIA_SOLR_CLOUD_NODES

one line, no spaces:
10.234.1.91,10.234.1.92,10.234.1.93,10.234.1.94,
10.234.1.95,10.234.1.96,10.234.1.97,10.234.1.98 

ALEXANDRIA_SOLR_CLOUD_USERasolr
ALEXANDRIA_SOLR_URLhttp://10.234.1.91:$ALEXANDRIA_SOLR_PORT/$ALEXANDRIA_SOLR_CONTEXT
ALEXANDRIA_SOLR_ZK_NODES10.234.1.91,10.234.1.92,10.234.1.93
ALEXANDRIA_SOLR_ZK_HOST10.234.1.91


Run setup.sh

$ ./setup.sh 
Initializing ZooKeeper 3-node quorum configuration ...
  * -> zoo-1.cfg
  * -> zoo-2.cfg
  * -> zoo-3.cfg
Initializing solr.xml ... done
Testing ssh access to solr:
  * -> node: asolr@10.234.1.91 ... ok
  * -> node: asolr@10.234.1.92 ... ok
  * -> node: asolr@10.234.1.93 ... ok
  * -> node: asolr@10.234.1.94 ... ok
  * -> node: asolr@10.234.1.95 ... ok
  * -> node: asolr@10.234.1.96 ... ok
  * -> node: asolr@10.234.1.97 ... ok
  * -> node: asolr@10.234.1.98 ... ok
Testing ssh access to zookeeper:
  * -> node: asolr@10.234.1.91 ... ok
  * -> node: asolr@10.234.1.92 ... ok
  * -> node: asolr@10.234.1.93 ... ok
Deploying SOLR ...
  * -> asolr@10.234.1.91:/home/asolr ... ok
  * -> asolr@10.234.1.92:/home/asolr ... ok
  * -> asolr@10.234.1.93:/home/asolr ... ok
  * -> asolr@10.234.1.94:/home/asolr ... ok
  * -> asolr@10.234.1.95:/home/asolr ... ok
  * -> asolr@10.234.1.96:/home/asolr ... ok
  * -> asolr@10.234.1.97:/home/asolr ... ok
  * -> asolr@10.234.1.98:/home/asolr ... ok
Deploying ZooKeeper ...
  * -> asolr@10.234.1.91:/home/asolr ... ok
  * -> asolr@10.234.1.92:/home/asolr ... ok
  * -> asolr@10.234.1.93:/home/asolr ... ok


Starting and Bootstrapping the ZooKeeper Quorum

./zookeeperctl start
Starting zookeeper ... STARTED
Starting zookeeper ... STARTED
Starting zookeeper ... STARTED


# check the quorum is running:
./zookeeperctl status
Mode: follower
Mode: follower
Mode: leader


# install SOLR configuration files
./bootstrap-zookeeper.sh 
Bootstrapping 10.234.1.91 ...
  -> name: cdidx-v2.1.2
  -> directory: /home/asolr/alexandria-solr-v2.1.2-distribution/conf-2.1.2

Starting and Bootstrapping the SOLR Collection

 $ ./solrctl start
Initializing solr.xml ... done
asolr@10.234.1.91: SOLR ( start ) ...Waiting up to 30 seconds to see Solr running on port 8080 [/]  
Started Solr server on port 8080 (pid=10520). Happy searching!
### etc ...
 
# check cluster status
./solrctl status
asolr@10.234.1.91: SOLR ( status ) ...
Found 1 Solr nodes: 
Solr process 11273 running on port 8080
{
  "solr_home":"/home/asolr/alexandria-solr-v2.1.2distribution",
  "version":"5.3.1 1703449 - noble - 2015-09-17 01:48:15",
  "startTime":"2016-02-23T08:18:36.059Z",
  "uptime":"0 days, 0 hours, 1 minutes, 9 seconds",
  "memory":"63.6 MB (%6.5) of 981.4 MB",
  "cloud":{
    "ZooKeeper":"10.234.1.91:2181,10.234.1.92:2181,10.234.1.93:2181",
    "liveNodes":"8",
    "collections":"0"}}
### etc.

# create the collection
./bootstrap-alexandria.sh
 
Creating (action=CREATE) on http://10.234.1.91:8080/alexandria-v2.1/admin/collections ...
  -> name=alexandria
  -> numShards=8
  -> replicationFactor=1
  -> maxShardsPerNode=1
  -> collection.config=cdidx-v2.1.2
  -> property.config=solrconfig-alexandria.xml
  HTTP/1.1 200 OK
  Content-Type: application/xml; charset=UTF-8
  Transfer-Encoding: chunked
<?xml version="1.0" encoding="UTF-8"?>
<response>
 <lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">3136</int>
 </lst>
  # etc ..
</response> 


SOLR Interface

At this point, the ZooKeeper quorum and all SOLR nodes should be running. You can now visit http://10.234.1.91:8080/alexandria-v2.1/old.html

Stopping SOLR Cluster and ZooKeeper Quorum

./solrctl stop
asolr@10.234.1.91: SOLR ( stop ) ...
# etc.
 
./zookeeperctl stop 


Next Steps

Once SOLR has been installed, set up the indexing daemon aidxd.