Component Configuration and Requirements

Note:

  • For remote installations which include both PostgreSQL and Solr, we recommend a minimum of two server machines.
  • We highly recommend that you select the disk subsystem (hardware raid, software raid, LVM or any combination) with support for extending device capacity. New and updated content is continually flowing through CLAIMS Direct, and it is very important that your disk subsystem possesses the capacity for expansion.
  • IFI CLAIMS has produced a patched release of libxml2-2.9.2 as an RPM. We recommend locally installing this package and replacing the package in the distribution. Download the RPM at: http://alexandria.fairviewresearch.com/software/libxml2/f20/libxml2-2.9.2-1.fc20.x86_64.rpm. Contact support@ificlaims.com if other versions are required.

PostgreSQL Requirements

Hardware Requirements

Requirement

Recommended

CPU4-cores
System Memory24GB
Storage Capacity6TB (SSD required)

Software Requirements

Requirement

Supported Versions

Notes

Operating SystemRHEL/Rocky 8, Amazon Linux 2We do not support Ubuntu or any operating system not explicitly listed.
PostgreSQL11 - 14

For the appropriate repository see https://www.postgresql.org/download/linux/redhat/

IFI CLAIMS Repository
Amazon Linux 2
sudo yum -y install \
 https://repo.ificlaims.com/ifi-claims-direct/amzn2/x86_64/ifi-claims-direct-1.0-1.amzn2.x86_64.rpm

RHEL/Rocky 8
sudo dnf -y install \
https://repo.ificlaims.com/ifi-claims-direct/rocky/8/x86_64/ifi-claims-direct-1.0-1.el8.x86_64.rpm

Solr Basic Distributed Requirements (Type 2)

Information about Solr has been deprecated and should not be implemented. Updated information will be available soon.

Hardware Requirements

Since CLAIMS Direct Solr is a pre-configured, bundled distribution of Apache Solr, it can be deployed on any number of nodes (individual instances). This documentation describes installation and configuration on a single node without the use of SolrCloud.

There are many scenarios for a CLAIMS Direct deployment that range from indexing the entire content of CLAIMS Direct XML to the sparse indexing of certain fields and ranges of publication dates for application-specific usage. There could also be specific QoS requirements: minimum supported queries per second, average response time, etc. All of these factors play a role in planning for a CLAIMS Direct Solr deployment. Generally speaking, a stand-alone full index with the entire content of CLAIMS Direct XML requires, at a minimum, the following:

Requirement

Minimum

Recommended

CPU16-cores32-cores
System Memory128GB256GB
StorageBasic: 6TB (SSD)
Premium: 8TB (SSD)
Premium+: 8TB (SSD)

The minimum required storage allows for a full index and approximately 1-2 years of growth. It doesn't allow space for Solr optimization (see "Commit and Optimize Operations" in Uploading Data with Index Handlers) unless carefully planned. Please contact support@ificlaims.com for more information about optimization with minimum requirements.

Currently, the delivery of a fully populated CLAIMS Direct index requires the above Solr hardware requirements. A customized deployment with select data to index is currently not offered fully populated. With a custom configuration, hardware requirements are dependent on use case and complete indexing will need to be done at the installation site.

Software Requirements

The CLAIMS Direct Solr installation is a self-contained package suitable for deployment on any Linux server running Java 8. The simple prerequisite tool list follows:

Name

Used By

javaZooKeeper, Solr and various support tools
wgetConfiguration tools (bootstrap-*.sh)
lsofStart/stop scripts (solrctl/zookeeperctl)

Solr Advanced Distribution Requirements (Type 3)

Information about Solr has been deprecated and should not be implemented. Updated information will be available soon.

Hardware Requirements

As CLAIMS Direct Solr is a pre-configured, bundled distribution of Apache Solr, it can be deployed on any number of nodes (individual instances). A group of nodes function to expose a collection. Further, multiple collections could be searched across the distribution.

There are many scenarios for a CLAIMS Direct deployment that range from indexing the entire content of CLAIMS Direct XML to the sparse indexing of certain fields and ranges of publication dates for application-specific usage. There could also be specific QoS requirements: minimum supported queries per second, average response time et al. All of these factors play a role in planning for a CLAIMS Direct Solr deployment. Generally speaking, a full index with the entire content of CLAIMS Direct XML requires, at a minimum:

Number

Type

Specs

8

Solr search server

nodes 1-3 housing the ZooKeeper quorum

minimum:
  • CPU: 2 cores
  • RAM: 16GB
  • Disk: 1TB
1processing serverminimum:
  • CPU: 4 cores
  • RAM: 16GB
  • Disk: 1TB

The ZooKeeper quorum could be placed together on Solr search servers or, optionally, you could break out the ZooKeeper configuration into an additional 3 separate servers.

Number

Type

Specs

3ZooKeeper configuration serverminimum:
  • CPU: 1 core
  • RAM: 2GB
  • Disk: 50GB

Currently, the delivery of a fully populated CLAIMS Direct index requires the above Solr and ZooKeeper configuration (8 Solr servers + 3 ZooKeepers). Load balancers and web servers are required only if CLAIMS Direct Web Services (CDWS) will be installed as well. A customized deployment with select data to index is currently not offered fully populated. With a custom configuration, complete indexing will need to be done at the installation site.

Software Requirements

The CLAIMS Direct Solr installation is a self-contained package suitable for deployment on any Linux server running Java 8. The simple prerequisite tool list follows:

Name

Used By

javaZooKeeper, Solr and various support tools
wgetConfiguration tools (bootstrap-*.sh)
lsofStart/stop scripts (solrctl/zookeeperctl)

The configuration script setup.sh assumes that each node in the cluster will have the same directory structure. For example, if you download to a machine and unpack the archive into path /cdsolr, the full path to the package will be /cdsolr/alexandria-solr-v2.1.2-distribution. Each node must have the path /cdsolr available for deployment. You are free to choose any mount point or path as long as they are uniform across all nodes in the cluster and as long as the mount point or path for each Solr node has at least 1TB of available disk space.

Processing Server Requirements

Hardware Requirements

CPU2-cores
System Memory8GB
Storage Capacity500GB (100GB SSD for fast temporary processing space)

Requirement

Recommended

Software Requirements

RequirementMinimum VersionNotes
Operating SystemRHEL/Rocky 8, Amazon Linux 2We do not support Ubuntu or any operating system not explicitly listed.
IFI CLAIMS Repository
Amazon Linux 2
yum -y install \
 https://repo.ificlaims.com/ifi-claims-direct/amzn2/x86_64/ifi-claims-direct-1.0-1.amzn2.x86_64.rpm

RHEL/Rocky 8
yum -y install \
https://repo.ificlaims.com/ifi-claims-direct/rocky/8/x86_64/ifi-claims-direct-1.0-1.el8.x86_64.rpm

Web Server Requirements

Hardware Requirements

Requirement

Recommended

CPU2-cores
System Memory4GB
System Storage100GB

Software Requirements

Requirement

Recommended

Notes

Apache httpdDistribution version
yum -y install httpd
Perl ModulesDistribution version
yum -y install \
# overkill, but saves an incredible amount of time
perl-open \
perl-Catalyst* \
perl-Module-Install \
perl-DBD-Pg \
perl-XML-LibXML \
perl-XML-LibXSLT \
perl-CPAN
CLAIMS Direct LibraryLatest VersionContact support@ificlaims.com for link to latest version
CLAIMS Direct CDWSLatest VersionContact support@ificlaims.com for link to latest version

Logging

The logging configuration file is located in the same place as the distributed alexandria.xml, e.g.,

/usr/share/perl5/vendor_perl/auto/share/dist/Alexandria-Library/alexandria-log.conf

If you want to customize logging, copy the distribution alexandria-log.conf file to /etc.

cp /usr/share/perl5/vendor_perl/auto/share/dist/Alexandria-Library/alexandria-log.conf /etc

Modify as desired.

If you make no changes, default logging is output to /tmp/alexandria.log.

For more information about how the alexandria tools log, see:

Credentials

There are two sets of credentials:

  1. --IFIuser/--IFIpassword passed to apgupd – issued by IFI CLAIMS
  2. --PGSuser/--PGSpassword used to connect to postgresql – created during the PostgreSQL database installation

apgupd requires the IFIuser / IFIpassword, e.g.,

apgupd --user=IFIuser --password=IFIpassword

The connection string to postgresql is configurable in the main configuration file alexandria.xml. You can find that configuration file using acfg, e.g.,

$ acfg

Using configuration from: /etc/alexandria.xml

Configured Databases:

  • alexandria: [alexandria; 127.0.0.1; 5432]
  • alexandria-dummy: [alexandria; 127.0.0.1; 5432]

 Configured Indices:

If you used a different user to create and load the alexandria database, you need to modify the database entry in the file pointed to by:

Using configuration from: /etc/alexandria.xml

<database name="alexandria" host="127.0.0.1" port="5432" user="alexandria" password="alexandria">
      <atts pg_errorlevel="0" AutoCommit="1" RaiseError="1" PrintError="0" LongTruncOk="0" LongReadLen="10485760" />
    </database>

Modify @user and @password to the correct values, assuming the defaults are incorrect.