Solr Index - aidxd


Indexing into Solr is controlled by an indexing daemon: aidxd. This daemon probes PostgreSQL for available load-id(s) to index. This "queue" is represented by the table reporting.t_client_index_process. See Data Warehouse Design for more information on the structure of this table. When processing is successfully completed into PostgreSQL, apgupd registers a new, index-ready load-id. The indexing daemon aidxd recognizes this as an available load-id and begins the indexing process for that particular load-id. aidxd is installed as part of the CLAIMS Direct Client Tools. Please see the Client Tools Installation Instructions for more information about how to install this tool.

If you have chosen to deploy Solr as Type 3, --core  must be specified corresponding to your subscription level.

  • Basic: --core=alexandria-standard 
  • Premium: --core=alexandria-premium 
  • Premium-Plus: --core=alexandria-premium-plus 


Usage

	aidxd [ Options ... ]
  --nodaemon    don't put process into background
    --once      only process one load-id
  --pidfile=s   specify location of PIDFILE
                  (default=/var/log/alexandria/aidxd.pid)
  --interval=i  n-seconds between probing for new loads
  --tmp=dir     specify temporary file storage (default=/tmp)
  --clean       remove temporary processing directory
  --batchsize=i maximum number of documents to parallelize
  --nthreads=i  maximum number of processes to parallelize
  --facility=s  logging facility (default=aidxd)
  --help        print this usage and exit
  --------
  --idxversion= 21
  --idxcls=s    Alexandria::DWH::Index::DocumentEx
  --dbfunc=s    specify an alternative extraction function (default=xml.f_patent_document_s)
  --idxexe=s    specify indexing script (default aidx)
    --quiet     suppress output from sub-process
                NOTE: suppressing this output will make it difficult
                      to track down errors originating in --idxexe
  --pgdbname=s   source postgresql instance as defined in /etc/alexandria.xml
  --solrdbname=s base url for index (default=alexandria)
    --core=s     index core (default=alexandria)
  --tolerate     tolerate indexing errors and attempt again
  --autooptimize issue an 'optimize' call to Solr after optinterval
                 continuous load-id(s)
    --optinterval  # of load-id(s) after which an optimize is issued (default=100)
    --optsegments=n optimize down to n-segments (default=16)
  --nostatistics do not gather indexing statistics

Options

ArgumentDescriptionDefault Value
--nodaemon
--once
When specified, aidxd will run in the foreground. If --once is given, --nodaemon is implied and only one load-id will be processed.N/A
--intervalTime (in seconds) between successive indexing queue probes10
--tmpTemporary processing area which holds the transformed XML before being POSTed to Solr/tmp
--batchsizeNumber of documents to extract for indexing250
--nthreadsNumber of parallel extraction processes.
This value should be adjusted depending on available processing power on the PostgreSQL data warehouse server.
A rule of thumb would be to set this to the number of cores.
8
--idxversionThe version of the index21
--idxcls

The indexing class used in XML transformation

Alexandria::DWH::Index::DocumentEx
--dbfunc Specify an alternative extraction functionxml.f_patent_document_s
--pgdbname Source PostgreSQL instance as defined in /etc/alexandria.xmlalexandria
--solrdbname
--core 
Base URL for indexing. If different from the default, it should have an index entry in /etc/alexandria.xml.alexandria
--tolerate (v2.6-1) Tolerate a wide variety of errors and re-try failed indexN/A
--autooptimizeDO NOT USEN/A

Daemon Execution

Starting

# v2.1: all defaults
$INSTALL_BASE/bin/aidxd --idxversion=21 --idxcls=Alexandria::DWH::Index::DocumentEx
 
# v2.1: Only process one load-id
$INSTALL_BASE/bin/aidxd --idxversion=21 --idxcls=Alexandria::DWH::Index::DocumentEx --once

Pausing/Resuming/Stopping

# pause the daemon
kill -s USR1 <pid>
 
# resume processing
kill -s USR2 <pid>
 
# stop daemon entirely
kill -s INT <pid>