asrch

Usage

asrch is a command-line tool used to search an optional on-site installation of Solr and extract data either in Solr response format or complete CLAIMS Direct XML. It is installed as part of the CLAIMS Direct repository. Please see the Client Tools Installation Instructions for more information about how to install this tool.

asrch [Options ...] query
  --url=s       search URL (excluding /select)
                  (default=http://solr.alexandria.com:8080/alexandria-index/alexandria)
  --raw         output raw Solr XML
  --count       output total documents found
  --maxrows=i   maximum documents to output
                  this argument is ignored when using --table
  --output=file specify output file
  --dtdpublic=pi  Public Identifier for DTD
  --dtdsystem=si  System Identifier for DTD
  Output Options
  --------
  --archive     archive result set documents into predictable path
                directory structure (Alexandria XML only)
  --archiveroot=dir
                root directory to place result set (default=.)
  --wrapper=s   wrap multiple documents in wrapper-named element
                default=patent-documents
  --pretty      indent output
  SOLR Options
  --------
  --solropt=s@  Solr options.
    e.g., --solropt=sort=f1,f2,f3 --solropt=rows=30
    See: http://wiki.apache.org/solr/CommonQueryParameters
  DWH Options
  --------
  --pgdbname     as defined in /etc/alexandria.xml (default=alexandria)
  --dbfunc       extract UDF (default=xml.f_patent_document_s)
  --table=s      If specified, a table of UCIDs/publication_ids is
                 created -- could later be used for indexing
    --truncate  truncate --table if it currently exists
  --help         print this usage and exit

Detailed Description of the Parameters

Connectivity

Parameter

Description
pgdbnameAs configured in /etc/alexandria.xml, the database entry pointing to the on-site CLAIMS Direct PostgreSQL instance. The default value is alexandria as this value is pre-configured in /etc/alexandria.xml.
urlThis is the URL of the CLAIMS Direct Solr instance.

Output Options

The following parameters specify output possibilities.

ParameterDescription
outputOutput results to named file. The default output goes to stdout.
archiveArchive results in a predictable path structure. See aext.
archiverootThe root directory of the archive. See aext.
wrapperDefault top-level XML element. The default is patent-document.
prettyIndent the output XML.
countOnly output the count of documents.
maxrowsMaximum number of documents to output. If using the --table option, this parameter is ignored.
table

If specified, a table of UCIDs/publication_ids is created.

rawThis parameter specifies Solr response XML as format.

Solr Options

ParameterDescription
solropt

Raw Solr query parameters. This parameter can be used multiple times, e.g.,


--solropt='sort=pd desc' --solropt='fq=pnctry:us'

Examples

Search and Count Results

asrch  --count \
       --url=http://SOLR-INSTANCE-URL/alexandria-v2.1/alexandria \
'loadid:261358'
-> executing search ...  (found 4613; done in 0.095)
4613

Output Select Fields in Solr XML

The following example searches Solr and returns the results in XML format.

You can return Solr results in a variety of formats using the query parameter wt. For a detailed list of output format options, see https://cwiki.apache.org/confluence/display/solr/Response+Writers.

asrch  --raw \
       --url=http://SOLR-INSTANCE-URL/alexandria-v2.1/alexandria \
       --solropt='wt=xml' \
       --solropt='fl=ucid,score' \
       --solropt='rows=1' \
       --solropt='shards.info=false' \
'loadid:261358'
-> executing search ... 200 OK
 
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <bool name="zkConnected">true</bool>
  <int name="status">0</int>
  <int name="QTime">14</int>
  <lst name="params">
    <str name="q">loadid:261358</str>
    <str name="qt">premium</str>
    <str name="echoParams">all</str>
    <str name="indent">true</str>
    <str name="fl">ucid,score</str>
    <str name="shards.info">false</str>
    <str name="sort">pd desc</str>
    <str name="rows">1</str>
    <str name="wt">xml</str>
  </lst>
</lst>
<result name="response" numFound="4613" start="0" maxScore="9.676081">
  <doc>
    <str name="ucid">JP-2013257331-A</str>
    <float name="score">9.617687</float></doc>
</result>
</response>