Technical Blogs

Finding Backward Citations in Patent Data
Backward citations are a primary component of proving inventiveness in new patent applications. These citations reference previous work or prior art that is considered relevant to a current patent application. However, when looking for these citations in patent data, it is important to know that they are not always published in the applications. Here,…
Updating PostgreSQL from 9.x to 10.x
As of June 2020, CLAIMS Direct added support for PostgreSQL 10.x. The changes required to support 10.x unfortunately broke backwards compatibility to 9.x. Therefore, to migrate from 9.x to 10.x, one needs to use the new 10.x schema delivered in the package alexandria-schema-tools. This new package, available through the IFI CLAIMS Direct yum repository, also provides tools for quality control ( cd-count.sh ) as well as tools for bulk extraction and loading ( cd-extract.sh  and cd-load.sh ).…
Reclaiming Disk Space in Your PostgreSQL Database
Introduction IFI recommends 6TB of disk space for an on-site PostgreSQL instance. Normally, this will accommodate approximately 3 years of database growth (depending on your subscription level). If you are running out of disk space more quickly than anticipated, the reason may be an increased number of deletes occurring during the update process. This is especially likely to occur when IFI replaces content at a more aggressive rate than usual,…
Leveraging On-Site Citation and Family Functionality
Introduction With the deployment of the cumulative patch alexandria-sql-patch-alpa-3636-20191101, on-site CLAIMS Direct installations now have the ability to utilize family and citation functionality in-house. What was previously only available using the remote CLAIMS Direct shared API is now possible internally with simple SQL functions. The following post will outline the steps required to prepare the data tables as well as presenting a brief walk-through of the functionality.…
SOLR Indexing Process Explained
Processes The main executable script used for indexing is aidx delivered as part of Alexandria::Library. This script is responsible for pulling source data, converting it into SOLR XML and submitting via HTTP POST to SOLR for indexing. The conversion process from CLAIMS Direct XML to SOLR XML is handled by the indexer class (default is Alexandria::DWH::Index::Document). Alexandria::Client::Tools also provides an indexing daemon, aidxd which monitors an index process queue.…
Re-indexing Data from CLAIMS Direct Data Warehouse
Introduction There are a number of reasons one would need to re-index data from the data warehouse. These range from simply repeating a load-id to a complete re-index of the entire contents of the data warehouse. In this blog, I'm going to go over the mechanisms that move data from the data warehouse to the index and ways in which these mechanisms can be used to trigger partial or full re-indexing of data. Background In all installations of CLAIMS Direct,…
Sorting Through Data Warehouse Updates
The CLAIMS Direct data warehouse is updated continuously. On average, 1-3% of the entire database is touched weekly. These 1-3 million updates fall into two categories: new documents (roughly 20%) and updated documents (roughly 80%).  New documents are, of course, new publications from issuing authorities, US, EP etc. Updates are generally, but not limited to, CPC changes, family changes (adding family ID), IFI integrated content (patent status, standard names and claims summaries),…
XML Functionality Inside CLAIMS Direct Data Warehouse
Overview The CLAIMS Direct Web Services (CDWS) offer a variety of entry points into both the data warehouse and SOLR index. These are mid-to-high-level entry points and can satisfy most requirements pertaining to searching and extracting data for a typical search/view application. There are, however, corner cases which may require more intricate extraction of particular information. On the other hand, there may also be situations where massive amounts of data need to be extracted for further,…
Understanding the SOLR Result Set - Sort Parameter
Paging results is cumbersome and inefficient. In this next segment I'd like to talk about simple and complex sorting. Sorting, used effectively with the rows parameter can push relevant documents into the first page of results. Generally, you can sort on any indexed field but you can also utilize query boosting and functions to influence sort order.  CLAIMS Direct is configured to return empty fields at the top when asc is the direction and the bottom when desc is the direction.…
Understanding the SOLR Result Set - fl parameter
In this first of a series of blogs about SOLR result sets I'd like to talk about returned fields, both static and dynamic. Stored Fields Any field that is stored in SOLR can be returned in the result set.…