Public Patent Data Now Available on Google BigQuery

  • Added
  • Author:

At IFI CLAIMS, we are very excited about the release of the Google Patents Public Datasets on Google’s BigQuery.  We are proud to be the source of the core, public data table which provides global bibliographic data and US full text data free of charge.  It is our hope that researchers around the world will use this data table to make new patent related discoveries.  We are also announcing the availability of IFI patent data enrichments on BigQuery as a paid table.  
Modest processing fees from Google BigQuery will apply when querying the data, but the first 1TB of data processed per month is free.

The Free Public Patent Data Table

The Google Patents Public Data table on BigQuery is different from traditional patent search systems, including Google Patents.  The two main differences are:

  • The ability to access the very large patent database using SQL commands instead of Boolean search.
  • The ability to join to other databases – either public (free), paid or private.   
 
The Public Patent Data table on BigQuery is not a relational database.  Rather than normalize the patent database into many separate tables, the entire patent database appears to users as one big flat table. The user interface is SQL.  Having everything in one big flat table makes query writing fairly simple and reduces the need for complicated JOIN clauses.  The technology under the covers provides for great efficiency, even for very large data sets.  For more information on the technology behind BigQuery, see this Google Technical White Paper An Inside Look at Google BigQuery.

The main features of the Public Patent Table are:
  • Quarterly Updates
  • Basic Bibliographic Data for Global Patents
  • Full Text for US Patents and Applications
  • No IFI Value Added Data
  • No IFI full text translations
Please explore the public database – you can use the sample queries below to get started.  

IFI CLAIMS® Data Enrichments available as a Paid Table on Big Query

IFI CLAIMS is also announcing that it is offering IFI Data Enrichments as a paid table.  This table contains IFI's standardized assignee/applicant names, current assignee/applicant names and legal status information.  BigQuery users can link this data to other public and private datasets.  

 

SQL Query Examples on the Patents Public Data

To get started with Google BigQuery, follow these instructions: https://cloud.google.com/bigquery/quickstart-web-ui.  Note, the examples below are based on using Standard SQL. 

1.  Publication Numbers and Application Data

This simple example shows application information for the documents in a patent family:
SELECT publication_number, application_number, country_code, kind_code, application_kind,
     application_number_formatted, publication_date, filing_date, priority_date
     FROM `patents-public-data.patents.publications` WHERE family_id = '46149598'

The result is:

Row publication_number application_number country_code kind_code application_kind application_number_formatted publication_date filing_date priority_date
1 JP-2015509925-A JP-2014555162-A JP A A JP2014555162A 20150402 20130128 20120206
2 CN-104080435-A CN-201380008172-A CN A A CN 201380008172 20141001 20130128 20120206
3 WO-2013117449-A1 EP-2013051556-W WO A1 W PCT/EP2013/051556 20130815 20130128 20120206
4 US-2015010601-A1 US-201314376687-A US A1 A US14376687 20150108 20130128 20120206
5 EP-2811969-A1 EP-13703345-A EP A1 A EP20130703345 20141217 20130128 20120206
6 FR-2986422-A1 FR-1251073-A FR A1 A FR1251073A 20130809 20120206 20120206
7 FR-2986422-B1 FR-1251073-A FR B1 A FR1251073A 20140221 20120206 20120206

2.  Dealing with Array Data - Inventor Example


In BigQuery, fields with multiple values 'arrays'. For example, a patent document may contain multiple inventors.  The list of inventor names is stored in an array.  This query will not work:
SELECT publication_number, inventor_harmonized.name
     FROM `patents-public-data.patents.publications`
     WHERE publication_number IN
     ('US-8543680-B2',
     'US-8429119-B2') ORDER BY publication_number

You will receive the error: Cannot access field name on a value with type ARRAY<STRUCT<name STRING, country_code STRING>> at [1:48]

The query below shows how to use the "UNNEST" feature for array data:
SELECT p.publication_number, i.name
     FROM `patents-public-data.patents.publications` AS p, UNNEST(inventor_harmonized) AS i
     WHERE publication_number IN
     ('US-8543680-B2',
     'US-8429119-B2') ORDER BY publication_number
The result is: 
 
Row publication_number name
1 US-8429119-B2 GORTI SREENIVASA
2 US-8429119-B2 DAILEY ROBERT
3 US-8429119-B2 MUKERJI ANKUR
4 US-8429119-B2 THOMAS SCOTT
5 US-8543680-B2 BAVISHI PANKAJ S
6 US-8543680-B2 NADGOWDA SHRIPAD J
7 US-8543680-B2 PATIL SANDEEP R
8 US-8543680-B2 SHAH DHAVAL K

3.  Dates and Counts

Dates are in the format YYYYMMDD.  To report by year, you must divide by 10000.  See the example below, which shows how many US records there are by year:
     SELECT FLOOR(patent.filing_date / 10000) AS year, COUNT(patent.publication_number) AS count
     FROM `patents-public-data.patents.publications` AS patent
     WHERE patent.country_code = "US" GROUP BY year ORDER BY year DESC;
The first ten rows of the result are:
 
Row year count
1 2017.0 50248
2 2016.0 271737
3 2015.0 481289
4 2014.0 591171
5 2013.0 635164
6 2012.0 625414
7 2011.0 590193
8 2010.0 550806
9 2009.0 526088
10 2008.0 556206

4.  One Patent Per Family

The query below selects one patent per family.  It select the documents with the earliest filing date.  
    SELECT
    ARRAY_AGG((p.publication_number, p.filing_date) 
    ORDER BY CASE WHEN p.publication_date > 0 THEN p.filing_date ELSE 99999999 END ASC)[OFFSET(0)], 
    p.family_id
    FROM `patents-public-data.patents.publications` AS p
    WHERE (SELECT MAX(TRUE) FROM UNNEST(assignee_harmonized) AS a WHERE a.name = "CROSSBAR INC")
    GROUP BY p.family_id;

The result is 129 rows.  The first 10 rows are shown below.  A search for all "CROSSBAR INC." documents returns 362 documents.
 

Row f0_._field_1 f0_._field_2 family_id
1 US-9118007-B2 20130816 51526436
2 US-8467227-B1 20111104 48578169
3 US-2016351625-A1 20150529 57399047
4 US-8971088-B1 20120322 52575106
5 US-2011305064-A1 20100611 45096119
6 US-8411485-B2 20100614 45096120
7 US-2011305066-A1 20100614 45096121
8 US-2016343937-A1 20160519 57325202
9 CN-205992530-U 20160520 58102024
10 US-9087576-B1 20120329 53540197

 

5.  Citations

Here are three examples that illustrate how to deal with patent citations.  A patent or application can cite multiple other patents.  So, Citations are contained in a BigQuery array.  You must use "UNNEST" to report on these. 

5.1  Simple Patent Citations

To generate a list of patents that a list of patents cite, use a query like this. 

     SELECT p.publication_number AS Pub, c.publication_number AS Cited, c.Category AS Cat 
     FROM `patents-public-data.patents.publications` AS p, UNNEST(citation) AS c 
     WHERE p.publication_number IN ('US-8616463-B2', 
          'US-8619792-B1', 
          'EP-2677384-A1', 
          'US-2013333342-A1', 
          'US-2013338856-A1', 
          'JP-2013544697-A', 
          'JP-2013544696-A', 
          'US-2013338825-A1', 
          'US-2013338868-A1', 
          'US-2013338854-A1', 
          'US-2013334892-A1', 
          'JP-2013544695-A',
          'WO-2013188383-A2')

 

Result (first 10 rows out of 116)
 

Row Pub Cited Cat
1 US-8616463-B2 US-3858875-A APP
2 US-8616463-B2 US-4169334-A APP
3 US-8616463-B2 US-4537577-A APP
4 US-8616463-B2 US-4575354-A APP
5 US-8616463-B2 US-5232154-A SEA
6 US-8616463-B2 US-5660595-A APP
7 US-8616463-B2 US-6328500-B1 SEA
8 US-8616463-B2 US-2007037480-A1 APP
9 US-8616463-B2 US-2007259600-A1 APP
10 US-8616463-B2 US-2010081356-A1 APP

5.2  Count of Patent Citations

To count the backward citations, use COUNT.  Note the GROUP BY line at the bottom.

     SELECT p.publication_number AS Pub, COUNT(c.publication_number) AS Citations 
     FROM `patents-public-data.patents.publications` AS p, UNNEST(citation) AS c 
     WHERE p.publication_number IN ('US-8616463-B2', 
          'US-8619792-B1', 
          'EP-2677384-A1', 
          'US-2013333342-A1', 
          'US-2013338856-A1', 
          'JP-2013544697-A', 
          'JP-2013544696-A', 
          'US-2013338825-A1', 
          'US-2013338868-A1', 
          'US-2013338854-A1', 
          'US-2013334892-A1', 
          'JP-2013544695-A', 
          'WO-2013188383-A2') 
     GROUP BY p.publication_number

Result (first 10 rows):

Row Pub Citations
1 US-2013338854-A1 1
2 WO-2013188383-A2 2
3 US-2013334892-A1 2
4 US-2013338856-A1 4
5 JP-2013544695-A 6
6 EP-2677384-A1 6
7 JP-2013544697-A 9
8 JP-2013544696-A 10
9 US-8616463-B2 11
10 US-2013338868-A1 14

5.3  Count of Forward Patent Citations

To count the forward citations, switch the role of the patent.publication_number and citation.publication_number.
 
     SELECT c.publication_number AS Pub, COUNT(p.publication_number) AS CitedBy 
     FROM `patents-public-data.patents.publications` AS p, UNNEST(citation) AS c 
     WHERE c.publication_number IN ('US-8616463-B2', 
         'US-8619792-B1', 
         'EP-2677384-A1', 
         'US-2013333342-A1', 
         'US-2013338856-A1', 
         'JP-2013544697-A', 
         'JP-2013544696-A', 
         'US-2013338825-A1', 
         'US-2013338868-A1', 
         'US-2013338854-A1', 
         'US-2013334892-A1', 
         'JP-2013544695-A', 
         'WO-2013188383-A2') 
     GROUP BY c.publication_number

Result:

Row Pub CitedBy
1 JP-2013544697-A 1
2 US-2013338825-A1 2
3 US-8616463-B2 2
4 JP-2013544696-A 2
5 US-2013338854-A1 3
6 JP-2013544695-A 3
7 US-2013333342-A1 3
8 US-2013338856-A1 9
9 US-2013338868-A1 15
10 US-2013334892-A1 82
 

6.  JOIN patents with USPTO PTAB Data 

In the example below, we  JOIN the "patents.publication" patent data table and the "uspto_ptab.trails" table.  Patent number formats can be a challenge.  In the ptab.trials table, the "PatentNumber" field is in number format, for example "7499872".  This will not join to the "patents.publication" table where the publication_number field needs to include country and kind codes - for example "US-7499872-B1".  Each dataset has a "match" table to translate the number formats.  In the example below, the "ptab.match" table is used for the join.  

      SELECT patents.publication_number,
      patents.family_id AS Family,
      ptab.PatentOwnerName AS PatentOwner,
      ptab.PetitionerPartyName AS Petitioner,
      ptab.TrialNumber AS Trial,
      ptab.ProsecutionStatus AS Status,
      ptab_match.application_number
      FROM `patents-public-data.uspto_ptab.trials` AS ptab
      JOIN `patents-public-data.uspto_ptab.match` AS ptab_match
          ON ptab.ApplicationNumber = ptab_match.ApplicationNumber
      JOIN `patents-public-data.patents.publications` AS patents
          ON ptab_match.application_number = patents.application_number
      WHERE ptab.PetitionerPartyName = 'Mylan Pharmaceuticals Inc.'

The result shows patent information (patent number and family) and PTAB Trial and Case Status information for PTAB cases petitioned by Mylan Pharmaceuticals. Note that the "publication_number" field may be used as a key to link patent related databases. 

The result (first 10  records out of 366:
 

Row publication_number Family PatentOwner Petitioner Trial Status application_number
1 US-8685930-B2 34421506 Allergan, Inc. Mylan Pharmaceuticals Inc. IPR2016-01127 Instituted US-201313961828-A
2 US-8685930-B2 34421506 Allergan, Inc. Mylan Pharmaceuticals Inc. IPR2016-01127 Instituted US-201313961828-A
3 US-8685930-B2 34421506 Allergan, Inc. Mylan Pharmaceuticals Inc. IPR2016-01127 Instituted US-201313961828-A
4 US-8685930-B2 34421506 Allergan, Inc. Mylan Pharmaceuticals Inc. IPR2016-01127 Instituted US-201313961828-A
5 US-8685930-B2 34421506 Allergan, Inc. Mylan Pharmaceuticals Inc. IPR2016-01127 Instituted US-201313961828-A
6 US-8685930-B2 34421506 Allergan, Inc. Mylan Pharmaceuticals Inc. IPR2016-01127 Instituted US-201313961828-A
7 US-8685930-B2 34421506 Allergan, Inc. Mylan Pharmaceuticals Inc. IPR2016-01127 Instituted US-201313961828-A
8 US-8685930-B2 34421506 Allergan, Inc. Mylan Pharmaceuticals Inc. IPR2016-01127 Instituted US-201313961828-A
9 US-8685930-B2 34421506 Allergan, Inc. Mylan Pharmaceuticals Inc. IPR2016-01127 Instituted US-201313961828-A
10 US-8685930-B2 34421506 Allergan, Inc. Mylan Pharmaceuticals Inc. IPR2016-01127 Instituted US-201313961828-A
 

There are many interesting public tables to JOIN with - for example, USPTO PAIR data.  You can also JOIN public patent data with private data tables. 

We hope these simple examples help you get started.