At IFI CLAIMS, we are very excited about the release of the Google Patents Public Datasets on Google’s BigQuery. We are proud to be the source of the core, public data table which provides global bibliographic data and US full text data free of charge. It is our hope that researchers around the world will use this data table to make new patent related discoveries. We are also announcing the availability of IFI patent data enrichments on BigQuery as a paid table.
Modest processing fees from Google BigQuery will apply when querying the data, but the first 1TB of data processed per month is free.
The Free Public Patent Data Table
The Google Patents Public Data table on BigQuery is different from traditional patent search systems, including Google Patents. The two main differences are:
- The ability to access the very large patent database using SQL commands instead of Boolean search.
- The ability to join to other databases – either public (free), paid or private.
The main features of the Public Patent Table are:
- Quarterly Updates
- Basic Bibliographic Data for Global Patents
- Full Text for US Patents and Applications
- No IFI Value Added Data
- No IFI full text translations
IFI CLAIMS® Data Enrichments available as a Paid Table on Big Query
IFI CLAIMS is also announcing that it is offering IFI Data Enrichments as a paid table. This table contains IFI's standardized assignee/applicant names, current assignee/applicant names and legal status information. BigQuery users can link this data to other public and private datasets.
SQL Query Examples on the Patents Public Data
To get started with Google BigQuery, follow these instructions: https://cloud.google.com/bigquery/quickstart-web-ui. Note, the examples below are based on using Standard SQL.
1. Publication Numbers and Application Data
This simple example shows application information for the documents in a patent family:SELECT publication_number, application_number, country_code, kind_code, application_kind, application_number_formatted, publication_date, filing_date, priority_date FROM `patents-public-data.patents.publications` WHERE family_id = '46149598'
The result is:
Row | publication_number | application_number | country_code | kind_code | application_kind | application_number_formatted | publication_date | filing_date | priority_date |
---|---|---|---|---|---|---|---|---|---|
1 | JP-2015509925-A | JP-2014555162-A | JP | A | A | JP2014555162A | 20150402 | 20130128 | 20120206 |
2 | CN-104080435-A | CN-201380008172-A | CN | A | A | CN 201380008172 | 20141001 | 20130128 | 20120206 |
3 | WO-2013117449-A1 | EP-2013051556-W | WO | A1 | W | PCT/EP2013/051556 | 20130815 | 20130128 | 20120206 |
4 | US-2015010601-A1 | US-201314376687-A | US | A1 | A | US14376687 | 20150108 | 20130128 | 20120206 |
5 | EP-2811969-A1 | EP-13703345-A | EP | A1 | A | EP20130703345 | 20141217 | 20130128 | 20120206 |
6 | FR-2986422-A1 | FR-1251073-A | FR | A1 | A | FR1251073A | 20130809 | 20120206 | 20120206 |
7 | FR-2986422-B1 | FR-1251073-A | FR | B1 | A | FR1251073A | 20140221 | 20120206 | 20120206 |
2. Dealing with Array Data - Inventor Example
In BigQuery, fields with multiple values 'arrays'. For example, a patent document may contain multiple inventors. The list of inventor names is stored in an array. This query will not work:
SELECT publication_number, inventor_harmonized.name FROM `patents-public-data.patents.publications` WHERE publication_number IN ('US-8543680-B2', 'US-8429119-B2') ORDER BY publication_number
You will receive the error: Cannot access field name on a value with type ARRAY<STRUCT<name STRING, country_code STRING>> at [1:48]
The query below shows how to use the "UNNEST" feature for array data:SELECT p.publication_number, i.name FROM `patents-public-data.patents.publications` AS p, UNNEST(inventor_harmonized) AS i WHERE publication_number IN ('US-8543680-B2', 'US-8429119-B2') ORDER BY publication_numberThe result is:
Row | publication_number | name |
---|---|---|
1 | US-8429119-B2 | GORTI SREENIVASA |
2 | US-8429119-B2 | DAILEY ROBERT |
3 | US-8429119-B2 | MUKERJI ANKUR |
4 | US-8429119-B2 | THOMAS SCOTT |
5 | US-8543680-B2 | BAVISHI PANKAJ S |
6 | US-8543680-B2 | NADGOWDA SHRIPAD J |
7 | US-8543680-B2 | PATIL SANDEEP R |
8 | US-8543680-B2 | SHAH DHAVAL K |
3. Dates and Counts
Dates are in the format YYYYMMDD. To report by year, you must divide by 10000. See the example below, which shows how many US records there are by year:SELECT FLOOR(patent.filing_date / 10000) AS year, COUNT(patent.publication_number) AS count FROM `patents-public-data.patents.publications` AS patent WHERE patent.country_code = "US" GROUP BY year ORDER BY year DESC;The first ten rows of the result are:
Row | year | count |
---|---|---|
1 | 2017.0 | 50248 |
2 | 2016.0 | 271737 |
3 | 2015.0 | 481289 |
4 | 2014.0 | 591171 |
5 | 2013.0 | 635164 |
6 | 2012.0 | 625414 |
7 | 2011.0 | 590193 |
8 | 2010.0 | 550806 |
9 | 2009.0 | 526088 |
10 | 2008.0 | 556206 |
4. One Patent Per Family
The query below selects one patent per family. It select the documents with the earliest filing date.SELECT ARRAY_AGG((p.publication_number, p.filing_date) ORDER BY CASE WHEN p.publication_date > 0 THEN p.filing_date ELSE 99999999 END ASC)[OFFSET(0)], p.family_id FROM `patents-public-data.patents.publications` AS p WHERE (SELECT MAX(TRUE) FROM UNNEST(assignee_harmonized) AS a WHERE a.name = "CROSSBAR INC") GROUP BY p.family_id;
The result is 129 rows. The first 10 rows are shown below. A search for all "CROSSBAR INC." documents returns 362 documents.
Row | f0_._field_1 | f0_._field_2 | family_id |
---|---|---|---|
1 | US-9118007-B2 | 20130816 | 51526436 |
2 | US-8467227-B1 | 20111104 | 48578169 |
3 | US-2016351625-A1 | 20150529 | 57399047 |
4 | US-8971088-B1 | 20120322 | 52575106 |
5 | US-2011305064-A1 | 20100611 | 45096119 |
6 | US-8411485-B2 | 20100614 | 45096120 |
7 | US-2011305066-A1 | 20100614 | 45096121 |
8 | US-2016343937-A1 | 20160519 | 57325202 |
9 | CN-205992530-U | 20160520 | 58102024 |
10 | US-9087576-B1 | 20120329 | 53540197 |
5. Citations
5.1 Simple Patent Citations
SELECT p.publication_number AS Pub, c.publication_number AS Cited, c.Category AS Cat
FROM `patents-public-data.patents.publications` AS p, UNNEST(citation) AS c
WHERE p.publication_number IN ('US-8616463-B2',
'US-8619792-B1',
'EP-2677384-A1',
'US-2013333342-A1',
'US-2013338856-A1',
'JP-2013544697-A',
'JP-2013544696-A',
'US-2013338825-A1',
'US-2013338868-A1',
'US-2013338854-A1',
'US-2013334892-A1',
'JP-2013544695-A',
'WO-2013188383-A2')
Result (first 10 rows out of 116)
Row | Pub | Cited | Cat |
---|---|---|---|
1 | US-8616463-B2 | US-3858875-A | APP |
2 | US-8616463-B2 | US-4169334-A | APP |
3 | US-8616463-B2 | US-4537577-A | APP |
4 | US-8616463-B2 | US-4575354-A | APP |
5 | US-8616463-B2 | US-5232154-A | SEA |
6 | US-8616463-B2 | US-5660595-A | APP |
7 | US-8616463-B2 | US-6328500-B1 | SEA |
8 | US-8616463-B2 | US-2007037480-A1 | APP |
9 | US-8616463-B2 | US-2007259600-A1 | APP |
10 | US-8616463-B2 | US-2010081356-A1 | APP |
5.2 Count of Patent Citations
SELECT p.publication_number AS Pub, COUNT(c.publication_number) AS Citations
FROM `patents-public-data.patents.publications` AS p, UNNEST(citation) AS c
WHERE p.publication_number IN ('US-8616463-B2',
'US-8619792-B1',
'EP-2677384-A1',
'US-2013333342-A1',
'US-2013338856-A1',
'JP-2013544697-A',
'JP-2013544696-A',
'US-2013338825-A1',
'US-2013338868-A1',
'US-2013338854-A1',
'US-2013334892-A1',
'JP-2013544695-A',
'WO-2013188383-A2')
GROUP BY p.publication_number
Result (first 10 rows):
Row | Pub | Citations |
---|---|---|
1 | US-2013338854-A1 | 1 |
2 | WO-2013188383-A2 | 2 |
3 | US-2013334892-A1 | 2 |
4 | US-2013338856-A1 | 4 |
5 | JP-2013544695-A | 6 |
6 | EP-2677384-A1 | 6 |
7 | JP-2013544697-A | 9 |
8 | JP-2013544696-A | 10 |
9 | US-8616463-B2 | 11 |
10 | US-2013338868-A1 | 14 |
5.3 Count of Forward Patent Citations
FROM `patents-public-data.patents.publications` AS p, UNNEST(citation) AS c
WHERE c.publication_number IN ('US-8616463-B2',
'US-8619792-B1',
'EP-2677384-A1',
'US-2013333342-A1',
'US-2013338856-A1',
'JP-2013544697-A',
'JP-2013544696-A',
'US-2013338825-A1',
'US-2013338868-A1',
'US-2013338854-A1',
'US-2013334892-A1',
'JP-2013544695-A',
'WO-2013188383-A2')
GROUP BY c.publication_number
Result:
Row | Pub | CitedBy |
---|---|---|
1 | JP-2013544697-A | 1 |
2 | US-2013338825-A1 | 2 |
3 | US-8616463-B2 | 2 |
4 | JP-2013544696-A | 2 |
5 | US-2013338854-A1 | 3 |
6 | JP-2013544695-A | 3 |
7 | US-2013333342-A1 | 3 |
8 | US-2013338856-A1 | 9 |
9 | US-2013338868-A1 | 15 |
10 | US-2013334892-A1 | 82 |
6. JOIN patents with USPTO PTAB Data
In the example below, we JOIN the "patents.publication" patent data table and the "uspto_ptab.trails" table. Patent number formats can be a challenge. In the ptab.trials table, the "PatentNumber" field is in number format, for example "7499872". This will not join to the "patents.publication" table where the publication_number field needs to include country and kind codes - for example "US-7499872-B1". Each dataset has a "match" table to translate the number formats. In the example below, the "ptab.match" table is used for the join.
SELECT patents.publication_number,
patents.family_id AS Family,
ptab.PatentOwnerName AS PatentOwner,
ptab.PetitionerPartyName AS Petitioner,
ptab.TrialNumber AS Trial,
ptab.ProsecutionStatus AS Status,
ptab_match.application_number
FROM `patents-public-data.uspto_ptab.trials` AS ptab
JOIN `patents-public-data.uspto_ptab.match` AS ptab_match
ON ptab.ApplicationNumber = ptab_match.ApplicationNumber
JOIN `patents-public-data.patents.publications` AS patents
ON ptab_match.application_number = patents.application_number
WHERE ptab.PetitionerPartyName = 'Mylan Pharmaceuticals Inc.'
The result shows patent information (patent number and family) and PTAB Trial and Case Status information for PTAB cases petitioned by Mylan Pharmaceuticals. Note that the "publication_number" field may be used as a key to link patent related databases.
The result (first 10 records out of 366:
Row | publication_number | Family | PatentOwner | Petitioner | Trial | Status | application_number |
---|---|---|---|---|---|---|---|
1 | US-8685930-B2 | 34421506 | Allergan, Inc. | Mylan Pharmaceuticals Inc. | IPR2016-01127 | Instituted | US-201313961828-A |
2 | US-8685930-B2 | 34421506 | Allergan, Inc. | Mylan Pharmaceuticals Inc. | IPR2016-01127 | Instituted | US-201313961828-A |
3 | US-8685930-B2 | 34421506 | Allergan, Inc. | Mylan Pharmaceuticals Inc. | IPR2016-01127 | Instituted | US-201313961828-A |
4 | US-8685930-B2 | 34421506 | Allergan, Inc. | Mylan Pharmaceuticals Inc. | IPR2016-01127 | Instituted | US-201313961828-A |
5 | US-8685930-B2 | 34421506 | Allergan, Inc. | Mylan Pharmaceuticals Inc. | IPR2016-01127 | Instituted | US-201313961828-A |
6 | US-8685930-B2 | 34421506 | Allergan, Inc. | Mylan Pharmaceuticals Inc. | IPR2016-01127 | Instituted | US-201313961828-A |
7 | US-8685930-B2 | 34421506 | Allergan, Inc. | Mylan Pharmaceuticals Inc. | IPR2016-01127 | Instituted | US-201313961828-A |
8 | US-8685930-B2 | 34421506 | Allergan, Inc. | Mylan Pharmaceuticals Inc. | IPR2016-01127 | Instituted | US-201313961828-A |
9 | US-8685930-B2 | 34421506 | Allergan, Inc. | Mylan Pharmaceuticals Inc. | IPR2016-01127 | Instituted | US-201313961828-A |
10 | US-8685930-B2 | 34421506 | Allergan, Inc. | Mylan Pharmaceuticals Inc. | IPR2016-01127 | Instituted | US-201313961828-A |
There are many interesting public tables to JOIN with - for example, USPTO PAIR data. You can also JOIN public patent data with private data tables.
We hope these simple examples help you get started.