Select the Right Patent Data Source for Your Big Data Project

  • Added
  • Author:

Have you wondered how emerging technologies such as artificial intelligence and deep learning can be applied to patent data?  Sumeet Sandhu, CEO of Elementary IP, has written a very informative paper shedding light on the possibilities titled “Deep Learning AI for Patent Search and Analytics”. She also presented the material at the Patent Information User Group (PIUG) annual 2017 conference.

Ms. Sandhu describes how her company uses deep learning and neural networks to semantically search patents, and to auto-classify US patents.  Instead of using traditional Boolean search to find “the needle in the haystack”, big data methods allow you to analyze the entire haystack to discover not just needles, but trends, predictions, and to make better decisions based on the data. 

For these new techniques to work properly, it's essential to work with accurate patent data. When there are more than 2,000 variations of the company name "IBM" in public records, you need a data provider that offers enhancements such as standardized names. 

At IFI CLAIMS, we provide a high quality global patent database to data analysts and product developers that has more than 30 years of data development, curation, and quality assurance behind it.  Our CLAIMS Direct product delivers over 100 million patent records - the whole haystack.  Analysts and developers can use the complete database for:

  • BigData Analysis including Deep Learning
  • Semantic Search using proprietary indexing methods
  • Automated classification using very large training sets
  • Joining patent data to internal document repositories
The features of IFI’s patent database are:
  • Normalized and Standardized Data.  Every record from every authority is delivered in the same XML format (conforms to the same DTD).  Numbers and dates are standardized so that it is simple to link records.
  • Integrated Data.  IFI acquires data from over 44 sources and integrates it so that all relevant data is readily available in each patent document.  National full text, patent family IDs, legal status, reassignments, national registry data are all combined to create a single record with all the data you need.
  • Value Added Data – IFI CLAIMS delivers standardized assignee/applicant names for both the original and current patent owner.  Legal status indicators, calculated expirations dates, and claims tagging are also available.
  • Full text for over 23 countries including US, EP, PCT, CN, JP, KR, IN, DE and FR.  We have recently added Brazil and Taiwan.
  • Original language and English.  Machine translated English is available for most non-English documents.
  • Document PDFs, Drawing Sheets and Referenced Images available through IFI’s attachment web service. 
The availability of English language full text across all major authorities means that your BigQuery analysis can be made truly global.  Your semantic search can reach China, Korea and Japan.  Classifiers can be trained using EP data – and every other country.

IFI’s patent database can be accessed through the Cloud using IFI’s CLAIMS Direct web services API.  Additionally, the entire database can be installed on-site in your data center.  The database is updated daily.  Within your data center, you can access the complete XML patent repository and analyze it using whichever platform and toolset you prefer.

If you would like to talk to us about how IFI’s global patent database can power your data analysis, contact us at