IFI CLAIMS recently caught up with Cipher’s co-founder and CTO, Steve Harris, who is responsible for the design and implementation of the company’s offerings, in order to learn more about the recent launch of Cipher’s Universal Technology Taxonomy (UTT). The UTT gathers patents across the globe and maps them into a manageable system that comprises nine high-level groupings arranged on top of 114 technology classes—a much easier structure to navigate compared to the more than 300,000 CPC codes from patenting authorities. Our conversation was wide-ranging. We talked about why he chose to focus on a patent startup after coming off a stint in fintech, the inspiration behind Cipher, and why CPC codes really exist. Oh, and don’t get him started on the ambiguity around artificial intelligence and misconceptions around that topic (we apparently did get him started!). To gain insights from an expert practitioner on the front lines of understanding the value within intangible assets, read on for Harris’ views on all of these things, along with why Cipher uses IFI CLAIMS as the patent data bedrock for the deep learning that the company performs. The interview has been edited for clarity and length.
IFI CLAIMS: How did you get into the patent business?
Harris: I was a computer science researcher for a long time and then I did a fintech startup, which was sold to a big financial services firm several years ago. After I sold that company, I was introduced by an investor in my previous company to Nigel Swycher, who started Cipher in 2013. The Cipher story was a good match for what I was looking for in my next business. I was after something that was niche but with lots of potential. I had looked at loads of different startups and lots of founding teams. Either it was too much competition, or it wasn’t that interesting, or they didn’t seem to know enough about what they were doing. At that point, I knew nothing about patents, so I did a bunch of research into what kind of companies file for them and the sums of money involved. I thought there was potential. At that point, there was no sophisticated analysis at all. There were lots of search tools for patents but nothing else. So, it seemed like a good opportunity.
IFI CLAIMS: What was it like to make the leap from fintech to patents?
Harris: I’m an AI guy by background so it’s the sort of thing where you have to turn to technology to solve an economic problem. Fintech, clearly, has lots of money involved. We were working in anti-fraud, which is a big problem. They can afford to pay for the technology, but over time, it has gotten more and more cost effective to apply this level of technology. There is stuff that we’re doing at Cipher in a minute or two that back in my academic days would have cost us millions of dollars and taken months. It’s orders of magnitude more cost effective now than it ever was. Around the time we started Cipher, it was just about plausible to apply the technology to patents. At the time, it was quite a lot of money. Now, it’s a lot more reasonable.
IFI CLAIMS: What does Cipher’s technology do for your clients?
Harris: Cipher’s core competency is providing strategic information. Big companies pay huge amounts of money to maintain their patent portfolios. There was nothing on the market helping them understand whether they were spending that money wisely or not. But our core idea was, Let’s give people the data they need to set and improve their patent filing strategy. They have strategic decisions to make such as, Which technology areas am I going to file into more? Where do I have too many patents that are going to expire? All of these are important questions. And it’s impossible to answer over a big portfolio without a lot of data. So that’s what Cipher does. That’s the reason we exist.
IFI CLAIMS: Can you give an example?
Harris: The main thing Cipher brings to bear from a product perspective is classification. If you manufacture wind turbines, for example, you want to know whether there is a balance of patents in blades versus generators, if you have too many of one or too many of the other. The hard part is figuring out what everyone else has because you want to be more advanced than other companies in your industry. There is no point having four times more blade patents than a competitor because it’s not going to help you strategically. So getting those numbers in a cost effective way is difficult. It’s also difficult to get that information in a consistent manner. So you might run a report that keeps track of what everybody else is doing in order for you to set your strategy. If you’re using data searching and manual review, you just can’t do that because the results are too inconsistent. Across two different patent searchers, you’ll get very different answers, so you can’t really compare the results from one report to the next. But with an algorithm, it will always say the same thing. Given the same input, you will always get the same output. It gives you the ability to track changes in the industry and respond to them, which you can’t do by any other means. So classification is the core of our technology. Typically what we do is start with the client’s internal taxonomy. If they build wind turbines, maybe they’ve got a few key technologies that are important to them: manufacturing the blade, battery backup, battery power distribution. We take that client’s taxonomy and turn it into a set of machine learning classifiers so that they can get a view of what’s going on and answer questions that they might have around their portfolio.
But while this classification gives you great insight into what’s going on in the key technologies in your own business, there are undoubtedly adjacent technologies that you’re interested in, which you’re not going to put effort into defining and training a machine learning classifier. In our wind turbines example, a company like that probably cares a lot about power cables and connectors, but they’re unlikely to have them in their taxonomy. But they want to find them in suppliers portfolios since they don’t manufacture them themselves. In addition, if you want to know what’s going on inside a company like Samsung, for instance, that’s an awfully big taxonomy. They make everything: trains, generators, air conditioners, batteries, everything. And that’s where Cipher’s idea of the UTT came from. What if we could design a radically small taxonomy which covers every patent in the world and allows you to take any portfolio and break it down into the chief technologies so that you can understand companies outside of your industry or companies that overlap? That was the inspiration—to give our clients a way to categorize every patent in the world.
IFI CLAIMS: Tell us more about the Universal Technology Taxonomy and how it works.
Harris: We wanted to make the taxonomy reflect what actually happens in the patent world without any person or company’s bias brought to it. Because if you work in a particular field, you have life experience and bias based on the areas and technologies that you work on. But what we were interested in was what was actually happening and what company is actually filing patents and for what. So we took every patent granted in the U.S. in the last 10 years and used an automatic categorization algorithm to break them down into about 1,000 technologies. These were technologies discovered by the machine without any human input. Then we made notes on each of those technologies to figure out what they’re about. And then we grouped them together in clusters that were each roughly one percent of the granted patents in the last decade, evenly spread across all industries. We did add a few exceptions though. There are technologies like blockchain in machine learning where the rate of patenting was so high that they were going to be one percent soon, so we added exceptions for those.
Then we brought everything together into the superclasses. These superclasses were arranged by a person deciding which category everything belonged in. Underneath the superclasses are 114 subclasses. We were trying to make the UTT work across every industry and for every company, so it was important that the taxonomy reflected what was really going on in the patent world, rather than someone’s opinion of it. It took us about two years to build.
If you consider Blackberry, people think of Blackberry as the small phones and devices company, but actually their portfolio was quite diverse. So the UTT gives a way to put every patent inside the Blackberry portfolio into a category and then there are all kinds of analyses you can do. You can look at how often each area is cited or how litigious an area can be. Cipher did a study on Blackberry in order to show what the UTT can do.
IFI CLAIMS: What do your clients find most helpful about the UTT?
Harris: The aspect of being able to analyze portfolios outside of their core interest. In areas of their business where they file a lot of patents, they’re covered by the classifiers that match up to their internal terminology. But there are always patents you have to deal with that fall outside of that. If an assertion comes in from another company where they say, for example, that they have 2,000 patents that they think relate to your business that you might want to license, the UTT gives you an easy way to break those down and say, Well three quarters of these relate to areas where we don’t do business, so we’re not interested in licensing those ones, but let’s talk about the others. Or, if you’re trying to negotiate a cross license with another company in your space, you’ll want to know what portion of their portfolio doesn’t relate to you and what might be beneficial for you to license even if it’s not your core technology.
There are two ways of using the UTT. It gives you a general landscape, or you can apply it to a set of companies.
IFI CLAIMS: How do you address the problem that the CPC changes over time?
Harris: We don’t actually make that much use of CPC codes. They are a feature that we consider when we classify but they don’t really add that much value. There is a big gap between the way CPC codes are defined and the technology of a patent. The CPC codes exist to help the examiners find prior art. The UTT is mostly driven off the text of the patent, particularly the claims. In the machine learning world, there is a thing called embedding, which converts textual information into numbers. What the embedding does is normalize some of these differences. So if, for example, historically there were two CPC codes—one for a particular kind of polymer coating and another one for coatings with a particular kind of aerodynamic effect—and over time the patent office decides it’s important enough that they’re going to create a single CPC code that catches both, the embedding representation of the two separate CPC classes into one is very similar if not identical. So it deals with all of those changes. You see this all the time. Every year, the EPO announces new CPC codes. And they’re always related to earlier ones in some way. What the embedding does is capture some of those similarities and represent them numerically in a way that the classifier can understand them.
It’s a bit confusing. AI is a broad topic. Within AI, there is deep learning. But there is also a topic called machine learning, which is covered by deep learning. What we do inside Cipher is deep learning. I tend not to use the term AI because it’s technically ambiguous, plus people have all these weird connotations if you talk about artificial intelligence. They imagine it’s kind of the machine thinking for itself. You hear all these horror stories in the papers about robots taking over and replacing mankind. It’s all nonsense, but people have these perfectly reasonable preconceptions if you talk about AI. And it’s not like that at all. It’s just a bunch of math.
IFI CLAIMS: Why do you use IFI’s data?
Harris: We were looking for a company that processed the data quickly, that organized it in a way that was consistent across the different patent offices and that provided all the information we need. We feed the machine with a lot of different information in order to help it make the different classifications. We’ve got the title abstract claim, citations, CPC codes and a whole raft of things we use. When we evaluated the data, IFI was the best organized, the most up to date, the cleanest. I’ve worked in many fields and I think the source data that comes from the patent offices is awful. But I think IFI does the best job of tidying up and normalizing it. It just saves us a huge amount of time. We put a lot of effort into our data processing in order to run Cipher’s back end, and we save months of development effort by sourcing better data. It’s high value.
Cipher analytics software provides clients with the analysis needed to support their strategic IP decisions. It is a recognized leader in machine learning and deep learning on patent, licensing and litigation data to provide insight into complex technology landscapes. By using ML to categorize patents Cipher also offers sector-specific products with hundreds of pre-built technology categories for instant automatic categorization.