How accurate are AI patent counts? A new tool suggests the standard measure misses most of them

When policymakers, investors, and researchers want to know who is winning the global race in artificial intelligence, they often turn to patent counts. But there’s a problem hiding in plain sight: the tools used to identify which patents actually count as “AI” are surprisingly unreliable. If the ruler is bent, every measurement built on top of it is off.

That’s the starting point for a new NBER working paper by Hanming Fang of the University of Pennsylvania and colleagues Xian Gu (Durham University Business School), Hanyin Yan (Tsinghua University), and Wu Zhu (Tsinghua University). The team built a more accurate classifier for identifying AI patents, applied it to more than 13 million patent records from the United States and China, and used the resulting dataset to document how AI innovation differs between the two countries.

A measurement problem at the foundation

In 2023, the U.S. Patent and Trademark Office released its Artificial Intelligence Patent Dataset, which used a machine learning model to flag AI-related inventions. The dataset has been widely used in economic research. But when the authors examined how well that classifier performed, they found it flagged AI patents with only about 40% precision and 38% recall. In plain terms: fewer than half of the patents it called “AI” actually were, and it missed more than 60% of the real ones.

That kind of noise creates serious problems for any downstream analysis of firm innovation, productivity, or technological competition. So the researchers set out to build a better tool.

Their approach, which they call the FGYZ classifier, fine-tunes a language model called PatentSBERTa that has been pre-trained on patent texts. They trained it on the same hand-labeled examples used to build the USPTO’s original classifier, splitting the data into training and test sets and validating performance across eight AI subfields: machine learning, natural language processing, speech, vision, planning, knowledge processing, hardware, and evolutionary computation.

On the test data, the new classifier hit 97% precision and 91% recall across seven of the eight subfields. (Evolutionary computation, which had only 128 labeled examples to train on, performed poorly and was dropped from later analyses.) The authors then ran two additional validation checks: they showed that patents flagged as AI by their classifier cite, and are cited by, other AI patents far more often than non-AI patents, and that the technical vocabulary of these patents closely matches that of a high-confidence AI benchmark set.

Extending the tool to China

Because Chinese patent filings include standardized English abstracts and claims, the researchers could apply the same classifier to records from the China National Intellectual Property Administration. To check whether the tool generalized, they looked at cross-border citations and lexical overlap. Chinese patents flagged as AI cited U.S. AI patents far more often than U.S. non-AI patents, and used vocabulary much closer to U.S. AI patents than to non-AI ones. The team took these patterns as evidence that the classifier, though trained on U.S. data, works reasonably well on Chinese filings.

Applying the classifier across both patent systems, they identified roughly 877,000 AI patents granted by the USPTO between 1976 and 2023, and about 652,000 AI patents granted by CNIPA between 2010 and 2023.

Convergence in what, divergence in how

With the cleaner dataset in hand, several patterns emerge. AI patenting grew rapidly in both countries, with acceleration after the mid-2010s. By 2023, AI patents made up roughly 20% of all patents granted in both countries. China surpassed the U.S. in annual AI patent counts starting around 2020. The mix of AI subfields is also broadly similar: planning, vision, and hardware are the three largest categories in both countries.

One notable divergence involves natural language processing. U.S. patenting in NLP grew steadily through the 2010s, while Chinese NLP patenting stayed modest until 2020, when it accelerated sharply, coinciding with the rise of large language models. The authors read this as evidence that the U.S. retains a lead in NLP, even as Chinese activity catches up quickly.

The organizational picture looks quite different across the two countries. In the U.S., AI patents come overwhelmingly from a small group of large multinationals: IBM, Microsoft, Google, Amazon, Samsung, Intel. These firms dominate nearly every subfield. In China, leading private firms like Tencent, Baidu, and Huawei appear at the top of most lists, but universities (Tsinghua, Zhejiang, UESTC) and state-owned enterprises (State Grid) also rank prominently, especially in hardware, planning, and vision.

Geography tells a similar story

The researchers also mapped where AI patents are produced. In the U.S., AI activity remains anchored in early hubs, mainly the San Francisco Bay Area and the Northeast Corridor, with some spillover to Austin and Seattle. Their “diffusion share” measure, which tracks the fraction of new AI patents coming from outside the original top-ten hubs, rose from about 28% in 1981 to around 48% by 2010, then flattened.

China started from a much more concentrated base (about 15% of AI patents came from outside the initial hubs in the mid-2000s) but has seen steadier geographic spread, with AI activity moving into provincial capitals further inland. The authors link this pattern to state-led initiatives aimed at distributing research capacity across regions.

Do these patents have economic value?

A long-running debate in the economics of innovation concerns whether China’s rapid patent expansion reflects genuine technological advances or is partly driven by government subsidies and administrative targets. The authors examined this by looking at stock market reactions to patent grants, following a method developed by Kogan and co-authors. The idea is that if a patent creates real economic value, investors should react to its announcement.

Across all AI subfields and in both countries, AI patents were associated with higher market value than non-AI patents. The premium was largest in software- and data-intensive domains like machine learning and NLP, and smaller in hardware-oriented areas. Chinese AI patents showed lower absolute values than U.S. ones (reflecting differences in market size and firm capitalization), but the relative AI premium over non-AI patents held in both countries. The authors interpret this as evidence that, at least for publicly listed firms, Chinese AI patents reflect economically meaningful innovation rather than simply paperwork generated to capture subsidies.

How knowledge flows between universities and firms

The researchers also studied citation patterns across different types of assignees. In the U.S., universities and research institutions cite mostly other academic work: roughly 90% of citations by U.S. institutional patents go to other institutional patents, for both AI and non-AI. The authors describe this as an “ivory tower” dynamic with limited direct engagement between academia and industry.

The Chinese pattern looks different. Private firms in China cite patents from state-owned enterprises and research institutions more often than patents from other private firms, and the pattern is stronger in AI than in non-AI fields. Academic and corporate sectors also cite each other more frequently. The authors interpret this as evidence that, contrary to the view that non-market Chinese patents are largely administrative, universities and SOEs produce knowledge that private firms actively build on.

Competition without decoupling

Finally, the team looked at whether rising U.S.-China tensions have led to technological separation in AI. Using a citation-propensity measure that adjusts for the rapid growth of both countries’ patent pools, they found the opposite: cross-border citations have generally risen over time in both AI and non-AI fields. The relationship is asymmetric, though. Chinese AI inventors cite U.S. patents more intensively than Chinese non-AI inventors do, while U.S. AI inventors cite Chinese patents less intensively than U.S. non-AI inventors do. The authors read this as “asymmetric cross-border learning”: Chinese AI researchers lean heavily on the U.S. frontier, while U.S. researchers draw on Chinese work more selectively, particularly outside core AI domains.

A caveat worth flagging: the classifier, while validated across several dimensions, was trained on U.S.-labeled data, and the authors acknowledge its weakness in evolutionary computation, which they drop from analysis. Market-value estimates also apply only to publicly listed firms, leaving out the many smaller and private firms, universities, and SOEs that produce a significant share of patents, especially in China.

How accurate are AI patent counts? A new tool suggests the standard measure misses most of them

Related Posts

New research shows exactly where virtual avatars fall short of human spokespeople

The pronoun trick that makes virtual influencers feel more human

Can generative AI unlock employee creativity? Only with the right psychology, study finds

Can AI read the market’s mood? Researchers test DeepSeek sentiment scores against Shanghai stock returns

Follow us