Can ChatGPT beat the S&P 500? Eight months of daily picks suggest no

Ask a popular chatbot to build you a stock portfolio and it will probably oblige, no license required. That willingness has attracted a growing number of retail investors who now treat large language models as informal financial advisors. But what exactly are these systems recommending, and do the resulting portfolios beat the market?

A new NBER working paper by Bruce I. Carlin of Rice University, Ryan D. Israelsen of Michigan State University, and Christopher F. Wazzan of the University of California, Berkeley attempts to answer those questions by tracking AI-generated portfolios in real time, day after day, for roughly eight months.

The question behind the experiment

Prior studies of AI stock picking have tended to rely on historical data, testing how well a model would have performed if it had been used in the past. That approach has a weakness: the models were trained on data that may include the very period being tested, which can inflate apparent performance. The authors wanted to see how AI behaves when it is making forward-looking calls in real time, knowing nothing about returns that have not yet happened.

They were also interested in the behavior a typical household would encounter, rather than what a sophisticated analyst could coax from a model with careful prompt engineering. The queries they used look like what a curious retail investor might type into a chatbot: build me a portfolio to beat the S&P 500 over the next year.

How the study was built

Starting in August 2025, the researchers began submitting daily prompts to several large language models: OpenAI’s ChatGPT 5.0 and 5.2, Anthropic’s Claude Sonnet 4.5, Google’s Gemini 2.5 Flash, and xAI’s Grok 4.1 Fast. Some prompts asked for a buy-and-hold portfolio. Others asked the model to actively manage a portfolio it had previously recommended, feeding yesterday’s holdings back in each morning.

In pre-tests, the team found that the specific wording of a prompt did not meaningfully change the recommendations. Asking for stocks to beat an index produced similar results as asking the model to pretend it was a financial advisor. Each day, the researchers recorded the tickers, weights, justifications, search queries, and websites the models visited. They then paired these with daily open-to-open returns from Compustat and news article counts from Ravenpack.

By April 2026, the dataset covered more than 1,200 portfolio-day observations across seven actively managed portfolio variants and five buy-and-hold prompts.

What AI actually picks

The first striking pattern is concentration. ChatGPT 5.0 tended to hold roughly 18 to 19 stocks, but Gemini’s portfolios held a median of just 4.5 names. Across the board, the AI-selected portfolios were heavily tilted toward a small group of large technology companies. ChatGPT 5.0 put 18 to 20 percent of portfolio wealth into Nvidia alone throughout the sample. And as the models actively managed their portfolios, concentration tended to increase over time rather than diversify away.

Industry composition tells the same story. Semiconductors made up an average of 41 percent of AI portfolios, compared with about 20 percent of the S&P 500 and 18 percent of the broader U.S. equity market. Computer hardware was similarly overrepresented. Banks and other financial firms, which make up about 10 percent of the S&P 500, did not even crack the top seven industries in ChatGPT’s picks.

The authors also measured risk-taking. When left unconstrained, the models chose high-beta stocks, meaning stocks that tend to swing more than the market. The average beta of a selected stock was about 1.6. When the researchers explicitly asked one version of ChatGPT to keep portfolio beta between 0.9 and 1.1, it persistently hugged the upper bound and occasionally broke through it.

Selected stocks were also much larger than the typical public firm, had lower book-to-market ratios (a growth tilt), and showed positive recent momentum. Perhaps most telling, they received about ten times as many news articles as the average Compustat firm.

Why the models pick what they pick

To understand what drives selection, the researchers ran statistical models predicting whether a stock appeared in an AI portfolio and how much weight it received. Size, momentum, high market beta, and low leverage all predicted selection. News coverage mattered as well, though it was tangled with size (large firms are written about more often).

The models’ own search behavior reinforces this picture. When prompted, the AI systems visited an average of 16 websites per query. About two-thirds of those visits were to corporate websites, with heavy traffic toward semiconductor, computer, and data services firms. The researchers interpret this as evidence that media attention and firm prominence play a central role in AI stock selection beyond what traditional financial factors explain.

Do the portfolios actually beat the market?

At first glance, the AI portfolios look good. Cumulative returns ran above the S&P 500 for all seven active portfolios through the sample’s end. Sharpe ratios, a measure of return per unit of risk, were higher than the benchmark across all but one specification.

But the authors argue that this apparent outperformance mostly reflects the portfolios’ tilts rather than genuine stock-picking skill. To separate the two, they used a technique from a 1997 paper by Daniel, Grinblatt, Titman, and Wermers, which compares each stock’s return against a benchmark of peer stocks matched on size, book-to-market, and momentum. If an AI simply loaded up on large growth stocks during a period when large growth stocks did well, the characteristic-matched benchmark will also have done well, and the apparent alpha will disappear.

That is largely what happened. For the buy-and-hold portfolios, excess returns over the S&P 500 were mostly statistically insignificant across horizons from one day to six months. The sole significant result, at the six-month horizon, shrank and lost statistical significance after the peer-group adjustment. For the actively managed portfolios, daily adjusted returns were positive on average but not statistically distinguishable from zero, with one borderline exception (Gemini) based on only 18 trading days.

Caveats and implications

The authors flag several limits. The sample for Claude, Gemini, and Grok is short, covering roughly a month of unusually strong market performance. The buy-and-hold return series has overlapping windows, which the researchers address with a specialized standard-error estimator. And the study examines how AI responds to queries a typical household might ask, not what a professional could extract with refined prompts.

For retail investors, the descriptive findings may matter more than the performance tests. Anyone asking a chatbot for a portfolio is likely to receive a concentrated set of large, high-beta technology stocks heavily tilted toward whichever firms dominate recent news coverage. That kind of portfolio exposes a household to substantial idiosyncratic risk, leaves entire sectors of the economy unrepresented, and, based on the authors’ analysis, does not appear to deliver returns beyond what comparable stocks would produce anyway.

Carlin and his colleagues write that their findings suggest “more oversight is needed to assure that people do not mis-use this powerful source of information and experience welfare losses.” They also note that AI providers could respond by offering formal investment services with appropriate guardrails. The project is ongoing, and the authors say they will continue adding data and revising conclusions as the sample grows.

Can ChatGPT beat the S&P 500? Eight months of daily picks suggest no

Related Posts

Do glowing words really sell? An economist finds a small but real puffery effect on Airbnb

Do we overestimate how much a raise will improve our lives? A new experiment says yes

Who buys Bitcoin, and why? Researchers examine the traits behind the decision

The genetic lottery weighs in: An extra year of school appears to boost earnings by 8%

Follow us