Imagine you manage money for a small fund and want to spread your investment across a thousand stocks in a way that keeps the overall ride as smooth as possible. To do that, you need to estimate how every stock moves in relation to every other stock. The problem is that this web of relationships shifts constantly, and the math you rely on tends to break down when you have many stocks but only a limited stretch of history to learn from.
A team of researchers set out to investigate whether a neural network could handle this estimation task better than the standard statistical tools. Their work appears in The Journal of Finance and Data Science, and it offers evidence that a network designed around the underlying math of portfolio theory can produce portfolios with lower realized risk and steadier returns than several widely used methods.
The estimation problem at the heart of portfolio building
Since Harry Markowitz introduced modern portfolio theory in 1952, investors have tried to combine assets so that their ups and downs partly cancel out. A common goal is the global minimum-variance portfolio, which is the mix of stocks expected to swing the least. Building it requires a covariance matrix, a large grid that records how each pair of stocks tends to move together.
Here is where things get difficult. When you estimate this grid from past returns, small amounts of random noise contaminate it, and the contamination grows worse as the number of stocks approaches the number of days in your sample. The researchers explain that this noise inflates the actual risk of the resulting portfolio, sometimes by a wide margin. Stretching your sample over more years does not solve the issue, because markets change over time, so old data describes a market that no longer exists.
To clean up the noise, statisticians adjust the matrix using a family of techniques that tweak its “eigenvalues,” which can be thought of as the strengths of the independent directions of risk hidden inside the data. A leading method called nonlinear shrinkage pulls these values toward the middle to dampen noise. The authors point out a limitation: these methods are tuned to make the matrix mathematically close to some ideal version of itself, which is not the same as making the final portfolio behave well.
A network shaped like the math it replaces
Christian Bongiorno of Universite Paris-Saclay and CentraleSupelec, working with Efstratios Manolakis of the University of Catania and Rosario N. Mantegna of the University of Palermo and the Complexity Science Hub in Vienna, took a different approach. Rather than training a network to copy a cleaned covariance matrix, they trained it to directly minimize the thing investors actually care about: the realized risk of the portfolio over the days that follow.
Their network is built in three connected pieces, each mirroring a step in the classic calculation. The first piece is a learnable filter that decides how much weight to give recent versus older returns, while taming extreme outliers. The second, and most involved, piece cleans the eigenvalues. The third converts each stock’s individual volatility into a usable scale. All three are trained together, so the system learns the whole pipeline at once instead of optimizing each part separately.
The eigenvalue-cleaning piece deserves a closer look. The researchers needed an architecture that respects a basic rule: reshuffling the order of the eigenvalues should not change the answer. They drew on a physics analogy in which eigenvalues behave like charged particles that repel their neighbors, an idea suggesting that each value is influenced mostly by those near it. To capture this, they used a bidirectional recurrent network, a type of model that reads a sequence in both directions and carries forward a memory of what it has seen. Treating the ranked eigenvalues as a sequence let the model learn local interactions without its size ballooning as the number of stocks grows.
That design choice has a practical payoff. Because the network’s parameters are shared across stocks and across eigenvalue ranks, a single trained model does not depend on how many assets it sees. The team calibrated it on panels of between 50 and 350 stocks and then applied it, without any retraining, to a universe of 1,000 stocks.
Testing on a quarter-century of US stocks
The researchers assembled a large panel of US equities listed on the NYSE or NASDAQ, spanning 1990 through 2024 and covering roughly 2.7 million stock-day pairs. They applied filters to keep only liquid, tradable stocks and took care to use only information available before each trading day, guarding against the trap of accidentally peeking at future data. They also removed stocks about to be delisted, since these can quietly distort results.
They evaluated the model in two ways. The first was a frictionless test that ran 1,000 separate five-year simulations on portfolios of 300 stocks, ignoring trading costs. In the unconstrained version, where short-selling is allowed, the network reached a Sharpe ratio of 1.01, a measure of return earned per unit of risk, ahead of the nonlinear shrinkage method at 0.94. When short-selling was forbidden, a more realistic setting, the network produced an annualized volatility of 13.5 percent, the lowest in the group, and the highest Sharpe ratio at 0.79.
The second test was built to resemble real trading. The researchers wrote a simulator modeling an Interactive Brokers account, including commissions, exchange and regulatory fees, slippage, financing charges on borrowed money, and dividends, all aligned to the actual exchange calendar. They ran a single continuous simulation on the 1,000 largest stocks from 2000 through 2024, rebalancing every five trading days and starting with one million dollars.
In this setting, the network again delivered the lowest short-term variance and the best Sharpe ratio at 1.06, with the static “Average Oracle” method second at 0.94 and a dynamic correlation model third at 0.93. Its tail-risk measures, which capture losses on the worst days, were also the most favorable. The authors note that the network’s edge in returns grew larger after 2012, the opposite of what many machine-learning trading studies report.
What the network learned, and what it cannot do
Because the architecture maps onto the underlying math, the researchers could inspect what each piece learned. The lag filter settled on a gentle power-law decay, weighting the recent past somewhat more heavily, rather than the sharper exponential decay used in many classic models. The eigenvalue cleaner compressed the bulk of the spectrum into a narrow band and made it nearly insensitive to the specific stocks fed in, echoing earlier findings that, under strong market change, fixed eigenvalue corrections can work surprisingly well. The volatility piece flattened the low end and stretched the high end of the volatility range.
The model came with trade-offs. Its turnover was high, around 57 percent of the portfolio every five days in the realistic test, compared with 18 percent for the Average Oracle. That means more trading and more cost, though the network’s risk advantage held up even after those costs were charged. The authors suggest a turnover penalty could be added externally if needed.
The researchers are also clear about the boundaries of their method. The cleaning step adjusts only the eigenvalues and leaves the eigenvectors, the directions of risk themselves, untouched. They write that the model “should not be interpreted as a fully dynamic covariance forecaster or as a mechanism that detects regime shifts,” but rather as a data-driven way to filter out finite-sample noise. They outline several extensions, including conditioning the cleaning on market conditions and denoising the eigenvectors as well.
One more finding stands out for practitioners. Although the network was trained to build an unconstrained portfolio that can hold short positions, the researchers show its learned covariance estimate can be plugged into a standard long-only optimizer with almost no loss of its advantage. As an open-ended caveat, the results come from US equities over a specific historical window, and the strong performance of a static fixed-eigenvalue baseline is a reminder that added complexity does not always pay off in this domain.




