Wuppertal. Deep Neural Networks Overcome the Curse of Dimensionality for Stochastic Control Problems

Matthias Ehrhardt

2 years ago

I’m Konrad Kleinberg, a PhD student in the Applied and Computational Mathematics (ACM) working group at the University of Wuppertal, Germany.

Within the broader scope of the DFG-funded priority program 2298, Theoretical Foundations of Deep Learning, one of our primary goals is to rigorously understand why deep neural networks are so effective at solving high-dimensional problems, specifically stochastic control problems.

Consider the following scenarios: A robot navigating through a cluttered environment must determine a sequence of actions to reach a target while avoiding obstacles. Each action influences future decisions and the robot’s position, and it requires processing high-dimensional sensor data (such as camera and LiDAR inputs). In finance, particularly in portfolio optimization, traders make sequential decisions about asset allocations. Here, high-dimensionality arises from the large number of assets and complex market data, with each decision affecting future portfolio values and associated risks. These are just two applied examples of sequential decision-making problems in which
agents interact with their environment in order to maximize some specified accumulated reward over the whole time period. Specifically one is interested in identifying an optimal strategy – a decision rule that prescribes optimal actions in each given state of the system.

Mathematically, such stochastic control problems are typically modeled with Markov decision processes (MDPs) – stochastic processes in which transition probabilities are – to some degree – influenced by the agent’s actions. In real-
world applications like the examples mentioned, the state space of an MDP can be high-dimensional. To find an optimal strategy that maximizes cumulative rewards, one must approximate the optimal value function, the solution to a specific (stochastic) fixed-point equation known as the Bellman equation. Solving this equation in high-dimensional settings is notoriously difficult due to the curse of dimensionality (CoD) [1]. In fact, it has been shown (cf. [2], [3]) that all deterministic algorithms suffer from this curse, meaning that the computational effort required to approximate the solution of the Bellman equation grows exponentially with the dimension of the state space.

We approached this issue by combining ideas from the full-history recursive multilevel Picard approximation method, which was recently introduced to solve certain nonlinear partial differential equations [4], and ideas from Q-learning [5]. By reformulating the original Bellman equation and introducing a Monte Carlo method that is based on the above mentioned ideas, we approximate the solution of the Bellman equation such that the computational effort grows at most polynomially with both the dimension of the state space and the reciprocal of the desired approximation accuracy [6]. In particular, this algorithm overcomes the curse of dimensionality.

In order to rigorously prove that deep neural networks are able to overcome the curse of dimensionality in the approximation of the solution of the Bellman equation, we aim to demonstrate (work in progress; see [7] for similar
techniques) that – combined with stability and complexity analyses – the aforementioned algorithm can be represented through deep neural networks. This enables the approximation of the solution of the Bellman equation such that the number of parameters in deep neural networks grows at most polynomially with both the dimension of the state space and the reciprocal of the approximation accuracy, thus overcoming the curse of dimensionality.

There remain many open questions about why deep learning is as effective as the numerous results in various applications suggest.

References

[1] Bellman R., (1957) Dynamic Programming, Princeton University Press
[2] Chow C.-S., Tsitsiklis J. N., (1989), The complexity of dynamic programming, 1989, J. Complexity 5, 4, 466-488
[3] Chow C.-S., Tsitsiklis J. N., (1991), An optimal one-way multigrid algorithm for discrete-time stochastic control, IEEE Trans. Automat. Control 36, 8, 898-914
[4] Hutzenthaler M., Jentzen A., Kruse T., Nguyen T. A., and von Wurstemberger P., (2020), Overcoming the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations, Proc. Royal Soc. A. 476, 2244
[5] Watkins C., (1989), Learning from delayed rewards, Ph.D. Thesis, University of Cambridge
[6] Beck C., Jentzen A., Kleinberg K., Kruse T., (2023), Nonlinear Monte Carlo methods with polynomial runtime for Bellman equations of discrete time high-dimensional stochastic optimal control problems, arXiv:2303.03390
[7] Hutzenthaler M., Jentzen A., Kruse T., Nguyen T. A., (2020), A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations, SN Partial Differ. Equ. Appl. 1, 10