Gaussian Process Regression for FX Forecasting

A Case Study





Summary

These documents show the start-to-finish process of quantitative analysis on the buy-side to produce a forecasting model. The code demonstrates the use of Gaussian processes in a dynamic linear regression. More generally, Gaussian processes can be used in nonlinear regressions in which the relationship between xs and ys is assumed to vary smoothly with respect to the values of the xs. We will assume that the relationship varies smoothly with respect to time, but is static across values of xs within a given time. Another use of Gaussian processes is as a nonlinear regression technique, so that the relationship between x and y varies smoothly with respect to the values of xs, sort of like a continuous version of random forest regressions.

The full code is available as a github project here. As I'm attempting to show how an analyst might use R or Python, coupled with Stan, to develop a model like this one, the data processing and testing has been done alongside extensive commentary in a series of R Studio Notebooks. These were compiled to html and are linked below.

Unless you take a deep personal interest in the trials and tribulations of data sourcing and cleanup, you will probably want to skip ahead to Section 3, Model Overview and Simplest Implementation. I've omitted subheadings for the first two sections to encourage this behavior.

Process

  1. Gathering Data

    Initial raw data retrieval and cleanup.
  2. Calculating Factors

    Turning raw data into normalized factors.
  3. Model Overview and Simple Implementation

    State the full model, then start by validating with a single factor, single asset version using known parameters.
  4. Model Validation - Single X, Multiple Y

    Validate a single factor, multiple asset version using known parameters.
  5. Model Validation - Full Model

    Validate the full multi-factor, multi-asset version using known parameters.
  6. Forecasting

    Test the model using the actual data.
  7. Backtesting

    Test all historical weeks using a processor cluster
  8. [In Progress] Optimizing

    Compare the in-sample results of a few options for optimized portfolios built on the basis of these forecasts.

Conclusions

In these pages, we built from scratch a time-varying, factor-based model to forecast weekly FX returns. We started with a simple, univariate Gaussian Process regression, then added complications to the model one by one, validating each step as we went. We now have a full backtest of the factor model, and we’ve validated the forecasts against actual values. We’ve also confirmed that the signals from these factors are swamped by the noise of weekly movements.

The true model for returns in any asset class is the combination of carry and price movements. In FX, the carry is relatively stable but subject to punctuated equilibrium as new data comes to light. In developed markets, this data primarily consists of central bank rate decisions and economic data that might affect those decisions. Price movements, however, are the deterministic result of countless iterative, interacting agents. While we may know many of these agents' motives, it is not possible to aggregate their behavior with any accuracy, because the actions of each agent are affected by those of all of the other agents, and small measurement errors compound. It is impossible to tell, for example, the periodicity of market data. A day's worth of price movements at 5-minute increments looks the same as a year's worth of daily movements. The asset price measures the result of a chaotic, nonlinear dynamical system.

So why bother? The chaotic nature of asset prices is the reason I believe no amount of layers of neural nets—or of any other algorithm which takes as an assumption that the data is stochastic—will be able to predict market movements accurately on any human timescale. However, we know that for currencies, interest rate expectations (i.e., forecasts of carry) and other fundamental factors impact the decision-making of the agents I described above. The hope is that, over time, the small differences between predicted means, plus better-than-nothing modeling of the relationships between these factors, and the relationships between the predictions, will add up. The investor will have an edge in harvesting the carry over a portfolio of currencies, whilst weathering idiosyncratic price movements.

What Next?

The next step would be to improve the risk forecasting by directly modeling the covariance of assets within this model. We're leaving that to one side until the Stan algorithms improve to permit faster fitting. As we have used generative modeling, we have the covariance of the forecasts in addition to their point estimate, which should be equivalent to the median. An asset manager presented with asset forecasts will next have to build a portfolio with weights optimized according to those forecasts, plus an estimate of asset risk. If we are satisfied with closed-form Markowitz optimization, we could integrate that step directly into our Stan code by generating optimized weights in the "generated data" block directly after creating our Y forecasts. However, this would lead us to yet another distribution of possible best weights, and at some point we need to collapse the wave function and come to a decision. With judicious risk management, we can build a winning portfolio on this basis that is mostly uncorrelated to equities or other traditional assets.

Why not a Kalman Filter?

The Kalman filter, especially in later iterations such as the Unscented Kalman Filter or Van Der Merwe's Sigma Point Kalman filter, provides a powerful and computationally efficient method of tracking the movement of an endogenous time series given a set of correlated, but error-prone, exogenous time series. As it has a closed form solution, and operates under the Markovian assumption that $D_t \perp D_{t-(2...)} | D_{t-1}$, i.e. that all information about the past is encapsulated in the last observation, it is particularly suitable for use in a production environment.

However, in order to achieve this efficiency, the Kalman filter throws away anything we might learn about the nature of past intertemporal relationships, beyond what we can see in the means and covariance matrices for all variables. It also presumes Gaussian distributions. Advances on the original filter relax the Gaussian assumption, but not the Markovian one.

With a Gaussian process (GP), we can assume that parameters are related to one another in time via an arbitrary function. The disadvantage in comparison to Kalman filters is that we will wind up inverting a matrix of size T, where T is the total number of time periods in which we are interested, in order to calculate parameter values. GPs are also not necessarily solvable, and so we must rely on MCMC or its variants to evaluate the posterior distribution.

With advances in processing power, this is less of a problem than it used to be. An upcoming version of Stan is promising GPU-powered matrix inversion, and should really kick off the use of GPs in production.

Acknowledgements

In preparing these notebooks, I referred heavily to the Stan Manual, and the work of Jim Savage, Rob Trangucci, and Michael Betancourt. I had my initiation into Bayesian forecasting from Jose Mario Quintana, currently principal at BEAM, LLC.

For backtesting, this research was supported in part through computational resources provided by Syracuse University, particularly the OrangeGrid distributed computing system. I also enjoyed the support of Syracuse University's Cyberinfrastructure Engineer, Larne Pekowsky. OrangeGrid is supported by NSF award ACI-1341006, and Larne is supported by NSF award ACI-1541396.

Questions, Comments, Short Speeches?

Please feel free to contact me on Twitter at @CRTNaylor, or me.