Using Machine Learning Models to Predict S&P500 Price Level and Spread Direction

semanticscholar(2020)

Cited 0|Views1
No score
Abstract
Pairs Trading is an investment strategy applied by many buy-side financial institutions. In pairs trading, instead of making investment decision based on future stock price level, it is based on prediction of the spread between two stocks in the future. Consider two stocks moving in the similar trend, following the same market dynamic and trading at a spread that is mean-reverting. If the spread widens between the two stocks, then we can short the overpriced stock and buy the underpriced stock. Later on as the spread between the stocks mean-reverts to the equilibrium level, the profit will be realized. Traditionally, the methodology applied in pairs trading typically lies in the linear regression, Time Series Analysis such as ARIMA, SARIMA as well as GARCH model what captures the stochasticity of the volatility. These classical methods have proved to be quite effective in the old regime of the financial world. However, as the world of finance shifted into a new regime, particularly after the dramatic improvement in the computing speed, faster diffusion of the news and more transparent market, the quantitative trading industry has started employing more innovative strategies, such as Machine Learning, Deep Learning and Reinforcement Learning algorithms. In this project, we seek to explore how to best apply learning algorithms in the pairs trading by implementing various machine learning models and compared against each other. We are inspired by the paper from (van der Have [2017]) and (Wu [2015]) and the models implemented by Alex Dai (Dai), where they modeled the spread between two stocks assuming Ornstein-Uhlenbeck (OU) process. Following their ideas, we selected co-integrated S&P 500 stock timeseries using cointegration test, and created input features using OU process, this process will be detailed in section 5.1. To make our model more realistic, instead of using the absolute spread level as the label, we convert the problem into a classification problem that predicts the direction of the future spread move, as it is much more feasible and easier to predict the general trend of the spread rather than the absolute level. Therefore, the label is 1 when the spread between the two stocks will tighten in the future and 0 otherwise, so if we can correctly predict a label then it is guaranteed to realize the profit. Our baseline model is the traditional time series model, then logistic regression, Gaussian Discriminant Analysis, Support Vector Machine (SVM), and Neural Network will be applied. Eventually the accuracy across different models are compared and analyzed.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined