Awesome

FinML: A Practical Machine Learning Framework for Dynamic Stock Selection

Abstract:

Stock recommendation is vital to investment companies and investors. However, no single stock selection strategy will always win while analysts may not have enough time to check all S&P 500 stocks (the Standard & Poor’s 500). In this paper, we propose a practical scheme that recommends stocks from S&P 500 using machine learning. Our basic idea is to buy and hold the top 20% stocks dynamically. First, we select representative stock indicators with good explanatory power. Secondly, we take five frequently used machine learning methods, including linear regression, ridge regression, stepwise regression, random forest and generalized boosted regression, to model stock indicators and quarterly log-return in a rolling window. Thirdly, we choose the model with the lowest Mean Square Error in each period to rank stocks. Finally, we test the selected stocks by conducting portfolio allocation methods such as equally weighted, mean- variance, and minimum-variance. Our empirical results show that the proposed scheme outperforms the long-only strategy on the S&P 500 index in terms of Sharpe ratio and cumulative returns.

Index Term:

Stock recommendation, fundamental value investing, machine learning, model selection, risk management

Project summary：

We developed a practical approach to using machine-learning methods selecting S&P 500 stocks based on financial ratios (e.g., EPS, ROA, ROE, etc). Outperformed the S&P 500 index on out of sample data, achieved a Sharpe ratio of 0.5 (0.19 on SPX).
We performed feature selection by 11 GICS sectors based on a rolling window to choose the lowest MSE model among Linear Regression, Stepwise Regression, Regression with Ridge, Random Forest, and GBM. Applied a model ensemble method.

Data:

Retrieved from WRDS (Wharton Research Data Services), Compustat Industrial [27 years daily and quarterly Data]

S&P 500 Fundamental Quarterly Data (fundamental_final_table.xlsx)
- Database: Compustat North America (Fundamentals Quarterly) and (Index Constituents)
- Timeline: 27 years (1990-2017)
- Tickers: 1193 stock (all historical S&P 500 component stocks)
- Value: 20 financial ratios calculated from raw accouting report data
S&P 500 Historical Component Stocks Adjusted Daily Price (1-sp500_adj_price.csv.zip)
- Database: Compustat North America (Security Daily)
- Timeline: 27 years (1990-2017)
- Tickers: 1193 stock (all historical S&P 500 component stocks)
- Value: Adjusted Daily Close Price
S&P 500 Index Daily Price (1-spx_price.xlsx)
- Database: Yahoo Finance
- Timeline: 27 years (1990-2017)
- Tickers: SPX
- Value: Adjusted Daily Close Price

Code:

Focasting Model:

Input: 11 Excel files of cleaned data about fundamental financial ratios (sector 10-Energy, sector 15-Materials, sector 20-Industrials, sector 25-Consumer Discretionary, sector 30-Consumer Staples, sector 35-Health Care, sector 40-Financials, sector 45-Information Technology, sector 50-Telecommunication Services, sector 55-Utilities, sector 60-Real Estate)
Python Script: 2 Scripts
- ml_model.py: The forecasting function (cornerstone of this project)
- fundamental_run_model.py: The main function to run the forecasting model


python3 fundamental_run_model.py \
  -sector_name sector10 \
  -fundamental Data/fundamental_final_table.xlsx \
  -sector Data/1-focasting_data/sector10_clean.xlsx

Old R Script: 3 R Scripts
- fundamental_run_model.R: The main function to run the forecasting model
- fundamental_ML_model.R: The forecasting function (cornerstone of this project)
- fundamental_select_stock.R: The function to select top 20% stocks in each sector
Output: a CSV file includes tic: the stock name, predicted_return: predicted return of next quarter by our model, trade_date: the date to execute the trades

Portfolio Allocation:

Input: 2 files
- The CSV file generated by forecasting model
- The adjusted close price data of S&P 500 stocks to calculate covariance matrix
Script: fundamental_portfolio.ipynb
Output: 3 Excel files each with the following 4 columns
1. tic: the stock name
2. predicted_return: predicted return of next quarter by our model
3. weights: the weights to trade
4. trade_date: the date to execute the trades

Back-testing Model:

Input: 5 files
- equally_weighted: equally-weighted portfolio (Portfolio Benchmark)
- mean_weighted: mean-variance portfolio
- minimum_weighted: minimum-variance portfolio (our model)
- adjusted daily close price of S&P 500 stocks: to calcualte quarterly return
- SPX adjusted daily close price: The Market Index (Overall Benchmark)
Script: 1 Python jupyter notebook Script
- fundamental_back_testing.ipynb: The back-testing function
Output:
1. Quarterly return of our portfolio with transaction cost
2. Performance Evaluation: total return, annulized return and standard deviation, maximum drawdown, Sharpe ratio

An IEEE TrustCom 2018 Paper (http://www.cloud-conf.net/trustcom18/)

Hongyang Yang, Xiao-Yang Liu, and Qingwei Wu. 2018. A practical machine learn-ing approach for dynamic stock recommendation. In IEEE TrustCom/BiDataSE,2018.1693–1697. Download from (https://ieeexplore.ieee.org/abstract/document/8456121) and (https://ssrn.com/abstract=3302088)