cMDA Feature selection can improve both predictive accuracy and intuition about regime-dependence of your trading strategy.
- Cluster together features are that are similar and should receive the same importance rankings
- Rank these Clusters by importance scores.
- Use only clusters with high scores to predict profitability.
- Improves out-of-sample predictive accuracy by building more parsimonious models. (See our recent paper.)
Example: S&P 500 index excess monthly returns
- Using a public dataset as described in this Paper. The dataset is available under the “Get Database” tab as SPX train and test data.
- Based on the paper, the dependent variable is equity premium, that is, the total rate of return on the stock market minus the prevailing short-term interest
- Independent variables are stock characteristics including fundamental and technical specifications like:
- dfy : Default yield spread
- infl : Inflation
- svar : Stock variance
- d/e : Dividend payout ratio
- lty : Long term yield
- tms : Term spread
- tbl : Treasury-bill rate
- dfr : Default return spread
- d/p : Dividend price ratio
- d/y : Dividend yield
- ltr : Long term return
- e/p : Earning price ratio
- and more…
- cMDA Feature selection as you can see from the feature plot tells us:
- 2 clusters (C_1, C_2) were found to be most important in terms of feature importance
- On closer inspection, two clusters can clearly be interpreted as fundamental vs technical indicators.
- Due to robust nature of cMDA, the feature rankings don’t change: fundamental indicators are always found to be more important than technical indicators in all 100 runs with different random seeds.