Feature Selection

cMDA Feature selection can improve both predictive accuracy and intuition about regime-dependence of your trading strategy.

  • Cluster together features are that are similar and  should receive the same importance rankings
  • Rank these Clusters by importance scores.
  • Use only clusters with high scores to predict profitability.
  • Improves out-of-sample predictive accuracy by building more parsimonious models. (See our recent paper.)

Example: S&P 500 index excess monthly returns

  • Using a public dataset as described in this Paper. The dataset is available under the “Get Database” tab as SPX train and test data.
  • Based on the paper, the dependent variable is equity premium, that is, the total rate of return on the stock market minus the prevailing short-term interest
    rate.
  • Independent variables are stock characteristics including fundamental and technical specifications like:
    • dfy : Default yield spread
    • infl : Inflation
    • svar : Stock variance
    • d/e : Dividend payout ratio
    • lty : Long term yield
    • tms : Term spread
    • tbl : Treasury-bill rate
    • dfr : Default return spread
    • d/p : Dividend price ratio
    • d/y : Dividend yield
    • ltr : Long term return
    • e/p : Earning price ratio
    • and more…
  • cMDA Feature selection as you can see from the feature plot tells us:
    • 2 clusters (C_1, C_2) were found to be most important in terms of feature importance
    • On closer inspection, two clusters can clearly be interpreted as fundamental vs technical indicators.
    • Due to robust nature of cMDA, the feature rankings don’t change: fundamental indicators are always found to be more important than technical indicators in all 100 runs with different random seeds.