To request a copy of Xin Man and Dr. Ernest Chan’s “Cluster-based Feature Selection” paper, please enter your name and email below. If you have any questions, feel free to contact us directly at firstname.lastname@example.org
Feature importance in machine learning indicates how much information a feature contributes when building a supervised learning model, so we can exclude uninformative features from the predictive model (“feature selection”). It also improves human interpretability of the resulting model. Recently, Man & Chan (2021) compared the stability of features selected by different methods such as MDA, SHAP, or LIME when they are subject to the computational randomness of the selection algorithms. In this article, we study whether the cluster-based MDA (cMDA) method proposed by López de Prado, M. (2020) improves predictive performance, feature stability, and model interpretability. We applied cMDA to two synthetic datasets, a clinical public dataset and two financial datasets. In all cases, the stability and interpretability of the cMDA-selected features are superior to MDA-selected features.