Carlotta Montorsi: Tree based algorithms: implementing Conditional Inference Trees and Forest for old age Frailty Index predictions

Machine Learning Seminar presentation

Topic: Tree based algorithms: implementing Conditional Inference Trees and Forest for old age Frailty Index predictions.

Speaker: Carlotta Montorsi, Luxembourg Institute of Socio-Economic Research (LISER)

Time: Wednesday, 2021.02.24, 12:00 CET

How to join: Please contact Jakub Lengiewicz

Abstract:

Tree-based algorithms are prediction algorithms introduced by Morgan and Sonquist (1963) and popularized by Breiman et al. (1984) almost 20 years later. These algorithms aim at predicting an outcome “out of sample” based on a number of covariates. This is done by partitioning the space of the regressors in non-overlapping regions. When the task is regression, the predicted income is simply the average outcome of units reaching each terminal node. Various methods to grow trees avoiding overfitting exist: Conditional Inference Trees introduced by Hothorn et al. (2006) prevent overfitting by growing the tree conditioning the splitting to a sequence of statistical tests.

In the upcoming presentation, I will present a brief theoretical introduction to the Tree-based model with a particular focus on Conditional Inference Trees and Conditional Inference Forest. Thus, I will show its implementation on real data, namely for predicting a Frailty Index of individuals aged 50+ from different European Countries. Moreover, I will compare the predictive performance of these algorithms with other traditional ML methods and for different subsamples of the training set. Finally, I will discuss the “best predictors” identified by the most accurate of these algorithms in the different subsamples.

Additional material:

[1] Morgan, J. N., and  Sonquist,J. A. (1963). Problems in the Analysis of Survey Data, and a Proposal, Journal of the American Statistical Association, 58(302), 415– 34.
[2] Breiman, L.,Friedman,J.,Stone,C. and R. Olshen (1984). Classification and Regression Trees, Taylor & Francis, Belmont, https://doi.org/10.1201/9781315139470.
[3] Hothorn, T.,Hornik, K. and Zeileis, A. (2006). Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics.
[4] Brunori, P. and Neidhöfer, G. (2021). The Evolution of Inequality of Opportunity in Germany: A Machine Learning Approach. Review of Income and Wealth.