A random forest guided tour

G Biau, E Scornet - Test, 2016 - Springer
The random forest algorithm, proposed by L. Breiman in 2001, has been extremely
successful as a general-purpose classification and regression method. The approach, which …

[HTML][HTML] Trend analysis of climate time series: A review of methods

M Mudelsee - Earth-science reviews, 2019 - Elsevier
The increasing trend curve of global surface temperature against time since the 19th century
is the icon for the considerable influence humans have on the climate since the …

Text as data

M Gentzkow, B Kelly, M Taddy - Journal of Economic Literature, 2019 - aeaweb.org
An ever-increasing share of human interaction, communication, and culture is recorded as
digital text. We provide an introduction to the use of text as an input to economic research …

Deltagrad: Rapid retraining of machine learning models

Y Wu, E Dobriban, S Davidson - International Conference on …, 2020 - proceedings.mlr.press
Abstract Machine learning models are not static and may need to be retrained on slightly
changed datasets, for instance, with the addition or deletion of a set of data points. This has …

The geometry of culture: Analyzing the meanings of class through word embeddings

AC Kozlowski, M Taddy… - American Sociological …, 2019 - journals.sagepub.com
We argue word embedding models are a useful tool for the study of culture using a historical
analysis of shared understandings of social class as an empirical case. Word embeddings …

Estimation and inference of heterogeneous treatment effects using random forests

S Wager, S Athey - Journal of the American Statistical Association, 2018 - Taylor & Francis
Many scientific and engineering challenges—ranging from personalized medicine to
customized marketing recommendations—require an understanding of treatment effect …

Integrated machine learning methods with resampling algorithms for flood susceptibility prediction

E Dodangeh, B Choubin, AN Eigdir, N Nabipour… - Science of the Total …, 2020 - Elsevier
Flood susceptibility projections relying on standalone models, with one-time train-test data
splitting for model calibration, yields biased results. This study proposed novel integrative …

Consistency of random forests

E Scornet, G Biau, JP Vert - 2015 - projecteuclid.org
Consistency of random forests Page 1 The Annals of Statistics 2015, Vol. 43, No. 4, 1716–1741
DOI: 10.1214/15-AOS1321 © Institute of Mathematical Statistics, 2015 CONSISTENCY OF …

Measuring group differences in high‐dimensional choices: method and application to congressional speech

M Gentzkow, JM Shapiro, M Taddy - Econometrica, 2019 - Wiley Online Library
We study the problem of measuring group differences in choices when the dimensionality of
the choice set is large. We show that standard approaches suffer from a severe finite …

A survey of cross-validation procedures for model selection

S Arlot, A Celisse - 2010 - projecteuclid.org
Used to estimate the risk of an estimator or to perform model selection, cross-validation is a
widespread strategy because of its simplicity and its (apparent) universality. Many results …