r - Memory usage of caret train function with cforest model -
i trying train()
cforest
model using caret
package , 32 gb of ram being hoovered @ end of process (at 92% seen on screeshot below). cannot share data, question more cforest()
implementation (or maybe theory behind it?) actual code. hope still fits here.
data , code:
dataset not big, 230k obs. 19 variables (1 of target), str()
follows:
classes ‘data.table’ , 'data.frame': 227358 obs. of 20 variables: $ var_n : factor w/ 11 levels $ var_n : factor w/ 2 levels $ var_n : factor w/ 2 levels $ var_n : factor w/ 2 levels $ var_n : factor w/ 2 levels $ var_n : factor w/ 2 levels $ var_n : factor w/ 16 levels $ var_n : factor w/ 2 levels $ var_n : num $ var_n : factor $ var_n : factor $ var_n : int $ var_n : int $ var_n : factor w/ 15 levels $ var_n : factor w/ 16 levels $ var_n : int $ var_n : factor w/ 7 levels $ var_n : factor w/ 31 levels
my whole control , train looks follows:
ctrl_params <- traincontrol( summaryfunction = twoclasssummary, # selectionfunction = 'onese', selectionfunction = 'best', index = cv_ids_caret, method = "oob", # method = "cv", number = folds, verboseiter = true, classprobs = true ) set.seed(2016) cforest_caret_model <- train( x = x_train, y = y_train, # factor(y_train), method = 'cforest', metric = 'roc', trcontrol = ctrl_params, controls = cforest_unbiased(ntree = 11, trace = true) )
my sessioninfo:
r version 3.3.2 (2016-10-31) platform: x86_64-w64-mingw32/x64 (64-bit) running under: windows 7 x64 (build 7601) service pack 1 attached base packages: [1] stats4 grid stats [4] graphics grdevices utils [7] datasets methods base other attached packages: [1] randomforest_4.6-12 [2] mdmisc_0.0.0.9002 [3] glmnet_2.0-5 [4] foreach_1.4.3 [5] matrix_1.2-8 [6] rpart_4.1-10 [7] party_1.2-1 [8] strucchange_1.5-1 [9] sandwich_2.3-4 [10] zoo_1.7-14 [11] modeltools_0.2-21 [12] mvtnorm_1.0-5 [13] caret_6.0-73 [14] hmisc_4.0-2 [15] ggplot2_2.2.1 [16] formula_1.2-1 [17] survival_2.40-1 [18] lattice_0.20-34 [19] magrittr_1.5 [20] lubridate_1.6.0 [21] stringr_1.1.0 [22] data.table_1.10.4 [23] tidyr_0.6.1 [24] dplyr_0.5.0
what have tried:
so can see have tried to:
- use
oob
instead ofcv
, suggested here; - limit number of trees ridiculously small number (11); and
- use
'best'
model selection instead of'onese'
hoping smaller memory usage,
but still during training see that:
after seeing memory error:
the training continues right maximum:
summary:
i not fixed on using model, conditional inference trees variable selection process (not biased towards variables more categories etc.), understand issue wander bit conditional inference forest.
Comments
Post a Comment