r - Memory usage of caret train function with cforest model -


i trying train() cforest model using caret package , 32 gb of ram being hoovered @ end of process (at 92% seen on screeshot below). cannot share data, question more cforest() implementation (or maybe theory behind it?) actual code. hope still fits here.

data , code:

dataset not big, 230k obs. 19 variables (1 of target), str() follows:

 classes ‘data.table’ , 'data.frame':  227358 obs. of  20 variables:  $ var_n : factor w/ 11 levels  $ var_n : factor w/ 2 levels  $ var_n : factor w/ 2 levels  $ var_n : factor w/ 2 levels  $ var_n : factor w/ 2 levels  $ var_n : factor w/ 2 levels  $ var_n : factor w/ 16 levels  $ var_n : factor w/ 2 levels  $ var_n : num  $ var_n : factor  $ var_n : factor  $ var_n : int  $ var_n : int  $ var_n : factor w/ 15 levels  $ var_n : factor w/ 16 levels  $ var_n : int  $ var_n : factor w/ 7 levels  $ var_n : factor w/ 31 levels 

my whole control , train looks follows:

ctrl_params <- traincontrol(   summaryfunction = twoclasssummary,   # selectionfunction = 'onese',   selectionfunction = 'best',   index = cv_ids_caret,   method = "oob",   # method = "cv", number = folds,   verboseiter = true,   classprobs = true )  set.seed(2016) cforest_caret_model <- train(   x = x_train,   y = y_train,  # factor(y_train),   method = 'cforest',   metric = 'roc',   trcontrol = ctrl_params,   controls = cforest_unbiased(ntree = 11, trace = true) ) 

my sessioninfo:

r version 3.3.2 (2016-10-31) platform: x86_64-w64-mingw32/x64 (64-bit) running under: windows 7 x64 (build 7601) service pack 1     attached base packages: [1] stats4    grid      stats     [4] graphics  grdevices utils     [7] datasets  methods   base       other attached packages:  [1] randomforest_4.6-12  [2] mdmisc_0.0.0.9002    [3] glmnet_2.0-5         [4] foreach_1.4.3        [5] matrix_1.2-8         [6] rpart_4.1-10         [7] party_1.2-1          [8] strucchange_1.5-1    [9] sandwich_2.3-4      [10] zoo_1.7-14          [11] modeltools_0.2-21   [12] mvtnorm_1.0-5       [13] caret_6.0-73        [14] hmisc_4.0-2         [15] ggplot2_2.2.1       [16] formula_1.2-1       [17] survival_2.40-1     [18] lattice_0.20-34     [19] magrittr_1.5        [20] lubridate_1.6.0     [21] stringr_1.1.0       [22] data.table_1.10.4   [23] tidyr_0.6.1         [24] dplyr_0.5.0         

what have tried:

so can see have tried to:

  • use oob instead of cv, suggested here;
  • limit number of trees ridiculously small number (11); and
  • use 'best' model selection instead of 'onese' hoping smaller memory usage,

but still during training see that:

training memory usage 01

after seeing memory error:

mem. error

the training continues right maximum:

training memory usage 02

summary:

i not fixed on using model, conditional inference trees variable selection process (not biased towards variables more categories etc.), understand issue wander bit conditional inference forest.


Comments

Popular posts from this blog

python - How to insert QWidgets in the middle of a Layout? -

python - serve multiple gunicorn django instances under nginx ubuntu -

module - Prestashop displayPaymentReturn hook url -