r - different values by fitting a boosted tree twice -
i use r-package adabag fit boosted trees (large) data set (140 observations 3 845 predictors).
i executed method twice same parameter , same data set , each time different values of accuracy returned (i defined simple function gives accuracy given data set). did make mistake or usual in each fitting different values of accuracy return? problem based on fact data set large?
function returns accuracy given predicted values , true test set values.
err<-function(pred_d, test_d) { abs.acc<-sum(pred_d==test_d) rel.acc<-abs.acc/length(test_d) v<-c(abs.acc,rel.acc) return(v) }
new edit (9.1.2017): important following question of above context.
as far can see not use "pseudo randomness objects" (such generating random numbers etc.) in code, because fit trees (using r-package rpart) , boosted trees (using r-package adabag) large data set. can explain me "pseudo randomness" enters, when execute code?
edit 1: similar phenomenon happens tree (using r-package rpart).
edit 2: similar phenomenon did not happen trees (using rpart) on data set iris.
there's no reason should expect same results if didn't set seed (with set.seed()
).
it doesn't matter seed set if you're doing statistics rather information security. might run model several different seeds check sensitivity. have set before involving pseudo randomness. people set @ beginning of code.
this ubiquitous in statistics; affects probabilistic models , processes across languages.
note in case of information security it's important have (pseudo) random seed cannot guessed brute force attacks, because (in nutshell) knowing seed value used internally security program paves way hacked. in science , statistics it's opposite - , share code/research should aware of seed ensure reproducibility.
https://en.wikipedia.org/wiki/random_seed
http://www.grasshopper3d.com/forum/topics/what-are-random-seed-values
Comments
Post a Comment