neural network - How to fine tune an FCN-32s for interactive object segmentation -

i'm trying implement proposed model in cvpr paper (deep interactive object selection) in data set contains 5 channels each input sample:

1.red

2.blue

3.green

4.euclidean distance map associated positive clicks

5.euclidean distance map associated negative clicks (as follows):

to so, should fine tune fcn-32s network using "object binary masks" labels:

as see, in first conv layer have 2 channels, did net surgery use pretrained parameters first 3 channels , xavier initialization 2 extras.

for rest of fcn architecture, have these questions:

should freeze layers before "fc6" (except first conv layer)? if yes, how channels of first conv learned? gradients strong enough reach first conv layer during training process?
what should kernel size of "fc6"? should keep 7? saw in "caffe net_surgery" notebook depends on output size of last layer ("pool5").
the main problem number of outputs of "score_fr" , "upscore" layers, since i'm not doing class segmentation (to use 21 20 classes , background), how should change it? 2? (one object , other non-object (background) area)?
should change "crop" layer "offset" 32 have center crops?
in case of changing each of these layers, best initialization strategy them? "bilinear" "upscore" , "xavier" rest?
should convert binary label matrix values zero-centered ( {-0.5,0.5} ) status, or ok use them values in {0,1} ?

any useful idea appreciated.

ps: i'm using euclidean loss, while i'm using "1" number of outputs "score_fr" , "upscore" layers. if use 2 that, guess should softmax.

i can answer of questions.

the gradients reach first layer should possible learn weights if freeze other layers.
change num_output 2 , finetune. should output.
i think you'll need experiment each of options , see how accuracy is.
you can use values 0,1.

WIKI