Intuition on Deep Residual Network -


i reading deep residual network paper , in paper there concept cannot understand:

enter image description here

question:

  1. what mean "hope 2 weight layers fit f(x)" ?

  2. here f(x) processing x 2 weight layers(+ relu non-linear function), desired mapping h(x)=f(x)? residual?

what mean "hope 2 weight layers fit f(x)" ?

so residual unit shown obtains f(x) processing x 2 weight layers. adds x f(x) obtain h(x). now, assume h(x) ideal predicted output matches ground truth. since h(x) = f(x) + x, obtaining desired h(x) depends on getting perfect f(x). means 2 weight layers in residual unit should able produce desired f(x), getting ideal h(x) guaranteed.

here f(x) processing x 2 weight layers(+ relu non-linear function), desired mapping h(x)=f(x)? residual?

first part correct. f(x) obtained x follows.

x -> weight_1 -> relu -> weight_2 

h(x) obtained f(x) follows.

f(x) + x -> relu  

so, don't understand second part of question. residual f(x).

the authors hypothesize residual mapping (i.e. f(x)) may easier optimize h(x). illustrate simple example, assume ideal h(x) = x. direct mapping difficult learn identity mapping there stack of non-linear layers follows.

x -> weight_1 -> relu -> weight_2 -> relu -> ... -> x 

so, approximate identity mapping these weights , relus in middle difficult.

now, if define desired mapping h(x) = f(x) + x, need f(x) = 0 follows.

x -> weight_1 -> relu -> weight_2 -> relu -> ... -> 0  # @ last 0 

achieving above easy. set weight 0 , 0 output. add x , desired mapping.

other factor in success of residual networks uninterrupted gradient flow first layer last layer. out of scope question. can read paper: "identity mappings in deep residual networks" more information on this.


Comments

Popular posts from this blog

How to understand 2 main() functions after using uftrace to profile the C++ program? -

c# - Update a combobox from a presenter (MVP) -

How to put a lock and transaction on table using spring 4 or above using jdbcTemplate and annotations like @Transactional? -