Deep Learning: UNIT- II: CNN : 7. CNN VS Fully Connected

 7. CNN vs Fully Connected

  • The basic difference between the two types of layers is the density of the connections. The FC layers are densely connected, meaning that every neuron in the output is connected to every input neuron. On the other hand, in a Conv layer, the neurons are not densely connected but are connected only to neighboring neurons within the width of the convolutional kernel.
  • A second main difference between them is weight sharing. In an FC layer, every output neuron is connected to every input neuron through a different weight . However, in a Conv layer, the weights are shared among different neurons. This is another characteristic that enables Conv layers to be used in the case of a large number of neurons.

The problems with MLP for Image Data?

a.      MLP will react differently to an image and its shifted version

b.     MLP doesnot consider Spatial relations

c.      Includes too many Parameters

 

a.     MLP will react differently to an image and its shifted version

  • Since MLP flattens the image, it is not positioning invariant
  • we fitted our image to ANN
  • Converted the image into a single feature vector,
  • hence not considering the neighbouring pixels, and
  • most importantly the Image channels (R-G-B)

Lets take an example

  • Supposedly we have two images of the same dog but at two different position
  • One on the Upper left while one on the middle right




  • Now since MLP will flatten the matrix, the neurons
    which might be more active for the first image will be dormant for the second one
  • Making MLP think these two images having completely different objects

b. MLP doesnot consider Spatial relations

  • Spatial Information (like if a Person is standing at the right side of the Car or The red car is on the left side of the blue bike) gets lost when image is flattened
  • Flattening also loses the internal representation of the 2D image.

c. Includes too many Parameters

  • Since MLP is a fully connected model, it requires a neuron for every input pixel of the image
  • Now Lets take an example with an image of size (1280 x 720) .
    • For an image with dimension as such the vector for the input layer becomes (921600 x 1). if a Dense layer of 128 is used then the number of parameters equals = 921600*128.
    • This makes MLP infeasible for large image and it may cause overfitting.

 

Do we even require global connectivity ?

  • The global connectivity caused due to densely connected neurons leads to more reduntant parameters which makes the MLP overfit

With all the above discussion we need:

  • to make the system translation (position) invariant
  • to leverage the spatial correlation between the pixels
  • focus only on the local connectivity

 

What should be the SPECIAL Features of CNN?

From the above discussion and taking inspiration from our visual cortex system, there are 3 essential properties of image data:

1. LOCALITY: Correlation between neighbouring pixels in a Image

2. STATIONARITY: Similar Patterns appearing multiple times in a Image

3. COMPOSITIONALITY: Extracting higher level features by pooling lower level features

 

 

 



Comments