3. History of Deep Learning, from Biology to Programming
3.1. How did everything started?
I am not going to include here a deep explanation(redundancies apart (; ) of the history of Deep Learning; but if you are interested, some useful resources I have found are:
However, I am going to explain the evolution from the Biological origin (sorry I am a Biomedical Engineering, I had to include my “bio-” prefix somewhere!)
3.2. Biological Neural Networks
3.3. The Basis of Biological Neural Networks: The Perceptron
The psychologist Frank Rosenblatt’s conceived the Perceptron from the idea of a neuron. It was defined as a simplified mathematical model of how the neurons operate:
- It takes a set of binary inputs (nearby neurons)
- Each input is multiplied by a continuous valued weight (the synapse strength to each nearby neuron)
- It thresholds the sum of these weighted inputs to output a 1 if the sum is big, 0 otherwise (like the way neurons fire or not)
The biological aspiration can also be shown in the neuron model of Mcculoch-Pitts, who sums binary inputs and will output 1 if a certain threshold is exceeded, or a 0 if not.
Here the concept of Activation Function is introduced, known as the non-linear function applied to the weighted sum to produce the output of the artificial neuron.
Donald Hebb then stated: “When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.”
This was not exactly followed by Perceptrons, but they could “learn” a function in which the weights would be modified after each training set valuation.
But a single Perceptron can only learn to output a 1 or a 0, thus multiple Perceptrons are needed in a layer to form neural networks that allow working with classification tasks; e.g. identifying the letters and digits in human handwriting.
3.4. How hidden layers appeared
Why hidden layers? They can find features within the data and allow following layers to operate on them (rather than on the noisy and large raw data).
When applying some calculus, we can assign an error made by “each output layer” and split the blame between it and the former hidden layer. This backwards process is known as “error backpropagation”, and it allows us to determine the difference in the net error when changing some weights and using an optimization technique. The objective is therefore to minimize this error.
Comments
Post a Comment