Skip to main content

Glossary of terms




 







      • Weights: show the strength of the particular node
      • Bias value: allows to shift the activation function
      • Activation function: map input between required values like (0, 1) or (-1, 1).

   

Fig : Perceptron




  • Deep learning: (also known as deep structured learning or hierarchical learning) type of ML method  based on learning data representations, as opposed to task-specific algorithms. They use ANN which are slightly inspired by the neuronal structure in the human brain.  Learning can be:
    • Supervised: used on problems where the goal is to learn a mapping from inputs to outputs. “Like a teacher supervising a student”: the model continually makes predictions, the predictions are compared to the expected outcomes, error is calculated, and the model is corrected using these errors. Examples:
      • Problems: classifying (mapping input variables to a label) and regression (mapping input variables to a quantity)
      • Algorithms: k-nearest neighbours, support vector machines, multilayer perceptron NN.
    • Semi-supervised Ref
    • Unsupervised: used on a problem with only inputs, and the goal is to learn or capture the inherent interesting structure in the data. There is no teacher, instead the models are updated based on repeated exposure to examples from the problem domain. Examples:
      • Problems: clustering (learning of the groups in the data) and association of the learning of relationships in the data.
      • Algorithms: k-means, apriori, self-organizing map NN









Credits: Yam Peleg (@Yampeleg)



  • Epoch: occurs when an entire dataset is passed forward and backward through the NN only once. However, more than one epoch is needed to properly update the weights and optimise the learning. The number of epochs chosen will determine the fitting of the model, which we understand as the result of running a learning algorithm on a dataset.

    • Underfitting: if not enough epochs the model does not capture enough of the structure in the data sample, thus it will perform poor on the training and the test datasets. More fit or a better fit is required.
    • Optimal fitting: occurs when the number of times the weight are changed in the NN (i.e. number of epochs) is the proper. The model finds a suitable balance of capturing the structure in the dataset and generalizing to new data. It performs well on the training and test datasets. There is no ‘right’ number of epochs, so it has to be determined experimentally, according to the diversity of the data.
    • Overfitting:  the model fits the random noise in the data sample, thus the model may perform well on the training data, but it does not generalize to new data and performs well on test data.

 


  • Batch: smaller divisions of one epoch, since this is too big to feed to the computer at once.
    • Batch size: total number of training examples in a single size
    • Number of batches: number of divisions/sets/parts made from one epoch, thus the number of iterations for one epoch
  • Iteration: number of batches needed to complete one epoch
  • Example: if we have 4500 training examples and we divide such dataset into batches of 500 (batch size), then it will take 9 iterations to complete 1 epoch

  • Loss function:

Types of loss:
loss=‘mean_squared_error’, optimizer=‘sgd’

mean_absolute_error
mean_absolute_percentage_Error
mean_squared_logarithmic_error
squared_hinge
hinge
categorical_hinge
logcosh
categorical_crossentropy
sparse_categorical_crossentropy
binary_crossentropy
kullback_leiber_divergence
poisson
cosine_proximity

  • Optimizer:
    • Adaptive learning rates: Adam, AdaGrad, Adadelta, RMSProp, SGD, ...
  • Learning Rate: We can establish the rate at which we want our program to learn.
  • Activation functions (AF):
    • Definition AF: node attached to the output end of any NN or in between two NN; in order to determine the output of NN (yes/no). It maps the resulting values in between 0 to 1 or -1 to 1 etc. (depending upon the function).  url
The AF can be:
      • Linear (Identity AF): the output is not be confined between any range. It doesn’t help with the complexity or various parameters of usual data that is fed to the NN.







Equation : f(x) = x
Range : (-infinity to infinity)
      • Non-linear: the most used. It makes it easy for the model to generalize or adapt with variety of data and to differentiate between the output. The main terminologies needed to understand for nonlinear functions are: 
        • Derivative or Differential (slope): Change in y-axis w.r.t. change in x-axis.
        • Relevance: When updating the curve, in order to know in which direction and how much to change or update the curve depending upon the slope. 
        • Monotonic function: it is either entirely non-increasing or non-decreasing.




The Nonlinear Activation Functions are mainly divided on the basis of their range or curves:

Function Shape Range Diff? Monot? and its der? Specially used to Others
Sigmoid or Logistic Activation Sigmoidal (s) (0, 1) Yes, yes, no Predict the probability Softmax function is more generalized, used for multiclass classification.
Tanh or hyperbolic tangent Sigmoidal (s) (-1, 1) Yes, yes, no Classification of 2 classes Negative inputs mapped strongly negative, zero inputs mapped near zero.
ReLU Half rectified (from bottom) [0, inf) Yes, yes, yes For linear regression models Negative input not appropriately mapped (value turns zero)
Leaky ReLU
(a = 0.01) and Randomized ReLU (if a ~= 0.01)
(-inf, inf) Yes, yes, yes For linear regression models Same as ReLU but the negative part maps to a function with positive slope a instead of to zero 





Sigmoid:


Tanh:



Sigmoid vs ReLU:



ReLU vs Leaky ReLU:





  • Neural Network (NN): “it is a multi-layer perceptron”
 


  • Artificial neural networks (ANNs) or connectionist systems are computing systems inspired by the biological neural networks of animal brains, able of learning to do tasks by considering examples. They are based in units called artificial neurons, which are connected (synapse) between each other, in order to transmit signal. Ref

    • Neurons have a state generally represented between 0 and 1.
    • Neurons and synapses may have a weight that varies as learning proceeds, which can increase or decrease the strength of the signal that it sends downstream.
    • Neurons are typically organized in layers, which may perform different input transformations.

    • Classes of ANNs:
      • Recurrent: connections between nodes form a directed graph along a sequence, which allows it to exhibit dynamic temporal behavior for a time sequence. They can use their internal state (memory) to process sequences of inputs, thus can be applied for unsegmented, connected handwriting recognition or speech recognition.

      • Feedforward: connections between the nodes do not form a cycle, so the information moves in only one direction, forward, from the input nodes to the output nodes.

        • Deep (DNN): it contains multiple layers between the input and output layers. The DNN finds the correct mathematical manipulation to turn the input into the output, whether it be a linear or non-linear relationship. The network moves through the layers calculating the probability of each output.

          • DNN architectures generate compositional models where the object is expressed as a layered composition of primitives. The extra layers enable composition of features from lower layers, potentially modeling complex data with fewer units than a similarly performing shallow network. Deep architectures include many variants of a few basic approaches. Each architecture has found success in specific domains. It is not always possible to compare the performance of multiple architectures, unless they have been evaluated on the same data sets.

          • DNNs are typically feedforward networks in which data flows from the input layer to the output layer without looping back. At first, the DNN creates a map of virtual neurons and assigns random numerical values, or "weights", to connections between them. The weights and inputs are multiplied and return an output between 0 and 1. If the network didn’t accurately recognize a particular pattern, an algorithm would adjust the weights That way the algorithm can make certain parameters more influential, until it determines the correct mathematical manipulation to fully process the data.

        • Convolutional neural network (CNN, or ConvNet) is a class of deep, feedforward ANN, most commonly applied to analyzing visual imagery. 
CNNs use a variation of multilayer perceptrons designed to require minimal preprocessing; advantage for image processing since CNNs can learn the filters that in traditional algorithms were hand-engineered.
          • Convolution: mathematical operation on two functions (f and g) to produce a third function, typically viewed as a modified version of one of the original functions, giving the integral of the pointwise multiplication of the two functions as a function of the amount that one of the original functions is translated

          • Design of a CNN (layers):

            • Convolutional layers: apply a convolution operation to the input (emulates neuron response to visual stimuli) passing the result to the next layer.

              • Parameters: set of learnable filters (kernels) with small receptive field, but which sted through the full depth of the input volume.
              • How they work: During the forward pass, each filter is convolved across the width and height of the input volume, computing the dot product between the entries of the filter and the input and producing a 2D activation map of that filter. As a result, the network learns filters that activate when it detects some specific type of feature at some spatial position in the input.

            • Pooling layers (optional): combine the outputs of neuron clusters at one layer into a single neuron in the next layer
              • MaxPooling: uses the maximum value from each of a cluster of neurons at the prior layer
              • MinPooling:
              • AveragePooling: uses the average value from each of a cluster of neurons at the prior layer

            • Fully Connected layers:  connect every neuron in one layer to every neuron in another layer (as in the traditional multi-layer perceptron neural network)

            • Weights: CNNs share weights in convolutional layers, which means that the same filter is used for each receptive field in the layer; this reduces memory footprint and improves performance.

 
 

 
  • Kaggle: platform for predictive modelling and analytics competitions in which statisticians and data miners compete to produce the best models for predicting and describing the datasets uploaded by companies and users. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know beforehand which technique or analyst will be most effective. Ref
 
 

Comments

Popular posts from this blog

4. Understand how a Neural Network works

4. Understand how a Neural Network works 4.1. Graphical Understanding In order to start in this world, it is important first to have a visual perception of what we are going to deal with, and understand the basics of how Neural Networks work, in the raw form. Image Source A Neural Network (NN) is nothing else than a net of perceptrons that are linked so that input fires another network that produces an output.  Of course, a NN has assigned values that allow to make further calculations and learning.   Image source So far the two images above represent a simple NN with 1 input, 1 output and 1 hidden layer. The hidden layer is said to be dense (each neuron in a layer x is connected to all neurons in the layer x-1 and all the neurons in the layer x+1). Depending on how the neurons are organized and how the connections are made, we can find many different types of NN:   Types of NN (image source and explanation) 4.2....