Glossary of terms

Image Source

Machine learning (ML): Subfield of Artificial Intelligence (AI) in the field of computer science focused on the development of intelligent systems, capable of "learning" (i.e., progressively improve performance on a specific task) with data, without being explicitly programmed.Ref

Chronology: Build with GoogleCloud - Chronology tree for ML
Perceptron: algorithm for supervised learning of binary classifiers (functions that can decide whether an input, represented by a vector of numbers, belongs to some specific class or not → classification in two parts). “It is a single layer NN”, thus in order to learn how a NN works, we should learn how a perceptron works.

Weights: show the strength of the particular node
Bias value: allows to shift the activation function
Activation function: map input between required values like (0, 1) or (-1, 1).

Fig : Perceptron

Deep learning: (also known as deep structured learning or hierarchical learning) type of ML method based on learning data representations, as opposed to task-specific algorithms. They use ANN which are slightly inspired by the neuronal structure in the human brain. Learning can be:

Supervised: used on problems where the goal is to learn a mapping from inputs to outputs. “Like a teacher supervising a student”: the model continually makes predictions, the predictions are compared to the expected outcomes, error is calculated, and the model is corrected using these errors. Examples:
- Problems: classifying (mapping input variables to a label) and regression (mapping input variables to a quantity)
- Algorithms: k-nearest neighbours, support vector machines, multilayer perceptron NN.

Semi-supervised Ref
Unsupervised: used on a problem with only inputs, and the goal is to learn or capture the inherent interesting structure in the data. There is no teacher, instead the models are updated based on repeated exposure to examples from the problem domain. Examples:
- Problems: clustering (learning of the groups in the data) and association of the learning of relationships in the data.
- Algorithms: k-means, apriori, self-organizing map NN

Source: (Un)Supervised Learning

Credits: Yam Peleg (@Yampeleg)

Central processing unit (CPU) is the electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logical, control and input/output (I/O) operations specified by the instructions. The computer industry has used the term "central processing unit" at least since the early 1960s.[1] Traditionally, the term "CPU" refers to a processor, more specifically to its processing unit and control unit (CU), distinguishing these core elements of a computer from external components such as main memory and I/O circuitry Ref
Graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles. Modern GPUs are very efficient at manipulating computer graphics and image processing, and their highly parallel structure makes them more efficient than general-purpose CPUs for algorithms where the processing of large blocks of data is done in parallel. In a personal computer, a GPU can be present on a video card, or it can be embedded on the motherboard or—in certain CPUs—on the CPU die Ref

Kernel: In image processing, a kernel, convolution matrix, or mask is a small matrix. It is used for blurring, sharpening, embossing, edge detection, and more. This is accomplished by doing a convolution between a kernel and an image Ref

Epoch: occurs when an entire dataset is passed forward and backward through the NN only once. However, more than one epoch is needed to properly update the weights and optimise the learning. The number of epochs chosen will determine the fitting of the model, which we understand as the result of running a learning algorithm on a dataset.

Underfitting: if not enough epochs the model does not capture enough of the structure in the data sample, thus it will perform poor on the training and the test datasets. More fit or a better fit is required.
Optimal fitting: occurs when the number of times the weight are changed in the NN (i.e. number of epochs) is the proper. The model finds a suitable balance of capturing the structure in the dataset and generalizing to new data. It performs well on the training and test datasets. There is no ‘right’ number of epochs, so it has to be determined experimentally, according to the diversity of the data.
Overfitting: the model fits the random noise in the data sample, thus the model may perform well on the training data, but it does not generalize to new data and performs well on test data.

Source

Batch: smaller divisions of one epoch, since this is too big to feed to the computer at once.

Batch size: total number of training examples in a single size
Number of batches: number of divisions/sets/parts made from one epoch, thus the number of iterations for one epoch

Iteration: number of batches needed to complete one epoch
Example: if we have 4500 training examples and we divide such dataset into batches of 500 (batch size), then it will take 9 iterations to complete 1 epoch

Source: Epoch vs Iterations vs Batch Size

Loss function:

Types of loss:

loss=‘mean_squared_error’, optimizer=‘sgd’

mean_absolute_error

mean_absolute_percentage_Error

mean_squared_logarithmic_error

squared_hinge

hinge

categorical_hinge

logcosh

categorical_crossentropy

sparse_categorical_crossentropy

binary_crossentropy

kullback_leiber_divergence

poisson

cosine_proximity

Optimizer:

Adaptive learning rates: Adam, AdaGrad, Adadelta, RMSProp, SGD, ...

Learning Rate: We can establish the rate at which we want our program to learn.
Activation functions (AF):

Definition AF: node attached to the output end of any NN or in between two NN; in order to determine the output of NN (yes/no). It maps the resulting values in between 0 to 1 or -1 to 1 etc. (depending upon the function). url

The AF can be:

Linear (Identity AF): the output is not be confined between any range. It doesn’t help with the complexity or various parameters of usual data that is fed to the NN.

Equation : f(x) = x

Range : (-infinity to infinity)

Non-linear: the most used. It makes it easy for the model to generalize or adapt with variety of data and to differentiate between the output. The main terminologies needed to understand for nonlinear functions are:
- Derivative or Differential (slope): Change in y-axis w.r.t. change in x-axis.
- Relevance: When updating the curve, in order to know in which direction and how much to change or update the curve depending upon the slope.
- Monotonic function: it is either entirely non-increasing or non-decreasing.

The Nonlinear Activation Functions are mainly divided on the basis of their range or curves:

Function	Shape	Range	Diff? Monot? and its der?	Specially used to	Others
Sigmoid or Logistic Activation	Sigmoidal (s)	(0, 1)	Yes, yes, no	Predict the probability	Softmax function is more generalized, used for multiclass classification.
Tanh or hyperbolic tangent	Sigmoidal (s)	(-1, 1)	Yes, yes, no	Classification of 2 classes	Negative inputs mapped strongly negative, zero inputs mapped near zero.
ReLU	Half rectified (from bottom)	[0, inf)	Yes, yes, yes	For linear regression models	Negative input not appropriately mapped (value turns zero)
Leaky ReLU (a = 0.01) and Randomized ReLU (if a ~= 0.01)		(-inf, inf)	Yes, yes, yes	For linear regression models	Same as ReLU but the negative part maps to a function with positive slope a instead of to zero

Sigmoid:

Tanh:

Sigmoid vs ReLU:

ReLU vs Leaky ReLU:

Neural Network (NN): “it is a multi-layer perceptron”

Ref

Artificial neural networks (ANNs) or connectionist systems are computing systems inspired by the biological neural networks of animal brains, able of learning to do tasks by considering examples. They are based in units called artificial neurons, which are connected (synapse) between each other, in order to transmit signal. Ref

Neurons have a state generally represented between 0 and 1.
Neurons and synapses may have a weight that varies as learning proceeds, which can increase or decrease the strength of the signal that it sends downstream.
Neurons are typically organized in layers, which may perform different input transformations.

Classes of ANNs:

Recurrent: connections between nodes form a directed graph along a sequence, which allows it to exhibit dynamic temporal behavior for a time sequence. They can use their internal state (memory) to process sequences of inputs, thus can be applied for unsegmented, connected handwriting recognition or speech recognition.

Feedforward: connections between the nodes do not form a cycle, so the information moves in only one direction, forward, from the input nodes to the output nodes.

Deep (DNN): it contains multiple layers between the input and output layers. The DNN finds the correct mathematical manipulation to turn the input into the output, whether it be a linear or non-linear relationship. The network moves through the layers calculating the probability of each output.

DNN architectures generate compositional models where the object is expressed as a layered composition of primitives. The extra layers enable composition of features from lower layers, potentially modeling complex data with fewer units than a similarly performing shallow network. Deep architectures include many variants of a few basic approaches. Each architecture has found success in specific domains. It is not always possible to compare the performance of multiple architectures, unless they have been evaluated on the same data sets.

DNNs are typically feedforward networks in which data flows from the input layer to the output layer without looping back. At first, the DNN creates a map of virtual neurons and assigns random numerical values, or "weights", to connections between them. The weights and inputs are multiplied and return an output between 0 and 1. If the network didn’t accurately recognize a particular pattern, an algorithm would adjust the weights That way the algorithm can make certain parameters more influential, until it determines the correct mathematical manipulation to fully process the data.

Convolutional neural network (CNN, or ConvNet) is a class of deep, feedforward ANN, most commonly applied to analyzing visual imagery.

CNNs use a variation of multilayer perceptrons designed to require minimal preprocessing; advantage for image processing since CNNs can learn the filters that in traditional algorithms were hand-engineered.

Convolution: mathematical operation on two functions (f and g) to produce a third function, typically viewed as a modified version of one of the original functions, giving the integral of the pointwise multiplication of the two functions as a function of the amount that one of the original functions is translated

Design of a CNN (layers):

Convolutional layers: apply a convolution operation to the input (emulates neuron response to visual stimuli) passing the result to the next layer.

Parameters: set of learnable filters (kernels) with small receptive field, but which sted through the full depth of the input volume.
How they work: During the forward pass, each filter is convolved across the width and height of the input volume, computing the dot product between the entries of the filter and the input and producing a 2D activation map of that filter. As a result, the network learns filters that activate when it detects some specific type of feature at some spatial position in the input.

Pooling layers (optional): combine the outputs of neuron clusters at one layer into a single neuron in the next layer

MaxPooling: uses the maximum value from each of a cluster of neurons at the prior layer
MinPooling:
AveragePooling: uses the average value from each of a cluster of neurons at the prior layer

Fully Connected layers: connect every neuron in one layer to every neuron in another layer (as in the traditional multi-layer perceptron neural network)

Weights: CNNs share weights in convolutional layers, which means that the same filter is used for each receptive field in the layer; this reduces memory footprint and improves performance.

Python: interpreted high-level programming language for general-purpose programming, with a design philosophy that emphasizes code readability,using significant whitespace. It provides constructs that enable clear programming on both small and large scales, and has a large and comprehensive standard library. Ref

GitHub Inc.: web-based hosting service for version control using Git. It is mostly used for computer code. It offers all of the distributed version control and source code management (SCM) functionality of Git as well as adding its own features. It provides access control and several collaboration features such as bug tracking, feature requests, task management, and wikis for every project.It is the largest host of source code in the world Ref

Keras: open source neural network library written in Python, capable of running on top of TensorFlow, or Theano. Designed to enable fast experimentation with DNN, it focuses on being user-friendly, modular, and extensible. Ref

TensorFlow: open-source software library for dataflow programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks. Ref

Kaggle: platform for predictive modelling and analytics competitions in which statisticians and data miners compete to produce the best models for predicting and describing the datasets uploaded by companies and users. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know beforehand which technique or analyst will be most effective. Ref

Scikit-learn (formerly scikits.learn) is a free software machine learning library for the Python programming language.[3] It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy Ref

Application programming interface (API) is a set of subroutine definitions, protocols, and tools for building software. In general terms, it is a set of clearly defined methods of communication between various components. A good API makes it easier to develop a computer program by providing all the building blocks, which are then put together by the programmer. An API may be for a web-based system, operating system, database system, computer hardware, or software library. An API specification can take many forms, but often includes specifications for routines, data structures, object classes, variables, or remote calls. POSIX, Windows API and ASPI are examples of different forms of APIs. Documentation for the API is usually provided to facilitate usage and implementation.

Diving into Deep Learning

Search This Blog

Glossary of terms

Comments

Post a Comment

Popular posts from this blog

4. Understand how a Neural Network works

3. History of Deep Learning, from Biology to Programming