NN4N: Neural Network for Numerics
Android App
NN4N: Neural Network for Numerics
Android App
Keywords
1. Neuron
Neuron is a mathematical model that mimics nerve cell function.
2. Neural Network
Neural network is a mathematical model that mimics brain function.
For this app, neural network consists of an input layer, an output layer, and up to 3 hidden layers. Neurons in a layer are fully-connected to all neurons in the previous layer.
Larger the number of neurons, more freedom the prediction, but harder the training convergence.
3. Parameters & Neuron I/O
Neurons except in the input layer have parameters of a bias and weights.
weight(w): Connection strength with each neuron in the previous layer.
bias(b): Ignition sensitivity of the neuron.
x=Σ(wy/n)+b: Input value to neurons.
y: Output value from neurons in the previous layer.
n: Number of neurons in the previous layer.
y=act(x): Output value from neurons. act(x) is called the activation function.
4. Activation Function
Input Layer: y=(x)/absmax
Hidden Layer: y=2*tanh(x)
Output Layer 'Regression': y=2*softsign(x)*absmax
Output Layer 'Classification': y=sigmoid(x)
softsign(x)=(x)/(1+abs(x))
sigmoid(x)=1/(1+exp(-x))
absmax: Maximum absolute value of the training data.
5. Regression & Classification
For this app, it is available for both 'Regression' and 'Classification'. If only one training data is selected for output training data and its values are only integers 1 or 0, it is recognized as 'Classification'. Otherwise, it is recognized as 'Regression'.
6. Supervised Learning
For this app, machine learning type is supervised learning. Supervised learning requires training data to train a neural network.
7. Training Data
In order to train a neural network, it is first necessary to prepare training data.
Training data should be provided in MECE. MECE is an acronym for the words 'Mutually Exclusive, Collectively Exhaustive' For example, if the output value is affected by temperature, temperature data should be provided for an input training data. Also, input training data should contain its max & min values. This is because extrapolation accuracy may be low. In addition, input training data should have a fine-grained distribution to prevent overfitting. Overfitting causes a decrease in prediction accuracy.
Larger the number of training data, more accurate the prediction, but longer the training time.
8. Training Data File
CSV file format compliant with RFC 4180. Additionally, unescaped strings after the trailing double quote are concatenated. Training data files can be created using a spreadsheet. And also training data files can be edited using a text editor, but careful handling is required on double quotes.
Decimal marker must follow British practice.
Line breaks are not allowed. Blank lines will be ignored.
First row is the training data item name, and it must be unique. If training data item name is blank, its item will be invalid. Max length of training data item name is 96.
Training data consists of input training data and output training data. At least 2 training data items are required for input training data.
Training data must be values that can be converted to a single-precision floating point data type. Training data must not be zero sigma. Exact duplicate training data will be ignored.
9. Training
In supervised learning, neural network is trained to minimize deviations between the output training data and the neural network output. For this app, several optimization methods & overfitting remedies are available.
10. Optimization Methods & Overfitting Remedies
10.1 Gradient Descent: Optimization method for neural network that use parameter gradients to minimize the cost.
10.2 Cost Function: Cost is calculated using the deviation between the training data and the neural network output. 'Mean Squared Error' is used for the Cost Function in both 'Regression' and 'Classification'.
10.3 Backpropagation: Parameter gradients are calculated by backpropagating the neural network from the output layer towards the input layer.
10.4 Full-batch Gradient Descent: Performs batch processing the entire training data.
10.5 Mini-batch Gradient Descent: Randomly groups the training data and performs batch processing iterations.
10.6 Batch-process: A unit of the training set that uses all or part of the training data.
10.7 Epoch: Number of training sets using all training data.
10.8 Batch Size: Number of training data per batch processing. Minimum batch size is 32.
10.9 Iteration: Number of batch processing per epoch.
10.10 Batch Normalization: Normalizes the training data for each batch process. The exception is 'Classification' output value, which does not need to be normalized.
10.11 Parameter Initialization: Initial weights are set with a shuffled Normal or Uniform distribution.
Hidden Layer #1: U(-10^-n,+10^-n)
Hidden Layer #2: N(0,(2n)^0.5)
Hidden Layer #3: N(0,(2n)^0.5)
Output Layer: N(0,(2n)^0.5)
n: Number of neurons in the previous layer.
Initial bias is set to zero.
Fixed initial parameters can also be assigned by changing the preference.
10.12 Hyperparameter: Adjusts the training efficiency. Some of optimization methods & overfitting remedies require hyperparameter settings. Hyperparameter settings can be modified by interrupting training.
10.13 Step Size: Magnitude of the gradient vector. Larger the step size, faster the training, but harder the training convergence.
10.14 Dropout: Ensemble the training by randomly dropping the hidden layer neurons.
10.15 Momentum: Applies a moving average to the gradient vector.
10.16 Weight Regularization: Penalizes the weights to prevent overfitting.
10.16.1 L1 Regularization: Makes dimensional reduction of the weights.
10.16.2 L2 Regularization: Makes weight decay.
11. Prediction
Numerical predictions can be made using trained neural networks. Prediction results may contain errors. Strongly recommend verifying the prediction accuracy.
12. Neural Network File
Neural network file contains the training results and can be exported after training. Editing is not recommended but secondary use is possible by pasting to Excel in R1C1 reference style.
13. Training History
Training history can be appended to neural network file as required. Training history contains epoch, cost and hyperparameters.