Check https://developers.google.com/machine-learning/glossary

### iterations, batch, batch size and epoch

- A batch is the set of examples used in one iteration, the number of examples in the set is the
**batch size**. - For example, the batch size of SGD is 1, while the batch size of a mini-batch is usually between 10 and 1000. Batch size is usually fixed during training and inference; however, TensorFlow does permit dynamic batch sizes.
- Each iteration is the span in which the system processes one
**batch**of size**batch size**. - An
**epoch**spans spans sufficient iterations to process every example in the dataset i.e. an**epoch**represents $\frac{N}{batchSize}$ training iterations where $N$ is the number of samples.

### k-fold cross validation

From https://www.analyticsvidhya.com/blog/2018/05/improve-model-performance-cross-validation-in-python-r/

- Randomly split your entire dataset into k “folds”
- For each k-fold in your dataset, build your model on k – 1 folds of the dataset. Then, test the model to check the effectiveness for kth fold
- Record the error you see on each of the predictions
- Repeat this until each of the k-folds has served as the test set
- The average of your k recorded errors is called the cross-validation error and will serve as your performance metric for the model

### feature extraction

Merge several correlated features into one. Also see dimensionality reduction

### sampling noise/bias

Sampling noise: nonrepresentative sample data as result of chance (typically when the sample is too small) Sampling bias: nonrepresentative sample data as result of a flaw in the sampling method