RegERMs.jl

Regularized empirical risk minimization (RegERM) is a general concept that defines a family of optimization problems in machine learning, as, e.g., Support Vector Machine, Logistic Regression, and Ridge Regression.

Contents:

API

The following main concepts are implemented:

Loss functions

Loss functions measure the disagreement between the true label \(y\in\{-1,1\}\) and the prediction.

Loss functions implement the following main methods:

value(l::Loss)

Compute the value of the loss.

gradient(l::Loss)

Compute the gradient of the loss.

The following loss functions are implemented:

Logistic(w::Vector, X::Matrix, y::Vector)

Return a vector of the logistic loss evaluated for all given training instances \(\bf X\) and the labels \(\bf y\)

\[\begin{split}\ell({\bf w}, {\bf x}, y)&=\log(1+exp(-y{\bf x}^T{\bf w})),\end{split}\]

where \({\bf w}\) is the weight vector of the decision function.

Note

The logistic loss corresponds to a likelihood function under an exponential family assumption of the class-conditional distributions \(p({\bf x}|y;{\bf w})\).

Squared(w::Vector, X::Matrix, y::Vector)

Return a vector of the squared loss evaluated for all given training instances \(\bf X\) and the labels \(\bf y\)

\[\begin{split}\ell({\bf w}, {\bf x}, y)&=(y-{\bf x}^T{\bf w})^2,\end{split}\]

where \({\bf w}\) is the weight vector of the decision function.

Hinge(w::Vector, X::Matrix, y::Vector)

Return a vector of the hinge loss evaluated for all given training instances \(\bf X\) and the labels \(\bf y\)

\[\begin{split}\ell({\bf w}, {\bf x}, y)&=\max(0, 1-y{\bf x}^T{\bf w}),\end{split}\]

where \({\bf w}\) is the weight vector of the decision function.

Note

The hinge loss corresponds to a max-margin assumption.

Regularizer

Regularization prevent overfitting and introduce additional information (prior knowledge) to solve an ill-posed problem.

Regularizers implement the following main methods:

value(r::Regularizer)

Compute the value of the regularizer.

gradient(r::Regularizer)

Compute the gradient of the regularizer.

The following regularizers are implemented:

L2reg(w::Vector, λ::Float64)

Implements an \(L^2\)-norm regularization of the weight vector w of the decision function:

\[\begin{split}\Omega({\bf w})&=\frac{1}{2\lambda}\|{\bf w}\|^2,\end{split}\]

where the regularization parameter \(\lambda\) controls the influence of the regularizer.

Note

The \(L^2\)-norm regularization corresponds to Gaussian prior assumption of \({\bf w}\sim\mathcal{N}({\bf 0},\lambda{\bf I})\).

Machine learning methods

The framework implements the following learning algorithms:

Ridge Regression

Ridge regression models the relationship between an input variable \({\bf x}\) and a continous output variable \(y\) by fitting a linear function.

LinReg(X::Matrix, y::Vector; kernel::Symbol=:linear)

Initialize a ridge regression object with a data matrix \({\bf X} \in \mathbb{R}^{n\times m}\), a label binary label vector \({\bf y} \in \mathbb{R}^{n}\) of \(n\) \(m\)-dimensional examples, and a kernel function.

Implements: optimize

Logistic Regression

Logistic regression models the relationship between an input variable \({\bf x}\) and a binary output variable \(y\) by fitting a logistic function.

LogReg(X::Matrix, y::Vector; kernel::Symbol=:linear)

Initialize a logistic regression object with a data matrix \({\bf X} \in \mathbb{R}^{n\times m}\), a label binary label vector \({\bf y} \in \mathbb{R}^{n}\) of \(n\) \(m\)-dimensional examples, and a kernel function.

Implements: optimize

Support Vector Machine

Support vector machines model the relationship between an input variable \({\bf x}\) and a continous output variable \(y\) by finding a hyperplane separating examples belonging to different classes with maximal margin.

SVM(X::Matrix, y::Vector; kernel::Symbol=:linear)

Initialize an SVM object with a data matrix \({\bf X} \in \mathbb{R}^{n\times m}\), a label binary label vector \({\bf y} \in \mathbb{R}^{n}\) of \(n\) \(m\)-dimensional examples, and a kernel function.

Implements: optimize

Implement: optimize

Let \({\bf x}_i\) be a vector of features describing an instance i and \(y_i\) be its target value. Then, for a given set of n training instances \(\{({\bf x}_i,y_i)\}_{i=1}^n\) the goal is to find a model \({\bf w}\) that minimizes the regularized empirical risk:
\[\sum_{i=1}^n \ell({\bf w}, {\bf x}_i, y_i) + \Omega({\bf w}).\]

The loss function \(\ell\) measures the disagreement between the true label \(y\) and the model prediction and the regularizer \(\Omega\) penalizes the model’s complexity.

optimize(method::RegERM, λ::Float64, optimizer::Symbol=:l_bfgs)

Perform the optimization of method for a given regularization parameter λ and return a prediction model that can be used for classification. Stochastic gradient descent (:svg) and Limited-memory BFGS (:l_bfgs) are valid optimizer.

Indices and tables