RegERMs.jl¶
Regularized empirical risk minimization (RegERM) is a general concept that defines a family of optimization problems in machine learning, as, e.g., Support Vector Machine, Logistic Regression, and Ridge Regression.
Contents:
API¶
The following main concepts are implemented:
Loss functions¶
Loss functions measure the disagreement between the true label \(y\in\{-1,1\}\) and the prediction.
Loss functions implement the following main methods:
- value(l::Loss)¶
Compute the value of the loss.
- gradient(l::Loss)¶
Compute the gradient of the loss.
The following loss functions are implemented:
- Logistic(w::Vector, X::Matrix, y::Vector)¶
Return a vector of the logistic loss evaluated for all given training instances \(\bf X\) and the labels \(\bf y\)
\[\begin{split}\ell({\bf w}, {\bf x}, y)&=\log(1+exp(-y{\bf x}^T{\bf w})),\end{split}\]where \({\bf w}\) is the weight vector of the decision function.
Note
The logistic loss corresponds to a likelihood function under an exponential family assumption of the class-conditional distributions \(p({\bf x}|y;{\bf w})\).
- Squared(w::Vector, X::Matrix, y::Vector)¶
Return a vector of the squared loss evaluated for all given training instances \(\bf X\) and the labels \(\bf y\)
\[\begin{split}\ell({\bf w}, {\bf x}, y)&=(y-{\bf x}^T{\bf w})^2,\end{split}\]where \({\bf w}\) is the weight vector of the decision function.
- Hinge(w::Vector, X::Matrix, y::Vector)¶
Return a vector of the hinge loss evaluated for all given training instances \(\bf X\) and the labels \(\bf y\)
\[\begin{split}\ell({\bf w}, {\bf x}, y)&=\max(0, 1-y{\bf x}^T{\bf w}),\end{split}\]where \({\bf w}\) is the weight vector of the decision function.
Note
The hinge loss corresponds to a max-margin assumption.
Regularizer¶
Regularization prevent overfitting and introduce additional information (prior knowledge) to solve an ill-posed problem.
Regularizers implement the following main methods:
- value(r::Regularizer)¶
Compute the value of the regularizer.
- gradient(r::Regularizer)¶
Compute the gradient of the regularizer.
The following regularizers are implemented:
- L2reg(w::Vector, λ::Float64)¶
Implements an \(L^2\)-norm regularization of the weight vector w of the decision function:
\[\begin{split}\Omega({\bf w})&=\frac{1}{2\lambda}\|{\bf w}\|^2,\end{split}\]where the regularization parameter \(\lambda\) controls the influence of the regularizer.
Note
The \(L^2\)-norm regularization corresponds to Gaussian prior assumption of \({\bf w}\sim\mathcal{N}({\bf 0},\lambda{\bf I})\).
Machine learning methods¶
The framework implements the following learning algorithms:
Ridge Regression¶
Ridge regression models the relationship between an input variable \({\bf x}\) and a continous output variable \(y\) by fitting a linear function.
- LinReg(X::Matrix, y::Vector; kernel::Symbol=:linear)¶
Initialize a ridge regression object with a data matrix \({\bf X} \in \mathbb{R}^{n\times m}\), a label binary label vector \({\bf y} \in \mathbb{R}^{n}\) of \(n\) \(m\)-dimensional examples, and a kernel function.
Implements: optimize
Logistic Regression¶
Logistic regression models the relationship between an input variable \({\bf x}\) and a binary output variable \(y\) by fitting a logistic function.
- LogReg(X::Matrix, y::Vector; kernel::Symbol=:linear)¶
Initialize a logistic regression object with a data matrix \({\bf X} \in \mathbb{R}^{n\times m}\), a label binary label vector \({\bf y} \in \mathbb{R}^{n}\) of \(n\) \(m\)-dimensional examples, and a kernel function.
Implements: optimize
Support Vector Machine¶
Support vector machines model the relationship between an input variable \({\bf x}\) and a continous output variable \(y\) by finding a hyperplane separating examples belonging to different classes with maximal margin.
- SVM(X::Matrix, y::Vector; kernel::Symbol=:linear)¶
Initialize an SVM object with a data matrix \({\bf X} \in \mathbb{R}^{n\times m}\), a label binary label vector \({\bf y} \in \mathbb{R}^{n}\) of \(n\) \(m\)-dimensional examples, and a kernel function.
Implements: optimize
Implement: optimize
- Let \({\bf x}_i\) be a vector of features describing an instance i and \(y_i\) be its target value. Then, for a given set of n training instances \(\{({\bf x}_i,y_i)\}_{i=1}^n\) the goal is to find a model \({\bf w}\) that minimizes the regularized empirical risk:
- \[\sum_{i=1}^n \ell({\bf w}, {\bf x}_i, y_i) + \Omega({\bf w}).\]
The loss function \(\ell\) measures the disagreement between the true label \(y\) and the model prediction and the regularizer \(\Omega\) penalizes the model’s complexity.
- optimize(method::RegERM, λ::Float64, optimizer::Symbol=:l_bfgs)¶
Perform the optimization of method for a given regularization parameter λ and return a prediction model that can be used for classification. Stochastic gradient descent (:svg) and Limited-memory BFGS (:l_bfgs) are valid optimizer.