Loss functions¶

Loss functions measure the disagreement between the true label \(y\in\{-1,1\}\) and the prediction.

Loss functions implement the following main methods:

value(l::Loss)¶: Compute the value of the loss.

gradient(l::Loss)¶: Compute the gradient of the loss.

The following loss functions are implemented:

Logistic(w::Vector, X::Matrix, y::Vector)¶

Return a vector of the logistic loss evaluated for all given training instances \(\bf X\) and the labels \(\bf y\)

\[\begin{split}\ell({\bf w}, {\bf x}, y)&=\log(1+exp(-y{\bf x}^T{\bf w})),\end{split}\]

where \({\bf w}\) is the weight vector of the decision function.

Note

The logistic loss corresponds to a likelihood function under an exponential family assumption of the class-conditional distributions \(p({\bf x}|y;{\bf w})\).

Squared(w::Vector, X::Matrix, y::Vector)¶

Return a vector of the squared loss evaluated for all given training instances \(\bf X\) and the labels \(\bf y\)

\[\begin{split}\ell({\bf w}, {\bf x}, y)&=(y-{\bf x}^T{\bf w})^2,\end{split}\]

where \({\bf w}\) is the weight vector of the decision function.

Hinge(w::Vector, X::Matrix, y::Vector)¶

Return a vector of the hinge loss evaluated for all given training instances \(\bf X\) and the labels \(\bf y\)

\[\begin{split}\ell({\bf w}, {\bf x}, y)&=\max(0, 1-y{\bf x}^T{\bf w}),\end{split}\]

where \({\bf w}\) is the weight vector of the decision function.

Note

The hinge loss corresponds to a max-margin assumption.