10.2 Connection to likelihood functions

We call the function \(S(b)\) the cost function. There is something big going on here. In Section 9 we also defined the log likelihood function (Equation (10.3), from Equation (9.2) with \(N=4\) and \(\sigma=1\)):

\[\begin{equation} \ln(L(\vec{\alpha} | \vec{x},\vec{y} )) = -2 \ln(2) - 2 \ln (\pi) -(3-b)^{2}-(5-2b)^{2}-(4-4b)^{2}-(10-4b)^{2} \tag{10.3} \end{equation}\]

If we compare this log likelihood equation (Equation (10.3)) with Equation (10.1), then we \(\ln(L(\vec{\alpha} | \vec{x},\vec{y} )) = -2 \ln(2) - 2 \ln (\pi) - S(b)\). This is no coincidence. Likelihood functions are similar in nature to cost functions. You may be thinking: but Equation (10.3) contains the extra factors of \(\displaystyle - -2 \ln(2) - 2 \ln (\pi)\) - but you need not worry. Here is why: our goal is to optimize a cost or log-likelihood function. What these extras terms do (for constant \(\sigma\)) is shift the graph of the log-likelihood function vertically but not horizontally. Vertically shifting a function doesn’t not change the location of an optimum value (Why? Think back to derivatives from Calculus I).

Recognizing the connection between cost and likelihood functions and their goal of optimization leads to a key observation: A quadratic cost function yields the same results as likelihood function assuming the residuals are normally distributed.