The function is evaluated in a pointP = (10, 10) to see its usage2. The value of the objective function is easily calculated as function of dist(U, V). Likewise, the gradient of f is defined as function of the gradient of dist(U, V) with V fixed, as presented below:
then,
These functions are easily defined in Python as follows:
def g_d(U,V): "gradient of the distance" return [U[0]-V[0],U[1]-V[1]]/dist(U,V) def grad_f(E): "gradient of the objective function" return 12*g_d(E,A)+13*g_d(E,B)+11*g_d(E,C)+18*g_d(E,D)
Now the gradient method consists in applying the iteration given by (Equation 2.30), as presented below:
t = 0.5
E = np.array([10,10])
for iter in range(50): E = E -t*grad_f(E) f = 12*dist(E,A) + 13*dist(E,B) + 11*dist(E,C) + 18*dist(E,D) print("Position:",E) print("Gradient",grad_f(E)) print("Cost",f)
In this case, t = 0.5 and a initial point E = (10, 10) with 50 iterations were enough to find the solution. The reader is invited to try with other values and analyze the effect on the algorithm’s convergence.
The step t is very important for the convergence of the gradient method. It can be constant or variable, according to a well-defined update rule. There are many variants of this algorithm, most of them with sophisticated ways to calculate this step3. A plot of ‖∇f‖ versus the number of iterations may be useful for determining the optimal value of t and showing the convergence rate of the algorithm, as presented in the next example. We expect a linear convergence for the gradient method, although the algorithm can lead to oscillations and even divergence if the parameter t is not selected carefully. Fortunately, there are modules in Python that make this work automatically.
Example 2.7
The convergence of the algorithm can be visualized by using the module MatplotLib as follows:
import matplotlib.pyplot as plt t = 0.5 conv = [] E = [10,10] for iter in range(50): E += - t*grad_f(E) conv += [np.linalg.norm(grad_f(E))] plt.semilogy(conv) plt.grid() plt.xlabel("Iteration") plt.ylabel("|Gradient|") plt.show()
The result of the script is shown in Figure 2.7. As expected, the convergence rate is linear; that is to say, the convergence plot describes almost a line in a semi-logarithmic scale. The value of ε can be used as convergence criteria (a gradient ‖∇f‖ ≤ 10−4 can be considered as the local optimum for this problem).
Figure 2.7 Convergence of the gradient method.
Notice that addition was simplified by the statement +=. In general, an statement such as x=x+1 is equivalent to x+=1. More details about this aspect are presented in Appendix C.
2.6 Lagrange multipliers
Reality imposes physical constraints into the problems and these constraints must be considered into the model. For example, an optimization model may include equality constraints, as presented below:
For solving this problem, a function called lagrangian is defined as follows:
This new function depends on the original decision variables x and a new variable λ, known as Lagrange multiplier or dual variable. By means of the lagrangian function, a constrained optimization problem was transformed into an unconstrained optimization problem that can be solved numerically, namely:
by a small abuse of notation, ∂f / ∂x is used instead of ∇f, which is the formal representation for the n-dimentional case, (the same for L and g). Notice that the optimal conditions of L imply optimal solution in f but also feasibility in terms of the constraint.
The first condition implies that the gradient of the objective function must be parallel to the gradient of the constraint and, the Lagrange multiplier is the proportionality constant. Besides this geometrical interpretation, Lagrange multipliers have another interesting interpretation. Suppose a local optimum x~ is found for a constrained