Description
Python, ONLY QUESTION 3,implement 19th function which is the entropy loss of theclassification of the first 100 MNIST images using full connected neural network of size 728x64x10 and ReLU activation, and run your implementations of Nelder-Meader method, Steepest Descent method , and BFGS method to minimize the entropy loss- include ReLu activation needed and need to run implementation of each method and display the minimization results.Please work on question 3 first and write a detailed analysis of the results.
Unformatted Attachment Preview
Testing Unconstrained Optimization
Software
JORGE J. MORI~, BURTON S. GARBOW, and KENNETH E. HILLSTROM
Argonne National Laboratory
Much of the testing of optimization software is inadequate because the number of test functmns is
small or the starting points are close to the solution. In addition, there has been too much emphasm
on measurmg the efficmncy of the software and not enough on testing reliability and robustness. To
address this need, we have produced a relatwely large but easy-to-use collection of test functions and
designed gmdelines for testing the reliability and robustness of unconstrained optimization software.
Key Words and Phrases: performance testing, systems of nonlinear equatmns, nonlinearleast squares,
unconstrained minnmzation, optimizatmn software
CR Categorms. 4.6, 5.15, 5.41
The Algorithm: FORTRAN Subroutines for Testing Unconstrained Optimizatmn Software. ACM
Trans. Math. Software 7, 1 (March 1981), 136-140.
1. INTRODUCTION
When an algorithm is presented in the optimization literature, it has usually been
tested on a set of functions. The purpose of this testing is to show that the
algorithm works and, indeed, that it works better than other algorithms in the
same problem area. In our opinion these claims are usually unwarranted because
it is often the case that there are only a small number of test functions, and that
the starting points are close to the solution.
Testing an algorithm on a relatively large set of test functions is bothersome
because it requires the coding of the functions. This is a tedious and error-prone
job that is avoided by many. However, not testing the algorithm on a large
number of functions can easily lead the cynical observer to conclude that the
algorithm was tuned to particular functions. Even aside from the cynical observer,
the algorithm is just not well tested.
It is harder to understand why the standard starting points are usually close to
the solution. One possible reason is that the algorithm developer is interested in
testing the ability of the algorithm to deal with only one type of problem (e.g., a
curved valley), and it is easier to force the algorithm to deal with this problem if
the starting point is close to the solution.
Thus a test function like Rosenbrock’s is useful because it tests the ability of
the algorithm to follow curved valleys. However, test functions like Rosenbrock’s
are the exception rather than the rule; other test functions have much more
complicated features, and it has been observed that algorithms that succeed from
This work was performed under the auspmes of the U.S. Department of Energy,
Authors’ address: Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439.
© 1981 ACM 0098-3500/81/0300-0017 $00 00
ACM Transactions on Mathematmal Software, Vol 7, No. 1, March 1981, Pages 17-41
18
•
J.J. More, B. S. Garbow, and K. E. Hillstrom
the standard starting points often have problems from points farther away and
fail. Hillstrom [15] was one of the first to point out the need to test optimization
software at nonstandard starting points. He proposed using random starting
points chosen from a box surrounding the standard starting point. This approach
is much more satisfactory, but it tends to produce large amounts of data which
can be hard to interpret. Moreover, the use of a random number generator
complicates the reproduction of the results at other computing centers.
A final complaint against most of the testing procedures that have appeared in
the literature is that there has been too much emphasis on comparing the
efficiency of optimization routines and not enough emphasis on testing the
reliability and robustness of optimization software–the ability of a computer
program to solve an optimization problem. It is important to measure the
efficiency of optimization software, and this can be done, for example, by counting
function evaluations or by timing the algorithm. However, either measure has
problems, and with the standard starting points it is usually fairly hard to
differentiate between similar algorithms (e.g., two quasi-Newton methods) on
either count. In contrast, the use of points farther away from the solution will
frequently reveal drastic differences in reliability and robustness between the
programs, and hence in the number of function evaluations and in the timing of
the algorithms.
To deal with the above problems, we have produced a relatively large collection
of carefully coded test functions and designed very simple procedures for testing
the reliability and robustness of unconstrained optimization software. The heart
of our testing procedure is a set of basic subroutines, described in Sections 2 and
3, which define the test functions and the starting points. The attraction of these
subroutines lies in their flexibility; with them it is possible to design many
different kinds of tests for optimization software. Finally, in Sections 4 and 5 we
describe some of the tests that we have been using to measure reliability and
robustness.
It should be emphasized that the testing described in this paper is only a
beginning and that other tests are necessary. For example, the ability of an
algorithm to deal with small tolerances should be tested. However, the testing of
Sections 4 and 5 does examine reliability and robustness in ways that other
testing procedures have ignored.
2. THE BASIC SUBROUTINES
Testing of optimization software requires a basic set of subroutines that define
the test functions and the starting points. We consider the following three
problem areas:
I. Systems of nonlinear equations. Given f~ : R” ~ R for i — 1 , . . . , n, solve
f,(x)—O,
l_ n variable
(b) f,(x) = x3 e x p [ – t , x , ] – x4 exp[-t~x2]
+ x6 exp[-t~x~] – y,
w h e r e t, = (0.1)i
a n d y, = e x p [ – t J – 5 e x p [ – 1 0 t , ] + 3 e x p [ – 4 t J
(c) Xo– (1, 2, 1, 1, 1, 1)
(d) f = 5 . 6 5 5 6 5 . . .
10 -3
if m = 13
f=0
at (1,10,1,5,4,3)
(19) O s b o r n e 2 f u n c t i o n [21]
(a) n = l l ,
m=65
(b) f~(x) = y, – (xl e x p [ – t , xs] + x2 e x p [ – ( t , – x9)2×6]
+ xz e x p [ – ( t , – xlO)2XT] + x4 e x p [ – ( t ~ – Xll)2xs])
w h e r e t, = (i – 1)/10
and
i
y,
~
y,
~
y,
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
1.366
1.191
1.112
1.013
0.991
0.885
0.831
0.847
0.786
0.725
0.746
0.679
0.608
0.655
0.616
0.606
0.602
0.626
0.651
0.724
0.649
0.649
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
0.694
0.644
0.624
0.661
0.612
0.558
0.533
0.495
0 500
0.423
0 395
0.375
0.372
0.391
0 396
0.405
0 428
0.429
0.523
0.562
0.607
0.653
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
0.672
0.708
0.633
0.668
0.645
0.632
0.591
0.559
0.597
0.625
0.739
0.710
0.729
0.720
0.636
0.581
0.428
0.292
0.162
0.098
0.054
(c) Xo = (1.3, 0.65, 0.65, 0.7, 0.6, 3, 5, 7, 2, 4.5, 5.5)
(d) f = 4.01377 . . . 10 -2
(20) W a t s o n f u n c t i o n [17]
(a) 2 _< n _ 31,
m = 31
ACM Transactions on MathemaUcal Software, Vol 7, No 1, March 1981.
26
J.J. Mor6, B. S. Garbow, and K E. Hillstrom
where
fMx)
t,=//29,
= xl,
1_
Purchase answer to see full
attachment