Description
Hello, I need Help please to solve the question with the best answer
Unformatted Attachment Preview
EEL 4810
1
HW 2
Due Mar 22
Optimality of l1 regularization
Recall that when l1 regularization is used, and assuming that the Hessian matrix H is
diagonal and positive definite, the objective function can be approximated by
L̂R (θ) =
d
X
1
i=1
2
Hii (θi − θi∗ ) + α|θi |
.
Find the optimal solution of this approximated objective function, and compare it with the
optimal solution of the objective function without regularization.
1
EEL 4810
2
HW 2
Due Mar 22
Backpropagation of fully connected network
Consider the following neural network:
∂f
′
where h1 = σ(W 1 x), h2 = σ(W 2 h1 ), f (x) = w3 h2 . Compute ∂W
1 . Use σ to denote the
i,j
derivative of the activation function σ.
2
EEL 4810
3
HW 2
Due Mar 22
Feedforward neural network
Design a feedforward neural network to solve XOR problem. The network is required to
have at least 1 hidden layer with 3 neurons.
3
EEL 4810
4
HW 2
Due Mar 22
Convolution
Consider the following convolution of the image X and kernel K:
(1). Calculate the convolution of them.
(2). Calculate the Cross-Correlation of them.
(3). Wide convolution: For the image X ∈ RM ×N and kernel K ∈ Rm×n , zero-padding is
applied to both dimensions of the image X, padding each end with m − 1 and n − 1 zeros,
resulting the full padding image X̃. The convolution of X̃ and K is called wide convolution.
Calculate the wide convolution of X and K.
4
EEL 4810
5
HW 2
Due Mar 22
Universal Approximation Theorem
[Cybenko, 1989, Hornik et al., 1989] showed the following theorem.
Assume ϕ(·) is a bounded, non-decreasing function. For any continuous function
f : [0, 1]d → R and any ϵ > 0, there exists M ∈ N, vi , bi ∈ R and wi ∈ Rd , such that
|f (x) −
M
X
vi ϕ(wi⊤ x + bi )| ≤ ϵ,
(1)
i=1
i.e., any continuous function defined on [0, 1]d can be approximated by the constructed
function class.
(1). Verify that any continuous function defined on [0, 1]d can be approximated by a
neural network with one hidden layer and Sigmoid activation function.
(2). Show that any continuous function defined on [0, 1]d can also be approximated by
a neural network with one hidden layer and ReLU activation function. (Hint. Apply the
theorem above for some ‘approximation’ of ReLU.)
5
Purchase answer to see full
attachment