For my education, I am trying to implement an N-dimensional convolutional layer in a convolutional neural network.
I would like to implement a backpropagation function. However, I am not sure of the most efficient way of doing so.
At present, I am using signal.fftconvolve to:
In the forwards step, convolve the filter and kernel forwards over all filters;
In the Backpropagation step, convolve the derivatives (reversed in all dimensions with the FlipAllAxes function) with the array (https://jefkine.com/general/2016/09/05/backpropagation-in-convolutional-neural-networks/) over all filters and sum them. The output I take to be the sum of each image convolved with each derivative for each filter.
I am particularly confused about how to convolve the derivatives. Using the class below to backpropagate results in an explosion in the size of the weights.
What is the correct way to program the convolution of the derivative with the output and filters?
EDIT:
According to this paper (Fast Training of Convolutional Networks through FFTs), which seeks to do exactly what I wish to do:
The derivatives for the previous layer are given by the convolution of the derivatives of the current layer with the weights: dL/dy_f = dL/dx * w_f^T
The derivative for the weights are the piecewise sum of the convolution of the derivatives with the original input: dL/dy = dL/dx * x
I have implemented, as best as I know how, this below. However, this does not seem to give the intended result, as the network I have written using this layer exhibits wild fluctuations during training.
import numpy as np
from scipy import signal
class ConvNDLayer:
def __init__(self,channels, kernel_size, dim):
self.channels = channels
self.kernel_size = kernel_size;
self.dim = dim
self.last_input = None
self.filt_dims = np.ones(dim+1).astype(int)
self.filt_dims[1:] = self.filt_dims[1:]*kernel_size
self.filt_dims[0]= self.filt_dims[0]*channels
self.filters = np.random.randn(*self.filt_dims)/(kernel_size)**dim
def FlipAllAxes(self, array):
sl = slice(None,None,-1)
return array[tuple([sl]*array.ndim)]
def ViewAsWindows(self, array, window_shape, step=1):
# -- basic checks on argumentsif not isinstance(array, cp.ndarray):
raise TypeError("`array` must be a Cupy ndarray")
ndim = array.ndim
if isinstance(window_shape, numbers.Number):
window_shape = (window_shape,) * ndim
if not (len(window_shape) == ndim):
raise ValueError("`window_shape` is incompatible with `arr_in.shape`")
if isinstance(step, numbers.Number):
if step < 1:
raise ValueError("`step` must be >= 1")
step = (step,) * ndim
if len(step) != ndim:
raise ValueError("`step` is incompatible with `arr_in.shape`")
arr_shape = array.shape
window_shape = np.asarray(window_shape, dtype=arr_shape.dtype))
if ((arr_shape - window_shape) < 0).any():
raise ValueError("`window_shape` is too large")
if ((window_shape - 1) < 0).any():
raise ValueError("`window_shape` is too small")
# -- build rolling window view
slices = tuple(slice(None, None, st) for st in step)
window_strides = array.strides
indexing_strides = array[slices].strides
win_indices_shape = (((array.shape -window_shape)
// step) + 1)
new_shape = tuple(list(win_indices_shape) + list(window_shape))
strides = tuple(list(indexing_strides) + list(window_strides))
arr_out = as_strided(array, shape=new_shape, strides=strides)
return arr_out
def UnrollAxis(self, array, axis):
# This so it works with a single dimension or a sequence of them
axis = cp.asnumpy(cp.atleast_1d(axis))
axis2 = cp.asnumpy(range(len(axis)))
# Put unrolled axes at the beginning
array = cp.moveaxis(array, axis,axis2)
# Unrollreturn array.reshape((-1,) + array.shape[len(axis):])
def Forward(self, array):
output_shape =cp.zeros(array.ndim + 1)
output_shape[1:] = cp.asarray(array.shape)
output_shape[0]= self.channels
output_shape = output_shape.astype(int)
output = cp.zeros(cp.asnumpy(output_shape))
self.last_input = array
for i, kernel in enumerate(self.filters):
conv = self.Convolve(array, kernel)
output[i] = conv
return output
def Backprop(self, d_L_d_out, learn_rate):
d_A= cp.zeros_like(self.last_input)
d_W = cp.zeros_like(self.filters)
for i, (kernel, d_L_d_out_f) in enumerate(zip(self.filters, d_L_d_out)):
d_A += signal.fftconvolve(d_L_d_out_f, kernel.T, "same")
conv = signal.fftconvolve(d_L_d_out_f, self.last_input, "same")
conv = self.ViewAsWindows(conv, kernel.shape)
axes = np.arange(kernel.ndim)
conv = self.UnrollAxis(conv, axes)
d_W[i] = np.sum(conv, axis=0)
output = d_A*learn_rate
self.filters = self.filters - d_W*learn_rate
return output
To implement the backpropagation for an N-dimensional convolutional layer, you can use the following steps:
Calculate the gradient of the loss with respect to the output of the convolutional layer (dL/dy). This will be provided by the backpropagation of the next layer in the network.
Flip the filters in all dimensions, as in the forward pass, to perform the convolution of the gradient with the filters (dL/dy * w_f^T). This will give you the gradient of the loss with respect to the input of the convolutional layer (dL/dx).
Use this gradient of the loss with respect to the input (dL/dx) to update the weights of the filters. To do this, you can use the piecewise sum of the convolution of the gradient with the original input (dL/dy = dL/dx * x).
Return the gradient of the loss with respect to the input (dL/dx) to the previous layer in the network, so that it can continue the backpropagation process.
It's worth noting that the convolution operation in the backpropagation process should be performed using the same padding and strides as in the forward pass.