1 min readOct 21, 2018
Thank you Piotr for this great article! It is very nicely explained and coded.
I like how you created the nn_architecture using a dictionary, reminds of Tensorflow (each output connects to next input and have to match).
I do have one quesiton, regarding the bias, why is the shape always (layer_output_size, 1)?
def init_layers(self, seed = None):
# Initialize weights and biases
# wegiths have size (n_prev, n_hidden)
# put random seed
np.random.seed(seed)
# get number of layers
number_of_layers = len(self.nn_architecture)
params_values = {}for idx, layer in enumerate(self.nn_architecture):
layer_idx = idx + 1
layer_input_size = layer["input_dim"]
layer_output_size = layer["output_dim"]params_values['W' + str(layer_idx)] = np.random.randn(
layer_input_size, layer_output_size) * 0.1
params_values['b' + str(layer_idx)] = np.random.randn(
self.batch, layer_output_size) * 0.1
self.params_values = params_values
return params_values
Does this make sense?
Batch size is the number of examples we feed at once, so the size of the bias should be reflected by the batch size? bias shape (batch, layer_output_size)?