I am working in a reinforcement learning program and I am using this article as the reference. I am using python with keras(theano) for creating neural network and the pseudo code I am using for this program is
Do a feedforward pass for the current state s to get predicted Q-values for all actions.
Do a feedforward pass for the next state s’ and calculate maximum overall network outputs max a’ Q(s’, a’).
Set Q-value target for action to r + γmax a’ Q(s’, a’) (use the max calculated in step 2). For all other actions, set the Q-value target to the same as originally returned from step 1, making the error 0 for those outputs.
Update the weights using backpropagation.
The loss function equation here is this

where my reward is +1, maxQ(s',a') =0.8375 and Q(s,a)=0.6892
My L would be 1/2*(1+0.8375-0.6892)^2=0.659296445
Now how should I update my model neural network weights using the above loss function value if my model structure is this
model = Sequential()
model.add(Dense(150, input_dim=150))
model.add(Dense(10))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='mse', optimizer='adam')
In order to update the weights of your neural network using the loss function value you calculated, you can use the fit() method provided by Keras. The fit() method takes in the input and output data, as well as the number of epochs and batch size for training, and it will update the weights of the model according to the loss function and optimizer you specified during compilation.
Here is an example of how you might use the fit() method to update the weights of your model using the loss function value you calculated:
# Create the input and output data X = [state] # Replace with the input data for the current state y = [Q_target] # Replace with the Q-value target for the current action# Fit the model to the data model.fit(X, y, epochs=1, batch_size=1)
Here, the X variable will have the state's information and y variable will have the Q-value target for the current action, as per the equation of loss function you have provided.
In this example, the model is being trained for 1 epoch with a batch size of 1, but you can adjust the number of epochs and batch size as needed. The weights of the model will be updated according to the loss function and optimizer specified during compilation and it will minimize the loss function.
It's worth noting that in this example the loss is MSE (mean squared error) but in the calculation you have shown is a different one. If you want to use this loss you will have to change the loss parameter in the compile method for the model.