Gradient descent is one of the most used algorithms to optimize a function. Optimizing a function means finding the hyperparameter values for that function that give us the best possible outcome.
Gradient descent has broad applications, but in this text, we will focus on its use in Machine Learning to minimize model loss in a linear function.
Mean Squared Error (MSE)
Before tal
Linear regression
In linear regression, the model is represented as:
Where:
is the slope,
is the intercept.
The goal of linear regression is to minimize the error between the predicted values and the actual data points. This error is often measured using the average squared distance from the predicted values to the actual values also known as Mean Squared Error (MSE). Since our model generates the predicted values, we must adjust and
to reduce the discrepancy between predictions and real data.
Gradient descent helps achieve this by iteratively updating and
in the direction that reduces the error. This process continues until no further improvement is found or a predefined benchmark is reached, ensuring the model fits the data as accurately as possible.
Sum of Squared Errors (SSE)
The SSE for our linear regression model is:
Where:
are the actual data points (true values),
are the input values (independent variable),
and
are the parameters to be optimized.
We want to minimize this function with respect to (m) and (b).
Step 1: Calculate the Gradients
To update and
using gradient descent, we first calculate the partial derivatives of the SSE with respect to both
and
.
Derivative with respect to (m) (Slope)
Derivative with respect to (b) (Intercept):
Step 2: Update the Parameters (Slope and Intercept)
Once we have the gradients, we update and
using the following formulas:
Where:
is the learning rate, a hyperparameter that controls how big the step is during each update.
Step 3: Apply to Our Linear Model
Let’s assume we are working with the following values for our linear function :
- Initial slope: (m_0 = 0),
- Initial intercept: (b_0 = 0),
- Learning rate: (\eta = 0.01),
- Data points:
.
Now, let’s go step by step to find the new values of and
.
Compute the Derivative with Respect to (m):
First, we need to calculate the predicted values using the current parameters. Initially, both and
are set to 0.
- For
, the prediction is
.
- For
, the prediction is
.
- For
, the prediction is
.
Now, calculate the partial derivative with respect to (m):
\subsubsection*{Compute the Derivative with Respect to (b):}
Now, calculate the partial derivative with respect to (b):
\subsubsection*{Update the Parameters:}
Now, use the learning rate (\eta = 0.01) to update the slope and intercept:
Step 4: Repeat the Process
Now that we have the updated slope and intercept, we would repeat this process for multiple iterations to gradually converge to the optimal values for (m) and (b).
with each iteration, the model will find the best fit line for the given data points.