Linear Regression

1. Regression Model

In statistics, linear regression is a linear approach to modelling the relationship between a dependent variable and one or more independent variables. Let
x1x_1
be the independent variable and
yy
be the dependent variable. We will define a linear relationship between these two variables as follows:
y=θ0+θ1x1y = \theta_0+\theta_1 x_1

2. Define Loss Function

We will use the Mean Squared Error function.
L=1ni=1n(ytrueypredicted)2L = \frac{1}{n}\sum_{i=1}^n (y_{true}-y_{predicted})^2

3. Utilize the Gradient Descent Algorithm

You might know that the partial derivative of a function at its minimum value is equal to 0. So gradient descent basically uses this concept to estimate the parameters or weights of our model by minimizing the loss function.
    1.
    Initialize the weights,
    θ0=0\theta_0 = 0
    and
    θ1=0\theta_1 =0
    2.
    Calculate the partial derivatives w.r.t. to
    θ0\theta_0
    and
    θ1\theta_1
    dθ0=2ni=1n(yiyiˉ)dθ1=2ni=1n(yiyiˉ)×xid_{\theta_0} = -\frac{2}{n} \sum_{i=1}^n(y_i - \bar{y_i}) \\ d_{\theta_1} = -\frac{2}{n} \sum_{i=1}^n(y_i - \bar{y_i}) \times x_i
    3.
    Update the weights
    θ0=θ0l×dθ0θ1=θ1l×dθ1\theta_0 = \theta_0 - l \times d_{\theta_0} \\ \theta_1 = \theta_1 - l \times d_{\theta_1}

Python Implementation

1
# Importing libraries
2
import numpy as np
3
import pandas as pd
4
from sklearn.model_selection import train_test_split
5
6
# Preparing the dataset
7
data = pd.DataFrame({'feature' : [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], 'label' : [2,4,6,8,10,12,14,16,18,20,22,24,26,28,30]})
8
# Divide the data to training set and test set
9
X_train, X_test, y_train, y_test = train_test_split(data['feature'], data['label'], test_size=0.30)
10
11
# Method to make predictions
12
def predict(X, theta0, theta1):
13
# Here the predict function is: theta0+theta1*x
14
return np.array([(theta0 + theta1*x) for x in X])
15
16
def linear_regression(X, Y):
17
# Initializing variables
18
theta0 = 0
19
theta1 = 0
20
learning_rate = 0.001
21
epochs = 300
22
n = len(X)
23
24
# Training iteration
25
for epoch in range(epochs):
26
y_pred = predict(X, theta0, theta1)
27
28
## Here the loss function is: 1/n*sum(y-y_pred)^2 a.k.a mean squared error (mse)
29
# Derivative of loss w.r.t. theta0
30
theta0_d = -(2/n) * sum(Y-y_pred)
31
# Derivative of loss w.r.t. theta1
32
theta1_d = -(2/n) * sum(X*(Y-y_pred))
33
34
theta0 = theta0 - learning_rate * theta0_d
35
theta1 = theta1 - learning_rate * theta1_d
36
37
return theta0, theta1
38
39
# Training the model
40
theta0, theta1 = linear_regression(X_train, y_train)
41
42
# Making predictions
43
y_pred = predict(X_test, theta0, theta1)
44
45
# Evaluating the model
46
print(list(y_test))
47
print(y_pred)
Copied!
Last modified 9mo ago