Logistic Regression

1. Logistic Model

Consider a model with features
x1,x2,x3...xnx_1, x_2, x_3 ... x_n
. Let the binary output be denoted by
yy
, that can take the values 0 or 1. Let
pp
be the probability of
y=1y = 1
, we can denote it as
p=P(y=1)p = P(y=1)
. The mathematical relationship between these variables can be denoted as:
ln(p1p)=θ0+θ1x1+θ2x2+θ3x3ln(\frac{p}{1-p}) = \theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3\\
Here the term
p1p\frac{p}{1−p}
is known as the odds and denotes the likelihood of the event taking place. Thus
ln(p1p)ln(\frac{p}{1−p})
is known as the log odds and is simply used to map the probability that lies between 0 and 1 to a range between (−∞, +∞). The terms
θ1,θ2,θ3,...\theta_1,\theta_2,\theta_3,...
are parameters (or weights) that we will estimate during training.

It is actually Sigmoid!

ln(p1p)=θ0+θ1x1+θ2x2+θ3x3p1p=eθ0+θ1x1+θ2x2+θ3x3eθ0+θ1x1+θ2x2+θ3x3p(eθ0+θ1x1+θ2x2+θ3x3)=pp+p(eθ0+θ1x1+θ2x2+θ3x3)=eθ0+θ1x1+θ2x2+θ3x3p(1+eθ0+θ1x1+θ2x2+θ3x3)=eθ0+θ1x1+θ2x2+θ3x3p=eθ0+θ1x1+θ2x2+θ3x31+eθ0+θ1x1+θ2x2+θ3x3p=eθ0+θ1x1+θ2x2+θ3x3eθ0+θ1x1+θ2x2+θ3x31+eb0+b1x1+b2x2+b3x3eθ0+θ1x1+θ2x2+θ3x3p=11+1eθ0+θ1x1+θ2x2+θ3x3p=11+e(θ0+θ1x1+θ2x2+θ3x3)S(x)=11+exln(\frac{p}{1-p}) = \theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3\\ \frac{p}{1-p} = e^{\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3}\\ e^{\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3} - p(e^{\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3}) = p\\ p + p(e^{\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3}) = e^{\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3}\\ p(1 + e^{\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3}) = e^{\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3}\\ p = \frac {e^{\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3}}{1 + e^{\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3}}\\ p = \frac {\frac{e^{\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3}}{e^{\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3}}}{\frac{1 + e^{b_0+b_1x_1+b_2x_2+b_3x_3}}{e^{\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3}}}\\ p = \frac {1}{1+\frac{1}{e^{\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3}}}\\ p = \frac {1}{1+e^{-(\theta_0+\theta_1x_1+\theta_2x_2+\theta_3x_3)}}\\ S(x)=\frac{1}{1+e^{-x}}
Now we will be using the above equation to make our predictions. Before that we will train our model to obtain the values of our parameters
θ1,θ2,θ3,...\theta_1,\theta_2,\theta_3,...
that result in least error.

2. Define the Loss Function

A L2 Loss function such as Least Squared Error will do the job.
L=i=1n(ytrueypredicted)2L = \sum_{i=1}^n (y_{true}-y_{predicted})^2

3. Utilize the Gradient Descent Algorithm

You might know that the partial derivative of a function at its minimum value is equal to 0. So gradient descent basically uses this concept to estimate the parameters or weights of our model by minimizing the loss function.
    1.
    Initialize the weights,
    θ0=0\theta_0=0
    and
    θ1=0\theta_1=0
    .
    2.
    Calculate the partial derivative with respect to
    θ0\theta_0
    and
    θ1\theta_1
    dθ0=2i=1n(yiyiˉ)×yiˉ×(1yiˉ)dθ1=2i=1n(yiyiˉ)×yiˉ×(1yiˉ)×xid_{\theta_0} = -2 \sum_{i=1}^n(y_i - \bar{y_i}) \times \bar{y_i} \times (1 - \bar{y_i})\\ d_{\theta_1} = -2 \sum_{i=1}^n(y_i - \bar{y_i}) \times \bar{y_i} \times (1 - \bar{y_i}) \times x_i
    3.
    Update the weights - values of
    b0b_0
    and
    b1b_1
    θ0=θ0l×dθ0θ1=θ1l×dθ1\theta_0 = \theta_0 - l \times d_{\theta_0} \\ \theta_1 = \theta_1 - l \times d_{\theta_1}

Python Implementation

1
# Importing libraries
2
import numpy as np
3
import pandas as pd
4
from sklearn.model_selection import train_test_split
5
from math import exp
6
7
# Preparing the dataset
8
data = pd.DataFrame({'feature' : [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], 'label' : [0,0,0,0,0,0,0,1,1,1,1,1,1,1,1]})
9
# Divide the data to training set and test set
10
X_train, X_test, y_train, y_test = train_test_split(data['feature'], data['label'], test_size=0.30)
11
12
## Logistic Regression Model
13
# Helper function to normalize data
14
def normalize(X):
15
return X - X.mean()
16
17
# Method to make predictions
18
def predict(X, theta0, theta1):
19
# Here the predict function is: 1/(1+e^(-x))
20
return np.array([1 / (1 + exp(-(theta0 + theta1*x))) for x in X])
21
22
# Method to train the model
23
def logistic_regression(X, Y):
24
# Normalizing the data
25
X = normalize(X)
26
27
# Initializing variables
28
theta0 = 0
29
theta1 = 0
30
learning_rate = 0.001
31
epochs = 300
32
33
# Training iteration
34
for epoch in range(epochs):
35
y_pred = predict(X, theta0, theta1)
36
37
## Here the loss function is: sum(y-y_pred)^2 a.k.a least squared error (LSE)
38
# Derivative of loss w.r.t. theta0
39
theta0_d = -2 * sum((Y - y_pred) * y_pred * (1 - y_pred))
40
# Derivative of loss w.r.t. theta1
41
theta1_d = -2 * sum(X * (Y - y_pred) * y_pred * (1 - y_pred))
42
43
theta0 = theta0 - learning_rate * theta0_d
44
theta1 = theta1 - learning_rate * theta1_d
45
46
return theta0, theta1
47
48
# Training the model
49
theta0, theta1 = logistic_regression(X_train, y_train)
50
51
# Making predictions
52
X_test_norm = normalize(X_test)
53
y_pred = predict(X_test_norm, theta0, theta1)
54
y_pred = [1 if p >= 0.5 else 0 for p in y_pred]
55
56
# Evaluating the model
57
print(list(y_test))
58
print(y_pred)
Copied!
Last modified 9mo ago