Consider a model with features x1,x2,x3...xn . Let the binary output be denoted by y , that can take the values 0 or 1. Let p be the probability of y=1 , we can denote it as p=P(y=1) . The mathematical relationship between these variables can be denoted as:
ln(1−pp)=θ0+θ1x1+θ2x2+θ3x3
Here the term 1−pp is known as the odds and denotes the likelihood of the event taking place. Thus ln(1−pp) is known as the log odds and is simply used to map the probability that lies between 0 and 1 to a range between (−∞, +∞). The terms θ1,θ2,θ3,...are parameters (or weights) that we will estimate during training.
Now we will be using the above equation to make our predictions. Before that we will train our model to obtain the values of our parameters θ1,θ2,θ3,... that result in least error.
You might know that the partial derivative of a function at its minimum value is equal to 0. So gradient descent basically uses this concept to estimate the parameters or weights of our model by minimizing the loss function.
Initialize the weights,θ0=0 and θ1=0 .
Calculate the partial derivative with respect to θ0 and θ1dθ0=−2∑i=1n(yi−yiˉ)×yiˉ×(1−yiˉ)dθ1=−2∑i=1n(yi−yiˉ)×yiˉ×(1−yiˉ)×xi
Update the weights - values of b0 and b1θ0=θ0−l×dθ0θ1=θ1−l×dθ1
Python Implementation
# Importing libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from math import exp
# Preparing the dataset
data = pd.DataFrame({'feature' : [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], 'label' : [0,0,0,0,0,0,0,1,1,1,1,1,1,1,1]})
# Divide the data to training set and test set
X_train, X_test, y_train, y_test = train_test_split(data['feature'], data['label'], test_size=0.30)
## Logistic Regression Model
# Helper function to normalize data
def normalize(X):
return X - X.mean()
# Method to make predictions
def predict(X, theta0, theta1):
# Here the predict function is: 1/(1+e^(-x))
return np.array([1 / (1 + exp(-(theta0 + theta1*x))) for x in X])
# Method to train the model
def logistic_regression(X, Y):
# Normalizing the data
X = normalize(X)
# Initializing variables
theta0 = 0
theta1 = 0
learning_rate = 0.001
epochs = 300
# Training iteration
for epoch in range(epochs):
y_pred = predict(X, theta0, theta1)
## Here the loss function is: sum(y-y_pred)^2 a.k.a least squared error (LSE)
# Derivative of loss w.r.t. theta0
theta0_d = -2 * sum((Y - y_pred) * y_pred * (1 - y_pred))
# Derivative of loss w.r.t. theta1
theta1_d = -2 * sum(X * (Y - y_pred) * y_pred * (1 - y_pred))
theta0 = theta0 - learning_rate * theta0_d
theta1 = theta1 - learning_rate * theta1_d
return theta0, theta1
# Training the model
theta0, theta1 = logistic_regression(X_train, y_train)
# Making predictions
X_test_norm = normalize(X_test)
y_pred = predict(X_test_norm, theta0, theta1)
y_pred = [1 if p >= 0.5 else 0 for p in y_pred]
# Evaluating the model
print(list(y_test))
print(y_pred)