About

This notebook is a quick walkthough of Linear Regression inner workings. We will not be using any external Machine Learning library (except for numpy, which really is a scientific computing libray than a Machine Learning one).

Every piece of the algorithm (right from model, to cost function, until gradinent decent) will be built from scratch.

What We Need

From the outset, as explained in the blog post, we need three important things to build any Machine Learning model

  1. Data from which We Need to Learn/represent
  2. Model
  3. Cost Function
  4. Optimization Algorithm (we'd be using Gradient Decent here as most production tasks use some or the other variation of this algorithm and as it suits our purpse here)

Part 1: The Data

Your dataset can contain anywhere between a few hundred to few million different datapoints. These datapoints are fed into an Optimization Algorithm multiple times. With each iteration, the Gradient Decent (Optimzation Algorithm), calculates the Output of the model, computes the cost based on that computation and real value (comes from the data itself), and then updates the parameters to minimize the loss.

For our case here, we will be just using 2 data points. x = [10, 20], with corresponding y = [105, 205].

import numpy as np

x_train = np.array([10, 20])
y_train = np.array([105, 205])

# also some test data
x_test = np.array([30, 40])
y_test = np.array([305, 405])

Part 2: The Model

Let's now build the function which will represent our model. Since we are doing linear regression, we will be using: y = wx + b

Here w is a weight and b is a bias term – they're generally referred as model parameters.

> Important: All in all, all we need is the right parameters that will represent the data well. i.e. Machine Learning is really the process of obtaining the best parameters w and b that fits our data well – also called as training. Once we have these parameters, we use it to predict unseen data, also called as inference. These two steps are all there is to it. We'll be doing both in this tutorial!

Now, lets build a function that will help us compute the output y, based on model wx + b

def compute_model_output(w, x, b):
    # print(f"w: {w}, x: {x}, b: {b}")
    model_output = w*x + b
    return model_output

Let's assume that our input x is 2. Our learned parameters are w=10, b=5. Thus our output should be 10\2 + 5*, which is 25. Let's verify:

compute_model_output(w=10, x=2, b=5)
25

As you can guess, as we have y = [105, 205]. when we're done, we'd like to have w=10, and b=5. Let's see how we can arrive at these values!

That's correct! This means our codified model in Python is returning the expected the values. Feel free to try here with your own examples here.

Part 3: The Cost Function

Now, as explained the blog post, we need a way to measure the performance of our learned model.

i.e. we need a way to measure if our model is producing realistic values.

def compute_cost(y_pred, y, m):
    # y_pred = shape:  (1, m), contains the predicted values by the Model
    # y = shape: (1, m), contains the ground truth (in our case, 105, 205)

    cost_per_datapoint = (y_pred - y)**2 # this variable contains the cost per datapoint
    total_cost = sum(cost_per_datapoint)/(2*m)
    return total_cost

Part 4: The Optimization Algorithm

def gradient_decent(iterations, w, x, b, y, alpha):
    """
    returns the learned parameters w, b and list of cost per iteration
    """
    m = len(x)
    all_costs = []
    for i in range(iterations):
        
        model_prediction = compute_model_output(w, x, b)
        cost = compute_cost(y_pred=model_prediction, y=y, m=m)
        dw,db = compute_gradients(model_prediction, y, x, m)
        w, b = update_parameters(w, b, dw, db, alpha)
        all_costs.append(cost)
        print_at = iterations/10
        if i%print_at == 0: # only print 10 progress costs
            print(f"Iteration # {i}, cost: {cost}")
    return w, b, all_costs
def compute_gradients(y_pred, y, x, m):
    diff_per_example = (y_pred - y)/m
    dj_dw = diff_per_example*x
    dj_db = diff_per_example
    return dj_dw, dj_db
def update_parameters(w, b, dw, db, alpha):
    w = w - alpha*sum(dw)
    b = b - alpha*sum(db)
    return w, b

Now that we have all the componenets, let's train our model using our data!

w_init = 10
b_init = 3
iterations = 1000
alpha = 0.0001
w, b, all_costs = gradient_decent(iterations, w_init, x_train, b_init, y_train, alpha)
Iteration # 0, cost: 2.0
Iteration # 100, cost: 0.2093535825341508
Iteration # 200, cost: 0.19784645051025793
Iteration # 300, cost: 0.19738372726634273
Iteration # 400, cost: 0.19699033884724906
Iteration # 500, cost: 0.19659815826785115
Iteration # 600, cost: 0.19620676109658386
Iteration # 700, cost: 0.19581614315416213
Iteration # 800, cost: 0.19542630287300639
Iteration # 900, cost: 0.19503723870480114
print(f"Learned parameters: w: {w}, b: {b}, Expected parameters: w: {10}, b: {5}")
Learned parameters: w: 10.118430918661392, b: 3.026938061811292, Expected parameters: w: 10, b: 5

And.. congratulations!, you have now built the linear regression machine learning model from scratch!!