Regression

Concept of Regression in Machine Learning

Basic concepts and mathematics

  • The input or predictor variable is the variable(s) that help predict the value of the output variable. It is commonly referred to as X.
  • The output variable is the variable that we want to predict. It is commonly referred to as Y.
Actual LR Equation
Fit-model
Estimated Parameter
Estimated parameter
  1. It is used to predict the future values of continuous data.
  2. Data Should be linear with a single independent and dependent variable.
Fundamental process of Machine Learning model
###Import necessary libraries
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
### Generate 'random' data
np.random.seed(0)
### Array of 100 values with mean = 1.5, stddev = 2.5
x = 2.5 * np.random.randn(100) + 1.5
# Generate 100 residual terms
res = 0.5 * np.random.randn(100)
# Actual values of Y
Y = 2 + 0.3 * X + res
# Create pandas dataframe to store our X and y values
df = pd.DataFrame( {'x': x, 'Y': Y})
# Show the first five rows of our dataframe
df.head()
Random Data generated from the above code
# Calculate the mean of X and y
xmean = np.mean(x)
ymean = np.mean(Y)
# Calculate the terms needed for the numerator and denominator of beta
df['xycov'] = (df['x'] - xmean) * (df['Y'] - ymean)
df['xvar'] = (df['x'] - xmean)**2
# Calculate beta and alpha
b1 = df['xycov'].sum() / df['xvar'].sum()
b0 = ymean - (beta * xmean)
print(f'b0 = {b0}')
print(f'b1 = {b1}')
Estimated parameters
ypred = b0 + b1 x
Output
# Plot regression against actual data
plt.figure(figsize=(12, 6))
plt.plot(x, ypred) # regression line
plt.plot(x, Y, 'ro') # scatter plot showing actual data
plt.title('Actual vs Predicted')
plt.xlabel('x')
plt.ylabel('Y')
plt.show()
Actual vs predicted plot
  1. Multi Linear Regression is also used to predict the future values of continuous data.
  2. Data Should be linear with multiple independent variables and a single dependent variable.
  1. Polynomial Regression is used to predict the future values of continuous non-linear data.
  2. Data Should be non-linear, with multiple independent variables and a single dependent variable.
  1. Multi-variant Regression is used to predict multiple predictors with help of multiple response data.

“Don’t dream to join a tech giant company, dream to make a company gigantic.” — Sai

--

--

--

A Machine Learning technology researcher commencing a new Blog Series to make clear ML concepts Simple and ease for everyone.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Dataism-At a glance

Fastest way to become a self-trained Data Scientist

Sports Analytics — Analyzing Cricket Dataset using MySQL

Have you heard of Zhua Zhou or “pick anniversary”?

Photo by Cleyder Duque: https://www.pexels.com/photo/baby-in-blue-denim-jacket-crawling-on-floor-near-the-balloons-3653948/

Loan Data Prediction

Time Series Analysis and Forecasting with Python

Learn Data Mining by Applying it on Excel(Part 2)

Insights from Seattle Airbnb data

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
sai krishna

sai krishna

A Machine Learning technology researcher commencing a new Blog Series to make clear ML concepts Simple and ease for everyone.

More from Medium

Machine Learning model for investing in companies of same Industry

Machine Learning Prediction in a Court of Law: SCOTUS Dataset

Decision Tree’s

A simple guide explaining how to use scikit learn for linear regression