Skip to main content

ML | STUDENTS GRADE PREDICTION

Predict your percentage using machine learning

Machine learning, the subfield of artificial intelligence is growing so rapidly that it will soon occupy its lead in every aspects of life. So far, you have found its applications in image recognition, stock market trading, traffic prediction, product recommendation, online fraud detection, etc. You can count on a day when machine learning will be used to solve every single problems of your life.

In this particular machine learning project, we are going to predict the percentage of students based on number of study hours. We have used a linear regression technique to train our percentage prediction model.

First, we need to import the necessary libraries. After that have imported our dataset. As you can see, our dataset contains only two columns Hours and Scores. Now we have to check if there are any missing values in our data. Next, we need to find the correlation between our variables. Further performing the correlation analysis shows that there is 97% positive relationship between the two variables, which means there is 97% chance that any change in study hours will lead to a change in grade.

Correlation and Causation

Although correlation helps us determine the degree of relationship between two or more variables, it does not tell about the cause and effect relationship. Correlation does not imply causation though the existence of causation always implies correlation. Let’s understand this better with examples.

More firemen’s presence during a fire instance signifies that the fire is big but the fire is not     caused by firemen.  

When one sleeps with shoes on, he is likely to get a headache. This may be due to alcohol intoxication.

 

Plot using python programming

A simple scatter plot with hours studied on the x-axis and the test grades on the y-axis shows that the score gradually increases with the increase in hours studied. This implies that there is a linear relationship between the two variables. When we fit a slope line through all the point, we get the error.

The error could be positive or negative based on its location from the slope.

The slope equation is given by Y = mX + c, where Y is the predicted value for a given x value.

m is the change in y, divided by change in x, that is, m is the slope of the line for the x variable and it indicates the steepness at which it increases with every unit increase in x variable value.

c is the intercept that indicates the location or point on the axis where it intersects, Intercept is a constant that represents the variability in Y that is not explained by the X. It is the value of Y when X is zero.

Now we will train our model with the help of scikit-learn library and find the value of intercept (‘C’) and slope (m). Together the slope and intercept define the linear relationship between the two variables and can be used to predict or estimate an average rate of change. Now using this relation, for a new student we can determine the score based on his study hours. Say a student is planning to study an overall of 9.25 hours in preparation for the test. Let’s put the appropriate values in the slope equation (m * X + c = Y), 9.77580339* 9.25 + 2.48367341= 92.91 that means a student studying 9.25 hours has the probability of scoring 92.91 test grade.

Simply drawing a connecting line from the x-axis and y-axis to the slope shows that there is a possibility of him scoring 92.91. We can use the slope equation to predict the score for any given number of hours of study

Performance of model

R-Squared for Goodness of Fit

The R-squared metric is the most popular practice of evaluating how well your model fits the data. R-squared value designates the total proportion of variance in the dependent variable explained by the independent variable. It is a value between 0 and 1; the value toward 1 indicates a better model fit.

Root Mean Squared Error (RMSE)

This is the square root of the mean of the squared errors. RMSE indicates how close the predicted values are to the actual values; hence a lower RMSE value signifies that the model performance is good. One of the key properties of RMSE is that the unit will be the same as the target variable.

Python code

#Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#now we will import our dataset
dataset=pd.read_csv('students_score.csv')
dataset.head(5) #first 5 rows of dataset

#we need to divide our data into input and output variable
hours=dataset.iloc[:,[0]] #input data(study hours)
score=dataset.iloc[:, [1]] #output data(student score)

#training the model
from sklearn.linear_model import LinearRegression
model=LinearRegression()
model.fit(hours,score)

#finding intercept and slope
print('Intercept C: ', model.intercept_)
print('Coefficient m: ', model.coef_)

#we have trained our model
#now we are going to predict the score
predicted_score=model.predict(hours)

#checking performance of model
from sklearn.metrics import r2_score, mean_squared_error, accuracy_score
print('R Squared error: ', r2_score(predicted_score, score))
print('Root mean squared error: ', np.sqrt(mean_squared_error(predicted_score, score)))

#visualizing our linear regression model
plt.scatter(hours,score, c='blue')
plt.plot(hours, predicted_score, c='black', linewidth=3)
plt.xlabel('Hours of study')
plt.ylabel('Student score')
plt.show()

#our model is performing well
#now we will predict the score, if student studies 9.25 hours/day

test_hour=[[9.25]]
test_score=model.predict(test_hour)
print(test_score)



Comments

Popular posts from this blog

Salary Prediction Web App using Streamlit

Salary Prediction Web App In this article, we are going to discuss how to predict the salary based on various attributes related to salary  using Random Forest Regression. This study focuses on a system that predicts the salary of a candidate based on candidate’s qualifications, historical data, and work experience. This app uses a machine learning algorithm to give the result. The algorithm used is Random Forest Regression. In this problem, the target variable (or output), y, takes value of salary for a given set of input features (or inputs), X. The dataset contains gender, secondary school percentage, higher secondary school percentage, higher secondary school stream, degree percentage, degree type, work experience and specialization of candidate. Below is the step-by-step Approach: Step 1: Import the necessary modules and read the dataset we are going to use for this analysis. Below is a screenshot of the dataset we used in our analysis. Step 2: Now before moving ...

STREAMLIT MULTIPAGE WEB APPLICATION | AREA CALCULATOR

Multipage Web App So far, we have worked with python streamlit library and we have built machine learning web applications using streamlit. In this blog we will see how to build a multi-page web app using streamlit. Streamlit multipage web app We can create multiple apps and navigate across each of them in a main app using a radio button. First, we have created separate apps for each shape to calculate the area of that particular shape example app1.py, app2.py, app3.py etc. Then we have created a main app and added a navigator using radio buttons. Now we just have to run the main app and navigate through the desired web page. Area Calculator This particular multipage web app we named it as area calculator. We have included introduction page and ten shapes of which we can calculate the area by putting required inputs. We have downloaded the multiapp.py framework from GitHub, as we have a greater number of web pages. Each shape in the navigation bar indicates new web p...