For any query, contact us at
+91-9872993883
+91-8283824812
info@ris-ai.com

☰

AI Demos Blog Thesis Services Pricing Contact Us Know More

Most Viewed Articles

Blogs >
Predict Gas Consumptions

Predict Gas Consumptions Using Decision Tree for Regression ¶

Project Objective: Is to predict the gas consumption on the bases of data provided of US states which can help in many decision making for climate change, people, goverment policies and many more thing.The process of solving regression problem with decision tree using Scikit Learn is very similar to that of classification. However for regression we use DecisionTreeRegressor class of the tree library. Also the evaluation matrics for regression differ from those of classification. The rest of the process is almost same like other regression models.

Importing the libraries ¶

Firstly, we import necessary library(numpy, matplotlib and pandas) for this model.

In [1]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Importing the dataset ¶

Now we read CSV file name petrol_consumption.csv. We will use this dataset to try and predict gas consumptions (in millions of gallons) in 48 US states based upon gas tax (in cents), per capita income (dollars), paved highways (in miles) and the proportion of population with a drivers license.

In [2]:

dataset = pd.read_csv('petrol_consumption.csv')

It contain 48 Column and 5 Rows containg imformation about US satates information related to petrol consumption prediction. We will again use the head function of the dataframe to see what our data actually looks like

In [3]:

dataset.head()
dataset.shape

Out[3]:

(48, 5)

To see statistical details of the dataset, execute the following command:

In [4]:

dataset.describe()

Out[4]:

	Petrol_tax	Average_income	Paved_Highways	Population_Driver_licence(%)	Petrol_Consumption
count	48.000000	48.000000	48.000000	48.000000	48.000000
mean	7.668333	4241.833333	5565.416667	0.570333	576.770833
std	0.950770	573.623768	3491.507166	0.055470	111.885816
min	5.000000	3063.000000	431.000000	0.451000	344.000000
25%	7.000000	3739.000000	3110.250000	0.529750	509.500000
50%	7.500000	4298.000000	4735.500000	0.564500	568.500000
75%	8.125000	4578.750000	7156.000000	0.595250	632.750000
max	10.000000	5342.000000	17782.000000	0.724000	968.000000

Preparing the Data ¶

As with the classification task, in this section we will divide our data into attributes and labels and consequently into training and test sets. Execute the following commands to divide data into labels and attributes:

In [5]:

X = dataset.drop('Petrol_Consumption', axis=1)
y = dataset['Petrol_Consumption']

Here the X variable contains all the columns from the dataset, except 'Petrol_Consumption' column, which is the label. The y variable contains values from the 'Petrol_Consumption' column, which means that the X variable contains the attribute set and y variable contains the corresponding labels.

Execute the following code to divide our data into training and test sets: ¶

In [6]:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

Training and Making Predictions¶

As mentioned earlier, for a regression task we'll use a different sklearn class than we did for the classification task. The class we'll be using here is the DecisionTreeRegressor class, as opposed to the DecisionTreeClassifier from before.

To train the tree, we'll instantiate the DecisionTreeRegressor class and call the fit method ¶

In [7]:

from sklearn.tree import DecisionTreeRegressor
regressor = DecisionTreeRegressor()
regressor.fit(X_train, y_train)

Out[7]:

DecisionTreeRegressor(ccp_alpha=0.0, criterion='mse', max_depth=None,
                      max_features=None, max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, presort='deprecated',
                      random_state=None, splitter='best')

To make predictions on the test set, ues the predict method: ¶

In [8]:

y_pred = regressor.predict(X_test)

Now let's compare some of our predicted values with the actual values and see how accurate we were: ¶

In [9]:

df=pd.DataFrame({'Actual':y_test, 'Predicted':y_pred})
df

Out[9]:

	Actual	Predicted
29	534	547.0
4	410	414.0
26	577	574.0
30	571	554.0
32	577	631.0
37	704	644.0
34	487	628.0
40	587	540.0
7	467	414.0
10	580	464.0

Remember that in your case the records compared may be different, depending upon the training and testing split. Since the train_test_split method randomly splits the data we likely won't have the same training and test sets.

Evaluating the Algorithm ¶

To evaluate performance of the regression algorithm, the commonly used metrics are mean absolute error, mean squared error, and root mean squared error. The Scikit-Learn library contains functions that can help calculate these values for us. To do so, use this code from the metrics package:

In [10]:

from sklearn import metrics
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

Mean Absolute Error: 50.8
Mean Squared Error: 4535.4
Root Mean Squared Error: 67.34537846058926

The mean absolute error for our algorithm is 54.7, which is less than 10 percent of the mean of all the values in the 'Petrol_Consumption' column. This means that our algorithm did a fine prediction job.

Conclusion: We showed how we can use Python's Scikit-Learn library to use decision trees for regression tasks. While being a fairly simple algorithm in itself, implementing decision trees with regression is even easier. Eventhough it can be perform by both classification and regression. ¶

Most Viewed Articles

Predict Gas Consumptions Using Decision Tree for Regression ¶

Importing the libraries ¶

Importing the dataset ¶

Preparing the Data ¶

Execute the following code to divide our data into training and test sets: ¶

Training and Making Predictions¶

To train the tree, we'll instantiate the DecisionTreeRegressor class and call the fit method ¶

To make predictions on the test set, ues the predict method: ¶

Now let's compare some of our predicted values with the actual values and see how accurate we were: ¶

Evaluating the Algorithm ¶

Conclusion: We showed how we can use Python's Scikit-Learn library to use decision trees for regression tasks. While being a fairly simple algorithm in itself, implementing decision trees with regression is even easier. Eventhough it can be perform by both classification and regression. ¶

Search Article

Popular ML Articles

Resources You Will Ever Need

Popular Searches

Go for Research

Consultation fee- 150 USD/hour

Select Thesis

Synopsis

Research Paper

Total cost (in USD): $0

PHD

Contact for custom package.

Most Viewed Articles

Predict Gas Consumptions Using Decision Tree for Regression ¶

Importing the libraries ¶

Importing the dataset ¶

Preparing the Data ¶

Execute the following code to divide our data into training and test sets: ¶

Training and Making Predictions¶

To train the tree, we'll instantiate the DecisionTreeRegressor class and call the fit method ¶

To make predictions on the test set, ues the predict method: ¶

Now let's compare some of our predicted values with the actual values and see how accurate we were: ¶

Evaluating the Algorithm ¶

Conclusion: We showed how we can use Python's Scikit-Learn library to use decision trees for regression tasks. While being a fairly simple algorithm in itself, implementing decision trees with regression is even easier. Eventhough it can be perform by both classification and regression. ¶

Don't forget to share this Article!

Sharing is Caring

Search Article

Popular ML Articles

Resources You Will Ever Need

Popular Searches

Go for Research

Consultation fee- 150 USD/hour

Select Thesis

Synopsis

Research Paper

Total cost (in USD): $0

PHD

Contact for custom package.