• For any query, contact us at
  • +91-9872993883
  • +91-8283824812
  • info@ris-ai.com

Predicting The Class Of Flower On The Bases Of Data In K- Nearest Neighbour Classification

The competition goal is to predict the flower name on the bases of data provided in dataset (iris.csv) file. This file contain five column i.e. SepalLength, SepalWidth, PetalLength, PetalWidth and Name which has to be predicted by K Nearest Neighbour (K-NN) Classification.

We use K-NN classification because it is used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. To evaluate any technique we generally look at 3 important aspects:

  1. Ease to interpret output
  2. Calculation time
  3. Predictive Power

Import Library

Firstly, we import necessary library(numpy, matplotlib and pandas) for this model.

In [1]:
import pandas as pd
import numpy as np
import math
import operator

Importing data

In [2]:
data = pd.read_csv('iris.csv')

Show the data shape and first five data element of iris.csv file.

In [3]:
print(data.head(5))
data.shape
   SepalLength  SepalWidth  PetalLength  PetalWidth         Name
0          5.1         3.5          1.4         0.2  Iris-setosa
1          4.9         3.0          1.4         0.2  Iris-setosa
2          4.7         3.2          1.3         0.2  Iris-setosa
3          4.6         3.1          1.5         0.2  Iris-setosa
4          5.0         3.6          1.4         0.2  Iris-setosa
Out[3]:
(150, 5)

Defining a function which calculates euclidean distance between two data points

Calculate the distance between test data and each row of training data. Here we will use Euclidean distance as our distance metric since it’s the most popular method. The other metrics that can be used are Chebyshev, cosine, etc.

In [4]:
def euclideanDistance(data1, data2, length):
    distance = 0
    for x in range(length):
        distance += np.square(data1[x] - data2[x])
    return np.sqrt(distance)

Defining our KNN model

In this function first we Calculating euclidean distance between each row of training data and test data. Secondly we Sorting them on the basis of distance then we Extracting top k neighbors after that Calculating the most freq class in the neighbors.

In [5]:
def knn(trainingSet, testInstance, k):
 
    distances = {}
    sort = {}
 
    length = testInstance.shape[1]
    
    for x in range(len(trainingSet)):
        dist = euclideanDistance(testInstance, trainingSet.iloc[x], length)
        distances[x] = dist[0]
 
    sorted_d = sorted(distances.items(), key=operator.itemgetter(1))
 
    neighbors = []
    
    for x in range(k):
        neighbors.append(sorted_d[x][0])

    classVotes = {}
    
    for x in range(len(neighbors)):
        response = trainingSet.iloc[neighbors[x]][-1]
 
        if response in classVotes:
            classVotes[response] += 1
        else:
            classVotes[response] = 1

    sortedVotes = sorted(classVotes.items(), key=operator.itemgetter(1), reverse=True)
    return(sortedVotes[0][0], neighbors)

Inputting Data

Here we input data for predicting flower this input data is on the bases of SepalLength, SepalWidth, PetalLength and PetalWidth of flower.

In [6]:
testSet = [[7.2, 3.6, 5.1, 2.5]]
test = pd.DataFrame(testSet)

Setting number of neighbors = 1

Initialise the value of k i.e. 1 now

In [7]:
print('\n\nWith 1 Nearest Neighbour \n\n')
k = 1
# Running KNN model
result,neigh = knn(data, test, k)

With 1 Nearest Neighbour 


Predicted class

Here the prediction of flower is done by the mean of K-NN Classification.

In [8]:
print('\nPredicted Class of the datapoint = ', result)
Predicted Class of the datapoint =  Iris-virginica

Nearest neighbor

We can see the nearest neighbour of the data point as we enter k=1 thats why we get one data point here.

In [9]:
print('\nNearest Neighbour of the datapoints = ',neigh)
Nearest Neighbour of the datapoints =  [141]