• +91-9872993883
• +91-8283824812
• info@ris-ai.com

Python Flower Classification ¶

Using k-nn classification for predicting the class of flowers. ¶

The competition goal is to predict the flower name on the bases of data provided in dataset (iris.csv) file. This file contain five column i.e. SepalLength, SepalWidth, PetalLength, PetalWidth and Name which has to be predicted by K Nearest Neighbour (K-NN) Classification.

We use K-NN classification because it is used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. To evaluate any technique we generally look at 3 important aspects:

1. Ease to interpret output
2. Calculation time
3. Predictive Power

Import Library ¶

Firstly, we import necessary library(numpy, matplotlib and pandas) for this model.

In [1]:
import pandas as pd
import numpy as np
import math
import operator


Importing data ¶

In [2]:
data = pd.read_csv('iris.csv')


Show the data shape and first five data element of iris.csv file. ¶

In [3]:
print(data.head(5))
data.shape

   SepalLength  SepalWidth  PetalLength  PetalWidth         Name
0          5.1         3.5          1.4         0.2  Iris-setosa
1          4.9         3.0          1.4         0.2  Iris-setosa
2          4.7         3.2          1.3         0.2  Iris-setosa
3          4.6         3.1          1.5         0.2  Iris-setosa
4          5.0         3.6          1.4         0.2  Iris-setosa

Out[3]:
(150, 5)

Defining a function which calculates euclidean distance between two data points ¶

Calculate the distance between test data and each row of training data. Here we will use Euclidean distance as our distance metric since it’s the most popular method. The other metrics that can be used are Chebyshev, cosine, etc.

In [4]:
def euclideanDistance(data1, data2, length):
distance = 0
for x in range(length):
distance += np.square(data1[x] - data2[x])
return np.sqrt(distance)


Defining our KNN model ¶

In this function first we Calculating euclidean distance between each row of training data and test data. Secondly we Sorting them on the basis of distance then we Extracting top k neighbors after that Calculating the most freq class in the neighbors.

In [5]:
def knn(trainingSet, testInstance, k):

distances = {}
sort = {}

length = testInstance.shape[1]

for x in range(len(trainingSet)):
dist = euclideanDistance(testInstance, trainingSet.iloc[x], length)
distances[x] = dist[0]

sorted_d = sorted(distances.items(), key=operator.itemgetter(1))

neighbors = []

for x in range(k):
neighbors.append(sorted_d[x][0])

for x in range(len(neighbors)):
response = trainingSet.iloc[neighbors[x]][-1]

else:



Inputting Data ¶

Here we input data for predicting flower this input data is on the bases of SepalLength, SepalWidth, PetalLength and PetalWidth of flower.

In [6]:
testSet = [[7.2, 3.6, 5.1, 2.5]]
test = pd.DataFrame(testSet)


Setting number of neighbors = 1¶

Initialise the value of k i.e. 1 now

In [7]:
print('\n\nWith 1 Nearest Neighbour \n\n')
k = 1
# Running KNN model
result,neigh = knn(data, test, k)


With 1 Nearest Neighbour



Predicted class¶

Here the prediction of flower is done by the mean of K-NN Classification.

In [8]:
print('\nPredicted Class of the datapoint = ', result)

Predicted Class of the datapoint =  Iris-virginica


Nearest neighbor¶

We can see the nearest neighbour of the data point as we enter k=1 thats why we get one data point here.

In [9]:
print('\nNearest Neighbour of the datapoints = ',neigh)

Nearest Neighbour of the datapoints =  [141]