• For any query, contact us at
  • +91-9872993883
  • +91-8283824812
  • info@ris-ai.com

Hello guys!,here we will work with some basic operations of pandas,we will learn the pandas operations in modules.so this is the first module in which we will go through how dataframe is created,how it is read,how we can apply various operation on rows and column .so rest we will disscuss in another module.

Pandas DataFrame Operation 1

Pandas DataFrame is two-dimensional size-mutable, heterogeneous tabular data structure with labeled axes . A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.

In [3]:
import pandas as pd
 
# intialise data of lists.
data = {'Name':['ram', 'sham', 'alpha', 'gamma'],
        'Age':[20, 21, 19, 18],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df)
    Name  Age    Address Qualification
0    ram   20      Delhi           Msc
1   sham   21     Kanpur            MA
2  alpha   19  Allahabad           MCA
3  gamma   18    Kannauj           Phd
In [6]:
#dataframe is created
df.to_csv("name.csv")

Column Selection

Column Selection: In Order to select a column in Pandas DataFrame, we can either access the columns by calling them by their columns name.

In [4]:
df[['Name', 'Qualification']]
Out[4]:
Name Qualification
0 ram Msc
1 sham MA
2 alpha MCA
3 gamma Phd

Row Selection:

Row Selection: Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.loc[] method is used to retrieve rows from Pandas DataFrame. Rows can also be selected by passing integer location to an iloc[] function.

In [8]:
df = pd.read_csv("name.csv", index_col ="Name")
first = df.loc["ram"]
second = df.loc["gamma"]
print(first, "\n\n\n", second)
Unnamed: 0           0
Age                 20
Address          Delhi
Qualification      Msc
Name: ram, dtype: object


 Unnamed: 0             3
Age                   18
Address          Kannauj
Qualification        Phd
Name: gamma, dtype: object

Indexing a DataFrame using .iloc[ ]

This function allows us to retrieve rows and columns by position. In order to do that, we’ll need to specify the positions of the rows that we want, and the positions of the columns that we want as well. The df.iloc indexer is very similar to df.loc but only uses integer locations to make its selections.

In [10]:
row2 = df.iloc[2]
row2
Out[10]:
Unnamed: 0               2
Age                     19
Address          Allahabad
Qualification          MCA
Name: alpha, dtype: object

Working with Missing Data

Missing Data can occur when no information is provided for one or more items or for a whole unit. Missing Data is a very big problem in real life scenario. Missing Data can also refer to as NA(Not Available) values in pandas.

Checking for missing values using isnull() and notnull()

In [24]:
import numpy as np

dict = {'First ':[100, 90, np.nan, 95,89,0,100,np.nan],
        'Second ': [30, 45, 56, np.nan,1,40,np.nan,70],
        'Third ':[np.nan, 40, 80, 98,np.nan,np.nan,13,55]}

# creating a dataframe from list
df = pd.DataFrame(dict)
df
Out[24]:
First Second Third
0 100.0 30.0 NaN
1 90.0 45.0 40.0
2 NaN 56.0 80.0
3 95.0 NaN 98.0
4 89.0 1.0 NaN
5 0.0 40.0 NaN
6 100.0 NaN 13.0
7 NaN 70.0 55.0
In [25]:
df.isnull()
Out[25]:
First Second Third
0 False False True
1 False False False
2 True False False
3 False True False
4 False False True
5 False False True
6 False True False
7 True False False

Filling missing values using fillna(), replace() and interpolate()

In order to fill null values in a datasets, we use fillna(), replace() and interpolate() function these function replace NaN values with some value of their own. All these function help in filling a null values in datasets of a DataFrame.

Interpolate() function is basically used to fill NA values in the dataframe but it uses various interpolation technique to fill the missing values rather than hard-coding the value.

In [26]:
df.fillna(0)
Out[26]:
First Second Third
0 100.0 30.0 0.0
1 90.0 45.0 40.0
2 0.0 56.0 80.0
3 95.0 0.0 98.0
4 89.0 1.0 0.0
5 0.0 40.0 0.0
6 100.0 0.0 13.0
7 0.0 70.0 55.0

Dropping missing values using dropna() :

In order to drop a null values from a dataframe, we used dropna() function this fuction drop Rows/Columns of datasets with Null values in different ways.

In [28]:
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, 40, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}
df = pd.DataFrame(dict)
print(df)
df.dropna()
   First Score  Second Score  Third Score  Fourth Score
0        100.0          30.0           52           NaN
1         90.0           NaN           40           NaN
2          NaN          45.0           80           NaN
3         95.0          56.0           98          65.0
Out[28]:
First Score Second Score Third Score Fourth Score
3 95.0 56.0 98 65.0
In [ ]: