Getting Started with the most popular Data Science Library: Pandas, Lecture 4

DataFrames

DataFrame |
Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the datarows, and columns.

We will get a brief insight on all these basic operation which can be performed on Pandas DataFrame :

• Creating a DataFrame
• Dealing with Rows and Columns
• Indexing and Selecting Data
• Working with Missing Data
• Iterating over rows and columns 

Creating a Pandas DataFrame
In the real world, a Pandas DataFrame will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, and Excel file. Pandas DataFrame can be created from the lists, dictionary, and from a list of dictionaries, etc. Dataframe can be created in different ways here are some ways by which we create a dataframe:

DataFrame can be created using a single list or a list of lists.

# import pandas as pd
import pandas as pd
# list of strings
list = [OrangeTree, ‘Global’, ‘institute’, ‘is’, ‘centre’, ‘for’, ‘Geeks’]
# Calling DataFrame constructor on list
df = pd.DataFrame(list)
print(df)

Dealing with Rows and Columns
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. We can perform basic operations on rows/columns like selecting, deleting, adding, and renaming.
Column: In Order to select a column in Pandas DataFrame, we can either access the columns by calling them by their columns name.

# Import pandas package
import pandas as pd
# Define a dictionary containing employee data
data = {‘Name’:[‘Jai’, ‘Princi’, ‘Gaurav’, ‘Anuj’],
‘Age’:[27, 24, 22, 32],
‘Address’:[‘Delhi’, ‘Kanpur’, ‘Allahabad’, ‘Kannauj’],
‘Qualification’:[‘Msc’, ‘MA’, ‘MCA’, ‘Phd’]}

# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
# select two columns
print(df[[‘Name’, ‘Qualification’]])

Run on IDE

Output:


Row Selection: Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.iloc[], method is used to retrieve rows from Pandas DataFrame. Rows can also be selected by passing integer location to an iloc[] function.
# importing pandas package
import pandas as pd
# making data frame from csv file
data = pd.read_csv(“nba.csv”, index_col =”Name”)
# retrieving row by loc method
first = data.loc[“Avery Bradley”]
second = data.loc[“R.J. Hunter”]
print(first, “\n\n\n”, second)

Output:
As shown in the output image, two series were returned since there was only one parameter both of the times. 

Indexing and Selecting Data
Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns. Indexing can also be known as Subset Selection.


Indexing a DataFrame using .loc[] :
This function selects data by the label of the rows and columns. The df.loc indexer selects data in a different way than just the indexing operator. It can select subsets of rows or columns. It can also simultaneously select subsets of rows and columns.
Selecting a single row
In order to select a single row using .loc[], we put a single row label in a .loc function.

# importing pandas package
import pandas as pd
# making data frame from csv file
data = pd.read_csv(“nba.csv”, index_col =”Name”)
# retrieving row by loc method
first = data.loc[“Avery Bradley”]
second = data.loc[“R.J. Hunter”]
print(first, “\n\n\n”, second)

Output:
As shown in the output image, two series were returned since there was only one parameter both of the times.
Indexing a DataFrame using iloc[]:
This function allows us to retrieve rows and columns by position. In order to do that, we’ll need to specify the positions of the rows that we want, and the positions of the columns that we want as well. The df.iloc indexer is very similar to df.loc but only uses integer locations to make its selections.

Selecting a single row
In order to select a single row using .iloc[], we can pass a single integer to .iloc[] function.

import pandas as pd
# making data frame from csv file
data = pd.read_csv(“nba.csv”, index_col =”Name”)
# retrieving rows by iloc method
row2 = data.iloc[3]
print(row2)

Output:

WhatsApp chat