Getting Started with the most popular Data Science Library: Pandas, Lecture 5

Different applications and functions of Pandas

Different applications and Functions of Pandas

Pandas is the most widely used tool for data munging. It contains high-level data structures and manipulation tools designed to make data analysis fast and easy. In this module, we are going to discuss the most frequently used pandas features. We will be using olive oil data set for this module. So, let’s get started!
We have enumerated below quite a few applications of the Pandas library of Python.

1) Loading Data

“The Olive Oils data set has eight explanatory variables (levels of fatty acids in the oils) and nine classes (areas of Italy)”. We are importing numpypandas and matplotlib modules.

1

2

3

4

%matplotlib inline

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

We are using pd.read_csv to load olive oil data set. Function head returns the first n rows of ‘olive.csv’. Here we are returning the first 5 rows.

3) Map

One thing that we want to do is to clean the area_Idli column and remove the numbers. We are using map object to perform this operation. Map property applies changes to every element of a column. We are applying split function to column area_idili.  Split function returns a list, and -1 returns the last element of the list. See how split function works:

See how split function works:

4) Apply and Apply Map

We have a list of acids called acidlist. Apply is a pretty flexible function, it applies a function along any axis of the DataFrame. We will be using apply function to divide each value of the acid by 100.

list_of_acids =[‘palmitic’, ‘palmitoleic’, ‘stearic’, ‘oleic’, ‘linoleic’, ‘linolenic’, ‘arachidic’, ‘eicosenoic’]

1

2

df = olive_oil[list_of_acids].apply (lambda x: x/100.00)

df.head (5)

Similar to applyapply map function works element-wise on a DataFrame.

Summing up, apply works on a row/column basis of a DataFrame, apply map works element-wise on a DataFrame, and map works element-wise on a Series.

5) Shape and Columns

Shape property will return a tuple of the shape of the data frame.

olive_oil.columns will give you the column values.

6) Plotting

plt.hist(olive_oil.palmitic). You can plot histogram using plt.hist function.

You can also generate subplots of pandas data frame.  Here we are generating 4 different subplots for palmitic and  linolenic columns.  You can set the  size of the figure using figsize object, nrows and ncols are nothing but  the number of columns and rows.

To apply your own or another library’s functions to Pandas objects, you should be aware of the three important methods. The methods have been discussed below. The appropriate method to use depends on whether your function expects to operate on an entire DataFrame, row- or column-wise, or element wise.

  • Table wise Function Application: pipe()
  • Row or Column Wise Function Application: apply()
  • Element wise Function Application: applymap()

Table-wise Function Application

Custom operations can be performed by passing the function and the appropriate number of parameters as pipe arguments. Thus, operation is performed on the whole DataFrame.
For example, add a value 2 to all the elements in the DataFrame.
The adder function adds two numeric values as parameters and returns the sum.
def adder(ele1,ele2):
return ele1+ele2
We will now use the custom function to conduct operation on the DataFrame.
df = pd.DataFrame(np.random.randn(5,3),columns=[‘col1′,’col2′,’col3’])
df.pipe(adder,2)
Its output is as follows −

col1       col2       col3

0   2.176704   2.219691   1.509360

1   2.222378   2.422167   3.953921

2   2.241096   1.135424   2.696432

3   2.355763   0.376672   1.182570

4   2.308743   2.714767   2.130288

 Row or Column Wise Function Application

Arbitrary functions can be applied along the axes of a DataFrame or Panel using the apply() method, which, like the descriptive statistics methods, takes an optional axis argument. By default, the operation performs column wise, taking each column as an array-like.

Example 1
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5,3),columns=[‘col1′,’col2′,’col3’])
df.apply(np.mean)
print df.apply(np.mean)

Its output is as follows −

col1   -0.288022

col2    1.044839

col3   -0.187009

dtype: float64

 

WhatsApp chat