Python - Pandas

1 minute read

Published:

This post covers Introduction to Pandas.

Hello Pandas!

DataFrame

import pandas as pd

sales = {'Product A': [80, 100, 120], 'Product B': [45, 50, 55]}

df = pd.DataFrame(sales)
display(df)

sales = {'Product A': [80, 100, 120], 'Product B': [45, 50, 55]}

df = pd.DataFrame(sales, index=[2018, 2019, 2020])
display(df)

Series

sales_A = pd.Series([80, 100, 120])
display(sales_A)

print()
sales_A = pd.Series([80, 100, 120], index=[2018, 2019, 2020])
display(sales_A)

print()
sales_A = pd.Series([80, 100, 120], index=[2018, 2019, 2020], name='Product_A')
display(sales_A)

Reading Data Files

# https://www.kaggle.com/c/titanic/data
csv_titanic = './data/titanic/train.csv'

df_titanic = pd.read_csv(csv_titanic)

print(df_titanic.shape)

display(df_titanic.head())
display(df_titanic.head(2))

display(df_titanic.tail())
display(df_titanic.tail(2))

Indexing, Selecting & Assigning

Native accessors

csv_titanic = './data/titanic/train.csv'

df_titanic = pd.read_csv(csv_titanic)

display(df_titanic.head(1))

print(df_titanic['Survived'])
print(df_titanic.Survived)

print(df_titanic.Name[0])

Indexing in pandas

# row-first, column-second

display(df_titanic.iloc[0])

display(df_titanic.iloc[0:3])

display(df_titanic.iloc[0:3, 0:4])

display(df_titanic.iloc[:5, 3])

print()
display(df_titanic.iloc[[0, 5, -2], 3])

Label-based Selection

display(df_titanic.loc[0])

display(df_titanic.loc[0:3])

display(df_titanic.loc[:, ['Name', 'Survived', 'Age']])

display(df_titanic.loc[:3, 'Name'])

display(df_titanic.loc[[1, 10, 100], ['Name', 'Survived']])

Manipulating the index

df_titanic = pd.read_csv(csv_titanic)
df_titanic.set_index('PassengerId', inplace=True)
display(df_titanic.head(2))

Conditional Selection

display(df_titanic.Survived==1)

display(df_titanic.loc[df_titanic.Survived==1])

query = (df_titanic.Survived==1) & (df_titanic.Age < 20)
display(df_titanic.loc[query])

query = (df_titanic.Survived==1) & ( (df_titanic.Age < 20) | (df_titanic.Pclass==1))
display(df_titanic.loc[query])

query = df_titanic.Cabin.isin(['C123', 'C85'])
display(df_titanic.loc[query])

query = df_titanic.Age.notnull()
display(df_titanic.loc[query])

Assigning data

df_titanic['NewClass'] = 'everyone'
display(df_titanic.head(3))

df_titanic['PassengerIdBackwards'] = range(len(df_titanic), 0, -1)

display(df_titanic.head(3))