Pandas Tutorial

What is Pandas?

Pandas is an open source library which helps you analyse and manipulate data.

Why pandas?

Pandas provides a simple to use but very capable set of functions you can use to on your data.

Pandas is the most popular python library that is used for data analysis. It provides highly optimized performance with back-end source code is purely written in C or Python.

It's integrated with many other data science and machine learning tools which use Python so having an understanding of it will be helpful throughout your journey.

One of the main use cases you'll come across is using pandas to transform your data in a way which makes it usable with machine learning algorithms.

Importing pandas

To get started using pandas, the first step is to import it.

The most common way (and method you should use) is to import pandas as the abbreviation pd (alias name for Pandas Package).

If you see the letters pd used everywhere in pandas, it's probably referring to the pandas library.

import pandas as pd

Data Structures

Pandas has two data structures, Series, DataFrame.

1. Series - 1-Dimensional column of data.

2. DataFrame - 2-Dimesional table of data with rows and columns.

Series

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index

A pandas Series can be created using the following constructor -

Syntax

pd.Series(data, index, dtype, copy)

Here, data can be many different things:

1. a Python dict

2. an ndarray

3. a scalar value (like 5)

Example 1

# Creating a series of student name
StudentName = pd.Series(["Michael", "John", "Sachin"])
print(StudentName)

Output:

0    Michael
1       John
2     Sachin
dtype: object

Example 2

# Creating a series of age
StudentAge = pd.Series([30, 28, 35])
print(StudentAge)

Output:

0    30
1    28
2    35
dtype: int64

DataFrame

A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table

A pandas DataFrame can be created using the following constructor

Syntax

pd.DataFrame( data, index, columns, dtype, copy)

DataFrame accepts many different kinds of input:

1. Dict of 1D ndarrays, lists, dicts, or Series

2. 2-D numpy.ndarray

3. Structured or record ndarray

4. A Series

5. Another DataFrame

Let's use our two Series as the values.

Example 1

# Creating a DataFrame of student and age
student_detail = pd.DataFrame({"StudentName": StudentName,
                               "StudentAge": StudentAge})
print(student_detail)

Output:

  StudentName  StudentAge
0     Michael          30
1        John          28
2      Sachin          35

You can see the keys of the dictionary became the column headings (text in bold) and the values of the two Series's became the values in the DataFrame.

It's important to note, many different types of data could go into the DataFrame.

If you have any doubts or queries related to this chapter, get them clarified from our Python Team experts on ibmmainframer Community!

⇑ Back to top