Python 3x Pandas Django

Viewing DataFrame


let's create a DataFrame by passing a dict of objects

import pandas as pd
car_details = pd.DataFrame({ "Make"  : pd.Series(["Toyota", "Toyota", "Nissan","Honda", "Toyota"]),
                             "Colour": pd.Series(["White", "Blue", "White","Blue", "White"]),
                             "Odometer (KM)": pd.Series([150043, 32549, 213095, 45698, 60000]),
                             "Doors" : pd.Series([4, 3, 4, 4, 4]),
                             "Price" : pd.Series(["$4,000.00", "$7,000.00", "$3,500.00","$7,500.00", "$6,250.00"]) })
print(car_details)

Output:

   Make    Colour  Odometer (KM)  Doors     Price
0  Toyota  White         150043      4  $4,000.00
1  Toyota   Blue          32549      3  $7,000.00
2  Nissan  White         213095      4  $3,500.00
3   Honda   Blue          45698      4  $7,500.00
4  Toyota  White          60000      4  $6,250.00

Anatomy of a DataFrame

Pandas DataFrame Anatomy

Pandas DataFrame consists of three major components, the data, rows, and columns.

Data is aligned in a tabular fashion in rows and columns.

Row in the dataframe denotes axis = 0 and column in the dataframe denotes axis = 1.

.dtypes

.dtypes shows us what datatype each column contains.

print(car_details.dtypes)

Output:

Make             object
Colour           object
Odometer (KM)     int64
Doors             int64
Price            object
dtype: object

.describe()

.describe() gives you a quick statistical overview of the numerical columns.

print(car_details.describe())

Output:

       Odometer (KM)     Doors
count       5.000000  5.000000
mean   100277.000000  3.800000
std     78090.879483  0.447214
min     32549.000000  3.000000
25%     45698.000000  4.000000
50%     60000.000000  4.000000
75%    150043.000000  4.000000
max    213095.000000  4.000000

.info()

.info() shows a handful of useful information about a DataFrame such as:

1. How many entries (rows) there are

2. Whether there are missing values (if a columns non-null value is less than the number of entries, it has missing values)

3. The datatypes of each column

car_details.info()

Output:


RangeIndex: 5 entries, 0 to 4
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype
---  ------         --------------  -----
 0   Make           5 non-null      object
 1   Colour         5 non-null      object
 2   Odometer (KM)  5 non-null      int64
 3   Doors          5 non-null      int64
 4   Price          5 non-null      object
dtypes: int64(2), object(3)
memory usage: 328.0+ bytes

.columns

.columns will show you all the columns of a DataFrame.

print(car_details.columns)

Output:

Index(['Make', 'Colour', 'Odometer (KM)', 'Doors', 'Price'], dtype='object')

.index

.index will display the index of the dataframe.

print(car_details.index)

Output:

RangeIndex(start=0, stop=5, step=1)

.head() & .tail(3)

Here is how to view the top and bottom rows of the frame:

print(car_details.head())
print(car_details.tail(3))

Output:

   Make   Colour  Odometer (KM)  Doors      Price
0  Toyota  White         150043      4  $4,000.00
1  Toyota   Blue          32549      3  $7,000.00
2  Nissan  White         213095      4  $3,500.00
3   Honda   Blue          45698      4  $7,500.00
4  Toyota  White          60000      4  $6,250.00

   Make   Colour  Odometer (KM)  Doors      Price
2  Nissan  White         213095      4  $3,500.00
3   Honda   Blue          45698      4  $7,500.00
4  Toyota  White          60000      4  $6,250.00

Transposing data:

.T used to transpose dataframe data from row to column or column to row.

print(car_details.T)

Output:

                    0          1          2          3          4
Make              Toyota     Toyota     Nissan      Honda     Toyota
Colour             White       Blue      White       Blue      White
Odometer (KM)     150043      32549     213095      45698      60000
Doors                  4          3          4          4          4
Price          $4,000.00  $7,000.00  $3,500.00  $7,500.00  $6,250.00

Sorting by an axis:

car_details = car_details.sort_index(axis=1, ascending=False)
print(car_details)

Output:

   Price       Odometer (KM)   Make  Doors Colour
0  $4,000.00         150043  Toyota      4  White
1  $7,000.00          32549  Toyota      3   Blue
2  $3,500.00         213095  Nissan      4  White
3  $7,500.00          45698   Honda      4   Blue
4  $6,250.00          60000  Toyota      4  White

Sorting by value:

car_details = car_details.sort_values(by="Price", ascending=False)
print(car_details)

Output:

    Make   Colour  Odometer (KM)  Doors     Price
3   Honda   Blue          45698      4  $7,500.00
1  Toyota   Blue          32549      3  $7,000.00
4  Toyota  White          60000      4  $6,250.00
0  Toyota  White         150043      4  $4,000.00
2  Nissan  White         213095      4  $3,500.00

Different ways to iterate over rows in Pandas Dataframe

Method 1:

Using index attribute of the Dataframe

for index in car_details.index:
     print(car_details['Make'][index], car_details['Colour'][index],
           car_details['Odometer (KM)'][index], car_details['Doors'][index],
           car_details['Price'][index])

Output:

Toyota White 150043 4 $4,000.00
Toyota Blue 32549 3 $7,000.00
Nissan White 213095 4 $3,500.00
Honda Blue 45698 4 $7,500.00
Toyota White 60000 4 $6,250.00

Method 2:

Using iterrows() method of the Dataframe

for index, row in car_details.iterrows():
    print(car_details.loc[index,"Make"], car_details.loc[index,"Colour"],
          car_details.loc[index,"Odometer (KM)"], car_details.loc[index,"Doors"],
          car_details.loc[index,"Price"])

Output:

Toyota White 150043 4 $4,000.00
Toyota Blue 32549 3 $7,000.00
Nissan White 213095 4 $3,500.00
Honda Blue 45698 4 $7,500.00
Toyota White 60000 4 $6,250.00

If you have any doubts or queries related to this chapter, get them clarified from our Python Team experts on ibmmainframer Community!