Viewing DataFrame
let's create a DataFrame by passing a dict of objects
import pandas as pd
car_details = pd.DataFrame({ "Make" : pd.Series(["Toyota", "Toyota", "Nissan","Honda", "Toyota"]),
"Colour": pd.Series(["White", "Blue", "White","Blue", "White"]),
"Odometer (KM)": pd.Series([150043, 32549, 213095, 45698, 60000]),
"Doors" : pd.Series([4, 3, 4, 4, 4]),
"Price" : pd.Series(["$4,000.00", "$7,000.00", "$3,500.00","$7,500.00", "$6,250.00"]) })
print(car_details)
|
Output:
Make Colour Odometer (KM) Doors Price
0 Toyota White 150043 4 $4,000.00
1 Toyota Blue 32549 3 $7,000.00
2 Nissan White 213095 4 $3,500.00
3 Honda Blue 45698 4 $7,500.00
4 Toyota White 60000 4 $6,250.00
|
Anatomy of a DataFrame
Pandas DataFrame consists of three major components, the data, rows, and columns.
Data is aligned in a tabular fashion in rows and columns.
Row in the dataframe denotes axis = 0 and column in the dataframe denotes axis = 1.
.dtypes
.dtypes shows us what datatype each column contains.
print(car_details.dtypes)
|
Output:
Make object
Colour object
Odometer (KM) int64
Doors int64
Price object
dtype: object
|
.describe()
.describe() gives you a quick statistical overview of the numerical columns.
print(car_details.describe())
|
Output:
Odometer (KM) Doors
count 5.000000 5.000000
mean 100277.000000 3.800000
std 78090.879483 0.447214
min 32549.000000 3.000000
25% 45698.000000 4.000000
50% 60000.000000 4.000000
75% 150043.000000 4.000000
max 213095.000000 4.000000
|
.info()
.info() shows a handful of useful information about a DataFrame such as:
1. How many entries (rows) there are
2. Whether there are missing values (if a columns non-null value is less than the number of entries, it has missing values)
3. The datatypes of each column
Output:
RangeIndex: 5 entries, 0 to 4
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Make 5 non-null object
1 Colour 5 non-null object
2 Odometer (KM) 5 non-null int64
3 Doors 5 non-null int64
4 Price 5 non-null object
dtypes: int64(2), object(3)
memory usage: 328.0+ bytes
|
.columns
.columns will show you all the columns of a DataFrame.
print(car_details.columns)
|
Output:
Index(['Make', 'Colour', 'Odometer (KM)', 'Doors', 'Price'], dtype='object')
|
.index
.index will display the index of the dataframe.
Output:
RangeIndex(start=0, stop=5, step=1)
|
.head() & .tail(3)
Here is how to view the top and bottom rows of the frame:
print(car_details.head())
print(car_details.tail(3))
|
Output:
Make Colour Odometer (KM) Doors Price
0 Toyota White 150043 4 $4,000.00
1 Toyota Blue 32549 3 $7,000.00
2 Nissan White 213095 4 $3,500.00
3 Honda Blue 45698 4 $7,500.00
4 Toyota White 60000 4 $6,250.00
Make Colour Odometer (KM) Doors Price
2 Nissan White 213095 4 $3,500.00
3 Honda Blue 45698 4 $7,500.00
4 Toyota White 60000 4 $6,250.00
|
Transposing data:
.T used to transpose dataframe data from row to column or column to row.
Output:
0 1 2 3 4
Make Toyota Toyota Nissan Honda Toyota
Colour White Blue White Blue White
Odometer (KM) 150043 32549 213095 45698 60000
Doors 4 3 4 4 4
Price $4,000.00 $7,000.00 $3,500.00 $7,500.00 $6,250.00
|
Sorting by an axis:
car_details = car_details.sort_index(axis=1, ascending=False)
print(car_details)
|
Output:
Price Odometer (KM) Make Doors Colour
0 $4,000.00 150043 Toyota 4 White
1 $7,000.00 32549 Toyota 3 Blue
2 $3,500.00 213095 Nissan 4 White
3 $7,500.00 45698 Honda 4 Blue
4 $6,250.00 60000 Toyota 4 White
|
Sorting by value:
car_details = car_details.sort_values(by="Price", ascending=False)
print(car_details)
|
Output:
Make Colour Odometer (KM) Doors Price
3 Honda Blue 45698 4 $7,500.00
1 Toyota Blue 32549 3 $7,000.00
4 Toyota White 60000 4 $6,250.00
0 Toyota White 150043 4 $4,000.00
2 Nissan White 213095 4 $3,500.00
|
Using index attribute of the Dataframe
for index in car_details.index:
print(car_details['Make'][index], car_details['Colour'][index],
car_details['Odometer (KM)'][index], car_details['Doors'][index],
car_details['Price'][index])
|
Output:
Toyota White 150043 4 $4,000.00
Toyota Blue 32549 3 $7,000.00
Nissan White 213095 4 $3,500.00
Honda Blue 45698 4 $7,500.00
Toyota White 60000 4 $6,250.00
|
Method 2:
Using iterrows() method of the Dataframe
for index, row in car_details.iterrows():
print(car_details.loc[index,"Make"], car_details.loc[index,"Colour"],
car_details.loc[index,"Odometer (KM)"], car_details.loc[index,"Doors"],
car_details.loc[index,"Price"])
|
Output:
Toyota White 150043 4 $4,000.00
Toyota Blue 32549 3 $7,000.00
Nissan White 213095 4 $3,500.00
Honda Blue 45698 4 $7,500.00
Toyota White 60000 4 $6,250.00
|
If you have any doubts or queries related to this chapter, get them clarified from our Python Team experts on ibmmainframer Community!