Creating Series and DataFrame's from scratch is nice but what you'll usually be doing is importing your data in the form of a .csv (comma separated value) or spreadsheet file or other types of files.
Pandas allows for easy importing of data like this through functions such as pd.read_csv() and pd.read_excel() (for Microsoft Excel files).
Say you wanted to get this information from excel Sheet document into a pandas DataFrame.
let's create a sample input file Input_File.xlsx
Using with pd.ExcelFile() will automatically close the file once read\write the data.
import pandas as pd with pd.ExcelFile("E:\\Input_File.xlsx") as reader: Car_Detail_df = pd.read_excel(reader) print(Car_Detail_df) |
You can also read specific sheet in the excel using sheet_name keyword
import pandas as pd with pd.ExcelFile("E:\\Input_File.xlsx") as reader: Car_Detail_df = pd.read_excel(reader, sheet_name='Sheet1') print(Car_Detail_df) |
Make Color Odometer (KM) Doors Price Date Launched 0 Toyota White 150043 4 4000 28/02/2000 1 Toyota Blue 32549 3 7000 28/02/2000 2 Nissan White 213095 4 3500 22/09/2005 3 Honda Blue 45698 4 7500 22/09/2005 4 Toyota White 60000 4 6250 28/02/2000 |
To read the date column correctly, we can use the argument parse_dates to specify a list of date columns.
with pd.ExcelFile("E:\\Input_File.xlsx") as reader: Car_Detail_df = pd.read_excel(reader, parse_dates=["Date Launched"]) Car_Detail_df.info() |
|
Sometimes date is split up into multiple columns, for example, year, month, and day.
To combine them into a datetime, we can pass a nested list to parse_dates.
import pandas as pd with pd.ExcelFile("E:\\Input_File.xlsx") as reader: Car_Detail_df = pd.read_excel(reader, parse_dates={"Date Launched" : ['Year', 'Month', 'Day']}) Car_Detail_df.info() |
|
By default, date columns are parsed using the Pandas built-in parser from dateutil.parser.parse. Sometimes, you might need to write your own parser to support a different date format, for example, YYYY-DD-MM HH:MM:SS
There are nubmer of ways available in python to customize the date but The easiest way is to write a lambda function which can read the data in this format, and pass the lambda function to the argument date_parser.
import pandas as pd from datetime import datetime custom_date_parser = lambda x: datetime.strptime(x, "%d/%m/%Y") with pd.ExcelFile("E:\\Input_File.xlsx") as reader: Car_Detail_df = pd.read_excel(reader, parse_dates=["Date Launched"], date_parser=custom_date_parser) print(Car_Detail_df) |
Make Color Odometer (KM) Doors Price Date Launched 0 Toyota White 150043 4 4000 2000-02-28 1 Toyota Blue 32549 3 7000 2000-02-28 2 Nissan White 213095 4 3500 2005-09-22 3 Honda Blue 45698 4 7500 2005-09-22 4 Toyota White 60000 4 6250 2000-02-28 |
After you've made a few changes to your data, you might want to export it and save it so someone else can access the changes.
pandas allows you to export DataFrame's to .csv format using .to_csv() or spreadsheet format using .to_excel() or other types of files using pandas functions.
We haven't made any changes yet to the Car_Detail_df DataFrame but let's try export it.
Car_Detail_df.to_csv("E:\\Output_File.xlsx") |
You see some additional column append in the dataframe as it's a index column which created while importing the excel data. To get rid off this index columns use index=False
Car_Detail_df.to_csv("E:\\Output_File.xlsx", index=False) |
If you have any doubts or queries related to this chapter, get them clarified from our Python Team experts on ibmmainframer Community!