Python practice: Using Excel data
Page 2: Writing to an Excel worksheet with pandas
For this purpose, the pandas library provides the DataFrame method to_excel()
. The following instruction creates a worksheet in the Demo1.xlsx workbook with the name Turnover 2nd quarter 2023 and writes the data stored in the previously created DataFrame (see previous listing) to it.
data_frame.to_excel('Demo1.xlsx', sheet_name='Umsätze 2. Quartal 2023', index=False)
Please note that the target workbook must already exist and that to_excel()
overwrites any existing content there. The specification index=False
prevents the data line indices stored in the DataFrame from being written; the default setting is True
.
Two further parameters of the to_excel()
method are columns
and startcol
. columns
is the counterpart to the read_excel()
parameter usecols
. If developers have not already made a preselection when importing the Excel data into the DataFrame, they can assign a list with the desired columns to usecols
. If the columns are next to each other, the startcol
parameter can be used as an alternative. The integer value defines the column from which to_excel()
writes to the Excel table. The startrow
parameter behaves in the same way and specifies the first row of data. To redefine the column names, a list with the desired new names is passed to the header
parameter. With the specification header=False
, to_excel()
writes the data without column headers to the Excel table.
Creating a DataFrame
If developers want to write calculated data to a workbook, they must first create their own DataFrame. There are several ways to do this, for example by passing the desired data to the __init__()
method of the DataFrame class. The columns
parameter defines the column names. The following instruction saves the sales of districts A and B of the first half of the year (first quarter and second quarter) in the DataFrame.
data_frame = pandas.DataFrame([['Bezirk A', 19500, 13500], ['Bezirk B', 18400, 25100]], columns=['', 'I. Quartal', 'II. Quartal'])
Alternatively, developers first save the data in a dictionary and then transfer this to the __init__()
method. In this case, __init__()
- obtains the column names for the DataFrame from the keys and the data from the values of the Dictionary elements. The following instructions create a DataFrame with the total sales per quarter and then write them to a worksheet Total sales 2023 of the workbook Demo1.xlsx.
import pandas
data_frame = pandas.read_excel('Demo.xlsx')
summe_quartal1_2023 = data_frame['I. Quartal'].sum()
summe_quartal2_2023 = data_frame['II. Quartal'].sum()
summe_quartal3_2023 = data_frame['III. Quartal'].sum()
summe_quartal4_2023 = data_frame['IV. Quartal'].sum()
umsaetze_dic = {
'Umsatz 1. Quartal': [summe_quartal1_2023],
'Umsatz 2. Quartal': [summe_quartal2_2023],
'Umsatz 3. Quartal': [summe_quartal3_2023],
'Umsatz 4. Quartal': [summe_quartal4_2023]
}
data_frame_umsaetze = pandas.DataFrame(umsaetze_dic)
data_frame_umsaetze.to_excel('Demo1.xlsx', sheet_name='Gesamtumsätze 2023', index=False)
The values in the dictionary umsaetze_dic
are specified as a list, even though they have one element, as the length of the values in the dictionary must match when a dictionary is passed to the __init__()
method. This is not usually the case for simple values such as the sales here, as the totals result in different amounts. there is another way to create the DataFrame in order to also note simple values in the dictionary: from_dict()
converts a Dictionary directly into a DataFrame. The parameter orient
specifies that the keys of the dictionary are used as indexes instead of column headings, i.e. as row headings of the DataFrame to be created. The mandatory parameter columns
contains the desired column headings:
…
umsaetze_dic = {
'Umsatz 1. Quartal': summe_quartal1_2023,
'Umsatz 2. Quartal': summe_quartal2_2023,
'Umsatz 3. Quartal': summe_quartal3_2023,
'Umsatz 4. Quartal': summe_quartal4_2023
}
data_frame_umsaetze = pandas.DataFrame.from_dict(umsaetze_dic, orient='index', columns=['Summe'])
data_frame_umsaetze.to_excel('Demo1.xlsx', sheet_name='Gesamtumsätze 2023')
The data is then written to Excel in the following order:
Summe
Umsatz 1. Quartal 63400
Umsatz 2. Quartal 73500
Umsatz 3. Quartal 77500
Umsatz 4. Quartal 84000
The specification index=False
is omitted in this variant when calling to_excel()
, as the indices, which here consist of the quarterly data, should be written explicitly.