How to name dataframes dynamically in Python? - python

I have an excel file which contains more than 30 sheets. However the operation that I do on each sheet remains the same more or less. But my objective is to create a separate dataframe for each sheet, so that I can refer in the future
This is what I tried but it throws an error
xls = pd.ExcelFile('DC_Measurement.xlsx')
sheets = xls.sheet_names
for s in sheets:
print(s)
'df '+ s = pd.read_excel(xls, sheet_name=s)
So, it's like I want 30 dataframes to be created and each dataframe will have the sheet name as the suffix name. I tried using the "+" operator but it didn't help either. It threw an error message as shown below
SyntaxError: can't assign to operator
How can I create dataframes on the fly and name them ?

You could use something like this:
for s in sheets:
vars()['df'+ s] = pd.read_excel(xls, sheet_name=s)

Strictly speaking not an answer to your question but this will create a dictionary where the key is the sheet name and the value is the dataframe.
workbook = pd.read_excel('DC_Measurement.xlsx', sheet_name = None)
Then you can retrieve the dataframe you need like this.
df = workbook['sheet_name']
I think this is tidier than other solutions.

Or use locals:
for s in sheets:
locals()['df'+ s] = pd.read_excel(xls, sheet_name=s)
In a function change locals to globals.

The best approach is usually to store the dataframes in a list or dictionary, where you can work with them systematically, like this:
xls = pd.ExcelFile('DC_Measurement.xlsx')
sheets = {}
for s in xls.sheet_names:
print(s)
sheets[s] = pd.read_excel(xls, sheet_name=s)
Or just this:
xls = pd.ExcelFile('DC_Measurement.xlsx')
sheets = {
s: pd.read_excel(xls, sheet_name=s)
for s in xls.sheet_names
}
This will make it easy to work with the sheets programmatically later (just access sheets[s], where s is a sheet name). Otherwise you will next face the tricky problem of how to access all the dataframes that you've just created as free-floating variables.

Related

how to use element as dataframe name when looping over a list

I need to read data from several sheets in a xlsx file, and save data as a dataframe with the same name as sheet name. Here is the code I use. It can read data from different sheets, however, all dataframes are named as temp. How should I change it. Thanks.
import pandas as pd
sheet_name_list = ['sheet1','sheet2','sheet3']
for temp in sheet_name_list:
temp = pd.read_excel("data_spreadsheet.xlsx", sheet_name = temp)
You can use dictionary:
pd_dict = {}
for temp in sheet_name_list:
pd_dict[temp] = pd.read_excel("data_spreadsheet.xlsx", sheet_name=temp)

How to read separate Excel sheets into separate DataFrames?

I have an Excel file with 13 tabs, and I want to write a function that takes specified sheets from the file, converts them into separate DataFrames, then bundles them into a list of DataFrames. In this case, I want to take the sheets labeled 'tblProviderDetails', 'tblSubmissionStatus', and 'Data Validation Ref Data', convert them into DataFrames and make a list. The reason I want the dfs in a list, is because I want to eventually want to take the input dfs and return a dictionary which will then be used to create a YAML file.
This is ultimately what I want:
dfs = [ 'tblProviderDetails', 'tblSubmissionStatus', 'Data Validation Ref Data']
The reason that I want to use a user-defined function is that I want the flexibility to call any sheet and any number of sheets into a list.
I was able to write a function that converts single specified sheets to dataframes, but I'm not sure how to call any number of sheets in the Excel file or create a list within the function. This is as far as I've gotten:
def read_excel(path, sheet_name, header):
dfs = pd.read_excel(path, sheet_name=sheet_name, header=header)
return dfs
df1 = read_excel(path=BASEDIR, sheet_name='tblProviderDetails', header=2)
df2 = read_excel(path=BASEDIR, sheet_name='tblSubmissionStatus', header=2)
df3 = read_excel(path=BASEDIR, sheet_name='Data Validation Ref Data', header=2)
Thank you for your help.
There are multiple ways to do this but perhaps the simplest way is to first get all the sheet names and then in a loop for every sheet name, load the result in a data frame and append it to the required list.
dfList = []
def read_excel(path, h):
xls = pd.ExcelFile(path)
# Now you can access all sheetnames in the file
sheetsList = xls.sheet_names
# ['sheet1', 'sheet2', ...]
for sheet in sheetsList:
dfList.append(pd.read_excel(path, sheet_name=sheet, header
=h))
read_excel('book.xlsx',2)
print(dfList)
You can pass the a list of sheet names and\or sheet number to parameter sheet_name.
def read_excel(path, sheet_name, header):
sheet_name = ['tblProviderDetails','tblSubmissionStatus','Data Validation
Ref Data']
dfs = pd.read_excel(path, sheet_name=sheet_name, header=header)
return dfs

Using Pandas (Python) with Excel to loop through multiple worksheets to return all rows where a value in a list appears in a column

I have a list of values, if they appear in a the column 'Books' I would like that row to be returned.
I think I have achieved this with the below code:
def return_Row():
file = 'TheFile.xls'
df = pd.read_excel(file)
listOfValues = ['A','B','C']
return(df.loc[df['Column'].isin(listOfValues)])
This currently only seems to work on the first Worksheet as there are multiple worksheets in 'TheFile.xls' how would I go about looping through these worksheets to return any rows where the listOfValues is found in the 'Books' column of all the other sheets?
Any help would be greatly appreciated.
Thank you
The thing is, pd.read_excel() returns a dataframe for the first sheet only if you didn't specify the sheet_name argument. If you want to get all the sheets in excel file without specifying their names, you can pass None to sheet_name as follows:
df = pd.read_excel(file, sheet_name=None)
This will give you a different dataframe for each sheet on which you can loop and do whatever you want. For example you can append the results that you need to a list and return the list:
def return_Row():
file = 'TheFile.xls'
results = []
dfs = pd.read_excel(file, sheet_name=None)
listOfValues = ['A','B','C']
for df in dfs.values():
results.append(df.loc[df['Column'].isin(listOfValues)])
return(results)

Extract Partial Data from multiple excel sheets in the same workbook using pandas

I have an excel Workbook with more than 200 sheets of data. Sheet names are as shown in the figure. I would like to assign each sheet to an individual variable as a data frame and later extract some required data from each sheet. Extracted information from all the sheet needs to be stored into a single excel sheet As I cannot keep writing 200 times, I would like to know if I can write any function or use for loop to kind of automate this process.
df1 = pd.read_excel("C:\\Users\\RECL\\Documents\\PRADYUMNA\\Experiment Data\\CNN\\CCCV Data.xlsx", sheet_name=5)
df2 = pd.read_excel("C:\\Users\\RECL\\Documents\\PRADYUMNA\\Experiment Data\\CNN\\CCCV Data.xlsx", sheet_name=10)
df3 = pd.read_excel("C:\\Users\\RECL\\Documents\\PRADYUMNA\\Experiment Data\\CNN\\CCCV Data.xlsx", sheet_name=15)
df1 = df1[0::100]
df2 = df2[0::200]
df3 = df3[0::300]
df1
i=0
for i in range(0,1035), i+5 :
df = pd.read_excel(xlsx, sheet_name=i)
df
I tried something like this but isn't working. Please let me know if there is any simple way to do it.
Thank you :)
Not sure exactly what you are trying to do, but an easier way to traverse through the sheet names would be with a for-each loop:
for sheet in input.sheet_names:
Now you can do something for all the sheets no matter their name.
Regarding " would like to assign each sheet to an individual variable" you could use a dictionary:
sheets = {}
for sheet in input.sheet_names:
sheets[sheet] = pd.read_excel(xlsx, sheet)
Now to get a sheet from the dictionary sheets:
sheets.get("15")
Or to traverse all the sheets:
for sheet in sheets:
%do_something eg.%
print(sheet)
This will print the data for each sheet in sheets.
Hope this helps / brings you further

Convert excel file with many sheets (with spaces in the name of the shett) in pandas data frame

I would like to convert an excel file to a pandas dataframe. All the sheets name have spaces in the name, for instances, ' part 1 of 22, part 2 of 22, and so on. In addition the first column is the same for all the sheets.
I would like to convert this excel file to a unique dataframe. However I dont know what happen with the name in python. I mean I was hable to import them, but i do not know the name of the data frame.
The sheets are imported but i do not know the name of them. After this i would like to use another 'for' and use a pd.merge() in order to create a unique dataframe
for sheet_name in Matrix.sheet_names:
sheet_name = pd.read_excel(Matrix, sheet_name)
print(sheet_name.info())
Using only the code snippet you have shown, each sheet (each DataFrame) will be assigned to the variable sheet_name. Thus, this variable is overwritten on each iteration and you will only have the last sheet as a DataFrame assigned to that variable.
To achieve what you want to do you have to store each sheet, loaded as a DataFrame, somewhere, a list for example. You can then merge or concatenate them, depending on your needs.
Try this:
all_my_sheets = []
for sheet_name in Matrix.sheet_names:
sheet_name = pd.read_excel(Matrix, sheet_name)
all_my_sheets.append(sheet_name)
Or, even better, using list comprehension:
all_my_sheets = [pd.read_excel(Matrix, sheet_name) for sheet_name in Matrix.sheet_names]
You can then concatenate them into one DataFrame like this:
final_df = pd.concat(all_my_sheets, sort=False)
You might consider using the openpyxl package:
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename=file_path, read_only=True)
all_my_sheets = wb.sheetnames
# Assuming your sheets have the same headers and footers
n = 1
for ws in all_my_sheets:
records = []
for row in ws._cells_by_row(min_col=1,
min_row=n,
max_col=ws.max_column,
max_row=n):
rec = [cell.value for cell in row]
records.append(rec)
# Make sure you don't duplicate the header
n = 2
# ------------------------------
# Set the column names
records = records[header_row-1:]
header = records.pop(0)
# Create your df
df = pd.DataFrame(records, columns=header)
It may be easiest to call read_excel() once, and save the contents into a list.
So, the first step would look like this:
dfs = pd.read_excel(["Sheet 1", "Sheet 2", "Sheet 3"])
Note that the sheet names you use in the list should be the same as those in the excel file. Then, if you wanted to vertically concatenate these sheets, you would just call:
final_df = pd.concat(dfs, axis=1)
Note that this solution would result in a final_df that includes column headers from all three sheets. So, ideally they would be the same. It sounds like you want to merge the information, which would be done differently; we can't help you with the merge without more information.
I hope this helps!

Categories