Creating Multiple Dataframes from Multiple key- value pairs in dictionary - python

Beginner coder here
I am trying to create multiple dataframes from multiple excel sheets in a single notebook with dataframe names being same as sheet names but I am unable to do so.
I have tried this but to no avail.
Kindly help me on this.
file_name='file.xlsx'
xl = pd.ExcelFile(file_name)
dfs = {sh:xl.parse(sh) for sh in xl.sheet_names}
for key in dfs.keys():
dfs[key] = pd.DataFrame()
Expected Result is
excelbook contains sheet1 sheet2
I need to create two dataframes: sheet1 and sheet2
containing all the columns of sheet1 and sheet2
result that I am getting is
I am able to create dictionary having all the dataframe as key and their columns as values but I need them all seperately out of the dictionary.
as
dfs[sheet1]
dfs[sheet2]
i created a loop like this
for key in dfs.keys():
dfs[key] = pd.DataFrame()
but it is creating dataframe for the first key value pair only.
df_sheet1
Kindly help me on this.

You need to use the read_excel function to read a sheet from the excel
import pandas as pd
xls = pd.ExcelFile('sample.xlsx')
dfs = {sh: pd.read_excel(xls, sh) for sh in xls.sheet_names}
This will create a dictionary of DataFrames corresponding to each sheet in the Workbook.
Source: https://stackoverflow.com/a/26521726/5236575
Edit:
Assuming you have sheet1 and sheet2 in your workbook, you can access them as
df_sheet1 = dfs['sheet1']
df_sheet2 = dfs['sheet2']

Related

Multiple sheets of an Excel workbook into different dataframes using Pandas

I have a Excel workbook which has 5 sheets containing data.
I want each sheet to be a different dataframe.
I tried using the below code for one sheet of my Excel Sheet
df = pd.read_excel("path",sheet_name = ['Product Capacity'])
df
But this returns the sheet as a dictionary of the sheet, not a dataframe.
I need a data frame.
Please suggest the code that will return a dataframe
If you want separate dataframes without dictionary, you have to read individual sheets:
with pd.ExcelFile('data.xlsx') as xlsx:
prod_cap = pd.read_excel(xlsx, sheet_name='Product Capacity')
load_cap = pd.read_excel(xlsx, sheet_name='Load Capacity')
# and so on
But you can also load all sheets and use a dict:
dfs = pd.read_excel('data.xlsx', sheet_name=None)
# dfs['Product Capacity']
# dfs['Load Capacity']

pd.ExcelFile does not get the real sheet_names

I'm trying to read in an Excel file with multiple sheets (s.t that all columns are strings). The below code works for that but it doen't get the correct sheet names. So my dic_excel which is a dictionary with all sheet names and the corresponding data has the following keys: 'Sheet1', 'Sheet2', 'Sheet3', etc. But the actual names of the sheets are different. How do I get the actual names of the sheets?
dic_excel={}
excel = pd.ExcelFile(excel_path)
for sheet in excel.sheet_names:
print(sheet)
columns = excel.parse(sheet).columns
converters = {col: str for col in columns}
dic_excel[sheet] = excel.parse(sheet, converters=converters)
Here is two ways to get the real names of your Excel sheets:
By using pandas.DataFrame.keys with pandas
import pandas as pd
excel = pd.read_excel(excel_path, sheet_name=None)
dic_excel = df.keys()
This will return a dictionnary of the sheetnames
By using Workbook.sheetname with openpyxl
import openpyxl
wb = openpyxl.load_workbook(excel_path)
list_excel = wb.sheetnames
This will return a list of the sheetnames

Extract Partial Data from multiple excel sheets in the same workbook using pandas

I have an excel Workbook with more than 200 sheets of data. Sheet names are as shown in the figure. I would like to assign each sheet to an individual variable as a data frame and later extract some required data from each sheet. Extracted information from all the sheet needs to be stored into a single excel sheet As I cannot keep writing 200 times, I would like to know if I can write any function or use for loop to kind of automate this process.
df1 = pd.read_excel("C:\\Users\\RECL\\Documents\\PRADYUMNA\\Experiment Data\\CNN\\CCCV Data.xlsx", sheet_name=5)
df2 = pd.read_excel("C:\\Users\\RECL\\Documents\\PRADYUMNA\\Experiment Data\\CNN\\CCCV Data.xlsx", sheet_name=10)
df3 = pd.read_excel("C:\\Users\\RECL\\Documents\\PRADYUMNA\\Experiment Data\\CNN\\CCCV Data.xlsx", sheet_name=15)
df1 = df1[0::100]
df2 = df2[0::200]
df3 = df3[0::300]
df1
i=0
for i in range(0,1035), i+5 :
df = pd.read_excel(xlsx, sheet_name=i)
df
I tried something like this but isn't working. Please let me know if there is any simple way to do it.
Thank you :)
Not sure exactly what you are trying to do, but an easier way to traverse through the sheet names would be with a for-each loop:
for sheet in input.sheet_names:
Now you can do something for all the sheets no matter their name.
Regarding " would like to assign each sheet to an individual variable" you could use a dictionary:
sheets = {}
for sheet in input.sheet_names:
sheets[sheet] = pd.read_excel(xlsx, sheet)
Now to get a sheet from the dictionary sheets:
sheets.get("15")
Or to traverse all the sheets:
for sheet in sheets:
%do_something eg.%
print(sheet)
This will print the data for each sheet in sheets.
Hope this helps / brings you further

Convert excel file with many sheets (with spaces in the name of the shett) in pandas data frame

I would like to convert an excel file to a pandas dataframe. All the sheets name have spaces in the name, for instances, ' part 1 of 22, part 2 of 22, and so on. In addition the first column is the same for all the sheets.
I would like to convert this excel file to a unique dataframe. However I dont know what happen with the name in python. I mean I was hable to import them, but i do not know the name of the data frame.
The sheets are imported but i do not know the name of them. After this i would like to use another 'for' and use a pd.merge() in order to create a unique dataframe
for sheet_name in Matrix.sheet_names:
sheet_name = pd.read_excel(Matrix, sheet_name)
print(sheet_name.info())
Using only the code snippet you have shown, each sheet (each DataFrame) will be assigned to the variable sheet_name. Thus, this variable is overwritten on each iteration and you will only have the last sheet as a DataFrame assigned to that variable.
To achieve what you want to do you have to store each sheet, loaded as a DataFrame, somewhere, a list for example. You can then merge or concatenate them, depending on your needs.
Try this:
all_my_sheets = []
for sheet_name in Matrix.sheet_names:
sheet_name = pd.read_excel(Matrix, sheet_name)
all_my_sheets.append(sheet_name)
Or, even better, using list comprehension:
all_my_sheets = [pd.read_excel(Matrix, sheet_name) for sheet_name in Matrix.sheet_names]
You can then concatenate them into one DataFrame like this:
final_df = pd.concat(all_my_sheets, sort=False)
You might consider using the openpyxl package:
from openpyxl import load_workbook
import pandas as pd
wb = load_workbook(filename=file_path, read_only=True)
all_my_sheets = wb.sheetnames
# Assuming your sheets have the same headers and footers
n = 1
for ws in all_my_sheets:
records = []
for row in ws._cells_by_row(min_col=1,
min_row=n,
max_col=ws.max_column,
max_row=n):
rec = [cell.value for cell in row]
records.append(rec)
# Make sure you don't duplicate the header
n = 2
# ------------------------------
# Set the column names
records = records[header_row-1:]
header = records.pop(0)
# Create your df
df = pd.DataFrame(records, columns=header)
It may be easiest to call read_excel() once, and save the contents into a list.
So, the first step would look like this:
dfs = pd.read_excel(["Sheet 1", "Sheet 2", "Sheet 3"])
Note that the sheet names you use in the list should be the same as those in the excel file. Then, if you wanted to vertically concatenate these sheets, you would just call:
final_df = pd.concat(dfs, axis=1)
Note that this solution would result in a final_df that includes column headers from all three sheets. So, ideally they would be the same. It sounds like you want to merge the information, which would be done differently; we can't help you with the merge without more information.
I hope this helps!

Can I modify specific sheet from Excel file and write back to the same without modifying other sheets using Pandas | openpyxl

I'll try to explain my problem with an example:
Let's say I have an Excel file test.xlsx which has five tabs (aka worksheets): Sheet1, Sheet2, Sheet3, Sheet4 and sheet5. I am interested to read and modify data in sheet2.
My sheet2 has some columns whose cells are dropdowns and those dropdown values are defined in sheet4 and sheet5. I don't want to touch sheet4 and sheet5. (I mean sheet4 & sheet5 have some references to cells on Sheet2).
I know that I can read all the sheets in excel file using pd.read_excel('test.xlsx', sheetnames=None) which basically gives all sheets as a dictionary(OrderedDict) of DataFrames.
Now I want to modify my sheet2 and save it without disturbing others. So is it posibble to do this using Python Pandas library.
[UPDATE - 4/1/2019]
I am using Pandas read_excel to read whatever sheet I need from my excel file, validating the data with the data in database and updating the status column in the excelfile.
So for writing back the status column in excel I am using openpyxl as shown in the below pseudo code.
import pandas as pd
import openpyxl
df = pd.read_excel(input_file, sheetname=my_sheet_name)
df = df.where((pd.notnull(df)), None)
write_data = {}
# Doing some validations with the data and building my write_data with key
# as (row_number, column_number) and value as actual value to put in that
# cell.
at the end my write_data looks something like this:
{(2,1): 'Hi', (2,2): 'Hello'}
Now I have defined a seperate class named WriteData for writing data using openpyxl
# WriteData(input_file, sheet_name, write_data)
book = openpyxl.load_workbook(input_file, data_only=True, keep_vba=True)
sheet = book.get_sheet_by_name(sheet_name)
for k, v in write_data.items():
row_num, col_num = k
sheet.cell(row=row_num, column=col_num).value = v
book.save(input_file)
Now when I am doing this operation it is removing all the formulas and diagrams. I am using openpyxl 2.6.2
Please correct me if I am doing anything wrong! Is there any better way to do?
Any help on this will be greatly appreciated :)
To modify a single sheet at a time, you can use pandas excel writer:
sheet2 = pd.read_excel("test.xlsx", sheet = "sheet2")
##modify sheet2 as needed.. then to save it back:
with pd.ExcelWriter("test.xlsx") as writer:
sheet2.to_excel(writer, sheet_name="sheet2")

Categories