Pandas concat dataframe per excel file

Pandas concat dataframe per excel file - python

I have a code that read multiple files inside the directory and every excel file have more than 10 sheet's. After that I need to exclude some sheet's every file's and the others extracted.
I got all data needed, but the problem is every sheet's from the excel created new Dataframe even I used concat so when I save it to json only the last dataframe per file saved instead of whole data.
Here's my code:
excluded_sheet = ['Sheet 2','Sheet 6']
for index, xls_path in enumerate(file_paths):
data_file = pd.ExcelFile(xls_path)
sheets = [ sheet for sheet in data_file.sheet_names if sheet not in excluded_sheet ]
for sheet_name in sheets:
file = xls_path.rfind(".")
head, tail = os.path.split(xls_path[1:file])
df =pd.concat([pd.read_excel(xls_path, sheet_name=sheet_name, header=None)],ignore_index=True)
df.insert(loc=0, column='sheet name', value=sheet_name)
pd.DataFrame(df.to_json(f"{json_folder_path}{tail}.json", orient='records',indent=4))
I didn't used sheet_name=None because I need to read sheet name and add to column values.
Data status of my dataFrame:
I got many DF because every sheet create new DF, instead of 2 DF only since I have 2 files inside the directory. Thanks guys for your help.

You can use list comprehension for join all sheetnames to one DataFrame:
...
...
sheets = [ sheet for sheet in data_file.sheet_names if sheet not in excluded_sheet ]
file = xls_path.rfind(".")
head, tail = os.path.split(xls_path[1:file])
dfs = [pd.read_excel(xls_path,sheet_name=sheet_name,header=None) for sheet_name in sheets]
df =pd.concat(dfs,keys=sheets)
df = df.reset_index(level=0, drop=True).rename_axis('sheet name').reset_index()
pd.DataFrame(df.to_json(f"{json_folder_path}{tail}.json", orient='records',indent=4))
Or create helper list dfs with append DataFrames per loop, outside loop use concat:
...
...
sheets = [ sheet for sheet in data_file.sheet_names if sheet not in excluded_sheet ]
dfs = []
for sheet_name in sheets:
file = xls_path.rfind(".")
head, tail = os.path.split(xls_path[1:file])
df = pd.read_excel(xls_path, sheet_name=sheet_name, header=None)
df.insert(loc=0, column='sheet name', value=sheet_name)
dfs.append(df)
df1 = pd.concat(dfs,ignore_index=True)
pd.DataFrame(df1.to_json(f"{json_folder_path}{tail}.json", orient='records',indent=4))

Related

Multiple sheets of an Excel workbook into different dataframes using Pandas

I have a Excel workbook which has 5 sheets containing data.
I want each sheet to be a different dataframe.
I tried using the below code for one sheet of my Excel Sheet
df = pd.read_excel("path",sheet_name = ['Product Capacity'])
df
But this returns the sheet as a dictionary of the sheet, not a dataframe.
I need a data frame.
Please suggest the code that will return a dataframe

If you want separate dataframes without dictionary, you have to read individual sheets:
with pd.ExcelFile('data.xlsx') as xlsx:
prod_cap = pd.read_excel(xlsx, sheet_name='Product Capacity')
load_cap = pd.read_excel(xlsx, sheet_name='Load Capacity')
# and so on
But you can also load all sheets and use a dict:
dfs = pd.read_excel('data.xlsx', sheet_name=None)
# dfs['Product Capacity']
# dfs['Load Capacity']

How to read separate Excel sheets into separate DataFrames?

I have an Excel file with 13 tabs, and I want to write a function that takes specified sheets from the file, converts them into separate DataFrames, then bundles them into a list of DataFrames. In this case, I want to take the sheets labeled 'tblProviderDetails', 'tblSubmissionStatus', and 'Data Validation Ref Data', convert them into DataFrames and make a list. The reason I want the dfs in a list, is because I want to eventually want to take the input dfs and return a dictionary which will then be used to create a YAML file.
This is ultimately what I want:
dfs = [ 'tblProviderDetails', 'tblSubmissionStatus', 'Data Validation Ref Data']
The reason that I want to use a user-defined function is that I want the flexibility to call any sheet and any number of sheets into a list.
I was able to write a function that converts single specified sheets to dataframes, but I'm not sure how to call any number of sheets in the Excel file or create a list within the function. This is as far as I've gotten:
def read_excel(path, sheet_name, header):
dfs = pd.read_excel(path, sheet_name=sheet_name, header=header)
return dfs
df1 = read_excel(path=BASEDIR, sheet_name='tblProviderDetails', header=2)
df2 = read_excel(path=BASEDIR, sheet_name='tblSubmissionStatus', header=2)
df3 = read_excel(path=BASEDIR, sheet_name='Data Validation Ref Data', header=2)
Thank you for your help.

There are multiple ways to do this but perhaps the simplest way is to first get all the sheet names and then in a loop for every sheet name, load the result in a data frame and append it to the required list.
dfList = []
def read_excel(path, h):
xls = pd.ExcelFile(path)
# Now you can access all sheetnames in the file
sheetsList = xls.sheet_names
# ['sheet1', 'sheet2', ...]
for sheet in sheetsList:
dfList.append(pd.read_excel(path, sheet_name=sheet, header
=h))
read_excel('book.xlsx',2)
print(dfList)

You can pass the a list of sheet names and\or sheet number to parameter sheet_name.
def read_excel(path, sheet_name, header):
sheet_name = ['tblProviderDetails','tblSubmissionStatus','Data Validation
Ref Data']
dfs = pd.read_excel(path, sheet_name=sheet_name, header=header)
return dfs

excel sheets name in pandas dataframe

I have an Excel workbook that I have already loaded and put all the sheets together, now I would like to add a column where I have the name of each original sheet, I don't know if I have to do it before I put everything together, and if that's how I could do it , I am using pandas. This is my code so far, I want the sheet name or number to be in the "Week" column.
xlsx= pd.ExcelFile('archivo.xlsx')
hojas=[]
for hojaslibro in xlsx.sheet_names:
hojas.append(xlsx.parse(hojaslibro))
estado=pd.concat(hojas,ignore_index=True)
estado['Week']=0

This should work:
xl = pd.ExcelFile('archvio.xlsx')
df_combined = pd.DataFrame()
for sheet_name in xl.sheet_names:
df = xl.parse(sheet_name)
df['Week'] = sheet_name # this adds `sheet_name` into the column `Week`
df_combined = df_combined.append(df)

Adding sheet name to the conceited final merged sheet in excel

I want to merge multiple excel sheets to one and to have a new column with the name of the original sheet
I'm using the following code:
list_of_sheets = list(df.keys())
cdf = pd.concat(df[sheet] for sheet in list_of_sheets)
# tried
cdf = pd.concat(df[sheet]["Brand"] for sheet in list_of_sheets)
# and
list_of_sheets = list(df.keys())
for sheet in list_of_sheets:
df[sheet]["Brand"] = sheet
cdf = pd.concat(df[sheet])
but none of them works

Does this accomplish what you want?
import pandas as pd
pd.concat(pd.read_excel("my_excel_file.xlsx", sheet_name=None))
The sheet's names will be the index of the dataframe.

First read the file:
xl = pd.ExcelFile(file)
Which should produce the following:
<pandas.io.excel.ExcelFile at 0x12cad0860>
Then iterate over the sheets, append the sheet name as a separate column and store all dfs in a list:
dfs = []
for sheet in xl.sheet_names:
df = xl.parse(sheet)
df['sheet_name'] = sheet
dfs.append(df)
In order to concat them at last:
pd.concat(dfs)

How to combine multiple excel files having multiple equal number of sheets in each excel files

I am able to combine multiple excel files having one sheet currently.
I want to combine multiple sheets having two different sheets in each excel file with giving name to each sheets How can I achieve this?
Here below is my current code for combining single sheet in multiple excel files without giving sheet name to Combined excel file
import pandas as pd
# filenames
excel_names = ["xlsx1.xlsx", "xlsx2.xlsx", "xlsx3.xlsx"]
# read them in
excels = [pd.ExcelFile(name) for name in excel_names]
# turn them into dataframes
frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in excels]
# delete the first row for all frames except the first
# i.e. remove the header row -- assumes it's the first
frames[1:] = [df[1:] for df in frames[1:]]
# concatenate them..
combined = pd.concat(frames)
# write it out
combined.to_excel("c.xlsx", header=False, index=False)

First combine the first and the second sheet separately
import pandas as pd
# filenames
excel_names = ["xlsx1.xlsx", "xlsx2.xlsx", "xlsx3.xlsx"]
def combine_excel_to_dfs(excel_names, sheet_name):
sheet_frames = [pd.read_excel(x, sheet_name=sheet_name) for x in excel_names]
combined_df = pd.concat(sheet_frames).reset_index(drop=True)
return combined_df
df_first = combine_excel_to_dfs(excel_names, 0)
df_second = combine_excel_to_dfs(excel_names, 1)
Use pd.ExcelWriter
And write these sheets to the same excel file:
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('two_sheets_combined.xlsx', engine='xlsxwriter')
# Write each dataframe to a different worksheet.
df_first.to_excel(writer, sheet_name='Sheet1')
df_second.to_excel(writer, sheet_name='Sheet2')
# Close the Pandas Excel writer and output the Excel file.
writer.save()

You can use:
#number of sheets
N = 2
#get all sheets to nested lists
frames = [[x.parse(y, index_col=None) for y in x.sheet_names] for x in excels]
#print (frames)
#combine firt dataframe from first list with first df with second list...
combined = [pd.concat([x[i] for x in frames], ignore_index=True) for i in range(N)]
#print (combined)
#write to file
writer = pd.ExcelWriter('c.xlsx', engine='xlsxwriter')
for i, x in enumerate(combined):
x.to_excel(writer, sheet_name='Sheet{}'.format(i + 1))
writer.save()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas concat dataframe per excel file - python

Related

Multiple sheets of an Excel workbook into different dataframes using Pandas

How to read separate Excel sheets into separate DataFrames?

excel sheets name in pandas dataframe

Adding sheet name to the conceited final merged sheet in excel

How to combine multiple excel files having multiple equal number of sheets in each excel files

Categories

Resources