Put two dataframe on the same sheet in excel - python

I am doing some analysis on several different categories. I want to all the analysis to be on the same tab in a spreadsheet. So I have two dataframes for the information, but the columns are different and information different.
dataframe 1
colA colB calC
row 1
row 2
row 3
dataframe 2
colD colE calD
row 1
row 2
row 3
I want to export both of these dataframes on one excel sheet one after the other. The analysis are different lengths and I want dataframe 2 to be right below dataframe1 on a sheet.

import pandas
from openpyxl import load_workbook
book = load_workbook('test.xlsx')
writer = pandas.ExcelWriter('test.xlsx', engine='openpyxl')
writer.book = book
df1.to_excel(writer,sheet_name=sheetname,startrow=writer.sheets["Sheet1"].max_row, index = False,header= False)
writer.save()
// then do the same steps for any more number of dataframes.

You can add an extra row to the second DataFrame with the values same as the column names. And then simply use pd.concat()

df2.columns = df1.columns
pd.concat([df1, df2])
First make the columns of both the dataframes to be the same and then use pd.concat to append df2 to the end of df1
You can create a new dataframe from this and export it to csv :
df = pd.concat([df1,df2])
df.to_csv('filename.csv')
If you want the header of the second dataframe also in your final csv file, create df2 : df2 = pd.read_csv('df2.csv', names = df1.columns)

df1=pd.DataFrame(np.vstack([df1.columns, df1]))
#this will convert column names into rows
df2=pd.DataFrame(np.vstack([df2.columns, df2]))
#samewith other dataframe
#concat these dataframe and save as excel without index or columns
pd.concat((a,b)).to_excel('filename.xlsx',header=False,index=False)

Related

Add empty rows at the beginning of dataframe before export to xlsx

I have a pandas dataframe and I need to append 3 blank rows over the head of the columns before to export it to xlsx.
I'm using this code based on this question:
df1 = pd.DataFrame([[np.nan] * len(df.columns)], columns=df.columns)
df = df1.append(df, ignore_index=True)
But it adds the rows at index 0 and I need the blank rows before the row with the column names in the xlsx.
Is this possible to do?
Use startrow parameter for omit first N rows:
N = 3
df.to_excel('data.xlsx', sheet_name='Sheet1', startrow=N, index=False)

Is there a pandas function that can read multiple excel sheets but with only sheet1 having a header

Here is my code to read multiple sheets.
df = pd.read_excel('excelfile.xls',sheet_name=['Sheet1','Sheet2','Sheet3'])
But only sheet1 has a header. Sheet2 and sheet3 have no header.
You can read the first sheet with header and the remaining sheets without. Apply the first sheet's column header to the remaining sheets and concatenate the lot. Since dict values enumerate in insertion order, the sheet read order should be the same. Alternately you could sort by sheet name or other criteria.
import pandas as pd
sheets = pd.read_excel('excelfile.xls',sheet_name=['Sheet1'])
columns = sheets["Sheet1"].columns
sheets.update(pd.read_excel('excelfile.xls', header=None,
sheet_name=['Sheet2','Sheet3']))
for sheet in sheets.values():
sheet.columns = columns
df = pd.concat(sheets.values())
print(df)
read first df with header
read second df2 without header
add header to df2
df2.columns = ['A', 'B']
append df2 to df1
df.append(df2)

Adding rows with excel countif condition before the headline row of a pandas dataframe

So I have this pandas dataframe (after pivot action):
I'd like to add in the first 2 rows a countif statements, first 1 is countif value is 0 and the second is count if value >0 which will give me the result:
Note that to get the dataframe I use:
df = pd.DataFrame(data)
df.columns = ['patient_id', 'id', 'date', 'num', 'num_valid', 'db_valid']
df = df.pivot(index='id', columns='date')['num_valid']
The end result is exported to excel:
with pd.ExcelWriter('report') as writer:
df.to_excel(writer, sheet_name='report')
I had to use 'openpyxl' for this.
After creating the excel file with pandas.ExcelWriter, open it with openpyxl, add row as needed with '.insert_rows' and then add value to each cell.
The countif calculations made prehand in the code rather in the excel itself.
Not the most elegant solution but it works.

How Do I Sort Same Columns Across Multiple Sheets?

I have a spreadsheet with 12 tabs, one for each month. They have the exact same columns, but are possibly in a different order. Eventually, I want to combine all 12 tabs into one dataset and Export a file. I know how to do everything but make sure the columns match before merging the datasets together.
Here's what I have so far:
Import Excel File and Create Ordered Dictionary of All Sheets
sheets_dict = pd.read_excel("Monthly Campaign Data.xlsx", sheet_name = None, parse_dates = ["Date", "Create Date"])
I want to iterate this
sorted(sheets_dict["January"].columns)
and combine it with this and capitalize each column:
new_df = pd.DataFrame()
for name, sheet in sheets_dict.items():
sheet['sheet'] = name
sheet = sheet.rename(columns=lambda x: x.title().split('\n')[-1])
new_df = new_df.append(sheet)
new_df.reset_index(inplace = True, drop = True)
print(new_df)
If all the sheets have exactly the same columns, the pd.concat() function can align those columns and concatenate all these DataFrames.
Then you can group the DataFrame by different year, then sort each part.

Pandas - Write multiple dataframes to single excel sheet

I have a dataframe with 45 columns and 1000 rows. My requirement is to create a single excel sheet with the top 2 values of each column and their percentages (suppose col 1 has the value 'python' present 500 times in it, the percentage should be 50)
I used:
writer = pd.ExcelWriter('abc.xlsx')
df = pd.read_sql('select * from table limit 1000', <db connection sring>)
column_list = df.columns.tolist()
df.fillna("NULL", inplace = True)
for obj in column_list:
df1 = pd.DataFrame(df[obj].value_counts().nlargest(2)).to_excel(writer,sheet_name=obj
writer.save()
This writes the output in separate excel tabs of the same document. I need them in a single sheet in the below format:
Column Name Value Percentage
col1 abc 50
col1 def 30
col2 123 40
col2 456 30
....
Let me know any other functions as well to get to this output.
The first thing that jumps out to me is that you are changing the sheet name each time, by saying sheet_name=obj If you get rid of that, that alone might fix your problem.
If not, I would suggest concatenating the results into one large DataFrame and then writing that DataFrame to Excel.
for obj in column_list:
df = pd.DataFrame(df[obj].value_counts().nlargest(2))
if df_master is None:
df_master = df
else:
df_master = pd.concat([df_master,df])
df_master.to_excel("abc.xlsx")
Here's more information on stacking/concatenating dataframes in Pandas
https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

Categories