One can write data to a specific cell, using:
xlsworksheet.write('B5', 'Hello')
But if you try to write a whole dataframe, df2, starting in cell 'B5':
xlsworksheet.write('B5', df2)
TypeError: Unsupported type <class 'pandas.core.frame.DataFrame'> in write()
What should be the way to write a whole dataframe starting in a specific cell?
The reason I ask this is because I need to paste 2 different pandas dataframes in the same sheet in excel.
XlsxWriter doesn't write Pandas dataframes directly. However, it is integrated with Pandas so you can do it the other way around.
Here is a small example of writing 2 dataframes to the same worksheet using the startrow parameter of Pandas to_excel:
import pandas as pd
df1 = pd.DataFrame({'Data': [10, 20, 30, 40]})
df2 = pd.DataFrame({'Data': [13, 24, 35, 46]})
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')
df1.to_excel(writer, sheet_name='Sheet1')
df2.to_excel(writer, sheet_name='Sheet1', startrow=6)
Output:
You can turn off the column and row indexes using other to_excel options.
You write strings to Excel. To convert DataFrames to strings, there you have several options of which it looks like to_csv is your best bet:
>>> string1 = df1.to_csv(writer)
>>> xlsworksheet.write('B5',string1)
>>> string2 = df2.to_csv(writer,'Sheet2')
>>> xlsworksheet.write('C5', string2)
Note this will write your entire dataframe to one cell. The only way I know of to write a frame to individual cells in an Excel sheet is to combine them and then use to_excel:
>>> writer = ExcelWriter('output.xlsx')
>>> frame = pd.concat(df1, df2)
>>> frame.to_excel(writer,'Sheet1')
Hope this helps...
Related
there are many similar questions to this. I have looked through them all and I can't figure out how to fix my issue.
I have 11 dataframes. I would like to export all of these dataframes to one excel file, with one sheet per data frame. I have 2 lists: One is a list of dataframe objects, and one is a list of the names I want for each df. the lists are each ordered so that if you iterated through each list at the same time it would be the df and the df name I want.
Here is my code:
for (df, df_name) in zip(df_list, df_name_list):
sheetname = "{}".format(df_name)
df.to_excel(r"myfolder\myfile.xlsx", index=False, sheet_name=sheetname)
It exports to excel, but it appears to overwrite the sheet each time. The final sheet has the same name as the final dataframe, so it looped through both lists but it won't save separate sheets. Any help would be much appreciated!
UPDATE - ISSUE FIXED - EDITING TO ADD THE CODE THAT WORKED
`with pd.ExcelWriter(r"myfolder\myfile.xlsx") as writer:
for (df, df_name) in zip(df_list, df_name_list):
sheetname = "{}".format(df_name)
df.to_excel(writer, sheet_name=sheetname)'
I just tried something based docs example as it seems to work ok see:
import pandas as pd
data = [(f'Sheet {j}', pd.DataFrame({'a': [i for i in range(20)], 'b': [j for i in range(20)] })) for j in range(10)]
with pd.ExcelWriter('output.xlsx') as writer:
for sheet, df in data:
df.to_excel(writer, sheet_name=sheet)
I ma trying to write a code that deletes the unnamed column , that comes right before Unix Timestamp. After deleting I will save the modified dataframe into data.csv. How would I be able to get the Expected Output below?
import pandas ads pd
data = pd.read_csv('data.csv')
data.drop('')
data.to_csv('data.csv')
data.csv file
,Unix Timestamp,Date,Symbol,Open,High,Low,Close,Volume
0,1635686220,2021-10-31 13:17:00,BTCUSD,60638.0,60640.0,60636.0,60638.0,0.4357009185659157
1,1635686160,2021-10-31 13:16:00,BTCUSD,60568.0,60640.0,60568.0,60638.0,3.9771881707839967
2,1635686100,2021-10-31 13:15:00,BTCUSD,60620.0,60633.0,60565.0,60568.0,1.3977284440628714
Updated csv (Expected Output):
Unix Timestamp,Date,Symbol,Open,High,Low,Close,Volume
1635686220,2021-10-31 13:17:00,BTCUSD,60638.0,60640.0,60636.0,60638.0,0.4357009185659157
1635686160,2021-10-31 13:16:00,BTCUSD,60568.0,60640.0,60568.0,60638.0,3.9771881707839967
1635686100,2021-10-31 13:15:00,BTCUSD,60620.0,60633.0,60565.0,60568.0,1.3977284440628714
This is the index. Use index=False in to_csv.
data.to_csv('data.csv', index=False)
Set the first column as index df = pd.read_csv('data.csv', index_col=0) and set index=False when writing the results.
you can follow below code.it will take column from 1st position and then you can save that df to csv without index values.
df = df.iloc[:,1:]
df.to_csv("data.csv",index=False)
I have an .xlsx file with 5 sheets, each sheet has 4 columns and I need to read the first column of the 5th sheet into a column of a dataframe.
I've tried this:
df = read_excel('file_path.xlsx', sheet_names='sheet_5', index_col='column_name'
However this seems to copy the whole sheet into the dataframe rather than just the first column.
Thanks to #Quang Hoang's comment, I found the solution.
df = pd.read_excel('file_path.xlsx', sheet_name, usecols=['column_name'])
The usecols option in read_excel only read in the column I wanted into the dataframe
Hey lets try it this way.
import pandas as pd
df = pd.read_csv ('path/to/file.csv', sheet_name = '5', index_col = 0)
print(df[['column_name']])
Tell me if it works, I recommend you reading documentation
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
I have a dataframe that I imported using pandas.read_csv that is two columns. I manipulated one column, and now would like to save all three columns as a .csv file. I have been able to save one column at a time, but am unable to get all three (df.Time, df.Distance, and df.Velocity). Here is what I'm working with.
`import pandas as pd
df=pd.read_csv('/Users/path/file.csv', delimiter=',', usecols=['A', 'B'])
df.columns = ['Time', 'Range']
df.Time = df['Time'].round(14)
df.Range = df['Range'].round(14)
df.Velocity = (df.Range.shift(1) - df.Range) / (df.Time.shift(1) -df.Time)
df2 = [df.Time, df.Range, df.Velocity]
df2.to_csv('test5.csv', columns = header)`
your assignment makes df2 a list and not a dataframe (df2 = [df.Time, df.Range, df.Velocity]).
You probably want:
df[['Time', 'Range', 'Velocity']].to_csv('test5.csv')
import pandas as pd
data=pd.read_csv('filename.csv')
data[['column1','column2','column3',...]].to_csv('fileNameWhereYouwantToWrite.csv')
You can use like this
I have constructed a matrix with integer values for columns and index. The matrix is acutally hierachical for each month. My problem is that the indexing and selecting of data does not work anymore as before when I write the data to csv and then load as pandas dataframe.
Selecting data before writing and reading data to file:
matrix.ix[1][4][3] would for example give 123
In words select, month January and get me the (travel) flow from origin 4 to destination 3.
After writing and reading the data to csv and back into pandas, the original referencing fails but if I convert the column indexing to string it works:
matrix.ix[1]['4'][3]
... the column names have automatically been tranformed from integer into string. But I would prefer the original indexing.
Any suggestions?
My current quick fix for handling the data after loading from csv is:
#Writing df to file
mulitindex_df_Travel_monthly.to_csv(r'result/Final_monthly_FlightData_countrylevel_v4.csv')
#Loading df from csv
test_matrix = pd.read_csv(filepath_inputdata+'/Final_monthly_FlightData_countrylevel_v4.csv',
index_col=[0, 1])
test_matrix.rename(columns = int, inplace = True) #Thx, #ayhan
CSV FILE:
https://www.dropbox.com/s/4u2opzh65zwcn81/travel_matrix_SO.csv?dl=0
I used something like this:
df = df.rename(columns={str(c): c for c in columns})
where:
df is pandas dataframe and columns are column to change
You could also do
df.columns = df.columns.astype(int)
or
df.columns = df.columns.map(int)
Related: what is difference between .map(str) and .astype(str) in dataframe